API Reference

example_package

A short description of your package.

Extract and populate metadata from a single MD engine log file, validated against the biosim-schema. Preserves canonical casing from schema mappings.

class biosim_extractor.metadata.populatemetadata.MetadataPopulator(schema_path=None, log_file=None, engine=None, top_file=None, traj_file=None, store_file_metadata=True)[source]

Bases: object

Orchestrates extraction of MD engine metadata and population of metadata validated against the biosim schema.

Supports log-file-based engines (Amber, GROMACS) and topology/trajectory parsing via MDAnalysis.

apply_mapping() Dict[source]

Apply mapping rules to engine data to produce schema-compliant output.

Returns:

Result dictionary with mapped schema values applied.

convert_values(value, term, is_vector=False)[source]

Convert a raw value (or list) to a unit-annotated schema dictionary.

Args:

value: Numeric value or list of values to convert. term: Forward-mapping entry containing "unit" and "key". is_vector: If True, stores the result under "vector_value" instead of "value".

Returns:

Dictionary with "value" (or "vector_value") and "value_unit" keys.

load_schema()[source]

Load and parse the extraction schema JSON from self.schema_path.

parse_log()[source]

Parse the MD engine log file and return a flattened parameter dictionary.

Returns:

Flat dictionary of parameter names to raw values.

Raises:

ValueError: If self.engine is not a supported engine.

populate()[source]

Run the full extraction and mapping pipeline.

Returns:

Populated SimulationMetadata dictionary with None-containing entries removed.

populate_toptraj()[source]

Parse topology and trajectory files and apply schema mapping.

Returns:

Schema-mapped result dictionary, or None if topology/trajectory files are not set.

validate(result, biosimschema_path=None, strict=False)[source]

Validate populated metadata against the biosim schema.

Args:

result: Populated metadata dictionary to validate. biosimschema_path: Optional path to the biosim schema YAML. strict: If True, raise on warnings in addition to errors.

biosim_extractor.metadata.populatemetadata.add_to_path(d: Dict, path: str, value: Any)[source]

Append a value to a list in a nested dict at a dot-separated path.

Args:

d: Dictionary to modify in place. path: Dot-separated key path pointing to an existing list. value: Value to append.

biosim_extractor.metadata.populatemetadata.assign_by_path(d: Dict, path: str, value: Any)[source]

Set a value in a nested dict at a dot-separated path, creating intermediate dicts as needed.

Args:

d: Dictionary to modify in place. path: Dot-separated key path. value: Value to assign at the final key.

biosim_extractor.metadata.populatemetadata.flatten_dict(d: Dict) Dict[source]

Recursively flatten a nested dict, keeping the first occurrence of duplicate keys.

Args:

d: Nested dictionary to flatten.

Returns:

Single-level dictionary with all leaf key-value pairs.

biosim_extractor.metadata.populatemetadata.get_by_path(d: Dict, path: str)[source]

Retrieve a value from a nested dict using a dot-separated path.

Args:

d: Dictionary to traverse. path: Dot-separated key path, e.g. "SimulationMetadata.timestep".

Returns:

Value at the path, or None if any key is missing.

biosim_extractor.metadata.populatemetadata.is_numeric(value)[source]

Check whether a value can be interpreted as a float.

Args:

value: Value to test.

Returns:

True if float(value) succeeds, False otherwise.

biosim_extractor.metadata.populatemetadata.main()[source]

Entry point: parse args, resolve schema sources, run pipeline, validate, write output.

biosim_extractor.metadata.populatemetadata.normalize_key(value: Any) str[source]

Normalise a value to lowercase stripped string for case-insensitive matching.

Args:

value: Value to normalise.

Returns:

Lowercased, stripped string representation.

biosim_extractor.metadata.populatemetadata.parse_args()[source]

Parse command-line arguments.

Returns:

Parsed argparse.Namespace object.

biosim_extractor.metadata.populatemetadata.remove_null_parents(d)[source]

Recursively remove any dict that contains a None value.

Args:

d: Dictionary to clean.

Returns:

Cleaned dictionary with None-containing dicts removed, or None if the top-level dict itself contains a None value.

biosim_extractor.metadata.populatemetadata.resolve_schema_inputs(args)[source]

Resolve mapping and biosim schema paths from args or remote schema bundle.

If either path argument (mappingschema, biosimschema) is missing, the function fetches a bundled schema release (optionally updating if requested). This ensures downstream processing has valid JSON/YAML sources without requiring manual caching setup

biosim_extractor.metadata.populatemetadata.transform_value(value: Any, rules: Dict)[source]

Map a raw engine value to its canonical schema equivalent using a rules dict.

Args:

value: Raw value from the engine data. rules: Mapping of raw keys to canonical values (empty dict skips mapping).

Returns:

Canonical mapped value, or None if the value has no matching rule.

Validation utilities for extracted MD simulation metadata against the biosim LinkML schema.

biosim_extractor.metadata.validatemetadata.extract_schema_version(schema_path)[source]

Extract the biosim schema version from a local schema YAML file.

Args: schema_path (str | Path): Path to the biosim schema YAML file.

Returns: str | None: Parsed schema version string, or None if the path is missing, points to a URL, the file does not exist, or no version field is found.

biosim_extractor.metadata.validatemetadata.validate_extracted(instance, schema_path)[source]

Validate extracted MD simulation metadata against the biosim LinkML schema.

Uses a two-pass strategy to work around LinkML’s JSON-Schema compiler not supporting nested array (matrix) constraints:

  1. Custom pass: checks every vector_value field for correct numeric types and (for matrices) consistent row lengths.

  2. LinkML pass: matrix vector_value fields are stripped and the remainder is validated with linkml.validator.validate, which enforces types, enums, required fields, and cardinality on flat vectors.

The working directory is temporarily changed to the directory containing schema_path so that relative $import paths inside the schema resolve correctly.

Args:

instance: Extracted metadata dict conforming to the SimulationMetadata class. schema_path: Path to the top-level biosim_schema.yaml file, or a raw GitHub URL (https://raw.githubusercontent.com/…).

Returns:

list: Validation error messages. An empty list means the instance is valid.

biosim_extractor.metadata.validatemetadata.validate_metadata(result, biosimschema_path=None, strict=False)[source]

Validate a populated metadata dict, optionally against a biosim schema.

Args:

result: Populated metadata dictionary to validate. biosimschema_path: Path or URL to the biosim schema YAML. If None, validation is skipped. strict: If True, raises ValueError on validation errors; otherwise emits a warning.

Raises:

ValueError: If strict=True and validation errors are found.

biosim_extractor.metadata.convertpopulated.convert_populated_metadata_units(metadata: dict) dict[source]

Recursively convert all value/value_unit and vector_value/value_unit pairs in a metadata dict to standard units.

Args:
metadata (dict): The metadata dictionary to process. This should contain nested dictionaries

where physical quantities are represented as {‘value’: …, ‘value_unit’: …} or {‘vector_value’: …, ‘value_unit’: …}.

Returns:
dict: A new metadata dictionary with all values converted to standard units as defined by UnitConverter.

The structure of the input is preserved.

Raises:

ValueError: If a unit is unknown or conversion fails.

Extract topology and trajectory metadata using MDAnalysis.

class biosim_extractor.mdanalysis.toptraj.TopTrajParser(toppath, trajpath)[source]

Bases: object

Parse topology and trajectory files to extract system and molecule metadata.

parse()[source]

Extract system-level metadata and molecule information.

Returns:

dict: Extracted metadata.

biosim_extractor.mdanalysis.toptraj.classify_box(dim, tolerance=0.001)[source]

Classify the simulation box type based on dimensions and angles.

Args:

dim (list or tuple): Box dimensions [lx, ly, lz, a, b, g]. tolerance (float): Tolerance for angle/length comparison.

Returns:

str: Box type (e.g., “cubic”, “tetragonal”, “orthorhombic”, etc.).

biosim_extractor.mdanalysis.toptraj.get_nucleic_sequence(fragment)[source]

Extract the nucleic acid sequence from a molecule fragment.

Args:

fragment (MDAnalysis.AtomGroup): Molecule fragment to analyze.

Returns:

str or None: Nucleic acid sequence as a string, or None if not found.

biosim_extractor.mdanalysis.toptraj.get_protein_sequence(fragment)[source]

Extract the protein sequence from a molecule fragment.

Args:

fragment (MDAnalysis.AtomGroup): Molecule fragment to analyze.

Returns:

str or None: Protein sequence as a string, or None if not found.

biosim_extractor.mdanalysis.toptraj.main()[source]

Entry point: parse args, run extraction, and write output.

biosim_extractor.mdanalysis.toptraj.parse_args()[source]

Parse command-line arguments.

Returns:

Parsed argparse.Namespace object.

biosim_extractor.mdanalysis.toptraj.safe_extract(func)[source]

Safely extract and convert values from a function, handling numpy types.

Args:

func (callable): Function to call.

Returns:

Any: Extracted and converted value.

Extract gmx log file metadata into a dictionary.

class biosim_extractor.gromacs.gromacslog.GromacsLogParser(filepath)[source]

Bases: object

Parser for GROMACS .log files, extracting header, input parameters, summary, and averages.

parse()[source]

Parse the log file and return all extracted data.

Returns:

Dictionary containing header fields, input parameters, summary, and averages.

biosim_extractor.gromacs.gromacslog.main()[source]

Entry point: parse args, run extraction, and write output.

biosim_extractor.gromacs.gromacslog.parse_args()[source]

Parse command-line arguments.

Returns:

Parsed argparse.Namespace object.

Extract AMBER log file metadata into a structured dictionary.

This script parses AMBER log files and outputs structured metadata as JSON. It can be used as a standalone CLI tool or imported as a module.

class biosim_extractor.amber.amberlog.AmberLogParser(filepath)[source]

Bases: object

Parser for AMBER log files.

parse()[source]

Parse the AMBER log file.

Returns:

dict: Parsed metadata.

biosim_extractor.amber.amberlog.main()[source]

Entry point: parse args, run extraction, and write output.

biosim_extractor.amber.amberlog.parse_args()[source]

Parse command-line arguments.

Returns:

Parsed argparse.Namespace object.

Convert units from various MD engine outputs to a consistent standard unit system.

This module provides the UnitConverter class for converting scientific values between different units, standardizing to a chosen system (default: SI-like).

class biosim_extractor.units.unitconversion.UnitConverter(standard_units: Dict[str, str] | None = None)[source]

Bases: object

Simple unit conversion class for scientific calculations. Converts values to a chosen standard unit system.

convert(value: float | List[float], from_unit: str, unit_type: str | None = None, decimals: int | None = None) float | List[float][source]

Convert a value from one unit to the standard unit.

Args:

value (float or list): Value(s) to convert. from_unit (str): The original unit. unit_type (str, optional): The unit type (auto-detected if None).

Returns:

float or list: Converted value(s) in standard unit.

Raises:

ValueError: If unit or unit type is unknown.

convert_with_unit(value: float | List[float], from_unit: str, unit_type: str | None = None) Tuple[float | List[float], str][source]

Convert a value and return both the value and the target unit.

Args:

value (float or list): Value(s) to convert. from_unit (str): The original unit. unit_type (str, optional): The unit type (auto-detected if None).

Returns:

tuple: (converted value(s), target unit)

Raises:

ValueError: If unit or unit type is unknown.

get_target_unit(from_unit: str) str[source]

Get the standard unit for a given unit.

Args:

from_unit (str): The original unit.

Returns:

str: The standard unit.

Raises:

ValueError: If unit is unknown.

get_unit_type(unit: str) str | None[source]

Get the unit type for a given unit string.

Args:

unit (str): The unit string.

Returns:

str or None: The unit type, or None if unknown.

is_standard_unit(unit: str) bool[source]

Check if a unit is the standard for its type.

Args:

unit (str): The unit string.

Returns:

bool: True if unit is standard, False otherwise.

needs_conversion(from_unit: str) bool[source]

Determine if a unit needs conversion to standard.

Args:

from_unit (str): The unit string.

Returns:

bool: True if conversion is needed, False if already standard.