API Reference¶
example_package
A short description of your package.
Extract and populate metadata from a single MD engine log file, validated against the biosim-schema. Preserves canonical casing from schema mappings.
- class biosim_extractor.metadata.populatemetadata.MetadataPopulator(schema_path=None, log_file=None, engine=None, top_file=None, traj_file=None, store_file_metadata=True)[source]¶
Bases:
objectOrchestrates extraction of MD engine metadata and population of metadata validated against the biosim schema.
Supports log-file-based engines (Amber, GROMACS) and topology/trajectory parsing via MDAnalysis.
- apply_mapping() Dict[source]¶
Apply mapping rules to engine data to produce schema-compliant output.
- Returns:
Result dictionary with mapped schema values applied.
- convert_values(value, term, is_vector=False)[source]¶
Convert a raw value (or list) to a unit-annotated schema dictionary.
- Args:
value: Numeric value or list of values to convert. term: Forward-mapping entry containing
"unit"and"key". is_vector: IfTrue, stores the result under"vector_value"instead of"value".- Returns:
Dictionary with
"value"(or"vector_value") and"value_unit"keys.
- parse_log()[source]¶
Parse the MD engine log file and return a flattened parameter dictionary.
- Returns:
Flat dictionary of parameter names to raw values.
- Raises:
ValueError: If
self.engineis not a supported engine.
- populate()[source]¶
Run the full extraction and mapping pipeline.
- Returns:
Populated
SimulationMetadatadictionary withNone-containing entries removed.
- biosim_extractor.metadata.populatemetadata.add_to_path(d: Dict, path: str, value: Any)[source]¶
Append a value to a list in a nested dict at a dot-separated path.
- Args:
d: Dictionary to modify in place. path: Dot-separated key path pointing to an existing list. value: Value to append.
- biosim_extractor.metadata.populatemetadata.assign_by_path(d: Dict, path: str, value: Any)[source]¶
Set a value in a nested dict at a dot-separated path, creating intermediate dicts as needed.
- Args:
d: Dictionary to modify in place. path: Dot-separated key path. value: Value to assign at the final key.
- biosim_extractor.metadata.populatemetadata.flatten_dict(d: Dict) Dict[source]¶
Recursively flatten a nested dict, keeping the first occurrence of duplicate keys.
- Args:
d: Nested dictionary to flatten.
- Returns:
Single-level dictionary with all leaf key-value pairs.
- biosim_extractor.metadata.populatemetadata.get_by_path(d: Dict, path: str)[source]¶
Retrieve a value from a nested dict using a dot-separated path.
- Args:
d: Dictionary to traverse. path: Dot-separated key path, e.g.
"SimulationMetadata.timestep".- Returns:
Value at the path, or
Noneif any key is missing.
- biosim_extractor.metadata.populatemetadata.is_numeric(value)[source]¶
Check whether a value can be interpreted as a float.
- Args:
value: Value to test.
- Returns:
Trueiffloat(value)succeeds,Falseotherwise.
- biosim_extractor.metadata.populatemetadata.main()[source]¶
Entry point: parse args, resolve schema sources, run pipeline, validate, write output.
- biosim_extractor.metadata.populatemetadata.normalize_key(value: Any) str[source]¶
Normalise a value to lowercase stripped string for case-insensitive matching.
- Args:
value: Value to normalise.
- Returns:
Lowercased, stripped string representation.
- biosim_extractor.metadata.populatemetadata.parse_args()[source]¶
Parse command-line arguments.
- Returns:
Parsed
argparse.Namespaceobject.
- biosim_extractor.metadata.populatemetadata.remove_null_parents(d)[source]¶
Recursively remove any dict that contains a
Nonevalue.- Args:
d: Dictionary to clean.
- Returns:
Cleaned dictionary with
None-containing dicts removed, orNoneif the top-level dict itself contains aNonevalue.
- biosim_extractor.metadata.populatemetadata.resolve_schema_inputs(args)[source]¶
Resolve mapping and biosim schema paths from args or remote schema bundle.
If either path argument (mappingschema, biosimschema) is missing, the function fetches a bundled schema release (optionally updating if requested). This ensures downstream processing has valid JSON/YAML sources without requiring manual caching setup
- biosim_extractor.metadata.populatemetadata.transform_value(value: Any, rules: Dict)[source]¶
Map a raw engine value to its canonical schema equivalent using a rules dict.
- Args:
value: Raw value from the engine data. rules: Mapping of raw keys to canonical values (empty dict skips mapping).
- Returns:
Canonical mapped value, or
Noneif the value has no matching rule.
Validation utilities for extracted MD simulation metadata against the biosim LinkML schema.
- biosim_extractor.metadata.validatemetadata.extract_schema_version(schema_path)[source]¶
Extract the biosim schema version from a local schema YAML file.
Args: schema_path (str | Path): Path to the biosim schema YAML file.
Returns: str | None: Parsed schema version string, or None if the path is missing, points to a URL, the file does not exist, or no version field is found.
- biosim_extractor.metadata.validatemetadata.validate_extracted(instance, schema_path)[source]¶
Validate extracted MD simulation metadata against the biosim LinkML schema.
Uses a two-pass strategy to work around LinkML’s JSON-Schema compiler not supporting nested array (matrix) constraints:
Custom pass: checks every
vector_valuefield for correct numeric types and (for matrices) consistent row lengths.LinkML pass: matrix
vector_valuefields are stripped and the remainder is validated with linkml.validator.validate, which enforces types, enums, required fields, and cardinality on flat vectors.
The working directory is temporarily changed to the directory containing schema_path so that relative $import paths inside the schema resolve correctly.
- Args:
instance: Extracted metadata dict conforming to the SimulationMetadata class. schema_path: Path to the top-level biosim_schema.yaml file, or a raw GitHub URL (https://raw.githubusercontent.com/…).
- Returns:
list: Validation error messages. An empty list means the instance is valid.
- biosim_extractor.metadata.validatemetadata.validate_metadata(result, biosimschema_path=None, strict=False)[source]¶
Validate a populated metadata dict, optionally against a biosim schema.
- Args:
result: Populated metadata dictionary to validate. biosimschema_path: Path or URL to the biosim schema YAML. If
None, validation is skipped. strict: IfTrue, raisesValueErroron validation errors; otherwise emits a warning.- Raises:
ValueError: If
strict=Trueand validation errors are found.
- biosim_extractor.metadata.convertpopulated.convert_populated_metadata_units(metadata: dict) dict[source]¶
Recursively convert all value/value_unit and vector_value/value_unit pairs in a metadata dict to standard units.
- Args:
- metadata (dict): The metadata dictionary to process. This should contain nested dictionaries
where physical quantities are represented as {‘value’: …, ‘value_unit’: …} or {‘vector_value’: …, ‘value_unit’: …}.
- Returns:
- dict: A new metadata dictionary with all values converted to standard units as defined by UnitConverter.
The structure of the input is preserved.
- Raises:
ValueError: If a unit is unknown or conversion fails.
Extract topology and trajectory metadata using MDAnalysis.
- class biosim_extractor.mdanalysis.toptraj.TopTrajParser(toppath, trajpath)[source]¶
Bases:
objectParse topology and trajectory files to extract system and molecule metadata.
- biosim_extractor.mdanalysis.toptraj.classify_box(dim, tolerance=0.001)[source]¶
Classify the simulation box type based on dimensions and angles.
- Args:
dim (list or tuple): Box dimensions [lx, ly, lz, a, b, g]. tolerance (float): Tolerance for angle/length comparison.
- Returns:
str: Box type (e.g., “cubic”, “tetragonal”, “orthorhombic”, etc.).
- biosim_extractor.mdanalysis.toptraj.get_nucleic_sequence(fragment)[source]¶
Extract the nucleic acid sequence from a molecule fragment.
- Args:
fragment (MDAnalysis.AtomGroup): Molecule fragment to analyze.
- Returns:
str or None: Nucleic acid sequence as a string, or None if not found.
- biosim_extractor.mdanalysis.toptraj.get_protein_sequence(fragment)[source]¶
Extract the protein sequence from a molecule fragment.
- Args:
fragment (MDAnalysis.AtomGroup): Molecule fragment to analyze.
- Returns:
str or None: Protein sequence as a string, or None if not found.
- biosim_extractor.mdanalysis.toptraj.main()[source]¶
Entry point: parse args, run extraction, and write output.
- biosim_extractor.mdanalysis.toptraj.parse_args()[source]¶
Parse command-line arguments.
- Returns:
Parsed
argparse.Namespaceobject.
- biosim_extractor.mdanalysis.toptraj.safe_extract(func)[source]¶
Safely extract and convert values from a function, handling numpy types.
- Args:
func (callable): Function to call.
- Returns:
Any: Extracted and converted value.
Extract gmx log file metadata into a dictionary.
- class biosim_extractor.gromacs.gromacslog.GromacsLogParser(filepath)[source]¶
Bases:
objectParser for GROMACS
.logfiles, extracting header, input parameters, summary, and averages.
- biosim_extractor.gromacs.gromacslog.main()[source]¶
Entry point: parse args, run extraction, and write output.
- biosim_extractor.gromacs.gromacslog.parse_args()[source]¶
Parse command-line arguments.
- Returns:
Parsed
argparse.Namespaceobject.
Extract AMBER log file metadata into a structured dictionary.
This script parses AMBER log files and outputs structured metadata as JSON. It can be used as a standalone CLI tool or imported as a module.
- class biosim_extractor.amber.amberlog.AmberLogParser(filepath)[source]¶
Bases:
objectParser for AMBER log files.
- biosim_extractor.amber.amberlog.main()[source]¶
Entry point: parse args, run extraction, and write output.
- biosim_extractor.amber.amberlog.parse_args()[source]¶
Parse command-line arguments.
- Returns:
Parsed
argparse.Namespaceobject.
Convert units from various MD engine outputs to a consistent standard unit system.
This module provides the UnitConverter class for converting scientific values between different units, standardizing to a chosen system (default: SI-like).
- class biosim_extractor.units.unitconversion.UnitConverter(standard_units: Dict[str, str] | None = None)[source]¶
Bases:
objectSimple unit conversion class for scientific calculations. Converts values to a chosen standard unit system.
- convert(value: float | List[float], from_unit: str, unit_type: str | None = None, decimals: int | None = None) float | List[float][source]¶
Convert a value from one unit to the standard unit.
- Args:
value (float or list): Value(s) to convert. from_unit (str): The original unit. unit_type (str, optional): The unit type (auto-detected if None).
- Returns:
float or list: Converted value(s) in standard unit.
- Raises:
ValueError: If unit or unit type is unknown.
- convert_with_unit(value: float | List[float], from_unit: str, unit_type: str | None = None) Tuple[float | List[float], str][source]¶
Convert a value and return both the value and the target unit.
- Args:
value (float or list): Value(s) to convert. from_unit (str): The original unit. unit_type (str, optional): The unit type (auto-detected if None).
- Returns:
tuple: (converted value(s), target unit)
- Raises:
ValueError: If unit or unit type is unknown.
- get_target_unit(from_unit: str) str[source]¶
Get the standard unit for a given unit.
- Args:
from_unit (str): The original unit.
- Returns:
str: The standard unit.
- Raises:
ValueError: If unit is unknown.
- get_unit_type(unit: str) str | None[source]¶
Get the unit type for a given unit string.
- Args:
unit (str): The unit string.
- Returns:
str or None: The unit type, or None if unknown.