Auto-Pipelines¶

This section documents the automated pipelines provided by OmniGenBench, which include AutoBench and AutoTrain. These tools are designed to streamline benchmarking and training workflows for genomic foundation models, making it easy for users to evaluate and train models with minimal manual setup.

OmniGenBench’s auto-pipelines offer:

Automated benchmarking of genomic models using standardized protocols and datasets.
Automated training workflows with flexible configuration options.
Integration with model, dataset, and pipeline hubs for easy access to resources.
Command-line interfaces for running benchmarks and training directly from the terminal.

Key Components:

AutoBench: Automates the benchmarking process for genomic models.
AutoTrain: Automates the training process for genomic models.
BenchHub, ModelHub, PipelineHub: Provide access to benchmarks, models, and pipelines.
CLI commands: Allow users to run benchmarks and training via the command line.

Example Usage:

from omnigenbench import AutoBench, AutoTrain

# Run automated benchmarking
bench = AutoBench("RGB", "model_name")
bench.run()

# Train a model
trainer = AutoTrain("RGB", "model_name")
trainer.run()

Refer to the API documentation below for details on each auto-pipeline component, including configuration, usage, and extension options.

Auto Bench¶

class omnigenbench.auto.auto_bench.auto_bench.AutoBench(benchmark, model_name_or_path, tokenizer=None, **kwargs)[source]

Bases: object

This class provides a comprehensive framework for evaluating genomic models across multiple benchmarks and tasks. It handles loading benchmarks, models, tokenizers, and running evaluations with proper metric tracking and result visualization.

AutoBench supports various evaluation scenarios including:

Single model evaluation across multiple benchmarks
Multi-seed evaluation for robustness testing
Different trainer backends (native, accelerate, huggingface)
Automatic metric visualization and result tracking

Variables:

benchmark (str) – The name or path of the benchmark to use.
model_name_or_path (str) – The name or path of the model to evaluate.
tokenizer – The tokenizer to use for evaluation.
autocast (str) – The autocast precision to use (‘fp16’, ‘bf16’, etc.).
overwrite (bool) – Whether to overwrite existing evaluation results.
trainer (str) – The trainer to use (‘native’, ‘accelerate’, ‘hf_trainer’).
mv_path (str) – Path to the metric visualizer file.
mv (MetricVisualizer) – The metric visualizer instance.
bench_metadata – Metadata about the benchmark configuration.

bench_info()[source]

Prints and returns information about the current benchmark setup.

Returns:: str – A string containing benchmark information.

Example

>>> info = bench.bench_info()
>>> print(info)

run(**kwargs)[source]

Runs the benchmarking process. This method iterates through the tasks in the benchmark, loads the corresponding configurations, initializes the model, tokenizer, and datasets, and then trains and evaluates the model.

Parameters:: **kwargs – Additional keyword arguments that will override the default parameters in the benchmark configuration.

Example

>>> # Run benchmarking with default settings
>>> bench.run()
>>> # Run with custom parameters
>>> bench.run(learning_rate=1e-4, batch_size=16)

omnigenbench.auto.auto_bench.auto_bench_cli.bench_command(args: list | None = None)[source]: This function parses command-line arguments, initializes the AutoBench, and runs the evaluation.

omnigenbench.auto.auto_bench.auto_bench_cli.create_parser() → ArgumentParser[source]

Creates the argument parser for the benchmark CLI.

Returns:: An argparse.ArgumentParser instance.

omnigenbench.auto.auto_bench.auto_bench_cli.run_bench()[source]: This function sets up logging, constructs the command to execute (potentially with accelerate launch), and runs it.

omnigenbench.auto.auto_bench.config_check.config_check(args)[source]

Performs a basic check on the configuration arguments.

Parameters:: args – A dictionary of configuration arguments.
Raises:: RuntimeError – If a configuration check fails.

Auto Train¶

class omnigenbench.auto.auto_train.auto_train.AutoTrain(dataset, model_name_or_path, tokenizer=None, **kwargs)[source]

Bases: object

This class provides a comprehensive framework for training genomic models on various datasets with minimal configuration. It handles dataset loading, model initialization, training configuration, and result tracking.

AutoTrain supports various training scenarios including:

Single dataset training with multiple seeds
Different trainer backends (native, accelerate, huggingface)
Automatic metric visualization and result tracking
Configurable training parameters

Variables:

dataset (str) – The name or path of the dataset to use for training.
model_name_or_path (str) – The name or path of the model to train.
tokenizer – The tokenizer to use for training.
autocast (str) – The autocast precision to use (‘fp16’, ‘bf16’, etc.).
overwrite (bool) – Whether to overwrite existing training results.
trainer (str) – The trainer to use (‘native’, ‘accelerate’, ‘hf_trainer’).
mv_path (str) – Path to the metric visualizer file.
mv (MetricVisualizer) – The metric visualizer instance.

run(**kwargs)[source]

This method loads the dataset configuration, initializes the model and tokenizer, and runs training across multiple seeds. It supports various training backends and automatic result tracking.

Parameters:: **kwargs – Additional keyword arguments that will override the default parameters in the dataset configuration.

Example

>>> # Run training with default settings
>>> trainer.run()
>>> # Run with custom parameters
>>> trainer.run(learning_rate=1e-4, batch_size=16)

train_info()[source]

Print and return information about the current training setup.

Returns:: str – A string containing training setup information.

Example

>>> info = trainer.train_info()
>>> print(info)

omnigenbench.auto.auto_train.auto_train_cli.create_parser() → ArgumentParser[source]

Creates the argument parser for the auto-train CLI.

Returns:: An argparse.ArgumentParser instance.

omnigenbench.auto.auto_train.auto_train_cli.run_train()[source]: This function is the entry point for the ‘autotrain’ console script.

omnigenbench.auto.auto_train.auto_train_cli.train_command(args: list | None = None)[source]

Entry point for the OmniGenome auto-train command-line interface.

Parameters:: args – A list of command-line arguments. If None, sys.argv is used.

Bench Hub¶

class omnigenbench.auto.bench_hub.bench_hub.BenchHub[source]

Bases: object

A hub for accessing and managing benchmarks. This class is intended to provide a centralized way to list, download, and inspect available benchmarks for OmniGenome.

run()[source]: Placeholder for running functionality related to the benchmark hub.