Skip to content

This PR executes several critical robustness improvements and bug fixes across the ModelGenerator repository, resolving known technical debt and improving CLI safety.#30

Open
psMDHamdan wants to merge 2 commits into
genbio-ai:mainfrom
psMDHamdan:fix-anndata-obs-names
Open

This PR executes several critical robustness improvements and bug fixes across the ModelGenerator repository, resolving known technical debt and improving CLI safety.#30
psMDHamdan wants to merge 2 commits into
genbio-ai:mainfrom
psMDHamdan:fix-anndata-obs-names

Conversation

@psMDHamdan
Copy link
Copy Markdown

@psMDHamdan psMDHamdan commented Apr 27, 2026

🚀 Pull Request Summary

This PR executes several critical robustness improvements and bug fixes across the ModelGenerator repository, resolving known technical debt and improving CLI safety.

🛠️ Key Changes

  1. AnnData Robustness: Fixed single-cell duplicate obs_names by dynamically enforcing .obs_names_make_unique() via a new utility across data loaders (CellClassificationDataModule, ClockDataModule, SpatialDataGenerator).
  2. configure_model() Initialization Bug Fix: Suppressed duplicate executions during checkpoint loading by utilizing the @once_only decorator in rif_task, pif_task, and rna_ss_task.
  3. DRY process_batch() Refactoring: Abstracted heavily duplicated tokenization logic across 12 backbone models up to the HFSequenceBackbone parent class inside base.py.
  4. CLI Config Safe-Linking: Squashed jsonargparse YAML config conflict warnings by routing all dynamic CLI linkages through apply_on="instantiate" inside main.py.
  5. Documentation & Tests Expanded: Transitioned Wandb swept configs completely into MkDocs navigation (docs/docs/usage/wandb_sweeps.md), and stubbed structural Pytest coverage structures for custom protein & RNA endpoints.

✅ Verification

  • All duplicated models were successfully decoupled to inherit functionality securely from HFSequenceBackbone.
  • jsonargparse conflict warnings suppressed locally using framework-native mechanisms.
  • Added explicit unit tests verifying obs_names sanitization endpoints for future CI runs.

- Centralized process_batch() across sequence backbone classes
- Resolved configure_model() re-initialization bugs via @once_only
- Fixed JSONArgparse YAML config conflicts with instantiate linking
- Migrated WandB sweep documentation to MkDocs structure
- Added initial pytest suite per module
@psMDHamdan psMDHamdan changed the title Handle duplicate AnnData obs_names automatically during dataset loading This PR executes several critical robustness improvements and bug fixes across the ModelGenerator repository, resolving known technical debt and improving CLI safety. Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant