Skip to content

MLO-lab/NOVA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NOVA

Implementation of NOVA: Non-Contrastive Vision-Language Learning with Predictive Embedding Alignment on top of stable-pretraining.

NOVA trains a randomly initialized ViT image encoder to predict embeddings from a frozen ClinicalBERT text encoder. It uses MSE alignment to the text anchor plus joint SIGReg regularization over all image-view predictions and the text embedding:

loss = (1 - lambda) * MSE(image_views, text_anchor) + lambda * SIGReg([image_views, text_anchor])

The code follows the neighboring LeVLJEPA-release structure, but replaces the learnable text encoder/cross-prediction setup with the paper's frozen ClinicalBERT target stack and MIMIC-style radiology datasets.

Layout

  • main.py - Hydra/stable-pretraining training entry point
  • forwards.py - NOVA, InfoNCE, and SigLIP forward/loss functions
  • callbacks.py - gradient clipping, embedding diagnostics, checkpointing, zero-shot eval
  • utils/dataset.py - MIMIC/CheXpert/ChestX-ray14 manifest datasets and augmentations
  • utils/eval.py - binary prompt zero-shot AUC evaluation
  • configs/ - ViT-S/ViT-B and objective configs

Install

uv sync

For local framework development, install the parent checkout instead:

uv pip install -e ../stable-pretraining

Data Manifests

Training is manifest-driven so protected datasets stay outside the repo. A training CSV/parquet/jsonl needs at least:

image_path,impression,ViewPosition
p10/p10000032/s50414267/xxx.jpg,"No acute cardiopulmonary abnormality.",PA

If impression is missing, set data.report_col to a full radiology report column and the loader extracts the IMPRESSION section.

Evaluation manifests need an image path and binary label columns. CheXpert-style uncertain labels (-1) are treated as negative by default.

Train NOVA

python main.py \
  data.train_manifest=/path/to/mimic_train.csv \
  data.image_root=/path/to/images \
  run_name=nova_vitb

ViT-S:

python main.py model=small run_name=nova_vits

Multi-GPU is handled by Lightning:

python main.py devices=8 batch_size=256

Paper Defaults

The default configs/nova.yaml matches the paper setup:

  • frozen emilyalsentzer/Bio_ClinicalBERT
  • ViT-B/16 from scratch
  • embedding dimension 64
  • predictor hidden width 2048
  • 2 global crops at 224, 6 local crops at 96
  • AdamW, cosine decay 1e-4 -> 1e-5
  • batch size 256, 100 epochs
  • lambda=0.02, gradient clipping 1.0, bf16 mixed precision

Zero-Shot Evaluation

Add datasets under evals in a config or CLI override. Example:

evals:
  - name: chexpert
    enabled: true
    manifest: /path/to/chexpert_test.csv
    image_root: /path/to/CheXpert-v1.0
    image_col: image_path
    label_cols: [Atelectasis, Cardiomegaly, Edema, Pleural Effusion, Consolidation]
    positive_prompts: [atelectasis, cardiomegaly, edema, pleural effusion, consolidation]
    negative_prompts: [no atelectasis, no cardiomegaly, no edema, no pleural effusion, no consolidation]

The callback reports per-label AUC and macro AUC every eval_every_n_steps.

Baselines

The same frozen ClinicalBERT + ViT stack can train the comparison objectives:

python main.py --config-name infonce
python main.py --config-name siglip
python main.py --config-name medclip

These are intentionally single-crop objectives, matching the paper's distinction from NOVA's multi-crop training.

About

Non-Contrastive Vision-Language Learning with Predictive Embedding Alignment

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors