Skip to content

Latest commit

 

History

History
132 lines (112 loc) · 6.11 KB

File metadata and controls

132 lines (112 loc) · 6.11 KB

Supervised experiments of SimPool

Pre-trained models

You can download checkpoints, logs and configs for all supervised models, both official reproductions and SimPool.

Architecture Mode Gamma Epochs Accuracy download
ViT-S/16 Official - 100 72.7 checkpoint logs configs
ViT-S/16 SimPool - 100 74.3 checkpoint logs configs
ViT-S/16 SimPool 1.25 100 74.2 checkpoint logs configs
ViT-S/16 SimPool 1.25 300 78.7 checkpoint logs configs
ResNet-50 Official - 100 77.4 checkpoint logs configs
ResNet-50 SimPool 2.0 100 78.0 checkpoint logs configs

Training

Having created the supervised environment and downloaded the ImageNet dataset, you are now ready to train! For our main experiments, we train ViT-S, ResNet-50 and ConvNeXt-S.

ViT-S

Train ViT-S with SimPool on ImageNet-1k for 100 epochs:

python3 -m torch.distributed.launch --nproc_per_node=8 train.py --model vit_small_patch16_224 --gp simpool --gamma 1.25 \
--data-dir /path/to/imagenet/ --output /path/to/output/ --experiment vits_supervised_simpool --batch-size 74  --sched cosine \ 
--epochs 100 --subset -1 --opt adamw -j 8 --warmup-lr 1e-6 --mixup .2 --model-ema --model-ema-decay 0.99996 \
--aa rand-m9-mstd0.5-inc1 --remode pixel --reprob 0.25 --lr 5e-4 --weight-decay .05 --drop 0.1 --drop-path .1 

For ViT-S official ([CLS]) adjust --gp token. For ViT-S with GAP adjust --gp avg. For no $\gamma$ adjust --gamma None. ❗ NOTE: Here we use 8 GPUs x 74 batch size per GPU = 592 global batch size.

ResNet-50

Train ResNet-50 with SimPool on ImageNet-1k for 100 epochs:

python3 -m torch.distributed.launch --nproc_per_node=8 train.py --model resnet50 --gp simpool --gamma 2.0 \
--data-dir /path/to/imagenet/ --output /path/to/output/ --experiment resnet50_supervised_simpool --batch-size 128 \ 
--epochs 100 --subset -1 --sched cosine --lr 0.4 --remode pixel --reprob 0.6 --aug-splits 3 --aa rand-m9-mstd0.5-inc1 \
--resplit --split-bn --jsd --dist-bn reduce 

For ResNet-50 official (GAP) adjust --gp avg. For no $\gamma$ adjust --gamma None. ❗ NOTE: Here we use 8 GPUs x 128 batch size per GPU = 1024 global batch size.

ConvNeXt-S

Train ConvNeXt-S with SimPool on ImageNet-1k for 100 epochs:

python3 -m torch.distributed.launch --nproc_per_node=8 train.py --model convnext_small --gp simpool --gamma 2.0 \
--data-dir /path/to/imagenet/ --output /path/to/output/ --experiment convnexts_supervised_simpool --batch-size 128 \
--sched cosine --epochs 100 --subset -1 --opt adamw -j 8 --warmup-lr 1e-6 --mixup .8 --cutmix 1.0 --model-ema \
--model-ema-decay 0.9999 --aa rand-m9-mstd0.5-inc1 --remode pixel --reprob 0.25 --lr 1e-3 --weight-decay .05 --drop-path .4

For ConvNeXt-S official (GAP) adjust --gp avg. For no $\gamma$ adjust --gamma None. ❗ NOTE: Here we use 8 GPUs x 128 batch size per GPU = 1024 global batch size.

Extra notes

  • Use --subset 260 to train on ImageNet-20% dataset.
  • When loading our weights using --pretrained_weights, take care of any inconsistencies in model keys!
  • Default value of $\gamma$ is: 1.25 for transformers, 2.0 for convolutional networks.
  • In some cases, we observed that using no $\gamma$ facilitates the training, results in slightly better metrics, but also lowers the attention map quality.

More training

⚠️ UNDER CONSTRUCTION ⚠️