You can download checkpoints, logs and configs for all supervised models, both official reproductions and SimPool.
| Architecture | Mode | Gamma | Epochs | Accuracy | download | ||
|---|---|---|---|---|---|---|---|
| ViT-S/16 | Official | - | 100 | 72.7 | checkpoint | logs | configs |
| ViT-S/16 | SimPool | - | 100 | 74.3 | checkpoint | logs | configs |
| ViT-S/16 | SimPool | 1.25 | 100 | 74.2 | checkpoint | logs | configs |
| ViT-S/16 | SimPool | 1.25 | 300 | 78.7 | checkpoint | logs | configs |
| ResNet-50 | Official | - | 100 | 77.4 | checkpoint | logs | configs |
| ResNet-50 | SimPool | 2.0 | 100 | 78.0 | checkpoint | logs | configs |
Having created the supervised environment and downloaded the ImageNet dataset, you are now ready to train! For our main experiments, we train ViT-S, ResNet-50 and ConvNeXt-S.
Train ViT-S with SimPool on ImageNet-1k for 100 epochs:
python3 -m torch.distributed.launch --nproc_per_node=8 train.py --model vit_small_patch16_224 --gp simpool --gamma 1.25 \
--data-dir /path/to/imagenet/ --output /path/to/output/ --experiment vits_supervised_simpool --batch-size 74 --sched cosine \
--epochs 100 --subset -1 --opt adamw -j 8 --warmup-lr 1e-6 --mixup .2 --model-ema --model-ema-decay 0.99996 \
--aa rand-m9-mstd0.5-inc1 --remode pixel --reprob 0.25 --lr 5e-4 --weight-decay .05 --drop 0.1 --drop-path .1 For ViT-S official ([CLS]) adjust
--gp token. For ViT-S with GAP adjust--gp avg. For no$\gamma$ adjust--gamma None. ❗ NOTE: Here we use 8 GPUs x 74 batch size per GPU = 592 global batch size.
Train ResNet-50 with SimPool on ImageNet-1k for 100 epochs:
python3 -m torch.distributed.launch --nproc_per_node=8 train.py --model resnet50 --gp simpool --gamma 2.0 \
--data-dir /path/to/imagenet/ --output /path/to/output/ --experiment resnet50_supervised_simpool --batch-size 128 \
--epochs 100 --subset -1 --sched cosine --lr 0.4 --remode pixel --reprob 0.6 --aug-splits 3 --aa rand-m9-mstd0.5-inc1 \
--resplit --split-bn --jsd --dist-bn reduce For ResNet-50 official (GAP) adjust
--gp avg. For no$\gamma$ adjust--gamma None. ❗ NOTE: Here we use 8 GPUs x 128 batch size per GPU = 1024 global batch size.
Train ConvNeXt-S with SimPool on ImageNet-1k for 100 epochs:
python3 -m torch.distributed.launch --nproc_per_node=8 train.py --model convnext_small --gp simpool --gamma 2.0 \
--data-dir /path/to/imagenet/ --output /path/to/output/ --experiment convnexts_supervised_simpool --batch-size 128 \
--sched cosine --epochs 100 --subset -1 --opt adamw -j 8 --warmup-lr 1e-6 --mixup .8 --cutmix 1.0 --model-ema \
--model-ema-decay 0.9999 --aa rand-m9-mstd0.5-inc1 --remode pixel --reprob 0.25 --lr 1e-3 --weight-decay .05 --drop-path .4
For ConvNeXt-S official (GAP) adjust
--gp avg. For no$\gamma$ adjust--gamma None. ❗ NOTE: Here we use 8 GPUs x 128 batch size per GPU = 1024 global batch size.
- Use
--subset 260to train on ImageNet-20% dataset. - When loading our weights using
--pretrained_weights, take care of any inconsistencies in model keys! - Default value of
$\gamma$ is: 1.25 for transformers, 2.0 for convolutional networks. - In some cases, we observed that using no
$\gamma$ facilitates the training, results in slightly better metrics, but also lowers the attention map quality.