Supervised experiments of SimPool

Pre-trained models

You can download checkpoints, logs and configs for all supervised models, both official reproductions and SimPool.

Architecture	Mode	Gamma	Epochs	Accuracy	download
ViT-S/16	Official	-	100	72.7	checkpoint	logs	configs
ViT-S/16	SimPool	-	100	74.3	checkpoint	logs	configs
ViT-S/16	SimPool	1.25	100	74.2	checkpoint	logs	configs
ViT-S/16	SimPool	1.25	300	78.7	checkpoint	logs	configs
ResNet-50	Official	-	100	77.4	checkpoint	logs	configs
ResNet-50	SimPool	2.0	100	78.0	checkpoint	logs	configs

Training

Having created the supervised environment and downloaded the ImageNet dataset, you are now ready to train! For our main experiments, we train ViT-S, ResNet-50 and ConvNeXt-S.

ViT-S

Train ViT-S with SimPool on ImageNet-1k for 100 epochs:

python3 -m torch.distributed.launch --nproc_per_node=8 train.py --model vit_small_patch16_224 --gp simpool --gamma 1.25 \
--data-dir /path/to/imagenet/ --output /path/to/output/ --experiment vits_supervised_simpool --batch-size 74  --sched cosine \ 
--epochs 100 --subset -1 --opt adamw -j 8 --warmup-lr 1e-6 --mixup .2 --model-ema --model-ema-decay 0.99996 \
--aa rand-m9-mstd0.5-inc1 --remode pixel --reprob 0.25 --lr 5e-4 --weight-decay .05 --drop 0.1 --drop-path .1

For ViT-S official ([CLS]) adjust --gp token. For ViT-S with GAP adjust --gp avg. For no $\gamma$ adjust --gamma None. ❗ NOTE: Here we use 8 GPUs x 74 batch size per GPU = 592 global batch size.

ResNet-50

Train ResNet-50 with SimPool on ImageNet-1k for 100 epochs:

python3 -m torch.distributed.launch --nproc_per_node=8 train.py --model resnet50 --gp simpool --gamma 2.0 \
--data-dir /path/to/imagenet/ --output /path/to/output/ --experiment resnet50_supervised_simpool --batch-size 128 \ 
--epochs 100 --subset -1 --sched cosine --lr 0.4 --remode pixel --reprob 0.6 --aug-splits 3 --aa rand-m9-mstd0.5-inc1 \
--resplit --split-bn --jsd --dist-bn reduce

For ResNet-50 official (GAP) adjust --gp avg. For no $\gamma$ adjust --gamma None. ❗ NOTE: Here we use 8 GPUs x 128 batch size per GPU = 1024 global batch size.

ConvNeXt-S

Train ConvNeXt-S with SimPool on ImageNet-1k for 100 epochs:

python3 -m torch.distributed.launch --nproc_per_node=8 train.py --model convnext_small --gp simpool --gamma 2.0 \
--data-dir /path/to/imagenet/ --output /path/to/output/ --experiment convnexts_supervised_simpool --batch-size 128 \
--sched cosine --epochs 100 --subset -1 --opt adamw -j 8 --warmup-lr 1e-6 --mixup .8 --cutmix 1.0 --model-ema \
--model-ema-decay 0.9999 --aa rand-m9-mstd0.5-inc1 --remode pixel --reprob 0.25 --lr 1e-3 --weight-decay .05 --drop-path .4

For ConvNeXt-S official (GAP) adjust --gp avg. For no $\gamma$ adjust --gamma None. ❗ NOTE: Here we use 8 GPUs x 128 batch size per GPU = 1024 global batch size.

Extra notes

Use --subset 260 to train on ImageNet-20% dataset.
When loading our weights using --pretrained_weights, take care of any inconsistencies in model keys!
Default value of $\gamma$ is: 1.25 for transformers, 2.0 for convolutional networks.
In some cases, we observed that using no $\gamma$ facilitates the training, results in slightly better metrics, but also lowers the attention map quality.

More training

⚠️ UNDER CONSTRUCTION ⚠️

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supervised experiments of SimPool

Pre-trained models

Training

ViT-S

ResNet-50

ConvNeXt-S

Extra notes

More training

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Supervised experiments of SimPool

Pre-trained models

Training

ViT-S

ResNet-50

ConvNeXt-S

Extra notes

More training