Attention is all you need
ViT reaches 84.86% top-1 accuracy on ImageNet with only 10 examples per class.
Using a pre-trained model is both more cost-efficient and leads to better results
Cheaper strategy works equally well as the more expensive strategy in the majority of scenarios