HuggingFace Vision Transformer (ViT) model pre-trained on ImageNet-21k (14 million images, 21843 classes) at resolution 224x224, and fine-tuned on ...

Read more of this post