NAS+CNN+Transformer=ViT-Res!MIT团队重磅开源ViT-Res,精度高于DeiT-Ti8.6%
详细信息如下:
论文链接:https://arxiv.org/abs/2109.00642
项目链接:https://github.com/yilunliao/vit-search
01
02
2.1 Background on Vision Transformer
Tokenization
Position Embedding
MHSA
FFN
LN
2.2 Residual Spatial Reduction
2.3 Weight-Sharing NAS with Multi-Architectural Sampling
Algorithm Overview
Search Space
Multi-Architectural Sampling for Super-Network Training
Evolutionary Search
2.4 Extra Techniques
Token Labeling with CutMix and Mixup
Convolution before Tokenization
03
3.1 Ablation Study
Multi-Stage Network with Residual Connection and Token Labeling
Weight-Sharing NAS with Multi-Architectural Sampling
3.2 Comparison with Related Works
04
作者介绍
研究领域:FightingCV公众号运营者,研究方向为多模态内容理解,专注于解决视觉模态和语言模态相结合的任务,促进Vision-Language模型的实地应用。
知乎/公众号:FightingCV
END,入群👇备注:NAS
赞 (0)