CVPR 2021 比CNN和Transformer更好的Backbone?伯克利&谷歌提出BoTNet,精度达84.7%
详细信息如下:
论文链接:https://arxiv.org/abs/2101.11605
项目链接:https://github.com/lucidrains/bottleneck-transformer-pytorch
01
02
2.1. Connection to the Transformer
Normalization:Transformer使用的是Layer Normalization,而BoTNet采用ResNet中常见的Batch Normalization。 Non-Linearities:Transformer中通常是在FFN中采用非线性激活函数,BoTNet和ResNet一样,在每个Block中采用了三次激活函数。 Output projections:MHSA层中Self-Attention操作结束后包含一个线性投影层,而BoTNet中Self-Attention结束后不包含线性投影层。 Optimizer:Transformer中通常采用Adam优化器,而BoTNet和ResNet一样,采用带动量的SGD优化器。
2.2. Connection to DETR
2.3. Connection to Non-Local Neural Nets
MHSA层使用多个head、value投影和位置编码; NL块将使用通道拓展参数降低为2,BOT块中依旧为4; NL块作为附加块插入到ResNet主干网络中,而不是像BotNet那样替换现有卷积块。
03
04
4.1. BoTNet improves over ResNet on COCO Instance Segmentation with Mask R-CNN
4.2. Scale Jitter helps BoTNet more than ResNet
4.3. Relative Position Encodings Boost Performance
4.4. BoTNet improves backbones in ResNet Family
4.5. BoTNet scales well with larger images
4.6. Comparison with Non-Local Neural Networks
4.7. Image Classification on ImageNet
05
END
加入「Transformer」交流群👇备注:TFM
赞 (0)