NeurIPS2021 MBT:多模态数据怎么融合?谷歌提出基于注意力瓶颈的方法,简单高效还省计算量
论文链接:https://arxiv.org/abs/2107.00135
项目链接:未开源
01
02
2.1 The ViT and AST architectures
2.2 Multimodal Transformer
2.2.1 Fusion via Vanilla Self-Attention
2.2.2 Fusion with Modality-specific Parameters
2.2.3 Fusion via Attention Bottlenecks
2.3 Where to Fuse: Early, Mid and Late
2.4 Classification
03
3.1. Fusion Strategies
3.2. Input Sampling and Dataset Size
3.3. Results
3.4. Visualisation
04
END
加入「Transformer」交流群👇备注:TFM
赞 (0)