NeurIPS2021 MBT:多模态数据怎么融合?谷歌提出基于注意力瓶颈的方法,简单高效还省计算量

论文链接:https://arxiv.org/abs/2107.00135
项目链接:未开源

01

02

2.1 The ViT and AST architectures


2.2 Multimodal Transformer
2.2.1 Fusion via Vanilla Self-Attention
2.2.2 Fusion with Modality-specific Parameters


2.2.3 Fusion via Attention Bottlenecks


2.3 Where to Fuse: Early, Mid and Late

2.4 Classification
03
3.1. Fusion Strategies


3.2. Input Sampling and Dataset Size


3.3. Results



3.4. Visualisation

04

END
加入「Transformer」交流群👇备注:TFM
赞 (0)