扩展图神经网络:暴力堆叠模型深度并不可取
节点之间的相互依赖关系使我们很难将损失函数分解为各个独立节点的影响。
在本文中,我们介绍了Twitter 研发的一种简单的图神经网络架构,该架构可以在大型图上有效工作。
大多数现有的文献中描述的方法在这样的大规模应用场景下并不适用。
[1] The recently introduced Open Graph Benchmark now offers large-scale graphs with millions of nodes. It will probably take some time for the community to switch to it.
[2] T. Kipf and M. Welling, Semi-supervised classification with graph convolutional networks (2017). Proc. ICLR introduced the popular GCN architecture, which was derived as a simplification of the ChebNet model proposed by M. Defferrard et al. Convolutional neural networks on graphs with fast localized spectral filtering (2016). Proc. NIPS.
[3] As the diffusion operator, Kipf and Welling used the graph adjacency matrix with self-loops (i.e., the node itself contributes to its feature update), but other choices are possible as well. The diffusion operation can be made feature-dependent of the form A(X)X (i.e., it is still a linear combination of the node features, but the weights depend on the features themselves) like in MoNet [4] or GAT [5] models, or completely nonlinear, 𝒜(X), like in message-passing neural networks (MPNN) [6]. For simplicity, we focus the discussion on the GCN model applied to node-wise classification.
[4] F. Monti et al., Geometric Deep Learning on Graphs and Manifolds Using Mixture Model CNNs (2017). In Proc. CVPR.
[5] P. Veličković et al., Graph Attention Networks (2018). In Proc. ICLR.
[6] J. Gilmer et al., Neural message passing for quantum chemistry (2017). In Proc. ICML.
[7] Here we assume for simplicity that the graph is sparse with the number of edges |ℰ|=𝒪(n).
[8] W. Hamilton et al., Inductive Representation Learning on Large Graphs (2017). In Proc. NeurIPS.
[9] The number of neighbours in such graphs tends to grow exponentially with the neighbourhood expansion.
[10] Sampling with replacement means that some neighbour nodes can appear more than once, in particular if the number of neighbours is smaller than k.
[11] W.-L. Chiang et al., Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks (2019). In Proc. KDD.
[12] H. Zeng et al., GraphSAINT: Graph Sampling Based Inductive Learning Method (2020) In Proc. ICLR.
[13] Y. Rong et al. DropEdge: Towards deep graph convolutional networks on node classification (2020). In Proc. ICLR. An idea similar to DropOut where a random subset of edges is used during training.
[14] U. Alon and E. Yahav, On the bottleneck of graph neural networks and its practical implications (2020). arXiv:2006.05205. Identified the over-squashing phenomenon in graph neural networks, which is similar to one observed in sequential recurrent models.
[15] Frasca et al., SIGN: Scalable Inception Graph Neural Networks (2020). ICML workshop on Graph Representation Learning and Beyond.
[16] O. Shchur et al. Pitfalls of graph neural network evaluation (2018). Workshop on Relational Representation Learning. Shows that simple GNN models perform on par with more complex ones.
[17] F. Wu et al., Simplifying graph neural networks (2019). In Proc. ICML.
[18] While we stress that SIGN does not need sampling for computational efficiency, there are other reasons why graph subsampling is useful. J. Klicpera et al. Diffusion improves graph learning (2020). Proc. NeurIPS show that sampled diffusion matrices improve performance of graph neural networks. We observed the same phenomenon in early SIGN experiments.
[19] G. Bouritsas et al. Improving graph neural network expressivity via subgraph isomorphism counting (2020). arXiv:2006.09252. Shows how provably powerful GNNs can be obtained by structural node encoding.
[20] F. Monti, K. Otness, M. M. Bronstein, MotifNet: a motif-based graph convolutional network for directed graphs (2018). arXiv:1802.01572. Uses motif-based diffusion operators.
[21] C. Szegedi et al., Going deeper with convolution (2015). Proc. CVPR proposed the inception module in the already classical Google LeNet architecture. To be fair, we were not the first to think of graph inception modules. Our collaborator Anees Kazi from TU Munich, who was a visiting student at Imperial College last year, introduced them first.
[22] Note that reaching higher-order neighbours is normally achieved by depth-wise stacking graph convolutional layers operating with direct neighbours; in our architecture this is directly achieved in the first layer by powers of graph operators.
来源:AI科技评论,以上文章观点仅代表文章作者,仅供参考,以抛砖引玉!