Graph Inductive Biases in Transformers without Message Passing

2023-05-27Code Available1· sign in to hype

Liheng Ma, Chen Lin, Derek Lim, Adriana Romero-Soriano, Puneet K. Dokania, Mark Coates, Philip Torr, Ser-Nam Lim

Code Available — Be the first to reproduce this paper.

Code

github.com/liamma/grit
OfficialIn paperpytorch★ 128
github.com/linusbao/MoSE
pytorch★ 6

Abstract

Transformers for graph data are increasingly widely studied and successful in numerous learning tasks. Graph inductive biases are crucial for Graph Transformers, and previous works incorporate them using message-passing modules and/or positional encodings. However, Graph Transformers that use message-passing inherit known issues of message-passing, and differ significantly from Transformers used in other domains, thus making transfer of research advances more difficult. On the other hand, Graph Transformers without message-passing often perform poorly on smaller datasets, where inductive biases are more crucial. To bridge this gap, we propose the Graph Inductive bias Transformer (GRIT) -- a new Graph Transformer that incorporates graph inductive biases without using message passing. GRIT is based on several architectural changes that are each theoretically and empirically justified, including: learned relative positional encodings initialized with random walk probabilities, a flexible attention mechanism that updates node and node-pair representations, and injection of degree information in each layer. We prove that GRIT is expressive -- it can express shortest path distances and various graph propagation matrices. GRIT achieves state-of-the-art empirical performance across a variety of graph datasets, thus showing the power that Graph Transformers without message-passing can deliver.

Tasks

Graph Classification Graph Regression Inductive Bias Node Classification

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
CIFAR10 100k	GRIT	Accuracy (%)	76.47	—	Unverified
MNIST	GRIT	Accuracy	98.11	—	Unverified

Graph Inductive Biases in Transformers without Message Passing

Code

Abstract

Tasks

Benchmark Results

Reproductions