Perceiver IO: A General Architecture for Structured Inputs & Outputs
Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, Joāo Carreira
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/deepmind/deepmind-research/tree/master/perceiverOfficialjax★ 0
- github.com/huggingface/transformerspytorch★ 158,292
- github.com/krasserm/perceiver-iopytorch★ 519
- github.com/SforAiDl/vformerpytorch★ 164
- github.com/esceptico/perceiver-iopytorch★ 128
- github.com/MindSpore-scientific-2/code-7/tree/main/SISDTAmindspore★ 0
- github.com/2796gaurav/code_examples/tree/main/Perceivernone★ 0
- github.com/MindCode-4/code-2/tree/main/perceivermindspore★ 0
- github.com/lucidrains/perceiver-pytorchpytorch★ 0
Abstract
A central goal of machine learning is the development of systems that can solve many problems in as many data domains as possible. Current architectures, however, cannot be applied beyond a small set of stereotyped settings, as they bake in domain & task assumptions or scale poorly to large inputs or outputs. In this work, we propose Perceiver IO, a general-purpose architecture that handles data from arbitrary settings while scaling linearly with the size of inputs and outputs. Our model augments the Perceiver with a flexible querying mechanism that enables outputs of various sizes and semantics, doing away with the need for task-specific architecture engineering. The same architecture achieves strong results on tasks spanning natural language and visual understanding, multi-task and multi-modal reasoning, and StarCraft II. As highlights, Perceiver IO outperforms a Transformer-based BERT baseline on the GLUE language benchmark despite removing input tokenization and achieves state-of-the-art performance on Sintel optical flow estimation with no explicit mechanisms for multiscale correspondence.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| KITTI 2015 | Perceiver IO | Average End-Point Error | 4.98 | — | Unverified |
| Sintel-clean | Perceiver IO | Average End-Point Error | 1.81 | — | Unverified |
| Sintel-final | Perceiver IO | Average End-Point Error | 2.42 | — | Unverified |