Pix2seq: A Language Modeling Framework for Object Detection

2021-09-22ICLR 2022Code Available1· sign in to hype

Ting Chen, Saurabh Saxena, Lala Li, David J. Fleet, Geoffrey Hinton

Code Available — Be the first to reproduce this paper.

Code

github.com/gaopengcuhk/Stable-Pix2Seq
pytorch★ 238
github.com/gaopengcuhk/Unofficial-Pix2Seq
pytorch★ 163
github.com/moein-shariatnia/Pix2Seq
pytorch★ 130
github.com/gaopengcuhk/Pretrained-Pix2Seq
pytorch★ 59
github.com/volgachen/Pix2Seq_Pytorch
pytorch★ 5

Abstract

We present Pix2Seq, a simple and generic framework for object detection. Unlike existing approaches that explicitly integrate prior knowledge about the task, we cast object detection as a language modeling task conditioned on the observed pixel inputs. Object descriptions (e.g., bounding boxes and class labels) are expressed as sequences of discrete tokens, and we train a neural network to perceive the image and generate the desired sequence. Our approach is based mainly on the intuition that if a neural network knows about where and what the objects are, we just need to teach it how to read them out. Beyond the use of task-specific data augmentations, our approach makes minimal assumptions about the task, yet it achieves competitive results on the challenging COCO dataset, compared to highly specialized and well optimized detection algorithms.

Tasks

Language Modeling Language Modelling Object object-detection Object Detection

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
COCO minival	Pix2seq (ViT-L)	box AP	50	—	Unverified
COCO minival	Pix2seq (R50-C4)	box AP	47.3	—	Unverified
COCO minival	Pix2seq (ViT-B)	box AP	47.1	—	Unverified
COCO minival	Pix2seq (R101-DC5)	box AP	45	—	Unverified
COCO minival	Pix2seq (R50-DC5 )	box AP	43.2	—	Unverified
COCO minival	Pix2seq (R50)	box AP	42.6	—	Unverified

Pix2seq: A Language Modeling Framework for Object Detection

Code

Abstract

Tasks

Benchmark Results

Reproductions