ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models

2022-04-19Code Available4· sign in to hype

Chunyuan Li, Haotian Liu, Liunian Harold Li, Pengchuan Zhang, Jyoti Aneja, Jianwei Yang, Ping Jin, Houdong Hu, Zicheng Liu, Yong Jae Lee, Jianfeng Gao

arXiv PDF

Code Available — Be the first to reproduce this paper.

Reproduce

Code

github.com/computer-vision-in-the-wild/cvinw_readings
OfficialIn papernone★ 1,363
github.com/Computer-Vision-in-the-Wild/Elevater_Toolkit_IC
OfficialIn paperpytorch★ 77
github.com/microsoft/GLIP
pytorch★ 2,585
github.com/microsoft/unicl
pytorch★ 408
github.com/eric-ai-lab/pevit
pytorch★ 106
github.com/sincerass/mvlpt
pytorch★ 55
github.com/microsoft/klite
pytorch★ 53
github.com/rsCPSyEu/ovd_cod
pytorch★ 2

Abstract

Learning visual representations from natural language supervision has recently shown great promise in a number of pioneering works. In general, these language-augmented visual models demonstrate strong transferability to a variety of datasets and tasks. However, it remains challenging to evaluate the transferablity of these models due to the lack of easy-to-use evaluation toolkits and public benchmarks. To tackle this, we build ELEVATER (Evaluation of Language-augmented Visual Task-level Transfer), the first benchmark and toolkit for evaluating(pre-trained) language-augmented visual models. ELEVATER is composed of three components. (i) Datasets. As downstream evaluation suites, it consists of 20 image classification datasets and 35 object detection datasets, each of which is augmented with external knowledge. (ii) Toolkit. An automatic hyper-parameter tuning toolkit is developed to facilitate model evaluation on downstream tasks. (iii) Metrics. A variety of evaluation metrics are used to measure sample-efficiency (zero-shot and few-shot) and parameter-efficiency (linear probing and full model fine-tuning). ELEVATER is a platform for Computer Vision in the Wild (CVinW), and is publicly released at at https://computer-vision-in-the-wild.github.io/ELEVATER/

Tasks

Fairness Few-Shot Image Classification Few-Shot Object Detection image-classification Image Classification object-detection Object Detection Zero-Shot Image Classification Zero-Shot Object Detection

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
ELEVATER	GLIP-T	AP	62.6	—	Unverified
ODinW Full-shot 35 Tasks	GLIP-T	AP	62.6	—	Unverified

ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models

Code

Abstract

Tasks

Benchmark Results

Reproductions