Visual Question Answering (VQA)
Visual Question Answering (VQA) is a task in computer vision that involves answering questions about an image. The goal of VQA is to teach machines to understand the content of an image and answer questions about it in natural language.
Image Source: visualqa.org
Papers
Showing 1–10 of 2167 papers
All datasetsGQA Test2019VQA v2 test-devVQA v2 test-stdOK-VQAMSVD-QADocVQA testMSRVTT-QAInfographicVQAGQA test-devVizWiz 2020 VQAA-OKVQACLEVR
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | human | Accuracy | 89.3 | — | Unverified |
| 2 | DREAM+Unicoder-VL (MSRA) | Accuracy | 76.04 | — | Unverified |
| 3 | TRRNet (Ensemble) | Accuracy | 74.03 | — | Unverified |
| 4 | MIL-nbgao | Accuracy | 73.81 | — | Unverified |
| 5 | Kakao Brain | Accuracy | 73.33 | — | Unverified |
| 6 | Coarse-to-Fine Reasoning, Single Model | Accuracy | 72.14 | — | Unverified |
| 7 | 270 | Accuracy | 70.23 | — | Unverified |
| 8 | NSM ensemble (updated) | Accuracy | 67.55 | — | Unverified |
| 9 | VinVL-DPT | Accuracy | 64.92 | — | Unverified |
| 10 | VinVL+L | Accuracy | 64.85 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | PaLI | Accuracy | 84.3 | — | Unverified |
| 2 | BEiT-3 | Accuracy | 84.19 | — | Unverified |
| 3 | VLMo | Accuracy | 82.78 | — | Unverified |
| 4 | ONE-PEACE | Accuracy | 82.6 | — | Unverified |
| 5 | mPLUG (Huge) | Accuracy | 82.43 | — | Unverified |
| 6 | CuMo-7B | Accuracy | 82.2 | — | Unverified |
| 7 | X2-VLM (large) | Accuracy | 81.9 | — | Unverified |
| 8 | MMU | Accuracy | 81.26 | — | Unverified |
| 9 | Lyrics | Accuracy | 81.2 | — | Unverified |
| 10 | InternVL-C | Accuracy | 81.2 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | BEiT-3 | overall | 84.03 | — | Unverified |
| 2 | mPLUG-Huge | overall | 83.62 | — | Unverified |
| 3 | ONE-PEACE | overall | 82.52 | — | Unverified |
| 4 | X2-VLM (large) | overall | 81.8 | — | Unverified |
| 5 | VLMo | overall | 81.3 | — | Unverified |
| 6 | SimVLM | overall | 80.34 | — | Unverified |
| 7 | X2-VLM (base) | overall | 80.2 | — | Unverified |
| 8 | VAST | overall | 80.19 | — | Unverified |
| 9 | VALOR | overall | 78.62 | — | Unverified |
| 10 | Prompt Tuning | overall | 78.53 | — | Unverified |