Visual Reasoning
Ability to understand actions and reasoning associated with any visual images
Papers
Showing 1–10 of 698 papers
All datasetsWinogroundNLVR2 DevNLVR2 TestCLEVRERBongard-OpenWorldWinoGAViLVSRPHYRE-1B-CrossPHYRE-1B-WithinVASRIRFL: Image Recognition of Figurative LanguageNLVR
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | GPT-4o + CA | Text Score | 75.5 | — | Unverified |
| 2 | GPT-4V (CoT, pick b/w two options) | Text Score | 75.25 | — | Unverified |
| 3 | GPT-4V (pick b/w two options) | Text Score | 69.25 | — | Unverified |
| 4 | MMICL + CoCoT | Text Score | 64.25 | — | Unverified |
| 5 | GPT-4V + CoCoT | Text Score | 58.5 | — | Unverified |
| 6 | OpenFlamingo + CoCoT | Text Score | 58.25 | — | Unverified |
| 7 | GPT-4V | Text Score | 54.5 | — | Unverified |
| 8 | FIBER (EqSim) | Text Score | 51.5 | — | Unverified |
| 9 | FIBER (finetuned, Flickr30k) | Text Score | 51.25 | — | Unverified |
| 10 | MMICL + CCoT | Text Score | 51 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | BEiT-3 | Accuracy | 91.51 | — | Unverified |
| 2 | X2-VLM (large) | Accuracy | 88.7 | — | Unverified |
| 3 | XFM (base) | Accuracy | 87.6 | — | Unverified |
| 4 | X2-VLM (base) | Accuracy | 86.2 | — | Unverified |
| 5 | CoCa | Accuracy | 86.1 | — | Unverified |
| 6 | VLMo | Accuracy | 85.64 | — | Unverified |
| 7 | VK-OOD | Accuracy | 84.6 | — | Unverified |
| 8 | SimVLM | Accuracy | 84.53 | — | Unverified |
| 9 | X-VLM (base) | Accuracy | 84.41 | — | Unverified |
| 10 | VK-OOD | Accuracy | 83.9 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | BEiT-3 | Accuracy | 92.58 | — | Unverified |
| 2 | X2-VLM (large) | Accuracy | 89.4 | — | Unverified |
| 3 | XFM (base) | Accuracy | 88.4 | — | Unverified |
| 4 | X2-VLM (base) | Accuracy | 87 | — | Unverified |
| 5 | CoCa | Accuracy | 87 | — | Unverified |
| 6 | VLMo | Accuracy | 86.86 | — | Unverified |
| 7 | SimVLM | Accuracy | 85.15 | — | Unverified |
| 8 | X-VLM (base) | Accuracy | 84.76 | — | Unverified |
| 9 | BLIP-129M | Accuracy | 83.09 | — | Unverified |
| 10 | ALBEF (14M) | Accuracy | 82.55 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | AI Core | Average-per ques. | 95.24 | — | Unverified |
| 2 | redherring | Average-per ques. | 91.14 | — | Unverified |
| 3 | VRDP | Average-per ques. | 90.24 | — | Unverified |
| 4 | Fighttttt | Average-per ques. | 88.71 | — | Unverified |
| 5 | neural | Average-per ques. | 88.27 | — | Unverified |
| 6 | NERV | Average-per ques. | 88.05 | — | Unverified |
| 7 | DCL | Average-per ques. | 75.52 | — | Unverified |
| 8 | troublesolver | Average-per ques. | 73.3 | — | Unverified |
| 9 | v0.1 | Average-per ques. | 73.1 | — | Unverified |
| 10 | First_test | Average-per ques. | 69.65 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Gemini-2.0 + CA | 2-Class Accuracy | 93.6 | — | Unverified |
| 2 | GPT-4o + CA | 2-Class Accuracy | 92.8 | — | Unverified |
| 3 | Human | 2-Class Accuracy | 91 | — | Unverified |
| 4 | SNAIL | 2-Class Accuracy | 64 | — | Unverified |
| 5 | InstructBLIP + GPT-4 | 2-Class Accuracy | 63.8 | — | Unverified |
| 6 | BLIP-2 + ChatGPT (Fine-tuned) | 2-Class Accuracy | 63.3 | — | Unverified |
| 7 | InstructBLIP + ChatGPT + Neuro-Symbolic | 2-Class Accuracy | 55.5 | — | Unverified |
| 8 | ChatCaptioner + ChatGPT | 2-Class Accuracy | 49.3 | — | Unverified |
| 9 | Otter | 2-Class Accuracy | 49.3 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Humans | Jaccard Index | 90 | — | Unverified |
| 2 | ViLT (Zero-Shot) | Jaccard Index | 52 | — | Unverified |
| 3 | X-VLM (Zero-Shot) | Jaccard Index | 46 | — | Unverified |
| 4 | CLIP-ViT-B/32 (Zero-Shot) | Jaccard Index | 41 | — | Unverified |
| 5 | CLIP-ViT-L/14 (Zero-Shot) | Jaccard Index | 40 | — | Unverified |
| 6 | CLIP-RN50x64/14 (Zero-Shot) | Jaccard Index | 38 | — | Unverified |
| 7 | CLIP-RN50 (Zero-Shot) | Jaccard Index | 35 | — | Unverified |
| 8 | CLIP-ViL (Zero-Shot) | Jaccard Index | 15 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | LXMERT | accuracy | 70.1 | — | Unverified |
| 2 | ViLT | accuracy | 69.3 | — | Unverified |
| 3 | CLIP (finetuned) | accuracy | 65.1 | — | Unverified |
| 4 | CLIP (frozen) | accuracy | 56 | — | Unverified |
| 5 | VisualBERT | accuracy | 55.2 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | RPIN | AUCCESS | 42.2 | — | Unverified |
| 2 | Dec[Joint]1f | AUCCESS | 40.3 | — | Unverified |
| 3 | Dynamics-Aware DQN | AUCCESS | 39.9 | — | Unverified |
| 4 | DQN | AUCCESS | 36.8 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | RPIN | AUCCESS | 85.2 | — | Unverified |
| 2 | Dynamics-Aware DQN | AUCCESS | 85.2 | — | Unverified |
| 3 | Dec[Joint]1f | AUCCESS | 80 | — | Unverified |
| 4 | DQN | AUCCESS | 77.6 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Humans | 1-of-100 Accuracy | 100 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | VisualBERT | Accuracy (Dev) | 67.4 | — | Unverified |