Video Question Answering
Papers
Showing 1–10 of 460 papers
All datasetsNExT-QAActivityNet-QATVBenchMVBenchMSRVTT-QASTAR BenchmarkOVBenchAGQA 2.0 balancedHow2QAiVQAMSRVTT-MCIntentQA
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | LinVT-Qwen2-VL (7B) | Accuracy | 85.5 | — | Unverified |
| 2 | InternVL-2.5(8B) | Accuracy | 85.5 | — | Unverified |
| 3 | VideoLLaMA3(7B) | Accuracy | 84.5 | — | Unverified |
| 4 | PLM-8B | Accuracy | 84.1 | — | Unverified |
| 5 | BIMBA-LLaVA-Qwen2-7B | Accuracy | 83.73 | — | Unverified |
| 6 | PLM-3B | Accuracy | 83.4 | — | Unverified |
| 7 | LLaVA-Video | Accuracy | 83.2 | — | Unverified |
| 8 | NVILA(8B) | Accuracy | 82.2 | — | Unverified |
| 9 | Oryx-1.5(7B) | Accuracy | 81.8 | — | Unverified |
| 10 | Qwen2-VL(7B) | Accuracy | 81.2 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | GPT-2 + CLIP-14 + CLIP-multilingual (Zero-Shot) | Accuracy | 61.2 | — | Unverified |
| 2 | GPT-2 + CLIP-32 (Zero-Shot) | Accuracy | 58.4 | — | Unverified |
| 3 | VideoCoCa | Accuracy | 56.1 | — | Unverified |
| 4 | Mirasol3B | Accuracy | 51.13 | — | Unverified |
| 5 | VAST | Accuracy | 50.4 | — | Unverified |
| 6 | COSA | Accuracy | 49.9 | — | Unverified |
| 7 | MA-LMM | Accuracy | 49.8 | — | Unverified |
| 8 | VideoChat2 | Accuracy | 49.1 | — | Unverified |
| 9 | VALOR | Accuracy | 48.6 | — | Unverified |
| 10 | UMT-L (ViT-L/16) | Accuracy | 47.9 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Seed1.5-VL thinking | Average Accuracy | 63.6 | — | Unverified |
| 2 | PLM-8B | Average Accuracy | 63.5 | — | Unverified |
| 3 | Seed1.5-VL | Average Accuracy | 61.5 | — | Unverified |
| 4 | V-JEPA 2 ViT-g 8B | Average Accuracy | 60.6 | — | Unverified |
| 5 | PLM-3B | Average Accuracy | 58.9 | — | Unverified |
| 6 | RRPO | Average Accuracy | 56.5 | — | Unverified |
| 7 | Tarsier-34B | Average Accuracy | 55.5 | — | Unverified |
| 8 | Tarsier2-7B | Average Accuracy | 54.7 | — | Unverified |
| 9 | Qwen2-VL-72B | Average Accuracy | 52.7 | — | Unverified |
| 10 | IXC-2.5 7B | Average Accuracy | 51.6 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | LinVT-Qwen2-VL (7B) | Avg. | 69.3 | — | Unverified |
| 2 | Tarsier (34B) | Avg. | 67.6 | — | Unverified |
| 3 | InternVideo2 | Avg. | 67.2 | — | Unverified |
| 4 | LongVU (7B) | Avg. | 66.9 | — | Unverified |
| 5 | Oryx(34B) | Avg. | 64.7 | — | Unverified |
| 6 | VideoLLaMA2 (72B) | Avg. | 62 | — | Unverified |
| 7 | VideoChat-T (7B) | Avg. | 59.9 | — | Unverified |
| 8 | mPLUG-Owl3(7B) | Avg. | 59.5 | — | Unverified |
| 9 | PPLLaVA (7b) | Avg. | 59.2 | — | Unverified |
| 10 | VideoGPT+ | Avg. | 58.7 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Mirasol3B | Accuracy | 50.42 | — | Unverified |
| 2 | VAST | Accuracy | 50.1 | — | Unverified |
| 3 | COSA | Accuracy | 49.2 | — | Unverified |
| 4 | VALOR | Accuracy | 49.2 | — | Unverified |
| 5 | MA-LMM | Accuracy | 48.5 | — | Unverified |
| 6 | mPLUG-2 | Accuracy | 48 | — | Unverified |
| 7 | FrozenBiLM | Accuracy | 47 | — | Unverified |
| 8 | HBI | Accuracy | 46.2 | — | Unverified |
| 9 | EMCL-Net | Accuracy | 45.8 | — | Unverified |
| 10 | VindLU | Accuracy | 44.6 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | VLAP (4 frames) | Average Accuracy | 67.1 | — | Unverified |
| 2 | LLaMA-VQA | Average Accuracy | 65.4 | — | Unverified |
| 3 | SeViLA | Average Accuracy | 64.9 | — | Unverified |
| 4 | InternVideo | Average Accuracy | 58.7 | — | Unverified |
| 5 | GF(sup) | Average Accuracy | 53.94 | — | Unverified |
| 6 | GF(uns) | Average Accuracy | 53.86 | — | Unverified |
| 7 | MIST | Average Accuracy | 51.13 | — | Unverified |
| 8 | Temp[ATP] | Average Accuracy | 48.37 | — | Unverified |
| 9 | AnyMAL-70B (0-shot) | Average Accuracy | 48.2 | — | Unverified |
| 10 | All-in-one | Average Accuracy | 47.5 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Seed1.5-VL | AVG | 60 | — | Unverified |
| 2 | VideoChat-Online (4B) | AVG | 54.9 | — | Unverified |
| 3 | Gemini-1.5-Flash | AVG | 50.7 | — | Unverified |
| 4 | Qwen2-VL (7B) | AVG | 49.7 | — | Unverified |
| 5 | LLaVA-OneVision (7B) | AVG | 49.5 | — | Unverified |
| 6 | InternVL2 (7B) | AVG | 48.7 | — | Unverified |
| 7 | InternVL2 (4B) | AVG | 44.1 | — | Unverified |
| 8 | LongVA (7B) | AVG | 43.6 | — | Unverified |
| 9 | LLaMA-VID (7B) | AVG | 41.9 | — | Unverified |
| 10 | MiniCPM-V 2.6 (7B) | AVG | 39.1 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | GF (sup) - Faster RCNN | Average Accuracy | 55.08 | — | Unverified |
| 2 | MIST - CLIP | Average Accuracy | 54.39 | — | Unverified |
| 3 | GF (uns) - S3D | Average Accuracy | 53.33 | — | Unverified |
| 4 | SViTT | Average Accuracy | 52.7 | — | Unverified |
| 5 | MIST - AIO | Average Accuracy | 50.96 | — | Unverified |
| 6 | SHG-VQA (trained from scratch) | Average Accuracy | 49.2 | — | Unverified |
| 7 | AIO - ViT | Average Accuracy | 48.59 | — | Unverified |
| 8 | MMTF | Average Accuracy | 44.36 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Text + Text (no Multimodal Pretext Training) | Accuracy | 93.2 | — | Unverified |
| 2 | FrozenBiLM | Accuracy | 86.7 | — | Unverified |
| 3 | Just Ask | Accuracy | 84.4 | — | Unverified |
| 4 | SeViLA | Accuracy | 83.7 | — | Unverified |
| 5 | Hero w/ pre-training | Accuracy | 77.75 | — | Unverified |
| 6 | ATP | Accuracy | 65.1 | — | Unverified |
| 7 | FrozenBiLM (0-shot) | Accuracy | 58.4 | — | Unverified |
| 8 | Just Ask (0-shot) | Accuracy | 51.1 | — | Unverified |