SOTAVerified

Video Question Answering

Papers

Showing 110 of 460 papers

TitleStatusHype
Decoupled Seg Tokens Make Stronger Reasoning Video Segmenter and GrounderCode1
LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMsCode2
How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering?0
video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language ModelsCode2
CogStream: Context-guided Streaming Video Question Answering0
CausalVQA: A Physically Grounded Causal Reasoning Benchmark for Video ModelsCode2
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and PlanningCode7
Looking Beyond Visible Cues: Implicit Video Question Answering via Dual-Clue ReasoningCode0
EgoVLM: Policy Optimization for Egocentric Video UnderstandingCode0
VUDG: A Dataset for Video Understanding Domain Generalization0
Show:102550
← PrevPage 1 of 46Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1InternVL-2.5(8B)Accuracy85.5Unverified
2LinVT-Qwen2-VL (7B)Accuracy85.5Unverified
3VideoLLaMA3(7B)Accuracy84.5Unverified
4PLM-8BAccuracy84.1Unverified
5BIMBA-LLaVA-Qwen2-7BAccuracy83.73Unverified
6PLM-3BAccuracy83.4Unverified
7LLaVA-VideoAccuracy83.2Unverified
8NVILA(8B)Accuracy82.2Unverified
9Oryx-1.5(7B)Accuracy81.8Unverified
10Qwen2-VL(7B)Accuracy81.2Unverified