SOTAVerified

Video Question Answering

Papers

Showing 110 of 460 papers

TitleStatusHype
Decoupled Seg Tokens Make Stronger Reasoning Video Segmenter and GrounderCode1
LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMsCode2
How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering?0
video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language ModelsCode2
CogStream: Context-guided Streaming Video Question Answering0
CausalVQA: A Physically Grounded Causal Reasoning Benchmark for Video ModelsCode2
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and PlanningCode7
Looking Beyond Visible Cues: Implicit Video Question Answering via Dual-Clue ReasoningCode0
EgoVLM: Policy Optimization for Egocentric Video UnderstandingCode0
VUDG: A Dataset for Video Understanding Domain Generalization0
Show:102550
← PrevPage 1 of 46Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Seed1.5-VLAVG60Unverified
2VideoChat-Online (4B)AVG54.9Unverified
3Gemini-1.5-FlashAVG50.7Unverified
4Qwen2-VL (7B)AVG49.7Unverified
5LLaVA-OneVision (7B)AVG49.5Unverified
6InternVL2 (7B)AVG48.7Unverified
7InternVL2 (4B)AVG44.1Unverified
8LongVA (7B)AVG43.6Unverified
9LLaMA-VID (7B)AVG41.9Unverified
10MiniCPM-V 2.6 (7B)AVG39.1Unverified