SOTAVerified

Video Question Answering

Papers

Showing 110 of 460 papers

TitleStatusHype
Decoupled Seg Tokens Make Stronger Reasoning Video Segmenter and GrounderCode1
LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMsCode2
How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering?0
video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language ModelsCode2
CogStream: Context-guided Streaming Video Question Answering0
CausalVQA: A Physically Grounded Causal Reasoning Benchmark for Video ModelsCode2
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and PlanningCode7
Looking Beyond Visible Cues: Implicit Video Question Answering via Dual-Clue ReasoningCode0
EgoVLM: Policy Optimization for Egocentric Video UnderstandingCode0
VUDG: A Dataset for Video Understanding Domain Generalization0
Show:102550
← PrevPage 1 of 46Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1VideoChat2_HD_mistralAccuarcy83.4Unverified
2VideoChat2_mistralAccuarcy81.9Unverified
3HumanAccuarcy78.5Unverified
4IntentQAAccuarcy57.6Unverified
5VGTAccuarcy51.3Unverified
6HQGAAccuarcy47.7Unverified