SOTAVerified

Zero-Shot Video Question Answer

This task present the results of Zeroshot Question Answer results on TGIF-QA dataset for LLM powered Video Conversational Models.

Papers

Showing 1120 of 85 papers

TitleStatusHype
TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language ModelsCode1
PPLLaVA: Varied Video Sequence Understanding With Prompt GuidanceCode2
GPT-4o System Card0
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded TuningCode2
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language UnderstandingCode3
Video Instruction Tuning With Synthetic Data0
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any ResolutionCode11
Question-Answering Dense Video EventsCode0
LLaVA-OneVision: Easy Visual Task TransferCode0
MiniCPM-V: A GPT-4V Level MLLM on Your PhoneCode12
Show:102550
← PrevPage 2 of 9Next →

No leaderboard results yet.