SOTAVerified

Zero-Shot Video Question Answer

This task present the results of Zeroshot Question Answer results on TGIF-QA dataset for LLM powered Video Conversational Models.

Papers

Showing 125 of 85 papers

TitleStatusHype
VideoMultiAgents: A Multi-Agent Framework for Video Question AnsweringCode1
Qwen2.5-Omni Technical ReportCode7
Agentic Keyframe Search for Video Question AnsweringCode1
VideoMind: A Chain-of-LoRA Agent for Long Video ReasoningCode3
BIMBA: Selective-Scan Compression for Long-Range Video Question AnsweringCode1
ENTER: Event Based Interpretable Reasoning for VideoQA0
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision TokenCode4
VidCtx: Context-aware Video Question Answering with Image ModelsCode0
LinVT: Empower Your Image-level Large Language Model to Understand VideosCode2
Video-RAG: Visually-aligned Retrieval-Augmented Long Video ComprehensionCode3
TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language ModelsCode1
PPLLaVA: Varied Video Sequence Understanding With Prompt GuidanceCode2
GPT-4o System Card0
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded TuningCode2
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language UnderstandingCode3
Video Instruction Tuning With Synthetic Data0
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any ResolutionCode11
Question-Answering Dense Video EventsCode0
LLaVA-OneVision: Easy Visual Task TransferCode0
MiniCPM-V: A GPT-4V Level MLLM on Your PhoneCode12
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language ModelsCode3
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal ModelsCode7
Tarsier: Recipes for Training and Evaluating Large Video Description ModelsCode4
Long Context Transfer from Language to VisionCode4
GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding0
Show:102550
← PrevPage 1 of 4Next →

No leaderboard results yet.