Zero-Shot Video Question Answer

This task present the results of Zeroshot Question Answer results on TGIF-QA dataset for LLM powered Video Conversational Models.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–25 of 85 papers

Title	Date	Tasks	Status	Hype
VideoMultiAgents: A Multi-Agent Framework for Video Question Answering	Apr 25, 2025	Caption GenerationEgoSchema	CodeCode Available	1
Qwen2.5-Omni Technical Report	Mar 26, 2025	Automatic Speech Recognition (ASR)GSM8K	CodeCode Available	7
Agentic Keyframe Search for Video Question Answering	Mar 20, 2025	EgoSchemaQuestion Answering	CodeCode Available	1
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning	Mar 17, 2025	Grounded Video Question AnsweringQuestion Answering	CodeCode Available	3
BIMBA: Selective-Scan Compression for Long-Range Video Question Answering	Mar 12, 2025	Video Question AnsweringZero-Shot Video Question Answer	CodeCode Available	1
ENTER: Event Based Interpretable Reasoning for VideoQA	Jan 24, 2025	Code GenerationEgoSchema	—Unverified	0
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token	Jan 7, 2025	GPUVisual Question Answering (VQA)	CodeCode Available	4
VidCtx: Context-aware Video Question Answering with Image Models	Dec 23, 2024	Large Language ModelQuestion Answering	CodeCode Available	0
LinVT: Empower Your Image-level Large Language Model to Understand Videos	Dec 6, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension	Nov 20, 2024	GPUMME	CodeCode Available	3
TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models	Nov 17, 2024	MVBenchVideo-based Generative Performance Benchmarking	CodeCode Available	1
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance	Nov 4, 2024	Caption GenerationMultiple-choice	CodeCode Available	2
GPT-4o System Card	Oct 25, 2024	Multiple-choiceSpatial Reasoning	—Unverified	0
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning	Oct 25, 2024	EgoSchemaHallucination	CodeCode Available	2
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding	Oct 22, 2024	Token ReductionVideo Question Answering	CodeCode Available	3
Video Instruction Tuning With Synthetic Data	Oct 3, 2024	3D Question Answering (3D-QA)	—Unverified	0
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution	Sep 18, 2024	Natural Language Visual Grounding	CodeCode Available	11
Question-Answering Dense Video Events	Sep 6, 2024	BenchmarkingQuestion Answering	CodeCode Available	0
LLaVA-OneVision: Easy Visual Task Transfer	Aug 6, 2024	3D Question Answering (3D-QA)	CodeCode Available	0
MiniCPM-V: A GPT-4V Level MLLM on Your Phone	Aug 3, 2024	HallucinationMultiple-choice	CodeCode Available	12
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models	Jul 22, 2024	Language Modeling	CodeCode Available	3
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models	Jul 10, 2024	Video Question AnsweringZero-Shot Video Question Answer	CodeCode Available	7
Tarsier: Recipes for Training and Evaluating Large Video Description Models	Jun 30, 2024	Video CaptioningVideo Description	CodeCode Available	4
Long Context Transfer from Language to Vision	Jun 24, 2024	Language ModelingLanguage Modelling	CodeCode Available	4
GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding	Jun 14, 2024	Activity RecognitionMMR total	—Unverified	0

Show:10 25 50

← PrevPage 1 of 4Next →

No leaderboard results yet.