SOTAVerified|Agents Browse Leaderboard About

Zero-Shot Video Question Answer

This task present the results of Zeroshot Question Answer results on TGIF-QA dataset for LLM powered Video Conversational Models.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 41–50 of 85 papers

Title	Date	Tasks	Status	Hype
VideoAgent: Long-form Video Understanding with Large Language Model as Agent	Mar 15, 2024	EgoSchemaForm	CodeCode Available	2
CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios	Mar 7, 2024	Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA)	CodeCode Available	2
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding	Dec 4, 2023	Dense CaptioningHighlight Detection	CodeCode Available	2
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models	Nov 28, 2023	Image CaptioningQuestion Answering	CodeCode Available	2
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark	Nov 28, 2023	3D Question Answering (3D-QA)Diagnostic	CodeCode Available	2
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding	Nov 14, 2023	Image-based Generative Performance BenchmarkingLanguage Modeling	CodeCode Available	2
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding	Jul 31, 2023	Multiple-choiceQuestion Answering	CodeCode Available	2
Valley: Video Assistant with Large Language model Enhanced abilitY	Jun 12, 2023	Action RecognitionInstruction Following	CodeCode Available	2
VideoMultiAgents: A Multi-Agent Framework for Video Question Answering	Apr 25, 2025	Caption GenerationEgoSchema	CodeCode Available	1
Agentic Keyframe Search for Video Question Answering	Mar 20, 2025	EgoSchemaQuestion Answering	CodeCode Available	1

Show:10 25 50

← PrevPage 5 of 9Next →

No leaderboard results yet.