Zero-Shot Video Question Answer

This task present the results of Zeroshot Question Answer results on TGIF-QA dataset for LLM powered Video Conversational Models.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 26–50 of 85 papers

Title	Date	Tasks	Status	Hype	Score
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding	Oct 22, 2024	Token ReductionVideo Question Answering	CodeCode Available	3	5
ViperGPT: Visual Inference via Python Execution for Reasoning	Mar 14, 2023	Code GenerationVideo Question Answering	CodeCode Available	3	5
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models	Jul 22, 2024	Language Modeling	CodeCode Available	3	5
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models	Jun 8, 2023	Question AnsweringVCGBench-Diverse	CodeCode Available	3	5
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context	Mar 8, 2024	1 Image, 2*2 StitchingCode Generation	CodeCode Available	3	5
Video ReCap: Recursive Captioning of Hour-Long Videos	Feb 20, 2024	EgoSchemaVideo Captioning	CodeCode Available	3	5
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding	Nov 14, 2023	Image-based Generative Performance BenchmarkingLanguage Modeling	CodeCode Available	2	5
VideoAgent: Long-form Video Understanding with Large Language Model as Agent	Mar 15, 2024	EgoSchemaForm	CodeCode Available	2	5
Elysium: Exploring Object-level Perception in Videos via MLLM	Mar 25, 2024	ObjectObject Tracking	CodeCode Available	2	5
vid-TLDR: Training Free Token merging for Light-weight Video Transformer	Mar 20, 2024	Action RecognitionComputational Efficiency	CodeCode Available	2	5
LinVT: Empower Your Image-level Large Language Model to Understand Videos	Dec 6, 2024	Language ModelingLanguage Modelling	CodeCode Available	2	5
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models	Nov 28, 2023	Image CaptioningQuestion Answering	CodeCode Available	2	5
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos	May 29, 2024	EgoSchemaMME	CodeCode Available	2	5
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding	Jul 31, 2023	Multiple-choiceQuestion Answering	CodeCode Available	2	5
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark	Nov 28, 2023	3D Question Answering (3D-QA)Diagnostic	CodeCode Available	2	5
Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs	Jun 13, 2024	BenchmarkingQuestion Answering	CodeCode Available	2	5
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance	Nov 4, 2024	Caption GenerationMultiple-choice	CodeCode Available	2	5
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM	Mar 27, 2024	Language ModelingLanguage Modelling	CodeCode Available	2	5
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding	Dec 4, 2023	Dense CaptioningHighlight Detection	CodeCode Available	2	5
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning	Oct 25, 2024	EgoSchemaHallucination	CodeCode Available	2	5
Understanding Long Videos with Multimodal Language Models	Mar 25, 2024	Action RecognitionFine-grained Action Recognition	CodeCode Available	2	5
Valley: Video Assistant with Large Language model Enhanced abilitY	Jun 12, 2023	Action RecognitionInstruction Following	CodeCode Available	2	5
CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios	Mar 7, 2024	Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA)	CodeCode Available	2	5
Language Repository for Long Video Understanding	Mar 21, 2024	EgoSchemaQuestion Answering	CodeCode Available	1	5
Shot2Story20K: A New Benchmark for Comprehensive Understanding of Multi-shot Videos	Dec 16, 2023	Video Captioningvideo narration captioning	CodeCode Available	1	5

Show:10 25 50

← PrevPage 2 of 4Next →

No leaderboard results yet.