SOTAVerified|Agents Browse Leaderboard About Blog

Zero-Shot Video Question Answer

This task present the results of Zeroshot Question Answer results on TGIF-QA dataset for LLM powered Video Conversational Models.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 26–50 of 85 papers

Title	Date	Tasks	Status	Hype
GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding	Jun 14, 2024	Activity RecognitionMMR total	—Unverified	0
Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QA	Jun 13, 2024	AllEgoSchema	CodeCode Available	1
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding	Jun 13, 2024	Dense Video CaptioningMVBench	CodeCode Available	3
Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs	Jun 13, 2024	BenchmarkingQuestion Answering	CodeCode Available	2
Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams	Jun 12, 2024	cross-modal alignmentLanguage Modelling	CodeCode Available	3
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs	Jun 11, 2024	Multiple-choiceQuestion Answering	CodeCode Available	5
DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs	Jun 6, 2024	Language ModellingLarge Language Model	—Unverified	0
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos	May 29, 2024	EgoSchemaMME	CodeCode Available	2
Streaming Long Video Understanding with Large Language Models	May 25, 2024	Question AnsweringVideo Understanding	—Unverified	0
CinePile: A Long Video Question Answering Dataset and Benchmark	May 14, 2024	FormHuman-Object Interaction Detection	—Unverified	0
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning	Apr 25, 2024	Dense CaptioningMVBench	CodeCode Available	4
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering	Apr 9, 2024	EgoSchemaMultiple-choice	—Unverified	0
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens	Apr 4, 2024	Language ModelingLanguage Modelling	CodeCode Available	4
TraveLER: A Modular Multi-LMM Agent Framework for Video Question-Answering	Apr 1, 2024	Question AnsweringVideo Question Answering	CodeCode Available	1
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM	Mar 27, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
Understanding Long Videos with Multimodal Language Models	Mar 25, 2024	Action RecognitionFine-grained Action Recognition	CodeCode Available	2
Elysium: Exploring Object-level Perception in Videos via MLLM	Mar 25, 2024	ObjectObject Tracking	CodeCode Available	2
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding	Mar 22, 2024	Action ClassificationAction Recognition	CodeCode Available	7
Language Repository for Long Video Understanding	Mar 21, 2024	EgoSchemaQuestion Answering	CodeCode Available	1
vid-TLDR: Training Free Token merging for Light-weight Video Transformer	Mar 20, 2024	Action RecognitionComputational Efficiency	CodeCode Available	2
VideoAgent: Long-form Video Understanding with Large Language Model as Agent	Mar 15, 2024	EgoSchemaForm	CodeCode Available	2
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context	Mar 8, 2024	1 Image, 2*2 StitchingCode Generation	CodeCode Available	3
CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios	Mar 7, 2024	Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA)	CodeCode Available	2
Video ReCap: Recursive Captioning of Hour-Long Videos	Feb 20, 2024	EgoSchemaVideo Captioning	CodeCode Available	3
Question-Instructed Visual Descriptions for Zero-Shot Video Question Answering	Feb 16, 2024	Language ModelingLanguage Modelling	CodeCode Available	0

Show:10 25 50

← PrevPage 2 of 4Next →

No leaderboard results yet.