SOTAVerified|Agents Browse Leaderboard About Blog

Zero-Shot Video Question Answer

This task present the results of Zeroshot Question Answer results on TGIF-QA dataset for LLM powered Video Conversational Models.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 41–50 of 85 papers

Title	Date	Tasks	Status	Hype
Understanding Long Videos with Multimodal Language Models	Mar 25, 2024	Action RecognitionFine-grained Action Recognition	CodeCode Available	2
Elysium: Exploring Object-level Perception in Videos via MLLM	Mar 25, 2024	ObjectObject Tracking	CodeCode Available	2
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding	Mar 22, 2024	Action ClassificationAction Recognition	CodeCode Available	7
Language Repository for Long Video Understanding	Mar 21, 2024	EgoSchemaQuestion Answering	CodeCode Available	1
vid-TLDR: Training Free Token merging for Light-weight Video Transformer	Mar 20, 2024	Action RecognitionComputational Efficiency	CodeCode Available	2
VideoAgent: Long-form Video Understanding with Large Language Model as Agent	Mar 15, 2024	EgoSchemaForm	CodeCode Available	2
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context	Mar 8, 2024	1 Image, 2*2 StitchingCode Generation	CodeCode Available	3
CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios	Mar 7, 2024	Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA)	CodeCode Available	2
Video ReCap: Recursive Captioning of Hour-Long Videos	Feb 20, 2024	EgoSchemaVideo Captioning	CodeCode Available	3
Question-Instructed Visual Descriptions for Zero-Shot Video Question Answering	Feb 16, 2024	Language ModelingLanguage Modelling	CodeCode Available	0

Show:10 25 50

← PrevPage 5 of 9Next →

No leaderboard results yet.