SOTAVerified

Zero-Shot Video Question Answer

This task present the results of Zeroshot Question Answer results on TGIF-QA dataset for LLM powered Video Conversational Models.

Papers

Showing 4150 of 85 papers

TitleStatusHype
Understanding Long Videos with Multimodal Language ModelsCode2
Elysium: Exploring Object-level Perception in Videos via MLLMCode2
InternVideo2: Scaling Foundation Models for Multimodal Video UnderstandingCode7
Language Repository for Long Video UnderstandingCode1
vid-TLDR: Training Free Token merging for Light-weight Video TransformerCode2
VideoAgent: Long-form Video Understanding with Large Language Model as AgentCode2
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of contextCode3
CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual ScenariosCode2
Video ReCap: Recursive Captioning of Hour-Long VideosCode3
Question-Instructed Visual Descriptions for Zero-Shot Video Question AnsweringCode0
Show:102550
← PrevPage 5 of 9Next →

No leaderboard results yet.