SOTAVerified

Zero-Shot Video Question Answer

This task present the results of Zeroshot Question Answer results on TGIF-QA dataset for LLM powered Video Conversational Models.

Papers

Showing 3140 of 85 papers

TitleStatusHype
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language ModelsCode3
Elysium: Exploring Object-level Perception in Videos via MLLMCode2
LLaMA-VID: An Image is Worth 2 Tokens in Large Language ModelsCode2
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video UnderstandingCode2
LinVT: Empower Your Image-level Large Language Model to Understand VideosCode2
Valley: Video Assistant with Large Language model Enhanced abilitYCode2
VideoAgent: Long-form Video Understanding with Large Language Model as AgentCode2
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLMCode2
Understanding Long Videos with Multimodal Language ModelsCode2
CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual ScenariosCode2
Show:102550
← PrevPage 4 of 9Next →

No leaderboard results yet.