SOTAVerified

Zero-Shot Video Question Answer

This task present the results of Zeroshot Question Answer results on TGIF-QA dataset for LLM powered Video Conversational Models.

Papers

Showing 6170 of 85 papers

TitleStatusHype
Video-LLaVA: Learning United Visual Representation by Alignment Before ProjectionCode4
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video UnderstandingCode2
Mistral 7BCode6
BT-Adapter: Video Conversation is Feasible Without Video Instruction TuningCode1
Zero-Shot and Few-Shot Video Question Answering with Multi-Modal PromptsCode1
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language UnderstandingCode1
OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data GenerationCode1
MovieChat: From Dense Token to Sparse Memory for Long Video UnderstandingCode2
Valley: Video Assistant with Large Language model Enhanced abilitYCode2
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language ModelsCode3
Show:102550
← PrevPage 7 of 9Next →

No leaderboard results yet.