SOTAVerified

Zero-Shot Video Question Answer

This task present the results of Zeroshot Question Answer results on TGIF-QA dataset for LLM powered Video Conversational Models.

Papers

Showing 5160 of 85 papers

TitleStatusHype
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional TokenizationCode4
A Simple LLM Framework for Long-Range Video Question-AnsweringCode1
Shot2Story20K: A New Benchmark for Comprehensive Understanding of Multi-shot VideosCode1
Vista-LLaMA: Reliable Video Narrator via Equal Distance to Visual Tokens0
VILA: On Pre-training for Visual Language ModelsCode4
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video UnderstandingCode2
Zero-Shot Video Question Answering with Procedural Programs0
MVBench: A Comprehensive Multi-modal Video Understanding BenchmarkCode2
LLaMA-VID: An Image is Worth 2 Tokens in Large Language ModelsCode2
Vamos: Versatile Action Models for Video UnderstandingCode0
Show:102550
← PrevPage 6 of 9Next →

No leaderboard results yet.