SOTAVerified

Zero-Shot Video Question Answer

This task present the results of Zeroshot Question Answer results on TGIF-QA dataset for LLM powered Video Conversational Models.

Papers

Showing 1120 of 85 papers

TitleStatusHype
Long Context Transfer from Language to VisionCode4
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense CaptioningCode4
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual TokensCode4
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional TokenizationCode4
VILA: On Pre-training for Visual Language ModelsCode4
Video-LLaVA: Learning United Visual Representation by Alignment Before ProjectionCode4
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video UnderstandingCode4
VideoChat: Chat-Centric Video UnderstandingCode4
mPLUG-Owl: Modularization Empowers Large Language Models with MultimodalityCode4
InternVideo: General Video Foundation Models via Generative and Discriminative LearningCode4
Show:102550
← PrevPage 2 of 9Next →

No leaderboard results yet.