SOTAVerified

Zero-Shot Video Question Answer

This task present the results of Zeroshot Question Answer results on TGIF-QA dataset for LLM powered Video Conversational Models.

Papers

Showing 1120 of 85 papers

TitleStatusHype
Tarsier: Recipes for Training and Evaluating Large Video Description ModelsCode4
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video UnderstandingCode4
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense CaptioningCode4
Flamingo: a Visual Language Model for Few-Shot LearningCode4
InternVideo: General Video Foundation Models via Generative and Discriminative LearningCode4
Long Context Transfer from Language to VisionCode4
mPLUG-Owl: Modularization Empowers Large Language Models with MultimodalityCode4
VideoChat: Chat-Centric Video UnderstandingCode4
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional TokenizationCode4
Video-LLaVA: Learning United Visual Representation by Alignment Before ProjectionCode4
Show:102550
← PrevPage 2 of 9Next →

No leaderboard results yet.