SOTAVerified

Zero-Shot Video Question Answer

This task present the results of Zeroshot Question Answer results on TGIF-QA dataset for LLM powered Video Conversational Models.

Papers

Showing 150 of 85 papers

TitleStatusHype
MiniCPM-V: A GPT-4V Level MLLM on Your PhoneCode12
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any ResolutionCode11
InternVideo2: Scaling Foundation Models for Multimodal Video UnderstandingCode7
Qwen2.5-Omni Technical ReportCode7
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal ModelsCode7
Mistral 7BCode6
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction ModelCode5
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMsCode5
Flamingo: a Visual Language Model for Few-Shot LearningCode4
VILA: On Pre-training for Visual Language ModelsCode4
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual TokensCode4
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional TokenizationCode4
mPLUG-Owl: Modularization Empowers Large Language Models with MultimodalityCode4
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video UnderstandingCode4
Video-LLaVA: Learning United Visual Representation by Alignment Before ProjectionCode4
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense CaptioningCode4
Tarsier: Recipes for Training and Evaluating Large Video Description ModelsCode4
VideoChat: Chat-Centric Video UnderstandingCode4
InternVideo: General Video Foundation Models via Generative and Discriminative LearningCode4
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision TokenCode4
Long Context Transfer from Language to VisionCode4
Video-RAG: Visually-aligned Retrieval-Augmented Long Video ComprehensionCode3
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video UnderstandingCode3
Flash-VStream: Memory-Based Real-Time Understanding for Long Video StreamsCode3
VideoMind: A Chain-of-LoRA Agent for Long Video ReasoningCode3
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language UnderstandingCode3
ViperGPT: Visual Inference via Python Execution for ReasoningCode3
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language ModelsCode3
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language ModelsCode3
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of contextCode3
Video ReCap: Recursive Captioning of Hour-Long VideosCode3
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video UnderstandingCode2
VideoAgent: Long-form Video Understanding with Large Language Model as AgentCode2
Elysium: Exploring Object-level Perception in Videos via MLLMCode2
vid-TLDR: Training Free Token merging for Light-weight Video TransformerCode2
LinVT: Empower Your Image-level Large Language Model to Understand VideosCode2
LLaMA-VID: An Image is Worth 2 Tokens in Large Language ModelsCode2
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long VideosCode2
MovieChat: From Dense Token to Sparse Memory for Long Video UnderstandingCode2
MVBench: A Comprehensive Multi-modal Video Understanding BenchmarkCode2
Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMsCode2
PPLLaVA: Varied Video Sequence Understanding With Prompt GuidanceCode2
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLMCode2
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video UnderstandingCode2
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded TuningCode2
Understanding Long Videos with Multimodal Language ModelsCode2
Valley: Video Assistant with Large Language model Enhanced abilitYCode2
CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual ScenariosCode2
Language Repository for Long Video UnderstandingCode1
Shot2Story20K: A New Benchmark for Comprehensive Understanding of Multi-shot VideosCode1
Show:102550
← PrevPage 1 of 2Next →

No leaderboard results yet.