SOTAVerified

Zero-Shot Video Question Answer

This task present the results of Zeroshot Question Answer results on TGIF-QA dataset for LLM powered Video Conversational Models.

Papers

Showing 110 of 85 papers

TitleStatusHype
MiniCPM-V: A GPT-4V Level MLLM on Your PhoneCode12
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any ResolutionCode11
InternVideo2: Scaling Foundation Models for Multimodal Video UnderstandingCode7
Qwen2.5-Omni Technical ReportCode7
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal ModelsCode7
Mistral 7BCode6
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction ModelCode5
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMsCode5
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual TokensCode4
Flamingo: a Visual Language Model for Few-Shot LearningCode4
Show:102550
← PrevPage 1 of 9Next →

No leaderboard results yet.