SOTAVerified

Zero-Shot Video Question Answer

This task present the results of Zeroshot Question Answer results on TGIF-QA dataset for LLM powered Video Conversational Models.

Papers

Showing 3140 of 85 papers

TitleStatusHype
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMsCode5
DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs0
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long VideosCode2
Streaming Long Video Understanding with Large Language Models0
CinePile: A Long Video Question Answering Dataset and Benchmark0
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense CaptioningCode4
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering0
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual TokensCode4
TraveLER: A Modular Multi-LMM Agent Framework for Video Question-AnsweringCode1
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLMCode2
Show:102550
← PrevPage 4 of 9Next →

No leaderboard results yet.