SOTAVerified

Zero-Shot Video Question Answer

This task present the results of Zeroshot Question Answer results on TGIF-QA dataset for LLM powered Video Conversational Models.

Papers

Showing 5185 of 85 papers

TitleStatusHype
BIMBA: Selective-Scan Compression for Long-Range Video Question AnsweringCode1
TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language ModelsCode1
Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QACode1
TraveLER: A Modular Multi-LMM Agent Framework for Video Question-AnsweringCode1
Language Repository for Long Video UnderstandingCode1
A Simple LLM Framework for Long-Range Video Question-AnsweringCode1
Shot2Story20K: A New Benchmark for Comprehensive Understanding of Multi-shot VideosCode1
BT-Adapter: Video Conversation is Feasible Without Video Instruction TuningCode1
Zero-Shot and Few-Shot Video Question Answering with Multi-Modal PromptsCode1
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language UnderstandingCode1
OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data GenerationCode1
Self-Chained Image-Language Model for Video Localization and Question AnsweringCode1
Zero-Shot Video Question Answering via Frozen Bidirectional Language ModelsCode1
ENTER: Event Based Interpretable Reasoning for VideoQA0
VidCtx: Context-aware Video Question Answering with Image ModelsCode0
GPT-4o System Card0
Video Instruction Tuning With Synthetic Data0
Question-Answering Dense Video EventsCode0
LLaVA-OneVision: Easy Visual Task TransferCode0
GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding0
Long Story Short: Story-level Video Understanding from 20K Short Films0
DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs0
Streaming Long Video Understanding with Large Language Models0
CinePile: A Long Video Question Answering Dataset and Benchmark0
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering0
Question-Instructed Visual Descriptions for Zero-Shot Video Question AnsweringCode0
Vista-LLaMA: Reliable Video Narrator via Equal Distance to Visual Tokens0
Zero-Shot Video Question Answering with Procedural Programs0
Vamos: Versatile Action Models for Video UnderstandingCode0
Verbs in Action: Improving verb understanding in video-language modelsCode0
VIPeR: Provably Efficient Algorithm for Offline RL with Neural Function ApproximationCode0
0/1 Deep Neural Networks via Block Coordinate Descent0
MVB: A Large-Scale Dataset for Baggage Re-Identification and Merged Siamese NetworksCode0
ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question AnsweringCode0
TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question AnsweringCode0
Show:102550
← PrevPage 2 of 2Next →

No leaderboard results yet.