SOTAVerified

Zero-Shot Video Question Answer

This task present the results of Zeroshot Question Answer results on TGIF-QA dataset for LLM powered Video Conversational Models.

Papers

Showing 2130 of 85 papers

TitleStatusHype
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language ModelsCode3
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal ModelsCode7
Tarsier: Recipes for Training and Evaluating Large Video Description ModelsCode4
Long Context Transfer from Language to VisionCode4
GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding0
Long Story Short: Story-level Video Understanding from 20K Short Films0
Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMsCode2
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video UnderstandingCode3
Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QACode1
Flash-VStream: Memory-Based Real-Time Understanding for Long Video StreamsCode3
Show:102550
← PrevPage 3 of 9Next →

No leaderboard results yet.