SOTAVerified

Zero-Shot Video Question Answer

This task present the results of Zeroshot Question Answer results on TGIF-QA dataset for LLM powered Video Conversational Models.

Papers

Showing 2130 of 85 papers

TitleStatusHype
Flamingo: a Visual Language Model for Few-Shot LearningCode4
VideoMind: A Chain-of-LoRA Agent for Long Video ReasoningCode3
Video-RAG: Visually-aligned Retrieval-Augmented Long Video ComprehensionCode3
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language UnderstandingCode3
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language ModelsCode3
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video UnderstandingCode3
Flash-VStream: Memory-Based Real-Time Understanding for Long Video StreamsCode3
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of contextCode3
Video ReCap: Recursive Captioning of Hour-Long VideosCode3
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language ModelsCode3
Show:102550
← PrevPage 3 of 9Next →

No leaderboard results yet.