SOTAVerified

EgoSchema

Papers

Showing 140 of 40 papers

TitleStatusHype
Flash-VStream: Efficient Real-Time Understanding for Long Video StreamsCode3
EgoVLM: Policy Optimization for Egocentric Video UnderstandingCode0
Four Eyes Are Better Than Two: Harnessing the Collaborative Potential of Large Models via Differentiated Thinking and Complementary Ensembles0
RAVU: Retrieval Augmented Video Understanding with Compositional Reasoning over Graph0
VideoMultiAgents: A Multi-Agent Framework for Video Question AnsweringCode1
Mobile-VideoGPT: Fast and Accurate Video Understanding Language ModelCode2
LLaVAction: evaluating and training multi-modal large language models for action recognitionCode2
Agentic Keyframe Search for Video Question AnsweringCode1
Keyframe-oriented Vision Token Pruning: Enhancing Efficiency of Large Vision Language Models on Long-Form Video ProcessingCode0
VLog: Video-Language Models by Generative Retrieval of Narration VocabularyCode4
M-LLM Based Video Frame Selection for Efficient Video Understanding0
MaxInfo: A Training-Free Key-Frame Selection Method Using Maximum Volume for Enhanced Video Understanding0
Understanding Long Videos via LLM-Powered Entity Relation Graphs0
ENTER: Event Based Interpretable Reasoning for VideoQA0
LongViTU: Instruction Tuning for Long-Form Video Understanding0
Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs0
Lyra: An Efficient and Speech-Centric Framework for Omni-CognitionCode3
Espresso: High Compression For Rich Extraction From Videos for Your Vision-Language Model0
VideoSAVi: Self-Aligned Video Language Models without Human Supervision0
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded TuningCode2
VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMsCode1
Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model0
VDMA: Video Question Answering with Dynamically Generated Multi-Agents0
HCQA @ Ego4D EgoSchema Challenge 2024Code1
DrVideo: Document Retrieval Based Long Video Understanding0
Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QACode1
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long VideosCode2
TOPA: Extending Large Language Models for Video Understanding via Text-Only Pre-AlignmentCode1
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering0
Language Repository for Long Video UnderstandingCode1
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding0
VideoAgent: Long-form Video Understanding with Large Language Model as AgentCode2
Video ReCap: Recursive Captioning of Hour-Long VideosCode3
Memory Consolidation Enables Long-Context Video Understanding0
A Simple LLM Framework for Long-Range Video Question-AnsweringCode1
Text-Conditioned Resampler For Long Form Video Understanding0
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames0
LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric VideosCode1
Vamos: Versatile Action Models for Video UnderstandingCode0
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language UnderstandingCode1
Show:102550

No leaderboard results yet.