EgoSchema

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–40 of 40 papers

Title	Date	Tasks	Status	Hype	Score
VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary	Mar 12, 2025	EgoSchemaRetrieval	CodeCode Available	4	5
Flash-VStream: Efficient Real-Time Understanding for Long Video Streams	Jun 30, 2025	cross-modal alignmentEgoSchema	CodeCode Available	3	5
Video ReCap: Recursive Captioning of Hour-Long Videos	Feb 20, 2024	EgoSchemaVideo Captioning	CodeCode Available	3	5
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition	Dec 12, 2024	EgoSchema	CodeCode Available	3	5
VideoAgent: Long-form Video Understanding with Large Language Model as Agent	Mar 15, 2024	EgoSchemaForm	CodeCode Available	2	5
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning	Oct 25, 2024	EgoSchemaHallucination	CodeCode Available	2	5
LLaVAction: evaluating and training multi-modal large language models for action recognition	Mar 24, 2025	Action RecognitionAction Understanding	CodeCode Available	2	5
Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model	Mar 27, 2025	EgoSchemaLanguage Modeling	CodeCode Available	2	5
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos	May 29, 2024	EgoSchemaMME	CodeCode Available	2	5
TOPA: Extending Large Language Models for Video Understanding via Text-Only Pre-Alignment	May 22, 2024	EgoSchemaVideo Understanding	CodeCode Available	1	5
Language Repository for Long Video Understanding	Mar 21, 2024	EgoSchemaQuestion Answering	CodeCode Available	1	5
LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videos	Dec 7, 2023	EgoSchemaForm	CodeCode Available	1	5
VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs	Sep 30, 2024	EgoSchemaLanguage Modelling	CodeCode Available	1	5
VideoMultiAgents: A Multi-Agent Framework for Video Question Answering	Apr 25, 2025	Caption GenerationEgoSchema	CodeCode Available	1	5
Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QA	Jun 13, 2024	AllEgoSchema	CodeCode Available	1	5
A Simple LLM Framework for Long-Range Video Question-Answering	Dec 28, 2023	EgoSchemaLanguage Modelling	CodeCode Available	1	5
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding	Aug 17, 2023	DiagnosticEgoSchema	CodeCode Available	1	5
Agentic Keyframe Search for Video Question Answering	Mar 20, 2025	EgoSchemaQuestion Answering	CodeCode Available	1	5
HCQA @ Ego4D EgoSchema Challenge 2024	Jun 22, 2024	Caption Generation	CodeCode Available	1	5
Vamos: Versatile Action Models for Video Understanding	Nov 22, 2023	EgoSchemaHard Attention	CodeCode Available	0	5
EgoVLM: Policy Optimization for Egocentric Video Understanding	Jun 3, 2025	EgoSchemaQuestion Answering	CodeCode Available	0	5
Keyframe-oriented Vision Token Pruning: Enhancing Efficiency of Large Vision Language Models on Long-Form Video Processing	Mar 13, 2025	EgoSchemaForm	CodeCode Available	0	5
Memory Consolidation Enables Long-Context Video Understanding	Feb 8, 2024	EgoSchemaVideo Understanding	—Unverified	0	0
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames	Dec 12, 2023	EgoSchema	—Unverified	0	0
Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs	Jan 8, 2025	EgoSchemaObject Tracking	—Unverified	0	0
DrVideo: Document Retrieval Based Long Video Understanding	Jun 18, 2024	document understandingEgoSchema	—Unverified	0	0
ENTER: Event Based Interpretable Reasoning for VideoQA	Jan 24, 2025	Code GenerationEgoSchema	—Unverified	0	0
Espresso: High Compression For Rich Extraction From Videos for Your Vision-Language Model	Dec 6, 2024	EgoSchemaLanguage Modeling	—Unverified	0	0
Four Eyes Are Better Than Two: Harnessing the Collaborative Potential of Large Models via Differentiated Thinking and Complementary Ensembles	May 22, 2025	EgoSchemaFew-Shot Learning	—Unverified	0	0
LongViTU: Instruction Tuning for Long-Form Video Understanding	Jan 9, 2025	EgoSchemaForm	—Unverified	0	0
MaxInfo: A Training-Free Key-Frame Selection Method Using Maximum Volume for Enhanced Video Understanding	Feb 5, 2025	DiversityEgoSchema	—Unverified	0	0
Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model	Aug 1, 2024	EgoSchemaLanguage Modeling	—Unverified	0	0
M-LLM Based Video Frame Selection for Efficient Video Understanding	Feb 27, 2025	EgoSchemaLanguage Modeling	—Unverified	0	0
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering	Apr 9, 2024	EgoSchemaMultiple-choice	—Unverified	0	0
RAVU: Retrieval Augmented Video Understanding with Compositional Reasoning over Graph	May 6, 2025	EgoSchemaRetrieval	—Unverified	0	0
Text-Conditioned Resampler For Long Form Video Understanding	Dec 19, 2023	EgoSchemaForm	—Unverified	0	0
Understanding Long Videos via LLM-Powered Entity Relation Graphs	Jan 27, 2025	EgoSchemaLarge Language Model	—Unverified	0	0
VDMA: Video Question Answering with Dynamically Generated Multi-Agents	Jul 4, 2024	EgoSchemaQuestion Answering	—Unverified	0	0
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding	Mar 18, 2024	EgoSchemaVideo Understanding	—Unverified	0	0
VideoSAVi: Self-Aligned Video Language Models without Human Supervision	Dec 1, 2024	EgoSchemaMVBench	—Unverified	0	0

Show:10 25 50

No leaderboard results yet.