| M-LLM Based Video Frame Selection for Efficient Video Understanding | Feb 27, 2025 | EgoSchemaLanguage Modeling | —Unverified | 0 |
| MaxInfo: A Training-Free Key-Frame Selection Method Using Maximum Volume for Enhanced Video Understanding | Feb 5, 2025 | DiversityEgoSchema | —Unverified | 0 |
| Understanding Long Videos via LLM-Powered Entity Relation Graphs | Jan 27, 2025 | EgoSchemaLarge Language Model | —Unverified | 0 |
| ENTER: Event Based Interpretable Reasoning for VideoQA | Jan 24, 2025 | Code GenerationEgoSchema | —Unverified | 0 |
| LongViTU: Instruction Tuning for Long-Form Video Understanding | Jan 9, 2025 | EgoSchemaForm | —Unverified | 0 |
| Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs | Jan 8, 2025 | EgoSchemaObject Tracking | —Unverified | 0 |
| Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition | Dec 12, 2024 | EgoSchema | CodeCode Available | 3 |
| Espresso: High Compression For Rich Extraction From Videos for Your Vision-Language Model | Dec 6, 2024 | EgoSchemaLanguage Modeling | —Unverified | 0 |
| VideoSAVi: Self-Aligned Video Language Models without Human Supervision | Dec 1, 2024 | EgoSchemaMVBench | —Unverified | 0 |
| TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning | Oct 25, 2024 | EgoSchemaHallucination | CodeCode Available | 2 |