| Four Eyes Are Better Than Two: Harnessing the Collaborative Potential of Large Models via Differentiated Thinking and Complementary Ensembles | May 22, 2025 | EgoSchemaFew-Shot Learning | —Unverified | 0 |
| RAVU: Retrieval Augmented Video Understanding with Compositional Reasoning over Graph | May 6, 2025 | EgoSchemaRetrieval | —Unverified | 0 |
| Keyframe-oriented Vision Token Pruning: Enhancing Efficiency of Large Vision Language Models on Long-Form Video Processing | Mar 13, 2025 | EgoSchemaForm | CodeCode Available | 0 |
| M-LLM Based Video Frame Selection for Efficient Video Understanding | Feb 27, 2025 | EgoSchemaLanguage Modeling | —Unverified | 0 |
| MaxInfo: A Training-Free Key-Frame Selection Method Using Maximum Volume for Enhanced Video Understanding | Feb 5, 2025 | DiversityEgoSchema | —Unverified | 0 |
| Understanding Long Videos via LLM-Powered Entity Relation Graphs | Jan 27, 2025 | EgoSchemaLarge Language Model | —Unverified | 0 |
| ENTER: Event Based Interpretable Reasoning for VideoQA | Jan 24, 2025 | Code GenerationEgoSchema | —Unverified | 0 |
| LongViTU: Instruction Tuning for Long-Form Video Understanding | Jan 9, 2025 | EgoSchemaForm | —Unverified | 0 |
| Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs | Jan 8, 2025 | EgoSchemaObject Tracking | —Unverified | 0 |
| Espresso: High Compression For Rich Extraction From Videos for Your Vision-Language Model | Dec 6, 2024 | EgoSchemaLanguage Modeling | —Unverified | 0 |