| VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding | Mar 18, 2024 | EgoSchemaVideo Understanding | —Unverified | 0 |
| VideoAgent: Long-form Video Understanding with Large Language Model as Agent | Mar 15, 2024 | EgoSchemaForm | CodeCode Available | 2 |
| Video ReCap: Recursive Captioning of Hour-Long Videos | Feb 20, 2024 | EgoSchemaVideo Captioning | CodeCode Available | 3 |
| Memory Consolidation Enables Long-Context Video Understanding | Feb 8, 2024 | EgoSchemaVideo Understanding | —Unverified | 0 |
| A Simple LLM Framework for Long-Range Video Question-Answering | Dec 28, 2023 | EgoSchemaLanguage Modelling | CodeCode Available | 1 |
| Text-Conditioned Resampler For Long Form Video Understanding | Dec 19, 2023 | EgoSchemaForm | —Unverified | 0 |
| A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames | Dec 12, 2023 | EgoSchema | —Unverified | 0 |
| LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videos | Dec 7, 2023 | EgoSchemaForm | CodeCode Available | 1 |
| Vamos: Versatile Action Models for Video Understanding | Nov 22, 2023 | EgoSchemaHard Attention | CodeCode Available | 0 |
| EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding | Aug 17, 2023 | DiagnosticEgoSchema | CodeCode Available | 1 |