| EAGLE: Egocentric AGgregated Language-video Engine | Sep 26, 2024 | Action RecognitionActivity Recognition | —Unverified | 0 |
| EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM | Dec 12, 2024 | Image ComprehensionImage Generation | —Unverified | 0 |
| EditScout: Locating Forged Regions from Diffusion-based Edited Images with Multimodal LLM | Dec 5, 2024 | Image ManipulationLanguage Modeling | —Unverified | 0 |
| EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model | Aug 21, 2024 | Computational EfficiencyLanguage Modeling | —Unverified | 0 |
| Efficient Indirect LLM Jailbreak via Multimodal-LLM Jailbreak | May 30, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios | Dec 5, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| EtC: Temporal Boundary Expand then Clarify for Weakly Supervised Video Grounding with Multimodal Large Language Model | Dec 5, 2023 | Boundary DetectionLanguage Modeling | —Unverified | 0 |
| EventVL: Understand Event Streams via Multimodal Large Language Model | Jan 23, 2025 | Event-based visionLanguage Modeling | —Unverified | 0 |
| Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling | Dec 6, 2024 | document understandingHallucination | —Unverified | 0 |
| FaceInsight: A Multimodal Large Language Model for Face Perception | Apr 22, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |