| InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models | Apr 14, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| CleanMAP: Distilling Multimodal LLMs for Confidence-Driven Crowdsourced HD Map Updates | Apr 14, 2025 | Autonomous NavigationLane Detection | —Unverified | 0 |
| Mavors: Multi-granularity Video Representation for Multimodal Large Language Model | Apr 14, 2025 | Computational EfficiencyLanguage Modeling | —Unverified | 0 |
| Marmot: Multi-Agent Reasoning for Multi-Object Self-Correcting in Improving Image-Text Alignment | Apr 10, 2025 | AI AgentAttribute | —Unverified | 0 |
| Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning | Apr 9, 2025 | Action Unit DetectionAge Estimation | —Unverified | 0 |
| MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep Thinking | Apr 9, 2025 | Autonomous DrivingLanguage Modeling | CodeCode Available | 0 |
| Q-Agent: Quality-Driven Chain-of-Thought Image Restoration Agent through Robust Multimodal Large Language Model | Apr 9, 2025 | Image Quality AssessmentImage Restoration | —Unverified | 0 |
| Towards Visual Text Grounding of Multimodal Large Language Model | Apr 7, 2025 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Universal Item Tokenization for Transferable Generative Recommendation | Apr 6, 2025 | General KnowledgeLarge Language Model | —Unverified | 0 |
| Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target Granularities | Apr 2, 2025 | DescriptiveLarge Language Model | CodeCode Available | 0 |
| Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources | Apr 1, 2025 | GPULarge Language Model | —Unverified | 0 |
| Orchestrate Multimodal Data with Batch Post-Balancing to Accelerate Multimodal Large Language Model Training | Mar 31, 2025 | GPULanguage Modeling | —Unverified | 0 |
| Dynamic Pyramid Network for Efficient Multimodal Large Language Model | Mar 26, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation | Mar 23, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation | Mar 19, 2025 | Language Model EvaluationLanguage Modeling | —Unverified | 0 |
| LEGION: Learning to Ground and Explain for Synthetic Image Detection | Mar 19, 2025 | Artifact DetectionImage Manipulation | —Unverified | 0 |
| SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability | Mar 18, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model | Mar 17, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| When neural implant meets multimodal LLM: A dual-loop system for neuromodulation and naturalistic neuralbehavioral research | Mar 16, 2025 | EEGLarge Language Model | —Unverified | 0 |
| GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing | Mar 16, 2025 | Change DetectionImage Captioning | —Unverified | 0 |
| OmniDiff: A Comprehensive Benchmark for Fine-grained Image Difference Captioning | Mar 14, 2025 | Large Language ModelMultimodal Large Language Model | —Unverified | 0 |
| CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance | Mar 13, 2025 | Large Language ModelMultimodal Large Language Model | —Unverified | 0 |
| Hybrid Agents for Image Restoration | Mar 13, 2025 | Image RestorationIn-Context Learning | —Unverified | 0 |
| Lightweight Multimodal Artificial Intelligence Framework for Maritime Multi-Scene Recognition | Mar 10, 2025 | Disaster ResponseLarge Language Model | —Unverified | 0 |
| PP-DocBee: Improving Multimodal Document Understanding Through a Bag of Tricks | Mar 6, 2025 | document understandingLanguage Modeling | —Unverified | 0 |