| MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep Thinking | Apr 9, 2025 | Autonomous DrivingLanguage Modeling | CodeCode Available | 0 |
| Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning | Apr 9, 2025 | Action Unit DetectionAge Estimation | —Unverified | 0 |
| Q-Agent: Quality-Driven Chain-of-Thought Image Restoration Agent through Robust Multimodal Large Language Model | Apr 9, 2025 | Image Quality AssessmentImage Restoration | —Unverified | 0 |
| Towards Visual Text Grounding of Multimodal Large Language Model | Apr 7, 2025 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Universal Item Tokenization for Transferable Generative Recommendation | Apr 6, 2025 | General KnowledgeLarge Language Model | —Unverified | 0 |
| Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target Granularities | Apr 2, 2025 | DescriptiveLarge Language Model | CodeCode Available | 0 |
| Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources | Apr 1, 2025 | GPULarge Language Model | —Unverified | 0 |
| Orchestrate Multimodal Data with Batch Post-Balancing to Accelerate Multimodal Large Language Model Training | Mar 31, 2025 | GPULanguage Modeling | —Unverified | 0 |
| Dynamic Pyramid Network for Efficient Multimodal Large Language Model | Mar 26, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation | Mar 23, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |