| MIKU-PAL: An Automated and Standardized Multi-Modal Method for Speech Paralinguistic and Affect Labeling | May 21, 2025 | Emotion RecognitionFace Detection | —Unverified | 0 |
| CAFES: A Collaborative Multi-Agent Framework for Multi-Granular Multimodal Essay Scoring | May 20, 2025 | Automated Essay ScoringDiversity | —Unverified | 0 |
| UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning | May 20, 2025 | Large Language ModelMultimodal Large Language Model | —Unverified | 0 |
| UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation | May 20, 2025 | Image GenerationLanguage Modeling | —Unverified | 0 |
| ORQA: A Benchmark and Foundation Model for Holistic Operating Room Modeling | May 19, 2025 | Graph GenerationKnowledge Distillation | —Unverified | 0 |
| MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO | May 19, 2025 | DecoderImage Generation | CodeCode Available | 0 |
| Beyond Retrieval: Joint Supervision and Multimodal Document Ranking for Textbook Question Answering | May 17, 2025 | Document RankingLarge Language Model | —Unverified | 0 |
| Batch Augmentation with Unimodal Fine-tuning for Multimodal Learning | May 10, 2025 | Image AugmentationLarge Language Model | CodeCode Available | 0 |
| MonetGPT: Solving Puzzles Enhances MLLMs' Image Retouching Skills | May 9, 2025 | Image RetouchingLarge Language Model | —Unverified | 0 |
| Is your multimodal large language model a good science tutor? | May 9, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |