| ChemMLLM: Chemical Multimodal Large Language Model | May 22, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding | May 22, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Human-centered Interactive Learning via MLLMs for Text-to-Image Person Re-identification | May 21, 2025 | Data AugmentationLarge Language Model | —Unverified | 0 |
| Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval | May 21, 2025 | AttributeImage Retrieval | —Unverified | 0 |
| Web-Shepherd: Advancing PRMs for Reinforcing Web Agents | May 21, 2025 | Large Language ModelMultimodal Large Language Model | CodeCode Available | 2 |
| MIKU-PAL: An Automated and Standardized Multi-Modal Method for Speech Paralinguistic and Affect Labeling | May 21, 2025 | Emotion RecognitionFace Detection | —Unverified | 0 |
| CAFES: A Collaborative Multi-Agent Framework for Multi-Granular Multimodal Essay Scoring | May 20, 2025 | Automated Essay ScoringDiversity | —Unverified | 0 |
| UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation | May 20, 2025 | Image GenerationLanguage Modeling | —Unverified | 0 |
| UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning | May 20, 2025 | Large Language ModelMultimodal Large Language Model | —Unverified | 0 |
| BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation | May 19, 2025 | Binary ClassificationDeepFake Detection | CodeCode Available | 1 |