| Speculative Decoding Reimagined for Multimodal Large Language Models | May 20, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| U-SAM: An audio language Model for Unified Speech, Audio, and Music Understanding | May 20, 2025 | cross-modal alignmentLanguage Modeling | CodeCode Available | 1 |
| R3: Robust Rubric-Agnostic Reward Models | May 19, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| 3D Visual Illusion Depth Estimation | May 19, 2025 | Common Sense ReasoningDepth Estimation | CodeCode Available | 1 |
| Sample Efficient Reinforcement Learning via Large Vision Language Model Distillation | May 16, 2025 | Decision MakingLanguage Modeling | CodeCode Available | 1 |
| Unifying Segment Anything in Microscopy with Multimodal Large Language Model | May 16, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| ImagineBench: Evaluating Reinforcement Learning with Large Language Model Rollouts | May 15, 2025 | Continual LearningLanguage Modeling | CodeCode Available | 1 |
| Multi-Token Prediction Needs Registers | May 15, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving | May 13, 2025 | 3D visual groundingAutonomous Driving | CodeCode Available | 1 |
| Kalman Filter Enhanced GRPO for Reinforcement Learning-Based Language Model Reasoning | May 12, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |