| CDR-Agent: Intelligent Selection and Execution of Clinical Decision Rules Using Large Language Model Agents | May 29, 2025 | Decision MakingLanguage Modeling | CodeCode Available | 0 |
| VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation | May 29, 2025 | Caption GenerationLanguage Modeling | CodeCode Available | 1 |
| Beam-Guided Knowledge Replay for Knowledge-Rich Image Captioning using Vision-Language Model | May 29, 2025 | Image CaptioningLanguage Modeling | —Unverified | 0 |
| Disrupting Vision-Language Model-Driven Navigation Services via Adversarial Object Fusion | May 29, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreement | May 29, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| TrackVLA: Embodied Visual Tracking in the Wild | May 29, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Spoken Language Modeling with Duration-Penalized Self-Supervised Units | May 29, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| VLM-RRT: Vision Language Model Guided RRT Search for Autonomous UAV Navigation | May 29, 2025 | Disaster ResponseLanguage Modeling | —Unverified | 0 |
| PhotoArtAgent: Intelligent Photo Retouching with Language Model-Based Artist Agents | May 29, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| SeG-SR: Integrating Semantic Knowledge into Remote Sensing Image Super-Resolution via Vision-Language Model | May 29, 2025 | Image Super-ResolutionLanguage Modeling | CodeCode Available | 0 |