| Towards Vision-Language-Garment Models For Web Knowledge Garment Understanding and Generation | Jun 5, 2025 | Zero-shot Generalization | —Unverified | 0 |
| Generating Synthetic Stereo Datasets using 3D Gaussian Splatting and Expert Knowledge Transfer | Jun 5, 2025 | 3DGSDataset Generation | —Unverified | 0 |
| OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis | Jun 4, 2025 | Action GenerationDecision Making | CodeCode Available | 1 |
| Language-Guided Multi-Agent Learning in Simulations: A Unified Framework and Evaluation | Jun 1, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| DrVD-Bench: Do Vision-Language Models Reason Like Human Doctors in Medical Image Diagnosis? | May 30, 2025 | DiagnosticMedical Image Analysis | CodeCode Available | 1 |
| Beyond the LUMIR challenge: The pathway to foundational registration models | May 30, 2025 | Image RegistrationZero-shot Generalization | CodeCode Available | 1 |
| Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression | May 26, 2025 | Zero-shot Generalization | CodeCode Available | 2 |
| ViTaPEs: Visuotactile Position Encodings for Cross-Modal Alignment in Multimodal Transformers | May 26, 2025 | cross-modal alignmentPosition | —Unverified | 0 |
| ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving | May 26, 2025 | Autonomous DrivingBench2Drive | CodeCode Available | 1 |
| WHISTRESS: Enriching Transcriptions with Sentence Stress Detection | May 25, 2025 | SentenceZero-shot Generalization | —Unverified | 0 |