| ViTaPEs: Visuotactile Position Encodings for Cross-Modal Alignment in Multimodal Transformers | May 26, 2025 | cross-modal alignmentPosition | —Unverified | 0 |
| Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps | May 24, 2025 | Scene UnderstandingSpatial Reasoning | —Unverified | 0 |
| Towards Dynamic 3D Reconstruction of Hand-Instrument Interaction in Ophthalmic Surgery | May 23, 2025 | 3D ReconstructionHand Pose Estimation | —Unverified | 0 |
| U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding | May 23, 2025 | BenchmarkingSpatial Reasoning | —Unverified | 0 |
| Bridging the Dynamic Perception Gap: Training-Free Draft Chain-of-Thought for Dynamic Multimodal Spatial Reasoning | May 22, 2025 | Spatial Reasoning | CodeCode Available | 0 |
| SEM: Enhancing Spatial Understanding for Robust Robot Manipulation | May 22, 2025 | 3D geometryRobot Manipulation | —Unverified | 0 |
| DetailMaster: Can Your Text-to-Image Model Handle Long Prompts? | May 22, 2025 | AttributeSpatial Reasoning | —Unverified | 0 |
| SPaRC: A Spatial Pathfinding Reasoning Challenge | May 22, 2025 | Spatial Reasoning | CodeCode Available | 0 |
| MEgoHand: Multimodal Egocentric Hand-Object Interaction Motion Generation | May 22, 2025 | Motion GenerationObject | —Unverified | 0 |
| VLM-R^3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought | May 22, 2025 | Spatial Reasoning | —Unverified | 0 |