| Foundation Models for Remote Sensing: An Analysis of MLLMs for Object Localization | Apr 14, 2025 | BenchmarkingEarth Observation | —Unverified | 0 |
| VisualPuzzles: Decoupling Multimodal Reasoning Evaluation from Domain Knowledge | Apr 14, 2025 | Logical ReasoningMultimodal Reasoning | —Unverified | 0 |
| Perturbed State Space Feature Encoders for Optical Flow with Event Cameras | Apr 14, 2025 | Event-based Optical FlowOptical Flow Estimation | —Unverified | 0 |
| Embodied Chain of Action Reasoning with Multi-Modal Foundation Model for Humanoid Loco-manipulation | Apr 13, 2025 | NavigateObject Rearrangement | —Unverified | 0 |
| 3D CoCa: Contrastive Learners are 3D Captioners | Apr 13, 2025 | 3D dense captioningCaption Generation | CodeCode Available | 0 |
| VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search | Apr 12, 2025 | Spatial Reasoning | —Unverified | 0 |
| AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations | Apr 10, 2025 | Spatial ReasoningVisual Grounding | —Unverified | 0 |
| Endowing Embodied Agents with Spatial Reasoning Capabilities for Vision-and-Language Navigation | Apr 9, 2025 | HallucinationSpatial Reasoning | —Unverified | 0 |
| How to Enable LLM with 3D Capacity? A Survey of Spatial Reasoning in LLM | Apr 8, 2025 | Autonomous VehiclesSpatial Reasoning | —Unverified | 0 |
| Towards Visual Text Grounding of Multimodal Large Language Model | Apr 7, 2025 | BenchmarkingLanguage Modeling | —Unverified | 0 |