| Foundation Models for Remote Sensing: An Analysis of MLLMs for Object Localization | Apr 14, 2025 | BenchmarkingEarth Observation | —Unverified | 0 |
| Perturbed State Space Feature Encoders for Optical Flow with Event Cameras | Apr 14, 2025 | Event-based Optical FlowOptical Flow Estimation | —Unverified | 0 |
| VisualPuzzles: Decoupling Multimodal Reasoning Evaluation from Domain Knowledge | Apr 14, 2025 | Logical ReasoningMultimodal Reasoning | —Unverified | 0 |
| Embodied Chain of Action Reasoning with Multi-Modal Foundation Model for Humanoid Loco-manipulation | Apr 13, 2025 | NavigateObject Rearrangement | —Unverified | 0 |
| 3D CoCa: Contrastive Learners are 3D Captioners | Apr 13, 2025 | 3D dense captioningCaption Generation | CodeCode Available | 0 |
| VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search | Apr 12, 2025 | Spatial Reasoning | —Unverified | 0 |
| AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations | Apr 10, 2025 | Spatial ReasoningVisual Grounding | —Unverified | 0 |
| Endowing Embodied Agents with Spatial Reasoning Capabilities for Vision-and-Language Navigation | Apr 9, 2025 | HallucinationSpatial Reasoning | —Unverified | 0 |
| How to Enable LLM with 3D Capacity? A Survey of Spatial Reasoning in LLM | Apr 8, 2025 | Autonomous VehiclesSpatial Reasoning | —Unverified | 0 |
| Towards Visual Text Grounding of Multimodal Large Language Model | Apr 7, 2025 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Advancing Egocentric Video Question Answering with Multimodal Large Language Models | Apr 6, 2025 | Object RecognitionQuestion Answering | —Unverified | 0 |
| NuScenes-SpatialQA: A Spatial Understanding and Reasoning Benchmark for Vision-Language Models in Autonomous Driving | Apr 4, 2025 | 3d scene graph generationAutonomous Driving | —Unverified | 0 |
| Enabling Systematic Generalization in Abstract Spatial Reasoning through Meta-Learning for Compositionality | Apr 2, 2025 | Meta-LearningSpatial Reasoning | CodeCode Available | 0 |
| SpaceR: Reinforcing MLLMs in Video Spatial Reasoning | Apr 2, 2025 | MMESpatial Reasoning | CodeCode Available | 2 |
| Improved Visual-Spatial Reasoning via R1-Zero-Like Training | Apr 1, 2025 | GPUSpatial Reasoning | CodeCode Available | 1 |
| Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead | Mar 31, 2025 | MathSpatial Reasoning | CodeCode Available | 2 |
| From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D | Mar 29, 2025 | Spatial Reasoning | CodeCode Available | 2 |
| Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks | Mar 27, 2025 | Imitation LearningMathematical Reasoning | CodeCode Available | 2 |
| Video-R1: Reinforcing Video Reasoning in MLLMs | Mar 27, 2025 | MVBenchReinforcement Learning (RL) | CodeCode Available | 4 |
| RSRWKV: A Linear-Complexity 2D Attention Mechanism for Efficient Remote Sensing Vision Task | Mar 26, 2025 | Spatial Reasoning | —Unverified | 0 |
| Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models | Mar 25, 2025 | BenchmarkingImage Captioning | CodeCode Available | 1 |
| LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning? | Mar 25, 2025 | Autonomous NavigationQuestion Answering | —Unverified | 0 |
| ST-VLM: Kinematic Instruction Tuning for Spatio-Temporal Reasoning in Vision-Language Models | Mar 25, 2025 | 4D reconstructionAutonomous Driving | —Unverified | 0 |
| DataPlatter: Boosting Robotic Manipulation Generalization with Minimal Costly Data | Mar 25, 2025 | Robot ManipulationSpatial Reasoning | —Unverified | 0 |
| MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse | Mar 24, 2025 | Layout GenerationReinforcement Learning (RL) | CodeCode Available | 3 |