| Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents | Oct 7, 2024 | Natural Language Visual GroundingNavigate | CodeCode Available | 3 |
| A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models | Jul 2, 2024 | Navigate | CodeCode Available | 3 |
| Computation-Efficient Era: A Comprehensive Survey of State Space Models in Medical Image Analysis | Jun 5, 2024 | MambaMedical Image Analysis | CodeCode Available | 3 |
| CarDreamer: Open-Source Learning Platform for World Model based Autonomous Driving | May 15, 2024 | Autonomous DrivingAutonomous Vehicles | CodeCode Available | 3 |
| AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench | Jul 3, 2025 | Navigate | CodeCode Available | 2 |
| DualMap: Online Open-Vocabulary Semantic Mapping for Natural Language Navigation in Dynamic Changing Scenes | Jun 2, 2025 | Natural Language QueriesNavigate | CodeCode Available | 2 |
| Dynam3D: Dynamic Layered 3D Tokens Empower VLM for Vision-and-Language Navigation | May 16, 2025 | 3D geometryNavigate | CodeCode Available | 2 |
| ForesightNav: Learning Scene Imagination for Efficient Exploration | Apr 22, 2025 | Efficient ExplorationNavigate | CodeCode Available | 2 |
| Enhance Then Search: An Augmentation-Search Strategy with Foundation Models for Cross-Domain Few-Shot Object Detection | Apr 6, 2025 | Cross-Domain Few-ShotCross-Domain Few-Shot Object Detection | CodeCode Available | 2 |
| MTGS: Multi-Traversal Gaussian Splatting | Mar 16, 2025 | NavigateNovel View Synthesis | CodeCode Available | 2 |