| Sim-to-Real Transfer via 3D Feature Fields for Vision-and-Language Navigation | Jun 14, 2024 | NavigateVision and Language Navigation | CodeCode Available | 2 |
| GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices | Jun 12, 2024 | Navigate | CodeCode Available | 2 |
| DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences | Jun 5, 2024 | Autonomous DrivingAutonomous Vehicles | CodeCode Available | 2 |
| XRec: Large Language Models for Explainable Recommendation | Jun 4, 2024 | Collaborative FilteringDecision Making | CodeCode Available | 2 |
| Demystifying AI Platform Design for Distributed Inference of Next-Generation LLM models | Jun 3, 2024 | ChunkingMamba | CodeCode Available | 2 |
| Can Graph Learning Improve Planning in LLM-based Agents? | May 29, 2024 | Decision MakingGraph Learning | CodeCode Available | 2 |
| LMVD: A Large-Scale Multimodal Vlog Dataset for Depression Detection in the Wild | May 9, 2024 | Depression DetectionNavigate | CodeCode Available | 2 |
| PLAYER*: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games | Apr 26, 2024 | Decision MakingLanguage Modeling | CodeCode Available | 2 |
| The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models | Apr 24, 2024 | DiversityNavigate | CodeCode Available | 2 |
| Can Vehicle Motion Planning Generalize to Realistic Long-tail Scenarios? | Apr 11, 2024 | Autonomous DrivingMotion Planning | CodeCode Available | 2 |
| GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation | Apr 9, 2024 | Go to AnyThingNavigate | CodeCode Available | 2 |
| Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation | Apr 2, 2024 | NavigateVision and Language Navigation | CodeCode Available | 2 |
| Volumetric Environment Representation for Vision-Language Navigation | Mar 21, 2024 | 3D geometryMulti-Task Learning | CodeCode Available | 2 |
| NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning | Mar 12, 2024 | NavigateVision and Language Navigation | CodeCode Available | 2 |
| Large Language Models(LLMs) on Tabular Data: Prediction, Generation, and Understanding -- A Survey | Feb 27, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| OpenFMNav: Towards Open-Set Zero-Shot Object Navigation via Vision-Language Foundation Models | Feb 16, 2024 | Common Sense ReasoningNavigate | CodeCode Available | 2 |
| Imagine Before Go: Self-Supervised Generative Map for Object Goal Navigation | Jan 1, 2024 | General KnowledgeNavigate | CodeCode Available | 2 |
| Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models | Dec 14, 2023 | DescriptiveImage Quality Assessment | CodeCode Available | 2 |
| Holodeck: Language Guided Generation of 3D Embodied AI Environments | Dec 14, 2023 | Common Sense ReasoningLanguage Modelling | CodeCode Available | 2 |
| VLFM: Vision-Language Frontier Maps for Zero-Shot Semantic Navigation | Dec 6, 2023 | Language ModellingNavigate | CodeCode Available | 2 |
| Towards Learning a Generalist Model for Embodied Navigation | Dec 4, 2023 | 3D Question Answering (3D-QA)Embodied Question Answering | CodeCode Available | 2 |
| Tree of Attacks: Jailbreaking Black-Box LLMs Automatically | Dec 4, 2023 | Navigate | CodeCode Available | 2 |
| Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey | Nov 21, 2023 | Navigate | CodeCode Available | 2 |
| ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code | Nov 16, 2023 | Code GenerationNavigate | CodeCode Available | 2 |
| PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization | Oct 25, 2023 | Navigate | CodeCode Available | 2 |
| Navigate through Enigmatic Labyrinth A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future | Sep 27, 2023 | Navigate | CodeCode Available | 2 |
| VidChapters-7M: Video Chapters at Scale | Sep 25, 2023 | Dense Video CaptioningNavigate | CodeCode Available | 2 |
| LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent | Sep 21, 2023 | 3D visual groundingLanguage Modeling | CodeCode Available | 2 |
| AerialVLN: Vision-and-Language Navigation for UAVs | Aug 13, 2023 | cross-modal alignmentNavigate | CodeCode Available | 2 |
| WizMap: Scalable Interactive Visualization for Exploring Large Machine Learning Embeddings | Jun 15, 2023 | Navigate | CodeCode Available | 2 |
| Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory | May 25, 2023 | Common Sense ReasoningCPU | CodeCode Available | 2 |
| ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments | Apr 6, 2023 | Autonomous NavigationNavigate | CodeCode Available | 2 |
| A vision-based autonomous UAV inspection framework for unknown tunnel construction sites with dynamic obstacles | Jan 20, 2023 | Navigate | CodeCode Available | 2 |
| Melting Pot 2.0 | Nov 24, 2022 | Artificial LifeNavigate | CodeCode Available | 2 |
| MuGER^2: Multi-Granularity Evidence Retrieval and Reasoning for Hybrid Question Answering | Oct 19, 2022 | NavigateQuestion Answering | CodeCode Available | 2 |
| Vision-aided UAV navigation and dynamic obstacle avoidance using gradient-based B-spline trajectory optimization | Sep 15, 2022 | Navigate | CodeCode Available | 2 |
| WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents | Jul 4, 2022 | Decision MakingImitation Learning | CodeCode Available | 2 |
| DayDreamer: World Models for Physical Robot Learning | Jun 28, 2022 | Deep Reinforcement LearningNavigate | CodeCode Available | 2 |
| VectorMapNet: End-to-end Vectorized HD Map Learning | Jun 17, 2022 | 3D Lane DetectionAutonomous Driving | CodeCode Available | 2 |
| Receding Moving Object Segmentation in 3D LiDAR Data Using Sparse 4D Convolutions | Jun 8, 2022 | Autonomous VehiclesNavigate | CodeCode Available | 2 |
| Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation | Feb 23, 2022 | Efficient ExplorationNavigate | CodeCode Available | 2 |
| Scaling Language Models: Methods, Analysis & Insights from Training Gopher | Dec 8, 2021 | Abstract AlgebraAnachronisms | CodeCode Available | 2 |
| Panoptic nuScenes: A Large-Scale Benchmark for LiDAR Panoptic Segmentation and Tracking | Sep 8, 2021 | BenchmarkingDiversity | CodeCode Available | 2 |
| The Decrypto Benchmark for Multi-Agent Reasoning and Theory of Mind | Jun 25, 2025 | Multi-agent Reinforcement LearningNavigate | CodeCode Available | 1 |
| SEMNAV: A Semantic Segmentation-Driven Approach to Visual Semantic Navigation | Jun 2, 2025 | Domain AdaptationNavigate | CodeCode Available | 1 |
| Cross from Left to Right Brain: Adaptive Text Dreamer for Vision-and-Language Navigation | May 27, 2025 | Large Language ModelLogical Reasoning | CodeCode Available | 1 |
| Large Language Models for Planning: A Comprehensive and Systematic Survey | May 26, 2025 | Logical ReasoningNavigate | CodeCode Available | 1 |
| Neural Brain: A Neuroscience-inspired Framework for Embodied Agents | May 12, 2025 | Navigate | CodeCode Available | 1 |
| CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global Memory | May 8, 2025 | Large Language ModelNavigate | CodeCode Available | 1 |
| Future-Oriented Navigation: Dynamic Obstacle Avoidance with One-Shot Energy-Based Multimodal Motion Prediction | May 1, 2025 | Model Predictive ControlMotion Planning | CodeCode Available | 1 |