Vision and Language Navigation

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 223 papers

Title	Date	Tasks	Status	Hype
Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities	Jul 17, 2025	Large Language ModelVision and Language Navigation	—Unverified	0
NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments	Jun 30, 2025	Decision MakingVision and Language Navigation	CodeCode Available	2
Grounded Vision-Language Navigation for UAVs with Open-Vocabulary Goal Understanding	Jun 12, 2025	Language ModelingLanguage Modelling	—Unverified	0
A Navigation Framework Utilizing Vision-Language Models	Jun 11, 2025	NavigatePrompt Engineering	CodeCode Available	0
Disrupting Vision-Language Model-Driven Navigation Services via Adversarial Object Fusion	May 29, 2025	Language ModelingLanguage Modelling	—Unverified	0
Cross from Left to Right Brain: Adaptive Text Dreamer for Vision-and-Language Navigation	May 27, 2025	Large Language ModelLogical Reasoning	CodeCode Available	1
FlightGPT: Towards Generalizable and Interpretable UAV Vision-and-Language Navigation with Vision-Language Models	May 19, 2025	Disaster ResponseVision and Language Navigation	CodeCode Available	2
Dynam3D: Dynamic Layered 3D Tokens Empower VLM for Vision-and-Language Navigation	May 16, 2025	3D geometryNavigate	CodeCode Available	2
CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global Memory	May 8, 2025	Large Language ModelNavigate	CodeCode Available	1
MetaScenes: Towards Automated Replica Creation for Real-world 3D Scans	May 5, 2025	Vision and Language Navigation	—Unverified	0
DOPE: Dual Object Perception-Enhancement Network for Vision-and-Language Navigation	Apr 30, 2025	NavigateObject	—Unverified	0
ST-Booster: An Iterative SpatioTemporal Perception Booster for Vision-and-Language Navigation in Continuous Environments	Apr 14, 2025	NavigateVision and Language Navigation	—Unverified	0
Endowing Embodied Agents with Spatial Reasoning Capabilities for Vision-and-Language Navigation	Apr 9, 2025	HallucinationSpatial Reasoning	—Unverified	0
COSMO: Combination of Selective Memorization for Low-cost Vision-and-Language Navigation	Mar 31, 2025	MemorizationVision and Language Navigation	—Unverified	0
Do Visual Imaginations Improve Vision-and-Language Navigation Agents?	Mar 20, 2025	Vision and Language Navigation	—Unverified	0
FlexVLN: Flexible Adaptation for Diverse Vision-and-Language Navigation Tasks	Mar 18, 2025	Vision and Language Navigation	—Unverified	0
HA-VLN: A Benchmark for Human-Aware Navigation in Discrete-Continuous Environments with Dynamic Multi-Human Interactions, Real-World Validation, and an Open Leaderboard	Mar 18, 2025	BenchmarkingHuman Dynamics	—Unverified	0
Aerial Vision-and-Language Navigation with Grid-based View Selection and Map Construction	Mar 14, 2025	NavigateVision and Language Navigation	—Unverified	0
Observation-Graph Interaction and Key-Detail Guidance for Vision and Language Navigation	Mar 14, 2025	cross-modal alignmentNavigate	—Unverified	0
PanoGen++: Domain-Adapted Text-Guided Panoramic Environment Generation for Vision-and-Language Navigation	Mar 13, 2025	Image InpaintingImage Outpainting	—Unverified	0
SmartWay: Enhanced Waypoint Prediction and Backtracking for Zero-Shot Vision-and-Language Navigation	Mar 13, 2025	Language ModelingLanguage Modelling	—Unverified	0
Ground-level Viewpoint Vision-and-Language Navigation in Continuous Environments	Feb 26, 2025	Instruction FollowingVision and Language Navigation	—Unverified	0
NavRAG: Generating User Demand Instructions for Embodied Navigation through Retrieval-Augmented LLM	Feb 16, 2025	NavigateRAG	CodeCode Available	2
TRAVEL: Training-Free Retrieval and Alignment for Vision-and-Language Navigation	Feb 11, 2025	RetrievalVision and Language Navigation	—Unverified	0
General Scene Adaptation for Vision-and-Language Navigation	Jan 29, 2025	DiversityVision and Language Navigation	CodeCode Available	2
Language and Planning in Robotic Navigation: A Multilingual Evaluation of State-of-the-Art Models	Jan 7, 2025	Instruction FollowingVision and Language Navigation	—Unverified	0
NAVCON: A Cognitively Inspired and Linguistically Grounded Corpus for Vision and Language Navigation	Dec 17, 2024	Few-Shot LearningVision and Language Navigation	—Unverified	0
RoomTour3D: Geometry-Aware Video-Instruction Tuning for Embodied Navigation	Dec 11, 2024	3D ReconstructionDiversity	—Unverified	0
Agent Journey Beyond RGB: Unveiling Hybrid Semantic-Spatial Environmental Representations for Vision-and-Language Navigation	Dec 9, 2024	Object LocalizationVision and Language Navigation	CodeCode Available	1
World-Consistent Data Generation for Vision-and-Language Navigation	Dec 9, 2024	Data AugmentationNavigate	—Unverified	0
NaVILA: Legged Robot Vision-Language-Action Model for Navigation	Dec 5, 2024	NavigateVision and Language Navigation	—Unverified	0
Hijacking Vision-and-Language Navigation Agents with Adversarial Environmental Attacks	Dec 3, 2024	Adversarial AttackVision and Language Navigation	—Unverified	0
Planning from Imagination: Episodic Simulation and Episodic Memory for Vision-and-Language Navigation	Nov 30, 2024	NavigateVision and Language Navigation	—Unverified	0
g3D-LF: Generalizable 3D-Language Feature Fields for Embodied Tasks	Nov 26, 2024	Contrastive LearningQuestion Answering	CodeCode Available	1
UnitedVLN: Generalizable Gaussian Splatting for Continuous Vision-Language Navigation	Nov 25, 2024	3DGSNavigate	—Unverified	0
Fine-Grained Alignment in Vision-and-Language Navigation through Bayesian Optimization	Nov 22, 2024	Bayesian OptimizationContrastive Learning	—Unverified	0
NavAgent: Multi-scale Urban Street View Fusion For UAV Embodied Vision-and-Language Navigation	Nov 13, 2024	NavigateVision and Language Navigation	—Unverified	0
Aerial Vision-and-Language Navigation via Semantic-Topo-Metric Representation Guided LLM Reasoning	Oct 11, 2024	Language ModelingLanguage Modelling	—Unverified	0
Zero-Shot Vision-and-Language Navigation with Collision Mitigation in Continuous Environment	Oct 7, 2024	Large Language ModelVision and Language Navigation	—Unverified	0
MiniVLN: Efficient Vision-and-Language Navigation by Progressive Knowledge Distillation	Sep 27, 2024	Knowledge DistillationVision and Language Navigation	—Unverified	0
Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs	Sep 27, 2024	Decision MakingNavigate	—Unverified	0
Spatially-Aware Speaker for Vision-and-Language Navigation Instruction Generation	Sep 9, 2024	Vision and Language Navigation	CodeCode Available	0
Seeing is Believing? Enhancing Vision-Language Navigation using Visual Perturbations	Sep 9, 2024	Autonomous NavigationDiversity	—Unverified	0
FLAME: Learning to Navigate with Multimodal LLM in Urban Environments	Aug 20, 2024	NavigateVision and Language Navigation	CodeCode Available	2
Narrowing the Gap between Vision and Action in Navigation	Aug 19, 2024	DecoderSpatial Reasoning	CodeCode Available	0
Loc4Plan: Locating Before Planning for Outdoor Vision and Language Navigation	Aug 9, 2024	NavigatePosition	—Unverified	0
Navigating Beyond Instructions: Vision-and-Language Navigation in Obstructed Environments	Jul 31, 2024	graph constructionNavigate	CodeCode Available	1
NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models	Jul 17, 2024	Instruction FollowingVision and Language Navigation	CodeCode Available	3
PRET: Planning with Directed Fidelity Trajectory for Vision and Language Navigation	Jul 16, 2024	NavigateVision and Language Navigation	CodeCode Available	1
Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models	Jul 9, 2024	Vision and Language Navigation	CodeCode Available	3

Show:10 25 50

← PrevPage 1 of 5Next →

All datasets VLN Challenge Touchdown Dataset RxR map2seq Room2Room robo-vln

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	human	success	0.86	—	Unverified
2	Lily	success	0.79	—	Unverified
3	Airbert	success	0.78	—	Unverified
4	explore@40 beam-search	success	0.74	—	Unverified
5	Global Normalization	success	0.74	—	Unverified
6	VLN-Bert	success	0.73	—	Unverified
7	BEVBert	success	0.73	—	Unverified
8	GMap	success	0.73	—	Unverified
9	Gloabl Normalization pre-explore	success	0.73	—	Unverified
10	FOAM-Beam Search	success	0.72	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	FLAME	Task Completion (TC)	40.2	—	Unverified
2	ORAR + junction type + heading delta	Task Completion (TC)	29.1	—	Unverified
3	ORAR	Task Completion (TC)	24.2	—	Unverified
4	ARC + L2STOP	Task Completion (TC)	16.68	—	Unverified
5	VLN Transformer +M-50 +style	Task Completion (TC)	16.2	—	Unverified
6	VLN Transformer	Task Completion (TC)	14.9	—	Unverified
7	ARC	Task Completion (TC)	14.13	—	Unverified
8	Retouch-RConcat	Task Completion (TC)	12.8	—	Unverified
9	Gated Attention (GA)	Task Completion (TC)	11.9	—	Unverified
10	RConcat	Task Completion (TC)	11.8	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MARVAL	ndtw	66.76	—	Unverified
2	EnvEdit-PT	ndtw	64.61	—	Unverified
3	HAMT	ndtw	59.94	—	Unverified
4	CLEAR-CLIP	ndtw	53.69	—	Unverified
5	Monolingual Baseline	ndtw	41.05	—	Unverified
6	Multilingual Baseline	ndtw	36.81	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	FLAME	Task Completion (TC)	52.44	—	Unverified
2	ORAR + junction type + heading delta	Task Completion (TC)	46.7	—	Unverified
3	ORAR	Task Completion (TC)	45.1	—	Unverified
4	Gated Attention	Task Completion (TC)	17	—	Unverified
5	Rconcat	Task Completion (TC)	14.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	R2R+EnvDrop	spl	0.61	—	Unverified
2	RCM + SIL	spl	0.59	—	Unverified
3	Tactical Rewind - short	spl	0.41	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Hierarchical Cross-Modal Agent	SPL (Sucess Weighted by Path Length)	0.4	—	Unverified