SOTAVerified

Autonomous Driving

Autonomous driving is the task of driving a vehicle without human conduction.

Many of the state-of-the-art results can be found at more general task pages such as 3D Object Detection and Semantic Segmentation.

(Image credit: Exploring the Limitations of Behavior Cloning for Autonomous Driving)

Papers

Showing 501550 of 6092 papers

TitleStatusHype
TUMTraffic-VideoQA: A Benchmark for Unified Spatio-Temporal Video Understanding in Traffic ScenesCode1
Event-aided Semantic Scene CompletionCode1
SimBEV: A Synthetic Multi-Task Multi-Sensor Driving Data Generation Tool and DatasetCode1
TransRAD: Retentive Vision Transformer for Enhanced Radar Object DetectionCode1
SliceOcc: Indoor 3D Semantic Occupancy Prediction with Vertical Slice RepresentationCode1
Dream to Drive with Predictive Individual World ModelCode1
MetaOcc: Surround-View 4D Radar and Camera Fusion Framework for 3D Occupancy Prediction with Dual Training StrategiesCode1
3DLabelProp: Geometric-Driven Domain Generalization for LiDAR Semantic Segmentation in Autonomous DrivingCode1
A Survey of World Models for Autonomous DrivingCode1
DSTIGCN: Deformable Spatial-Temporal Interaction Graph Convolution Network for Pedestrian Trajectory PredictionCode1
LEO: Boosting Mixture of Vision Encoders for Multimodal Large Language ModelsCode1
AD-L-JEPA: Self-Supervised Spatial World Models with Joint Embedding Predictive Architecture for Autonomous Driving with LiDAR DataCode1
Implicit Guidance and Explicit Representation of Semantic Information in Points Cloud: A SurveyCode1
RadarNeXt: Real-Time and Reliable 3D Object Detector Based On 4D mmWave Imaging RadarCode1
Pseudo Visible Feature Fine-Grained Fusion for Thermal Object DetectionCode1
PIDLoc: Cross-View Pose Optimization Network Inspired by PID ControllersCode1
OmniStereo: Real-time Omnidireactional Depth Estimation with Multiview Fisheye CamerasCode1
TiGDistill-BEV: Multi-view BEV 3D Object Detection via Target Inner-Geometry Learning DistillationCode1
DriveEditor: A Unified 3D Information-Guided Framework for Controllable Object Editing in Driving ScenesCode1
Generating Traffic Scenarios via In-Context Learning to Learn Better Motion PlannerCode1
Pre-training a Density-Aware Pose Transformer for Robust LiDAR-based 3D Human Pose EstimationCode1
DriveTester: A Unified Platform for Simulation-Based Autonomous Driving TestingCode1
ViPOcc: Leveraging Visual Priors from Vision Foundation Models for Single-View 3D Occupancy PredictionCode1
WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language ModelCode1
Towards Flexible 3D Perception: Object-Centric Occupancy Completion Augments 3D Object DetectionCode1
COOOL: Challenge Of Out-Of-Label A Novel Benchmark for Autonomous DrivingCode1
MVCTrack: Boosting 3D Point Cloud Tracking via Multimodal-Guided Virtual CuesCode1
Trajectory-based Road Autolabeling with Lidar-Camera Fusion in Winter ConditionsCode1
SEED4D: A Synthetic Ego--Exo Dynamic 4D Data Generator, Driving Dataset and BenchmarkCode1
A Multi-Loss Strategy for Vehicle Trajectory Prediction: Combining Off-Road, Diversity, and Directional Consistency LossesCode1
From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel ObjectsCode1
WHALES: A Multi-agent Scheduling Dataset for Enhanced Cooperation in Autonomous DrivingCode1
Robust 3D Semantic Occupancy Prediction with Calibration-free Spatial TransformationCode1
ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow PredictionCode1
Large-scale moral machine experiment on large language modelsCode1
LSSInst: Improving Geometric Modeling in LSS-Based BEV Perception with Instance RepresentationCode1
IGDrivSim: A Benchmark for the Imitation Gap in Autonomous DrivingCode1
Learning Multiple Initial Solutions to Optimization ProblemsCode1
ROAD-Waymo: Action Awareness at Scale for Autonomous DrivingCode1
Polar R-CNN: End-to-End Lane Detection with Fewer AnchorsCode1
An Efficient Approach to Generate Safe Drivable Space by LiDAR-Camera-HDmap FusionCode1
SpikMamba: When SNN meets Mamba in Event-based Human Action RecognitionCode1
Explainability of Point Cloud Neural Networks Using SMILE: Statistical Model-Agnostic Interpretability with Local ExplanationsCode1
Real-time Stereo-based 3D Object Detection for Streaming PerceptionCode1
TEOcc: Radar-camera Multi-modal Occupancy Prediction via Temporal EnhancementCode1
CAFuser: Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving ScenesCode1
LoLI-Street: Benchmarking Low-Light Image Enhancement and BeyondCode1
PRFusion: Toward Effective and Robust Multi-Modal Place Recognition with Image and Point Cloud FusionCode1
Spatial-Temporal Multi-Cuts for Online Multiple-Camera Vehicle TrackingCode1
Open3DTrack: Towards Open-Vocabulary 3D Multi-Object TrackingCode1
Show:102550
← PrevPage 11 of 122Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ReasonNetDriving Score79.95Unverified
2InterFuserDriving Score76.18Unverified
3TCPDriving Score75.14Unverified
4TF++ WPDriving Score66.32Unverified
5Learning From All Vehicles (LAV)Driving Score61.85Unverified
6TransFuserDriving Score61.18Unverified
7TransFuser (Reproduced)Driving Score55.04Unverified
8TCP (Reproduced)Driving Score47.91Unverified
9Latent TransFuserDriving Score45.2Unverified
10GRIADDriving Score36.79Unverified
#ModelMetricClaimedVerifiedStatus
1Geometric FusionRC69.17Unverified
2TransFuserRC56.36Unverified
#ModelMetricClaimedVerifiedStatus
1Geometric FusionRC86.91Unverified
2TransFuserRC78.41Unverified