SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 87768800 of 474278 papers

TitleStatusHype
Designing Tools with Control ConfidenceCode0
Few Shot Semi-Supervised Learning for Abnormal Stop Detection from Sparse GPS TrajectoriesCode0
PET Head Motion Estimation Using Supervised Deep Learning with AttentionCode0
The Harder The Better: Maintaining Supervised Fine-tuning Generalization with Less but Harder DataCode0
KnowledgeSmith: Uncovering Knowledge Updating in LLMs with Model Editing and UnlearningCode0
CAGE: Continuity-Aware edGE Network Unlocks Robust Floorplan ReconstructionCode0
KonfAI: A Modular and Fully Configurable Framework for Deep Learning in Medical ImagingCode0
Limited Preference Data? Learning Better Reward Model with Latent Space SynthesisCode0
Biased-Attention Guided Risk Prediction for Safe Decision-Making at Unsignalized IntersectionsCode0
One Life to Learn: Inferring Symbolic World Models for Stochastic Environments from Unguided Exploration0
Robot Learning: A Tutorial0
Dual Learning with Dynamic Knowledge Distillation and Soft Alignment for Partially Relevant Video RetrievalCode0
GeoVLM-R1: Reinforcement Fine-Tuning for Improved Remote Sensing Reasoning0
BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions0
An Adaptive Edge-Guided Dual-Network Framework for Fast QR Code Motion DeblurringCode0
In the Eye of MLLM: Benchmarking Egocentric Video Intent Understanding with Gaze-Guided Prompting0
InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts0
SCAN: Self-Denoising Monte Carlo Annotation for Robust Process Reward Learning0
Clean First, Align Later: Benchmarking Preference Data Cleaning for Reliable LLM AlignmentCode0
CoIRL-AD: Collaborative-Competitive Imitation-Reinforcement Learning in Latent World Models for Autonomous DrivingCode0
PricingLogic: Evaluating LLMs Reasoning on Complex Tourism Pricing TasksCode0
MMOT: The First Challenging Benchmark for Drone-based Multispectral Multi-Object TrackingCode0
Probing Latent Knowledge Conflict for Faithful Retrieval-Augmented GenerationCode0
Towards Robust and Realible Multimodal Misinformation Recognition with Incomplete ModalityCode0
Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language ModelsCode0
Show:102550
← PrevPage 352 of 18972Next →