SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1505115100 of 474278 papers

TitleStatusHype
Toward Safety-First Human-Like Decision Making for Autonomous Vehicles in Time-Varying Traffic Flow0
AgentDistill: Training-Free Agent Distillation with Generalizable MCP Boxes0
SKOLR: Structured Koopman Operator Linear RNN for Time-Series Forecasting0
CLGNN: A Contrastive Learning-based GNN Model for Betweenness Centrality Prediction on Temporal Graphs0
DiffusionBlocks: Blockwise Training for Generative Models via Score-Based Diffusion0
IntelliLung: Advancing Safe Mechanical Ventilation using Offline RL with Hybrid Actions and Clinically Aligned Rewards0
ResNets Are Deeper Than You Think0
HiLight: A Hierarchical Reinforcement Learning Framework with Global Adversarial Guidance for Large-Scale Traffic Signal Control0
Is Selection All You Need in Differential Evolution?0
Sharp Generalization Bounds for Foundation Models with Asymmetric Randomized Low-Rank Adapters0
Object-Centric Neuro-Argumentative LearningCode0
Unified Software Engineering agent as AI Software Engineer0
Universal Rates of ERM for Agnostic Learning0
Multi-Scale Finetuning for Encoder-based Time Series Foundation Models0
Unsupervised Skill Discovery through Skill Regions Differentiation0
A General Framework for Off-Policy Learning with Partially-Observed Reward0
Detecting immune cells with label-free two-photon autofluorescence and deep learning0
Zeroth-Order Optimization is Secretly Single-Step Policy Optimization0
Feasibility-Driven Trust Region Bayesian Optimization0
Reimagining Target-Aware Molecular Generation through Retrieval-Enhanced Aligned Diffusion0
The Perception of Phase Intercept Distortion and its Application in Data Augmentation0
Capacity Matters: a Proof-of-Concept for Transformer Memorization on Real-World DataCode0
MOL: Joint Estimation of Micro-Expression, Optical Flow, and Landmark via Transformer-Graph-Style ConvolutionCode1
GUI-Robust: A Comprehensive Dataset for Testing GUI Agent Robustness in Real-World AnomaliesCode1
Deep Learning Surrogates for Real-Time Gas Emission Inversion0
Unsupervised Imaging Inverse Problems with Diffusion Distribution MatchingCode1
3DGS-IEval-15K: A Large-scale Image Quality Evaluation Database for 3D Gaussian-SplattingCode1
Iterative Camera-LiDAR Extrinsic Optimization via Surrogate DiffusionCode0
Chaining Event Spans for Temporal Relation GroundingCode0
Re-Initialization Token Learning for Tool-Augmented Large Language ModelsCode0
ImpliRet: Benchmarking the Implicit Fact Retrieval ChallengeCode0
How Far Can LLMs Improve from Experience? Measuring Test-Time Learning Ability in LLMs with Human ComparisonCode0
AlphaDecay: Module-wise Weight Decay for Heavy-Tailed Balancing in LLMsCode0
GenerationPrograms: Fine-grained Attribution with Executable ProgramsCode0
Optimizing Length Compression in Large Reasoning ModelsCode1
Into the Unknown: Applying Inductive Spatial-Semantic Location Embeddings for Predicting Individuals' Mobility Beyond Visited PlacesCode0
AST-Enhanced or AST-Overloaded? The Surprising Impact of Hybrid Graph Representations on Code Clone DetectionCode0
Leveraging External Factors in Household-Level Electrical Consumption Forecasting using HypernetworksCode0
Common Benchmarks Undervalue the Generalization Power of Programmatic PoliciesCode0
Towards Robust Learning to Optimize with Theoretical GuaranteesCode0
A Scalable Hybrid Training Approach for Recurrent Spiking Neural NetworksCode0
Model compression using knowledge distillation with integrated gradients0
Scaling Intelligence: Designing Data Centers for Next-Gen Language Models0
The use of cross validation in the analysis of designed experimentsCode0
Abstract Meaning Representation for Hospital Discharge SummarizationCode0
Adapting Lightweight Vision Language Models for Radiological Visual Question AnsweringCode0
PoseGRAF: Geometric-Reinforced Adaptive Fusion for Monocular 3D Human Pose EstimationCode0
GRAM: A Generative Foundation Reward Model for Reward GeneralizationCode1
QUEST: Quality-aware Semi-supervised Table Extraction for Business DocumentsCode0
Dataset distillation for memorized data: Soft labels can leak held-out teacher knowledgeCode0
Show:102550
← PrevPage 302 of 9486Next →