SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1800118050 of 474278 papers

TitleStatusHype
EuroLLM-9B: Technical Report0
EV-Flying: an Event-based Dataset for In-The-Wild Recognition of Flying Objects0
YOND: Practical Blind Raw Image Denoising Free from Camera-Specific Data Dependency0
RewardAnything: Generalizable Principle-Following Reward ModelsCode1
PRJ: Perception-Retrieval-Judgement for Generated Images0
Recent Advances in Medical Image Classification0
DiffCAP: Diffusion-based Cumulative Adversarial Purification for Vision Language Models0
FedFACT: A Provable Framework for Controllable Group-Fairness Calibration in Federated Learning0
Model Splitting Enhanced Communication-Efficient Federated Learning for CSI Feedback0
R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement LearningCode0
Vulnerability-Aware Alignment: Mitigating Uneven Forgetting in Harmful Fine-Tuning0
Multi-view Surface Reconstruction Using Normal and Reflectance CuesCode2
INP-Former++: Advancing Universal Anomaly Detection via Intrinsic Normal Prototypes and Residual LearningCode3
Learning from Noise: Enhancing DNNs for Event-Based Vision through Controlled Noise InjectionCode0
DrSR: LLM based Scientific Equation Discovery with Dual Reasoning from Data and Experience0
Diffusion Transformer-based Universal Dose Denoising for Pencil Beam Scanning Proton Therapy0
Classifying Dental Care Providers Through Machine Learning with Features Ranking0
macOSWorld: A Multilingual Interactive Benchmark for GUI AgentsCode1
GEM: Empowering LLM for both Embedding Generation and Language Understanding0
Hierarchical Text Classification Using Contrastive Learning Informed Path Guided Hierarchy0
Replay Can Provably Increase Forgetting0
RedRFT: A Light-Weight Benchmark for Reinforcement Fine-Tuning-Based Red TeamingCode0
Backbone Augmented Training for Adaptations0
You Only Train Once0
A Poisson-Guided Decomposition Network for Extreme Low-Light Image Enhancement0
Mechanistic Decomposition of Sentence Representations0
MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP0
Building a Few-Shot Cross-Domain Multilingual NLU Model for Customer Care0
MedAgentGym: Training LLM Agents for Code-Based Medical Reasoning at Scale0
Unpacking Let Alone: Human-Scale Models Generalize to a Rare Construction in Form but not Meaning0
Zero-Shot Open-Schema Entity Structure Discovery0
SQLens: An End-to-End Framework for Error Detection and Correction in Text-to-SQL0
DRE: An Effective Dual-Refined Method for Integrating Small and Large Language Models in Open-Domain Dialogue Evaluation0
Automated Skill Discovery for Language Agents through Exploration and Iterative Feedback0
Plugging Schema Graph into Multi-Table QA: A Human-Guided Framework for Reducing LLM Reliance0
Schema Generation for Large Knowledge Graphs Using Large Language Models0
Knowledge-guided Contextual Gene Set Analysis Using Large Language Models0
Bridging the Performance Gap Between Target-Free and Target-Based Reinforcement Learning With Iterated Q-Learning0
The Latent Space Hypothesis: Toward Universal Medical Representation Learning0
Quantum-Inspired Genetic Optimization for Patient Scheduling in Radiation Oncology0
Relational reasoning and inductive bias in transformers trained on a transitive inference task0
A Lyapunov Drift-Plus-Penalty Method Tailored for Reinforcement Learning with Queue Stability0
Short-Term Power Demand Forecasting for Diverse Consumer Types to Enhance Grid Planning and Synchronisation0
Deep learning for predicting hauling fleet production capacity under uncertainties in open pit mines using real and simulated data0
Self-Supervised Contrastive Learning is Approximately Supervised Contrastive Learning0
KOALA++: Efficient Kalman-Based Optimization of Neural Networks with Gradient-Covariance Products0
RETRO SYNFLOW: Discrete Flow Matching for Accurate and Diverse Single-Step Retrosynthesis0
Selective Matching Losses -- Not All Scores Are Created Equal0
AUTOCT: Automating Interpretable Clinical Trial Prediction with LLM Agents0
Beyond Memorization: A Rigorous Evaluation Framework for Medical Knowledge EditingCode0
Show:102550
← PrevPage 361 of 9486Next →