SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1940119450 of 474278 papers

TitleStatusHype
Actor-Critic based Online Data Mixing For Language Model Pre-Training0
Reinforcement Learning for Better Verbalized Confidence in Long-Form Generation0
Probing Association Biases in LLM Moderation Over-Sensitivity0
ChARM: Character-based Act-adaptive Reward Modeling for Advanced Role-Playing Language AgentsCode0
BIRD: Behavior Induction via Representation-structure Distillation0
TSENOR: Highly-Efficient Algorithm for Finding Transposable N:M Sparse Masks0
Information Structure in Mappings: An Approach to Learning, Representation, and Generalisation0
VisualSphinx: Large-Scale Synthetic Vision Logic Puzzles for RL0
Multi-output Classification using a Cross-talk Architecture for Compound Fault Diagnosis of Motors in Partially Labeled Condition0
Large Language Model Meets Constraint Propagation0
From Images to Signals: Are Large Vision Models Useful for Time Series Analysis?0
MedPAIR: Measuring Physicians and AI Relevance Alignment in Medical Question Answering0
Bridging Source and Target Domains via Link Prediction for Unsupervised Domain Adaptation on Graphs0
Measure gradients, not activations! Enhancing neuronal activity in deep reinforcement learning0
Simplifying Bayesian Optimization Via In-Context Direct Optimum Sampling0
Adaptive Deadline and Batch Layered Synchronized Federated Learning0
The Rich and the Simple: On the Implicit Bias of Adam and SGD0
Towards disentangling the contributions of articulation and acoustics in multimodal phoneme recognition0
Conformal Object Detection by Sequential Risk Control0
One Task Vector is not Enough: A Large-Scale Study for In-Context Learning0
SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving0
Adaptive finite element type decomposition of Gaussian processes0
MaskAdapt: Unsupervised Geometry-Aware Domain Adaptation Using Multimodal Contextual Learning and RGB-Depth Masking0
Scaling up the think-aloud methodCode0
Primal-Dual Neural Algorithmic ReasoningCode0
3DGEER: Exact and Efficient Volumetric Rendering with 3D GaussiansCode1
NeuronTune: Towards Self-Guided Spurious Bias MitigationCode0
DeepTopoNet: A Framework for Subglacial Topography Estimation on the Greenland Ice SheetsCode0
BeaverTalk: Oregon State University's IWSLT 2025 Simultaneous Speech Translation SystemCode0
FLAT-LLM: Fine-grained Low-rank Activation Space Transformation for Large Language Model CompressionCode0
ScaleLong: A Multi-Timescale Benchmark for Long Video UnderstandingCode0
CNN-LSTM Hybrid Model for AI-Driven Prediction of COVID-19 Severity from Spike Sequences and Clinical DataCode0
The Surprising Soupability of Documents in State Space Models0
Confidential Guardian: Cryptographically Prohibiting the Abuse of Model AbstentionCode0
GenIC: An LLM-Based Framework for Instance Completion in Knowledge GraphsCode0
Large Language Models for Controllable Multi-property Multi-objective Molecule OptimizationCode0
Is Your Model Fairly Certain? Uncertainty-Aware Fairness Evaluation for LLMsCode0
Position: The Future of Bayesian Prediction Is Prior-Fitted0
ADG: Ambient Diffusion-Guided Dataset Recovery for Corruption-Robust Offline Reinforcement Learning0
Multi-Modal View Enhanced Large Vision Models for Long-Term Time Series Forecasting0
OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task AutomationCode11
Test-Time Training Done Right0
BiBLDR: Bidirectional Behavior Learning for Drug RepositioningCode0
Transforming Podcast Preview Generation: From Expert Models to LLM-Based Systems0
Searching Neural Architectures for Sensor Nodes on IoT Gateways0
KGMark: A Diffusion Watermark for Knowledge GraphsCode0
OMNIGUARD: An Efficient Approach for AI Safety Moderation Across ModalitiesCode0
BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive LearningCode0
Retrieval Augmented Generation based Large Language Models for Causality MiningCode0
Thompson Sampling in Online RLHF with General Function Approximation0
Show:102550
← PrevPage 389 of 9486Next →