SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1455114600 of 474278 papers

TitleStatusHype
PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR1
Gabliteration: Adaptive Multi-Directional Neural Weight Modification for Selective Behavioral Alteration in Large Language Models1
SK-Adapter: Skeleton-Based Structural Control for Native 3D Generation1
DREAM: Where Visual Understanding Meets Text-to-Image Generation1
Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory1
WideSeek: Advancing Wide Research via Multi-Agent Scaling1
Does Object Binding Naturally Emerge in Large Pretrained Vision Transformers?1
Joint Estimation of Piano Dynamics and Metrical Structure with a Multi-task Multi-Scale Network1
OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models1
EvasionBench: A Large-Scale Benchmark for Detecting Managerial Evasion in Earnings Call Q&A1
NOVA: Sparse Control, Dense Synthesis for Pair-Free Video Editing1
Free(): Learning to Forget in Malloc-Only Reasoning Models1
Learning Goal-Oriented Vision-and-Language Navigation with Self-Improving Demonstrations at Scale1
HSImul3R: Physics-in-the-Loop Reconstruction of Simulation-Ready Human-Scene Interactions1
How2Everything: Mining the Web for How-To Procedures to Evaluate and Improve LLMs1
KLASS: KL-Guided Fast Inference in Masked Diffusion Models1
Outcome Accuracy is Not Enough: Aligning the Reasoning Process of Reward Models1
Toward Complex-Valued Neural Networks for Waveform Generation1
Tracking Capabilities for Safer Agents1
LatentMem: Customizing Latent Memory for Multi-Agent Systems1
MergeMix: A Unified Augmentation Paradigm for Visual and Multi-Modal Understanding1
LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Models via Likelihood Preference1
Conversational Image Segmentation: Grounding Abstract Concepts with Scalable Supervision1
DSGym: A Holistic Framework for Evaluating and Training Data Science Agents1
Do Reasoning Models Enhance Embedding Models?1
One Adapts to Any: Meta Reward Modeling for Personalized LLM Alignment1
Code-A1: Adversarial Evolving of Code LLM and Test LLM via Reinforcement Learning1
-Reasoner: LLM Reasoning via Test-Time Gradient Descent in Latent Space1
Towards Reason-Informed Video Editing in Unified Models with Self-Reflective Learning1
V_1: Unifying Generation and Self-Verification for Parallel Reasoners1
Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models1
Retrieve and Segment: Are a Few Examples Enough to Bridge the Supervision Gap in Open-Vocabulary Segmentation?1
Multimodal Evaluation of Russian-language Architectures1
EditCtrl: Disentangled Local and Global Control for Real-Time Generative Video Editing1
Infherno: End-to-end Agent-based FHIR Resource Synthesis from Free-form Clinical Notes1
Kairos: Toward Adaptive and Parameter-Efficient Time Series Foundation Models1
CFG-Ctrl: Control-Based Classifier-Free Diffusion Guidance1
EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control1
\$OneMillion-Bench: How Far are Language Agents from Human Experts?1
Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts1
ReLi3D: Relightable Multi-view 3D Reconstruction with Disentangled Illumination1
TokenTrim: Inference-Time Token Pruning for Autoregressive Long Video Generation1
Safety Alignment of LMs via Non-cooperative Games1
CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards1
A Mechanistic View on Video Generation as World Models: State and Dynamics1
FineRMoE: Dimension Expansion for Finer-Grained Expert with Its Upcycling Approach1
PixARMesh: Autoregressive Mesh-Native Single-View Scene Reconstruction1
DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving1
Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition1
RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning1
Show:102550
← PrevPage 292 of 9486Next →