SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 776800 of 177339 papers

TitleStatusHype
Secrets of RLHF in Large Language Models Part II: Reward ModelingCode5
BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch DiffusionCode5
Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble ScorersCode5
WeNet 2.0: More Productive End-to-End Speech Recognition ToolkitCode5
WebVoyager: Building an End-to-End Web Agent with Large Multimodal ModelsCode5
MLE-bench: Evaluating Machine Learning Agents on Machine Learning EngineeringCode5
Free Process Rewards without Process LabelsCode5
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMsCode5
Executable Code Actions Elicit Better LLM AgentsCode5
InspireMusic: Integrating Super Resolution and Large Language Model for High-Fidelity Long-Form Music GenerationCode5
PatchRefiner: Leveraging Synthetic Data for Real-Domain High-Resolution Monocular Metric Depth EstimationCode5
ZoeDepth: Zero-shot Transfer by Combining Relative and Metric DepthCode5
Continuous Thought MachinesCode5
OpenMoE: An Early Effort on Open Mixture-of-Experts Language ModelsCode5
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to PosttrainingCode5
Efficient Streaming Language Models with Attention SinksCode5
OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMsCode5
Group-in-Group Policy Optimization for LLM Agent TrainingCode5
Sequencer: Deep LSTM for Image ClassificationCode5
FunReason: Enhancing Large Language Models' Function Calling via Self-Refinement Multiscale Loss and Automated Data RefinementCode5
Darwin Godel Machine: Open-Ended Evolution of Self-Improving AgentsCode5
EvoGit: Decentralized Code Evolution via Git-Based Multi-Agent CollaborationCode5
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation ModelsCode5
SoundMind: RL-Incentivized Logic Reasoning for Audio-Language ModelsCode5
Matrix-Game: Interactive World Foundation ModelCode5
Show:102550
← PrevPage 32 of 7094Next →