SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 9511000 of 659983 papers

TitleStatusHype
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of ExpertsCode5
RLHF Workflow: From Reward Modeling to Online RLHFCode5
Single-seed generation of Brownian paths and integrals for adaptive and high order SDE solversCode5
Evaluating Real-World Robot Manipulation Policies in SimulationCode5
Granite Code Models: A Family of Open Foundation Models for Code IntelligenceCode5
AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion EncodingCode5
When LLMs Meet Cybersecurity: A Systematic Literature ReviewCode5
Balance Reward and Safety Optimization for Safe Reinforcement Learning: A Perspective of Gradient ManipulationCode5
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language ModelsCode5
XFeat: Accelerated Features for Lightweight Image MatchingCode5
Make Your LLM Fully Utilize the ContextCode5
ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity PreservingCode5
NTIRE 2024 Challenge on Low Light Image Enhancement: Methods and ResultsCode5
MARIO Eval: Evaluate Your Math LLM with your Math LLM--A mathematical dataset evaluation toolkitCode5
Do "English" Named Entity Recognizers Work Well on Global Englishes?Code5
Sample Design Engineering: An Empirical Study of What Makes Good Downstream Fine-Tuning Samples for LLMsCode5
Lean Copilot: Large Language Models as Copilots for Theorem Proving in LeanCode5
Gaussian Opacity Fields: Efficient Adaptive Surface Reconstruction in Unbounded ScenesCode5
Magic Clothing: Controllable Garment-Driven Image SynthesisCode5
SQUAT: Stateful Quantization-Aware Training in Recurrent Spiking Neural NetworksCode5
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference OptimizationCode5
MING-MOE: Enhancing Medical Multi-Task Learning in Large Language Models with Sparse Mixture of Low-Rank Adapter ExpertsCode5
The Path To Autonomous Cyber DefenseCode5
LM Transparency Tool: Interactive Tool for Analyzing Transformer Language ModelsCode5
LLM2Vec: Large Language Models Are Secretly Powerful Text EncodersCode5
SpeechAlign: Aligning Speech Generation to Human PreferencesCode5
Humanoid-Gym: Reinforcement Learning for Humanoid Robot with Zero-Shot Sim2Real TransferCode5
MagicTime: Time-lapse Video Generation Models as Metamorphic SimulatorsCode5
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic EvaluatorsCode5
SpatialTracker: Tracking Any 2D Pixels in 3D SpaceCode5
ReFT: Representation Finetuning for Language ModelsCode5
Masked Completion via Structured Diffusion with White-Box TransformersCode5
Long-context LLMs Struggle with Long In-context LearningCode5
CityGaussian: Real-time High-quality Large-Scale Scene Rendering with GaussiansCode5
Measuring Taiwanese Mandarin Language UnderstandingCode5
TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting MethodsCode5
InstantSplat: Sparse-view SfM-free Gaussian Splatting in SecondsCode5
GauStudio: A Modular Framework for 3D Gaussian Splatting and BeyondCode5
UniDepth: Universal Monocular Metric Depth EstimationCode5
ChatDBG: Augmenting Debugging with Large Language ModelsCode5
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from TextCode5
MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View ImagesCode5
Mora: Enabling Generalist Video Generation via A Multi-Agent FrameworkCode5
Evolutionary Optimization of Model Merging RecipesCode5
FeatUp: A Model-Agnostic Framework for Features at Any ResolutionCode5
Automatic Interactive Evaluation for Large Language Models with State Aware Patient SimulatorCode5
Fundamental Components of Deep Learning: A category-theoretic approachCode5
WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?Code5
Bridging Different Language Models and Generative Vision Models for Text-to-Image GenerationCode5
pyvene: A Library for Understanding and Improving PyTorch Models via InterventionsCode5
Show:102550
← PrevPage 20 of 13200Next →