SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

658,356 papers258,216 code links4,818 tasks

Papers

Showing 351400 of 658356 papers

TitleStatusHype
ThunderKittens: Simple, Fast, and Adorable AI KernelsCode7
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement LearningCode7
InfiniteYou: Flexible Photo Recrafting While Preserving Your IdentityCode7
A Scalable Approach to Clustering Embedding ProjectionsCode7
Real-Time Video Generation with Pyramid Attention BroadcastCode7
Stable Audio OpenCode7
OpenThoughts: Data Recipes for Reasoning ModelsCode7
Training AI to be LoyalCode7
CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language ModelsCode7
Paper2Poster: Towards Multimodal Poster Automation from Scientific PapersCode7
MoBA: Mixture of Block Attention for Long-Context LLMsCode7
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?Code7
D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution RefinementCode7
pySLAM: An Open-Source, Modular, and Extensible Framework for SLAMCode7
Exploring Compressed Image Representation as a Perceptual Proxy: A StudyCode7
Practical Efficiency of Muon for PretrainingCode7
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language ModelsCode7
Low-code LLM: Graphical User Interface over Large Language ModelsCode7
O1 Replication Journey: A Strategic Progress Report -- Part 1Code7
Large Concept Models: Language Modeling in a Sentence Representation SpaceCode7
HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical AssistanceCode7
3DGUT: Enabling Distorted Cameras and Secondary Rays in Gaussian SplattingCode7
Scalable MatMul-free Language ModelingCode7
Mooncake: A KVCache-centric Disaggregated Architecture for LLM ServingCode7
Seed-TTS: A Family of High-Quality Versatile Speech Generation ModelsCode7
MMSU: A Massive Multi-task Spoken Language Understanding and Reasoning BenchmarkCode7
EAGLE: Speculative Sampling Requires Rethinking Feature UncertaintyCode7
The Prompt Report: A Systematic Survey of Prompting TechniquesCode7
Qwen2.5-Omni Technical ReportCode7
Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large-Scale RecommendationCode7
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe SystemsCode7
Labeling supervised fine-tuning data with the scaling lawCode7
A Survey of Graph Retrieval-Augmented Generation for Customized Large Language ModelsCode7
When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language ModelsCode7
DSPy: Compiling Declarative Language Model Calls into Self-Improving PipelinesCode7
TotalSegmentator MRI: Robust Sequence-independent Segmentation of Multiple Anatomic Structures in MRICode7
RouteLLM: Learning to Route LLMs with Preference DataCode7
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image GenerationCode7
YOLOv12: Attention-Centric Real-Time Object DetectorsCode7
Long-form music generation with latent diffusionCode7
LLM-AutoDiff: Auto-Differentiate Any LLM WorkflowCode7
Global Structure-from-Motion RevisitedCode7
Revisiting Feature Prediction for Learning Visual Representations from VideoCode7
Fast Text-to-Audio Generation with Adversarial Post-TrainingCode7
GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken ChatbotCode7
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and PlanningCode7
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning AttentionCode7
Flow Matching Guide and CodeCode7
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding HeadsCode7
ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AICode7
Show:102550
← PrevPage 8 of 13168Next →