SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 351400 of 474278 papers

TitleStatusHype
VACE: All-in-One Video Creation and EditingCode7
Revisiting PCA for time series reduction in temporal dimensionCode7
Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image AnalysisCode7
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement LearningCode7
Flow-GRPO: Training Flow Matching Models via Online RLCode7
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language ReasoningCode7
Pre^3: Enabling Deterministic Pushdown Automata for Faster Structured LLM GenerationCode7
DeepSeek-VL: Towards Real-World Vision-Language UnderstandingCode7
Vista: A Generalizable Driving World Model with High Fidelity and Versatile ControllabilityCode7
Grants4Companies: Applying Declarative Methods for Recommending and Reasoning About Business Grants in the Austrian Public Administration (System Description)Code7
InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction ModelsCode7
PEER: Expertizing Domain-Specific Tasks with a Multi-Agent Framework and Tuning MethodsCode7
Code Generation with AlphaCodium: From Prompt Engineering to Flow EngineeringCode7
Dynamic Evaluation of Large Language Models by Meta Probing AgentsCode7
Better Synthetic Data by Retrieving and Transforming Existing DatasetsCode7
Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal EstimationCode7
From RAG to Memory: Non-Parametric Continual Learning for Large Language ModelsCode7
AIOS Compiler: LLM as Interpreter for Natural Language Programming and Flow Programming of AI AgentsCode7
Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLPCode7
ECCO: Can We Improve Model-Generated Code Efficiency Without Sacrificing Functional Correctness?Code7
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language ModelsCode7
MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge DiscoveryCode7
PyRIT: A Framework for Security Risk Identification and Red Teaming in Generative AI SystemCode7
AutoTrain: No-code training for state-of-the-art modelsCode7
ThunderKittens: Simple, Fast, and Adorable AI KernelsCode7
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement LearningCode7
InfiniteYou: Flexible Photo Recrafting While Preserving Your IdentityCode7
A Scalable Approach to Clustering Embedding ProjectionsCode7
Real-Time Video Generation with Pyramid Attention BroadcastCode7
Stable Audio OpenCode7
OpenThoughts: Data Recipes for Reasoning ModelsCode7
Training AI to be LoyalCode7
CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language ModelsCode7
Paper2Poster: Towards Multimodal Poster Automation from Scientific PapersCode7
MoBA: Mixture of Block Attention for Long-Context LLMsCode7
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?Code7
D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution RefinementCode7
pySLAM: An Open-Source, Modular, and Extensible Framework for SLAMCode7
Exploring Compressed Image Representation as a Perceptual Proxy: A StudyCode7
Practical Efficiency of Muon for PretrainingCode7
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language ModelsCode7
Low-code LLM: Graphical User Interface over Large Language ModelsCode7
O1 Replication Journey: A Strategic Progress Report -- Part 1Code7
Large Concept Models: Language Modeling in a Sentence Representation SpaceCode7
HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical AssistanceCode7
3DGUT: Enabling Distorted Cameras and Secondary Rays in Gaussian SplattingCode7
Scalable MatMul-free Language ModelingCode7
Mooncake: A KVCache-centric Disaggregated Architecture for LLM ServingCode7
Seed-TTS: A Family of High-Quality Versatile Speech Generation ModelsCode7
MMSU: A Massive Multi-task Spoken Language Understanding and Reasoning BenchmarkCode7
Show:102550
← PrevPage 8 of 9486Next →