SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

658,356 papers258,216 code links4,818 tasks

Papers

Showing 301350 of 180343 papers

TitleStatusHype
Gradient-Based Multi-Objective Deep Learning: Algorithms, Theories, Applications, and BeyondCode7
PerceptionLM: Open-Access Data and Models for Detailed Visual UnderstandingCode7
Tulu 3: Pushing Frontiers in Open Language Model Post-TrainingCode7
Measuring Massive Multitask Chinese UnderstandingCode7
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectorsCode7
FoundationStereo: Zero-Shot Stereo MatchingCode7
Mirage: A Multi-Level Superoptimizer for Tensor ProgramsCode7
TimeXer: Empowering Transformers for Time Series Forecasting with Exogenous VariablesCode7
Visual Agentic Reinforcement Fine-TuningCode7
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object DetectionCode7
Align Anything: Training All-Modality Models to Follow Instructions with Language FeedbackCode7
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal ModelsCode7
Measuring short-form factuality in large language modelsCode7
RedPajama: an Open Dataset for Training Large Language ModelsCode7
Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling LibraryCode7
BrowseComp: A Simple Yet Challenging Benchmark for Browsing AgentsCode7
Easy Begun is Half Done: Spatial-Temporal Graph Modeling with ST-Curriculum DropoutCode7
Paper2Code: Automating Code Generation from Scientific Papers in Machine LearningCode7
On the Vulnerability of LLM/VLM-Controlled RoboticsCode7
OpenAssistant Conversations - Democratizing Large Language Model AlignmentCode7
Grounding Image Matching in 3D with MASt3RCode7
PPTAgent: Generating and Evaluating Presentations Beyond Text-to-SlidesCode7
VACE: All-in-One Video Creation and EditingCode7
Revisiting PCA for time series reduction in temporal dimensionCode7
Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image AnalysisCode7
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement LearningCode7
Flow-GRPO: Training Flow Matching Models via Online RLCode7
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language ReasoningCode7
Pre^3: Enabling Deterministic Pushdown Automata for Faster Structured LLM GenerationCode7
DeepSeek-VL: Towards Real-World Vision-Language UnderstandingCode7
Vista: A Generalizable Driving World Model with High Fidelity and Versatile ControllabilityCode7
Grants4Companies: Applying Declarative Methods for Recommending and Reasoning About Business Grants in the Austrian Public Administration (System Description)Code7
InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction ModelsCode7
PEER: Expertizing Domain-Specific Tasks with a Multi-Agent Framework and Tuning MethodsCode7
Code Generation with AlphaCodium: From Prompt Engineering to Flow EngineeringCode7
Dynamic Evaluation of Large Language Models by Meta Probing AgentsCode7
Better Synthetic Data by Retrieving and Transforming Existing DatasetsCode7
Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal EstimationCode7
From RAG to Memory: Non-Parametric Continual Learning for Large Language ModelsCode7
AIOS Compiler: LLM as Interpreter for Natural Language Programming and Flow Programming of AI AgentsCode7
Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLPCode7
ECCO: Can We Improve Model-Generated Code Efficiency Without Sacrificing Functional Correctness?Code7
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language ModelsCode7
MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge DiscoveryCode7
PyRIT: A Framework for Security Risk Identification and Red Teaming in Generative AI SystemCode7
AutoTrain: No-code training for state-of-the-art modelsCode7
ThunderKittens: Simple, Fast, and Adorable AI KernelsCode7
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement LearningCode7
InfiniteYou: Flexible Photo Recrafting While Preserving Your IdentityCode7
A Scalable Approach to Clustering Embedding ProjectionsCode7
Show:102550
← PrevPage 7 of 3607Next →