SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 401450 of 474278 papers

TitleStatusHype
EAGLE: Speculative Sampling Requires Rethinking Feature UncertaintyCode7
The Prompt Report: A Systematic Survey of Prompting TechniquesCode7
Qwen2.5-Omni Technical ReportCode7
Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large-Scale RecommendationCode7
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe SystemsCode7
Labeling supervised fine-tuning data with the scaling lawCode7
A Survey of Graph Retrieval-Augmented Generation for Customized Large Language ModelsCode7
When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language ModelsCode7
DSPy: Compiling Declarative Language Model Calls into Self-Improving PipelinesCode7
TotalSegmentator MRI: Robust Sequence-independent Segmentation of Multiple Anatomic Structures in MRICode7
RouteLLM: Learning to Route LLMs with Preference DataCode7
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image GenerationCode7
YOLOv12: Attention-Centric Real-Time Object DetectorsCode7
Long-form music generation with latent diffusionCode7
LLM-AutoDiff: Auto-Differentiate Any LLM WorkflowCode7
Global Structure-from-Motion RevisitedCode7
Revisiting Feature Prediction for Learning Visual Representations from VideoCode7
Fast Text-to-Audio Generation with Adversarial Post-TrainingCode7
GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken ChatbotCode7
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and PlanningCode7
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning AttentionCode7
Flow Matching Guide and CodeCode7
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding HeadsCode7
ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AICode7
Improving Diffusion Models for Authentic Virtual Try-on in the WildCode7
Judging LLM-as-a-Judge with MT-Bench and Chatbot ArenaCode7
The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree SearchCode7
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language ModelsCode7
Skywork-R1V3 Technical ReportCode7
Interactive Prompt Debugging with Sequence SalienceCode7
gsplat: An Open-Source Library for Gaussian SplattingCode7
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained TransformersCode7
EvoAgentX: An Automated Framework for Evolving Agentic WorkflowsCode7
DataComp-LM: In search of the next generation of training sets for language modelsCode7
VITA: Towards Open-Source Interactive Omni Multimodal LLMCode7
Segment Anything in Medical Images and Videos: Benchmark and DeploymentCode7
LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language ModelsCode7
Cradle: Empowering Foundation Agents Towards General Computer ControlCode7
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer EnvironmentsCode7
Efficient Track AnythingCode7
Streamlining Ocean Dynamics Modeling with Fourier Neural Operators: A Multiobjective Hyperparameter and Architecture Optimization ApproachCode7
Embedding Atlas: Low-Friction, Interactive Embedding VisualizationCode7
A Library for Learning Neural OperatorsCode7
Kimi k1.5: Scaling Reinforcement Learning with LLMsCode7
AutoCodeRover: Autonomous Program ImprovementCode7
S*: Test Time Scaling for Code GenerationCode7
RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection TransformerCode7
AI-Researcher: Autonomous Scientific InnovationCode7
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language ModelsCode7
PIXART-δ: Fast and Controllable Image Generation with Latent Consistency ModelsCode7
Show:102550
← PrevPage 9 of 9486Next →