SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 25512600 of 659983 papers

TitleStatusHype
CoMotion: Concurrent Multi-person 3D MotionCode3
Elucidating the Design Space of Multimodal Protein Language ModelsCode3
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing ReasoningCode3
SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RLCode3
REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion TransformersCode3
DataDecide: How to Predict Best Pretraining Data with Small ExperimentsCode3
Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement LearningCode3
REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real WebsitesCode3
DataSentinel: A Game-Theoretic Detection of Prompt Injection AttacksCode3
Efficient Reasoning Models: A SurveyCode3
A Clean Slate for Offline Reinforcement LearningCode3
Evaluation Report on MCP ServersCode3
Ai2 Scholar QA: Organized Literature Synthesis with AttributionCode3
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to ReinforceCode3
RAKG:Document-level Retrieval Augmented Knowledge Graph ConstructionCode3
The Tenth NTIRE 2025 Efficient Super-Resolution Challenge ReportCode3
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion TransformersCode3
Deep Reasoning Translation via Reinforcement LearningCode3
GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI AgentsCode3
Syzygy of Thoughts: Improving LLM CoT with the Minimal Free ResolutionCode3
TensorNEAT: A GPU-accelerated Library for NeuroEvolution of Augmenting TopologiesCode3
DocAgent: A Multi-Agent System for Automated Code Documentation GenerationCode3
MSCCL++: Rethinking GPU Communication Abstractions for Cutting-edge AI ApplicationsCode3
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image GenerationCode3
PixelFlow: Pixel-Space Generative Models with FlowCode3
Detect Anything 3D in the WildCode3
Perception-R1: Pioneering Perception Policy with Reinforcement LearningCode3
Dynamic Cheatsheet: Test-Time Learning with Adaptive MemoryCode3
Geo4D: Leveraging Video Generators for Geometric 4D Scene ReconstructionCode3
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-TuningCode3
FlashDepth: Real-time Streaming Video Depth Estimation at 2K ResolutionCode3
SEA-LION: Southeast Asian Languages in One NetworkCode3
GPU-accelerated Evolutionary Many-objective Optimization Using Tensorized NSGA-IIICode3
DDT: Decoupled Diffusion TransformerCode3
PromptHMR: Promptable Human Mesh RecoveryCode3
Playing Non-Embedded Card-Based Games with Reinforcement LearningCode3
DFormerv2: Geometry Self-Attention for RGBD Semantic SegmentationCode3
Video4DGen: Enhancing Video and 4D Generation through Mutual OptimizationCode3
TrafficLLM: Enhancing Large Language Models for Network Traffic Analysis with Generic Traffic RepresentationCode3
Scaling Analysis of Interleaved Speech-Text Language ModelsCode3
GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image GenerationCode3
Affordable AI Assistants with Knowledge Graph of ThoughtsCode3
Multi-SWE-bench: A Multilingual Benchmark for Issue ResolvingCode3
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement LearningCode3
Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head GenerationCode3
End-to-End Driving with Online Trajectory Evaluation via BEV World ModelCode3
YourBench: Easy Custom Evaluation Sets for EveryoneCode3
AnimeGamer: Infinite Anime Life Simulation with Next Game State PredictionCode3
MedReason: Eliciting Factual Medical Reasoning Steps in LLMs via Knowledge GraphsCode3
Beyond Quacking: Deep Integration of Language Models and RAG into DuckDBCode3
Show:102550
← PrevPage 52 of 13200Next →