SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 51515175 of 661570 papers

TitleStatusHype
FunDiff: Diffusion Models over Function Spaces for Physics-Informed Generative ModelingCode2
Speedy Deformable 3D Gaussian Splatting: Fast Rendering and Compression of Dynamic ScenesCode2
Thinking vs. Doing: Agents that Reason by Scaling Test-Time InteractionCode2
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image GenerationCode2
BitVLA: 1-bit Vision-Language-Action Models for Robotics ManipulationCode2
HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial OptimizationCode2
Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMsCode2
Audio synthesizer inversion in symmetric parameter spaces with approximately equivariant flow matchingCode2
Generating Long Semantic IDs in Parallel for RecommendationCode2
RecGPT: A Foundation Model for Sequential RecommendationCode2
MegaHan97K: A Large-Scale Dataset for Mega-Category Chinese Character Recognition with over 97K CategoriesCode2
VideoMolmo: Spatio-Temporal Grounding Meets PointingCode2
Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and VideosCode2
MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought ReasoningCode2
Kinetics: Rethinking Test-Time Scaling LawsCode2
SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMsCode2
Unifying Appearance Codes and Bilateral Grids for Driving Scene Gaussian SplattingCode2
Exploring Diffusion Transformer Designs via GraftingCode2
AliTok: Towards Sequence Modeling Alignment between Tokenizer and Autoregressive ModelCode2
A Smooth Sea Never Made a Skilled SAILOR: Robust Imitation via Learning to SearchCode2
EMBER2024 -- A Benchmark Dataset for Holistic Evaluation of Malware ClassifiersCode2
Scaling Laws for Robust Comparison of Open Foundation Language-Vision Models and DatasetsCode2
Search Arena: Analyzing Search-Augmented LLMsCode2
Contrastive Flow MatchingCode2
LeanExplore: A search engine for Lean 4 declarationsCode2
Show:102550
← PrevPage 207 of 26463Next →