SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 51515200 of 661570 papers

TitleStatusHype
Vision Transformers Don't Need Trained RegistersCode2
Thinking vs. Doing: Agents that Reason by Scaling Test-Time InteractionCode2
CausalPFN: Amortized Causal Effect Estimation via In-Context LearningCode2
HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial OptimizationCode2
FunDiff: Diffusion Models over Function Spaces for Physics-Informed Generative ModelingCode2
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image GenerationCode2
Audio synthesizer inversion in symmetric parameter spaces with approximately equivariant flow matchingCode2
Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMsCode2
RecGPT: A Foundation Model for Sequential RecommendationCode2
Generating Long Semantic IDs in Parallel for RecommendationCode2
Contrastive Flow MatchingCode2
Kinetics: Rethinking Test-Time Scaling LawsCode2
Search Arena: Analyzing Search-Augmented LLMsCode2
MegaHan97K: A Large-Scale Dataset for Mega-Category Chinese Character Recognition with over 97K CategoriesCode2
Scaling Laws for Robust Comparison of Open Foundation Language-Vision Models and DatasetsCode2
Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and VideosCode2
Unifying Appearance Codes and Bilateral Grids for Driving Scene Gaussian SplattingCode2
EMBER2024 -- A Benchmark Dataset for Holistic Evaluation of Malware ClassifiersCode2
Exploring Diffusion Transformer Designs via GraftingCode2
VideoMolmo: Spatio-Temporal Grounding Meets PointingCode2
A Smooth Sea Never Made a Skilled SAILOR: Robust Imitation via Learning to SearchCode2
AliTok: Towards Sequence Modeling Alignment between Tokenizer and Autoregressive ModelCode2
SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMsCode2
MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought ReasoningCode2
Facial Appearance Capture at Home with Patch-Level Reflectance PriorCode2
chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulationsCode2
Multi-view Surface Reconstruction Using Normal and Reflectance CuesCode2
Photoreal Scene Reconstruction from an Egocentric DeviceCode2
LeanExplore: A search engine for Lean 4 declarationsCode2
Savage-Dickey density ratio estimation with normalizing flows for Bayesian model comparisonCode2
ORV: 4D Occupancy-centric Robot Video GenerationCode2
CyberGym: Evaluating AI Agents' Cybersecurity Capabilities with Real-World Vulnerabilities at ScaleCode2
Simulate Any Radar: Attribute-Controllable Radar Simulation via Waveform Parameter EmbeddingCode2
Towards In-the-wild 3D Plane Reconstruction from a Single ImageCode2
KVCache Cache in the Wild: Characterizing and Optimizing KVCache Cache at a Large Cloud ProviderCode2
Revisiting End-to-End Learning with Slide-level Supervision in Computational PathologyCode2
Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM ReasoningCode2
HyperSteer: Activation Steering at Scale with HypernetworksCode2
Reasoning-Table: Exploring Reinforcement Learning for Table ReasoningCode2
Compiler Optimization via LLM Reasoning for Efficient Model ServingCode2
Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data EfficiencyCode2
GSCodec Studio: A Modular Framework for Gaussian Splat CompressionCode2
The Surprising Effectiveness of Negative Reinforcement in LLM ReasoningCode2
Synthesis of discrete-continuous quantum circuits with multimodal diffusion modelsCode2
DualMap: Online Open-Vocabulary Semantic Mapping for Natural Language Navigation in Dynamic Changing ScenesCode2
AceVFI: A Comprehensive Survey of Advances in Video Frame InterpolationCode2
FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual FusionCode2
MagiCodec: Simple Masked Gaussian-Injected Codec for High-Fidelity Reconstruction and GenerationCode2
CineMA: A Foundation Model for Cine Cardiac MRICode2
AnnaAgent: Dynamic Evolution Agent System with Multi-Session Memory for Realistic Seeker SimulationCode2
Show:102550
← PrevPage 104 of 13232Next →