SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 52765300 of 661570 papers

TitleStatusHype
A Lightweight Hybrid Dual Channel Speech Enhancement System under Low-SNR ConditionsCode2
The Missing Point in Vision Transformers for Universal Image SegmentationCode2
SAEs Are Good for Steering -- If You Select the Right FeaturesCode2
SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and BeyondCode2
One-shot Entropy MinimizationCode2
WINA: Weight Informed Neuron Activation for Accelerating Large Language Model InferenceCode2
Training-Free Multi-Step Audio Source SeparationCode2
Shifting AI Efficiency From Model-Centric to Data-Centric CompressionCode2
I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-ExpertsCode2
Jodi: Unification of Visual Generation and Understanding via Joint ModelingCode2
VPGS-SLAM: Voxel-based Progressive 3D Gaussian SLAM in Large-Scale ScenesCode2
VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool UseCode2
MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent SystemsCode2
Benchmarking Laparoscopic Surgical Image Restoration and BeyondCode2
LiteCUA: Computer as MCP Server for Computer-Use Agent on AIOSCode2
Using Large Language Models to Tackle Fundamental Challenges in Graph Learning: A Comprehensive SurveyCode2
Improved Immiscible Diffusion: Accelerate Diffusion Training by Reducing Its MiscibilityCode2
Spiking Transformers Need High Frequency InformationCode2
CRMArena-Pro: Holistic Assessment of LLM Agents Across Diverse Business Scenarios and InteractionsCode2
Geometry Aware Operator Transformer as an Efficient and Accurate Neural Surrogate for PDEs on Arbitrary DomainsCode2
DanmakuTPPBench: A Multi-modal Benchmark for Temporal Point Process Modeling and UnderstandingCode2
VeriThinker: Learning to Verify Makes Reasoning Model EfficientCode2
MetaBox-v2: A Unified Benchmark Platform for Meta-Black-Box OptimizationCode2
Managing FAIR Knowledge Graphs as Polyglot Data End Points: A Benchmark based on the rdf2pg Framework and Plant Biology DataCode2
TopoPoint: Enhance Topology Reasoning via Endpoint Detection in Autonomous DrivingCode2
Show:102550
← PrevPage 212 of 26463Next →