SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 87018725 of 474278 papers

TitleStatusHype
MorphoBench: A Benchmark with Difficulty Adaptive to Model ReasoningCode0
Flip-Flop Consistency: Unsupervised Training for Robustness to Prompt Perturbations in LLMsCode0
Reinforcement Learning for Unsupervised Domain Adaptation in Spatio-Temporal Echocardiography SegmentationCode0
Pruning Overparameterized Multi-Task Networks for Degraded Web Image RestorationCode0
EuroMineNet: A Multitemporal Sentinel-2 Benchmark for Spatiotemporal Mining Footprint Analysis in the European Union (2015-2024)Code0
Where are the Whales: A Human-in-the-loop Detection Method for Identifying Whales in High-resolution Satellite ImageryCode0
BioMedSearch: A Multi-Source Biomedical Retrieval Framework Based on LLMsCode0
Text Anomaly Detection with Simplified Isolation KernelCode0
ERGO: Entropy-guided Resetting for Generation Optimization in Multi-turn Language Models0
LLMs Can Get "Brain Rot"!0
Putting on the Thinking Hats: A Survey on Chain of Thought Fine-tuning from the Perspective of Human Reasoning MechanismCode0
VR-Thinker: Boosting Video Reward Models through Thinking-with-Image Reasoning0
LLM-guided Hierarchical Retrieval0
Two Heads Are Better Than One: Audio-Visual Speech Error Correction with Dual HypothesesCode0
Unlocking Out-of-Distribution Generalization in Transformers via Recursive Latent Space Reasoning0
TempFlow-GRPO: When Timing Matters for GRPO in Flow Models0
FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games0
CE-GPPO: Coordinating Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning0
A Tale of LLMs and Induced Small Proxies: Scalable Agents for Knowledge Mining0
AutoPR: Let's Automate Your Academic Promotion!0
VLA-0: Building State-of-the-Art VLAs with Zero Modification0
Complementary Information Guided Occupancy Prediction via Multi-Level Representation FusionCode0
Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math0
FlashWorld: High-quality 3D Scene Generation within Seconds0
Trace Anything: Representing Any Video in 4D via Trajectory Fields0
Show:102550
← PrevPage 349 of 18972Next →