SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 25012525 of 661570 papers

TitleStatusHype
Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement LearningCode3
Graph-Reward-SQL: Execution-Free Reinforcement Learning for Text-to-SQL via Graph Matching and Stepwise RewardCode3
dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive CachingCode3
Time Travel is Cheating: Going Live with DeepFund for Real-Time Fund Investment BenchmarkingCode3
Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation HypothesisCode3
SongEval: A Benchmark Dataset for Song Aesthetics EvaluationCode3
Visual Planning: Let's Think Only with ImagesCode3
Parallel Scaling Law for Language ModelsCode3
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical ReasoningCode3
MTVCrafter: 4D Motion Tokenization for Open-World Human Image AnimationCode3
Generative AI for Autonomous Driving: Frontiers and OpportunitiesCode3
OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement LearningCode3
Web-Bench: A LLM Code Benchmark Based on Web Standards and FrameworksCode3
OLinear: A Linear Model for Time Series Forecasting in Orthogonally Transformed DomainCode3
CompSLAM: Complementary Hierarchical Multi-Modal Localization and Mapping for Robot Autonomy in Underground EnvironmentsCode3
The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and OptimizationCode3
LLMs Get Lost In Multi-Turn ConversationCode3
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and GenerationCode3
A Common Interface for Automatic DifferentiationCode3
SOAP: Style-Omniscient Animatable PortraitsCode3
FastMap: Revisiting Dense and Scalable Structure from MotionCode3
OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic ManipulationCode3
LiftFeat: 3D Geometry-Aware Local Feature MatchingCode3
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-PlayCode3
Direct Retrieval-augmented Optimization: Synergizing Knowledge Selection and Language ModelsCode3
Show:102550
← PrevPage 101 of 26463Next →