SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 53015350 of 661570 papers

TitleStatusHype
VeriThinker: Learning to Verify Makes Reasoning Model EfficientCode2
ARPO:End-to-End Policy Optimization for GUI Agents with Experience ReplayCode2
SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software DevelopmentCode2
SpatialScore: Towards Unified Evaluation for Multimodal Spatial UnderstandingCode2
Structure-Aligned Protein Language ModelCode2
DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-ResolutionCode2
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel DecodingCode2
Extremely Simple Multimodal Outlier Synthesis for Out-of-Distribution Detection and SegmentationCode2
Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language ModelsCode2
QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-DesignCode2
SEED: Speaker Embedding Enhancement Diffusion ModelCode2
Training Long-Context LLMs Efficiently via Chunk-wise OptimizationCode2
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking RewardCode2
Ranked Entropy Minimization for Continual Test-Time AdaptationCode2
WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement LearningCode2
Seeing through Satellite Images at Street ViewsCode2
GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement LearningCode2
GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI AgentCode2
P2P: Automated Paper-to-Poster Generation and Fine-Grained BenchmarkCode2
iPad: Iterative Proposal-centric End-to-End Autonomous DrivingCode2
Meta-Design Matters: A Self-Design Multi-Agent SystemCode2
dKV-Cache: The Cache for Diffusion Language ModelsCode2
Scaling Diffusion Transformers Efficiently via μPCode2
PhyX: Does Your Model Have the "Wits" for Physical Reasoning?Code2
MonoSplat: Generalizable 3D Gaussian Splatting from Monocular Depth Foundation ModelsCode2
Web-Shepherd: Advancing PRMs for Reinforcing Web AgentsCode2
Graph Foundation Models: A Comprehensive SurveyCode2
Learn to Reason Efficiently with Adaptive Length-based Reward ShapingCode2
InstructSAM: A Training-Free Framework for Instruction-Oriented Remote Sensing Object RecognitionCode2
The P^3 dataset: Pixels, Points and Polygons for Multimodal Building VectorizationCode2
ConvSearch-R1: Enhancing Query Reformulation for Conversational Search with Reasoning via Reinforcement LearningCode2
RL Tango: Reinforcing Generator and Verifier Together for Language ReasoningCode2
Exploring the Limits of Vision-Language-Action Manipulations in Cross-task GeneralizationCode2
Moonbeam: A MIDI Foundation Model Using Both Absolute and Relative Music AttributesCode2
UltraEdit: Training-, Subject-, and Memory-Free Lifelong Editing in Large Language ModelsCode2
Place Recognition: A Comprehensive Review, Current Challenges and Future DirectionsCode2
TCSinger 2: Customizable Multilingual Zero-shot Singing Voice SynthesisCode2
Let LLMs Break Free from Overthinking via Self-Braking TuningCode2
VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to RankCode2
Grouping First, Attending Smartly: Training-Free Acceleration for Diffusion TransformersCode2
Quartet: Native FP4 Training Can Be Optimal for Large Language ModelsCode2
Code2Logic: Game-Code-Driven Data Synthesis for Enhancing VLMs General ReasoningCode2
Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language ModelsCode2
Learning Spatio-Temporal Dynamics for Trajectory Recovery via Time-Aware TransformerCode2
CAD-Coder: An Open-Source Vision-Language Model for Computer-Aided Design Code GenerationCode2
KORGym: A Dynamic Game Platform for LLM Reasoning EvaluationCode2
UniCTokens: Boosting Personalized Understanding and Generation via Unified Concept TokensCode2
PandaGuard: Systematic Evaluation of LLM Safety against Jailbreaking AttacksCode2
Temporal Query Network for Efficient Multivariate Time Series ForecastingCode2
Efficient Speech Language Modeling via Energy Distance in Continuous Latent SpaceCode2
Show:102550
← PrevPage 107 of 13232Next →