SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 25012550 of 659983 papers

TitleStatusHype
Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement LearningCode3
Graph-Reward-SQL: Execution-Free Reinforcement Learning for Text-to-SQL via Graph Matching and Stepwise RewardCode3
dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive CachingCode3
Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation HypothesisCode3
SongEval: A Benchmark Dataset for Song Aesthetics EvaluationCode3
Visual Planning: Let's Think Only with ImagesCode3
Time Travel is Cheating: Going Live with DeepFund for Real-Time Fund Investment BenchmarkingCode3
MTVCrafter: 4D Motion Tokenization for Open-World Human Image AnimationCode3
Parallel Scaling Law for Language ModelsCode3
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical ReasoningCode3
OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement LearningCode3
Generative AI for Autonomous Driving: Frontiers and OpportunitiesCode3
OLinear: A Linear Model for Time Series Forecasting in Orthogonally Transformed DomainCode3
Web-Bench: A LLM Code Benchmark Based on Web Standards and FrameworksCode3
CompSLAM: Complementary Hierarchical Multi-Modal Localization and Mapping for Robot Autonomy in Underground EnvironmentsCode3
LLMs Get Lost In Multi-Turn ConversationCode3
The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and OptimizationCode3
SOAP: Style-Omniscient Animatable PortraitsCode3
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and GenerationCode3
A Common Interface for Automatic DifferentiationCode3
FastMap: Revisiting Dense and Scalable Structure from MotionCode3
OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic ManipulationCode3
LiftFeat: 3D Geometry-Aware Local Feature MatchingCode3
Direct Retrieval-augmented Optimization: Synergizing Knowledge Selection and Language ModelsCode3
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement LearningCode3
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech SynthesisCode3
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-PlayCode3
Learning Heterogeneous Mixture of Scene Experts for Large-scale Neural Radiance FieldsCode3
Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future DirectionsCode3
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language ModelsCode3
Nexus-Gen: A Unified Model for Image Understanding, Generation, and EditingCode3
Reinforcement Learning for Reasoning in Large Language Models with One Training ExampleCode3
PixelHacker: Image Inpainting with Structural and Semantic ConsistencyCode3
ReasonIR: Training Retrievers for Reasoning TasksCode3
Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and VideoCode3
Annif at SemEval-2025 Task 5: Traditional XMTC augmented by LLMsCode3
MP-SfM: Monocular Surface Priors for Robust Structure-from-MotionCode3
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming VideosCode3
An Empirical Study on Prompt Compression for Large Language ModelsCode3
Tina: Tiny Reasoning Models via LoRACode3
Grad: Guided Relation Diffusion Generation for Graph Augmentation in Graph Fraud DetectionCode3
Learning to Reason under Off-Policy GuidanceCode3
OmniAudio: Generating Spatial Audio from 360-Degree VideoCode3
TAPIP3D: Tracking Any Point in Persistent 3D GeometryCode3
Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3DCode3
Generative AI Act II: Test Time Scaling Drives Cognition EngineeringCode3
LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation ModelsCode3
Event-Enhanced Blurry Video Super-ResolutionCode3
IMAGGarment-1: Fine-Grained Garment Generation for Controllable Fashion DesignCode3
Set You Straight: Auto-Steering Denoising Trajectories to Sidestep Unwanted ConceptsCode3
Show:102550
← PrevPage 51 of 13200Next →