SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 41014125 of 661570 papers

TitleStatusHype
Human Psychometric Questionnaires Mischaracterize LLM Psychology: Evidence from Generation Behavior0
Exact Generalisation Error Exposes Benchmarks Skew Graph Neural Networks Success (or Failure)0
InPhyRe Discovers: Large Multimodal Models Struggle in Inductive Physical Reasoning0
Diagonal Linear Networks and the Lasso Regularization Path0
See, Think, Act: Teaching Multimodal Agents to Effectively Interact with GUI by Identifying TogglesCode0
Where MLLMs Attend and What They Rely On: Explaining Autoregressive Token Generation0
IA2: Alignment with ICL Activations Improves Supervised Fine-Tuning0
M3DLayout: A Multi-Source Dataset of 3D Indoor Layouts and Structured Descriptions for 3D Generation0
In-Context Compositional Q-Learning for Offline Reinforcement Learning0
Personalized Motion Guidance Framework for Athlete-Centric Coaching0
LMOD+: A Comprehensive Multimodal Dataset and Benchmark for Developing and Evaluating Multimodal Large Language Models in Ophthalmology0
Vector sketch animation generation with differentiable motion trajectories0
Assessing LLM Reasoning Through Implicit Causal Chain Discovery in Climate Discourse0
Detecting Data Contamination from Reinforcement Learning Post-training for Large Language Models1
Scalable Energy-Based Models via Adversarial Training: Unifying Discrimination and Generation0
CBF-RL: Safety Filtering Reinforcement Learning in Training with Control Barrier Functions0
Learning Time-Varying Graphs from Incomplete Graph Signals0
Automated Wicket-Taking Delivery Segmentation and Trajectory-Based Dismissal-Zone Analysis in Cricket Videos Using OCR-Guided YOLOv80
Communication to Completion: Modeling Collaborative Workflows with Intelligent Multi-Agent Communication0
SHAP Meets Tensor Networks: Provably Tractable Explanations with Parallelism0
From Slides to Chatbots: Enhancing Large Language Models with University Course Materials0
Frame Semantic Patterns for Identifying Underreporting of Notifiable Events in Healthcare: The Case of Gender-Based Violence0
Towards One-step Causal Video Generation via Adversarial Self-Distillation0
Generative Hints0
Silenced Biases: The Dark Side LLMs Learned to Refuse0
Show:102550
← PrevPage 165 of 26463Next →