SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 25512600 of 659983 papers

TitleStatusHype
OccTENS: 3D Occupancy World Model via Temporal Next-Scale Prediction0
Human Psychometric Questionnaires Mischaracterize LLM Psychology: Evidence from Generation Behavior0
Exact Generalisation Error Exposes Benchmarks Skew Graph Neural Networks Success (or Failure)0
InPhyRe Discovers: Large Multimodal Models Struggle in Inductive Physical Reasoning0
Diagonal Linear Networks and the Lasso Regularization Path0
See, Think, Act: Teaching Multimodal Agents to Effectively Interact with GUI by Identifying TogglesCode0
Where MLLMs Attend and What They Rely On: Explaining Autoregressive Token Generation0
IA2: Alignment with ICL Activations Improves Supervised Fine-Tuning0
M3DLayout: A Multi-Source Dataset of 3D Indoor Layouts and Structured Descriptions for 3D Generation0
In-Context Compositional Q-Learning for Offline Reinforcement Learning0
Personalized Motion Guidance Framework for Athlete-Centric Coaching0
LMOD+: A Comprehensive Multimodal Dataset and Benchmark for Developing and Evaluating Multimodal Large Language Models in Ophthalmology0
Vector sketch animation generation with differentiable motion trajectories0
Assessing LLM Reasoning Through Implicit Causal Chain Discovery in Climate Discourse0
Detecting Data Contamination from Reinforcement Learning Post-training for Large Language Models1
Scalable Energy-Based Models via Adversarial Training: Unifying Discrimination and Generation0
CBF-RL: Safety Filtering Reinforcement Learning in Training with Control Barrier Functions0
Learning Time-Varying Graphs from Incomplete Graph Signals0
Automated Wicket-Taking Delivery Segmentation and Trajectory-Based Dismissal-Zone Analysis in Cricket Videos Using OCR-Guided YOLOv80
Communication to Completion: Modeling Collaborative Workflows with Intelligent Multi-Agent Communication0
SHAP Meets Tensor Networks: Provably Tractable Explanations with Parallelism0
From Slides to Chatbots: Enhancing Large Language Models with University Course Materials0
Frame Semantic Patterns for Identifying Underreporting of Notifiable Events in Healthcare: The Case of Gender-Based Violence0
Towards One-step Causal Video Generation via Adversarial Self-Distillation0
Generative Hints0
Silenced Biases: The Dark Side LLMs Learned to Refuse0
Semi-supervised Shelter Mapping for WASH Accessibility Assessment in Rohingya Refugee Camps0
Reduced Density Matrices Through Machine Learning0
Safety-Preserving PTQ via Contrastive Alignment Loss0
A robust methodology for long-term sustainability evaluation of Machine Learning models0
Aligning Probabilistic Beliefs under Informative Missingness: LLM Steerability in Clinical Reasoning0
MagicWorld: Towards Long-Horizon Stability for Interactive Video World Exploration0
Neighbor GRPO: Contrastive ODE Policy Optimization Aligns Flow Models0
WPT: World-to-Policy Transfer via Online World Model Distillation0
IRIS-SLAM: Unified Geo-Instance Representations for Robust Semantic Localization and Mapping0
Vision to Geometry: 3D Spatial Memory for Sequential Embodied MLLM Reasoning and Exploration0
NRR-Core: Non-Resolution Reasoning as a Computational Framework for Contextual Identity and Ambiguity Preservation0
RADAR: Retrieval-Augmented Detector with Adversarial Refinement for Robust Fake News Detection0
A Comedy of Estimators: On KL Regularization in RL Training of LLMs0
VL-RouterBench: A Benchmark for Vision-Language Model Routing0
EpiQAL: Benchmarking Large Language Models in Epidemiological Question Answering for Enhanced Alignment and Reasoning0
What Patients Really Ask: Exploring the Effect of False Assumptions in Patient Information Seeking0
Generative Adversarial Networks for Resource State Generation0
EVM-QuestBench: An Execution-Grounded Benchmark for Natural-Language Transaction Code Generation0
APEX-SWE0
PaperScout: An Autonomous Agent for Academic Paper Search with Process-Aware Sequence-Level Policy Optimization0
GTS: Inference-Time Scaling of Latent Reasoning with a Learnable Gaussian Thought Sampler0
Digital FAST: An AI-Driven Multimodal Framework for Rapid and Early Stroke Screening0
YOLO26: An Analysis of NMS-Free End to End Framework for Real-Time Object Detection0
Unifying Heterogeneous Degradations: Uncertainty-Aware Diffusion Bridge Model for All-in-One Image Restoration0
Show:102550
← PrevPage 52 of 13200Next →