SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1700117050 of 474278 papers

TitleStatusHype
Graph-Assisted Stitching for Offline Hierarchical Reinforcement LearningCode1
Video Unlearning via Low-Rank Refusal Vector0
RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguardsCode1
A Practical Guide to Tuning Spiking Neuronal DynamicsCode0
Looking Beyond Visible Cues: Implicit Video Question Answering via Dual-Clue ReasoningCode0
Vision Transformers Don't Need Trained RegistersCode2
Generative Modeling of Weights: Generalization or Memorization?Code1
Diffusion models under low-noise regimeCode0
How Benchmark Prediction from Fewer Data Misses the MarkCode0
TimberStrike: Dataset Reconstruction Attack Revealing Privacy Leakage in Federated Tree-Based SystemsCode0
Improving Fairness of Large Language Models in Multi-document SummarizationCode0
Solving Inequality Proofs with Large Language ModelsCode1
Diffusion Counterfactual Generation with Semantic AbductionCode0
Curriculum Learning With Counterfactual Group Relative Policy Advantage For Multi-Agent Reinforcement LearningCode1
Domain Randomization for Object Detection in Manufacturing Applications using Synthetic Data: A Comprehensive StudyCode0
Spatio-Temporal State Space Model For Efficient Event-Based Optical FlowCode1
Rethinking Cross-Modal Interaction in Multimodal Diffusion TransformersCode1
HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial OptimizationCode2
Reinforcement Pre-Training0
ZeroVO: Visual Odometry with Minimal Assumptions0
Quickest Causal Change Point Detection by Adaptive Intervention0
Improving large language models with concept-aware fine-tuningCode1
CyberV: Cybernetics for Test-time Scaling in Video UnderstandingCode1
Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language ModelsCode1
SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from DesignCode1
Gradients: When Markets Meet Fine-tuning -- A Distributed Approach to Model Optimisation0
SpikeSMOKE: Spiking Neural Networks for Monocular 3D Object Detection with Cross-Scale Gated Coding0
Reparameterized LLM Training via Orthogonal Equivalence Transformation0
Realistic Urban Traffic Generator using Decentralized Federated Learning for the SUMO simulatorCode0
MiniCPM4: Ultra-Efficient LLMs on End DevicesCode9
Trend-Aware Fashion Recommendation with Visual Segmentation and Semantic SimilarityCode0
Generalization Analysis for Bayesian Optimal Experiment Design under Model Misspecification0
LlamaRec-LKG-RAG: A Single-Pass, Learnable Knowledge Graph-RAG Framework for LLM-Based RankingCode0
An Intelligent Fault Self-Healing Mechanism for Cloud AI Systems via Integration of Large Language Models and Deep Reinforcement Learning0
Serendipitous Recommendation with Multimodal LLM0
SAM2Auto: Auto Annotation Using FLASH0
Decoupling the Image Perception and Multimodal Reasoning for Reasoning Segmentation with Digital Twin Representations0
PIG: Physically-based Multi-Material Interaction with 3D Gaussians0
FMaMIL: Frequency-Driven Mamba Multi-Instance Learning for Weakly Supervised Lesion Segmentation in Medical Images0
Aligning Text, Images, and 3D Structure Token-by-Token0
A Temporal FRBR/FRBRoo-Based Model for Component-Level Versioning of Legal Norms0
GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior0
W4S4: WaLRUS Meets S4 for Long-Range Sequence Modeling0
Statistical Hypothesis Testing for Auditing Robustness in Language Models0
NeurIPS 2025 E2LM Competition : Early Training Evaluation of Language Models0
Uncovering the Functional Roles of Nonlinearity in Memory0
Accelerating Diffusion Models in Offline RL via Reward-Aware Consistency Trajectory Distillation0
Deep Equivariant Multi-Agent Control Barrier Functions0
Flow-Anything: Learning Real-World Optical Flow Estimation from Large-Scale Single-view Images0
MEMOIR: Lifelong Model Editing with Minimal Overwrite and Informed Retention for LLMs0
Show:102550
← PrevPage 341 of 9486Next →