SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 96019625 of 474278 papers

TitleStatusHype
TDRM: Smooth Reward Models with Temporal Difference for LLM RL and InferenceCode0
Overview of SCIDOCA 2025 Shared Task on Citation Prediction, Discovery, and PlacementCode0
Wavelet-Assisted Mamba for Satellite-Derived Sea Surface Temperature Super-ResolutionCode0
Plan before Solving: Problem-Aware Strategy Routing for Mathematical Reasoning with LLMsCode0
Mask Clustering-based Annotation Engine for Large-Scale Submeter Land Cover MappingCode0
ScatterAD: Temporal-Topological Scattering Mechanism for Time Series Anomaly DetectionCode0
SCOPE: Semantic Conditioning for Sim2Real Category-Level Object Pose Estimation in RoboticsCode0
ProxyAttn: Guided Sparse Attention via Representative HeadsCode0
DyMoDreamer: World Modeling with Dynamic ModulationCode0
Uncertainty-Guided Expert-AI Collaboration for Efficient Soil Horizon AnnotationCode0
SemanticShield: LLM-Powered Audits Expose Shilling Attacks in Recommender SystemsCode0
PixelCraft: A Multi-Agent System for High-Fidelity Visual Reasoning on Structured ImagesCode0
Neural Visibility of Point SetsCode0
PVTAdpNet: Polyp Segmentation using Pyramid vision transformer with a novel Adapter blockCode0
HiDe: Rethinking The Zoom-IN method in High Resolution MLLMs via Hierarchical DecouplingCode0
EarthMind: Leveraging Cross-Sensor Data for Advanced Earth Observation Interpretation with a Unified Multimodal LLM0
Decentralized Dynamic Cooperation of Personalized Models for Federated Continual LearningCode0
Durian: Dual Reference Image-Guided Portrait Animation with Attribute Transfer0
Conditional Advantage Estimation for Reinforcement Learning in Large Reasoning Models0
MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use0
SparseD: Sparse Attention for Diffusion Language Models0
Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation0
Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval and Synthesis for SLMs0
SVAC: Scaling Is All You Need For Referring Video Object SegmentationCode0
CTTS: Collective Test-Time ScalingCode0
Show:102550
← PrevPage 385 of 18972Next →