SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 53515400 of 661570 papers

TitleStatusHype
CPRet: A Dataset, Benchmark, and Model for Retrieval in Competitive ProgrammingCode2
AD-AGENT: A Multi-agent Framework for End-to-end Anomaly DetectionCode2
Temporal Query Network for Efficient Multivariate Time Series ForecastingCode2
Efficient Speech Language Modeling via Energy Distance in Continuous Latent SpaceCode2
Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent SpaceCode2
Rethinking Features-Fused-Pyramid-Neck for Object DetectionCode2
G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement LearningCode2
MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their MixCode2
CSC-SQL: Corrective Self-Consistency in Text-to-SQL via Reinforcement LearningCode2
Recollection from Pensieve: Novel View Synthesis via Learning from Uncalibrated VideosCode2
Neurosymbolic Diffusion ModelsCode2
DD-Ranking: Rethinking the Evaluation of Dataset DistillationCode2
FlightGPT: Towards Generalizable and Interpretable UAV Vision-and-Language Navigation with Vision-Language ModelsCode2
μPC: Scaling Predictive Coding to 100+ Layer NetworksCode2
Hybrid 3D-4D Gaussian Splatting for Fast Dynamic Scene RepresentationCode2
AdaptThink: Reasoning Models Can Learn When to ThinkCode2
RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought ReasoningCode2
Learnware of Language Models: Specialized Small Language Models Can Do BigCode2
Degradation-Aware Feature Perturbation for All-in-One Image RestorationCode2
Dynamic Graph Induced Contour-aware Heat Conduction Network for Event-based Object DetectionCode2
Panda: A pretrained forecast model for universal representation of chaotic dynamicsCode2
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained OptimizationCode2
Synthetic Data RL: Task Definition Is All You NeedCode2
GlobalGeoTree: A Multi-Granular Vision-Language Dataset for Global Tree Species ClassificationCode2
SLOT: Sample-specific Language Model Optimization at Test-timeCode2
VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-TuningCode2
HISTAI: An Open-Source, Large-Scale Whole Slide Image Dataset for Computational PathologyCode2
Demystifying and Enhancing the Efficiency of Large Language Model Based Search AgentsCode2
DraftAttention: Fast Video Diffusion via Low-Resolution Attention GuidanceCode2
Safe Delta: Consistently Preserving Safety when Fine-Tuning LLMs on Diverse DatasetsCode2
LifelongAgentBench: Evaluating LLM Agents as Lifelong LearnersCode2
AI-Driven Automation Can Become the Foundation of Next-Era Science of Science ResearchCode2
Mergenetic: a Simple Evolutionary Model Merging LibraryCode2
DexGarmentLab: Dexterous Garment Manipulation Environment with Generalizable PolicyCode2
Dynam3D: Dynamic Layered 3D Tokens Empower VLM for Vision-and-Language NavigationCode2
DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion ModelingCode2
Think Twice Before You Act: Enhancing Agent Behavioral Safety with Thought CorrectionCode2
SoftCoT++: Test-Time Scaling with Soft Chain-of-Thought ReasoningCode2
Relational Graph TransformerCode2
Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert ReasonerCode2
ForensicHub: A Unified Benchmark & Codebase for All-Domain Fake Image Detection and LocalizationCode2
GuardReasoner-VL: Safeguarding VLMs via Reinforced ReasoningCode2
Search and Refine During Think: Autonomous Retrieval-Augmented Reasoning of LLMsCode2
PnPXAI: A Universal XAI Framework Providing Automatic Explanations Across Diverse Modalities and ModelsCode2
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning ModelsCode2
MASS: Multi-Agent Simulation Scaling for Portfolio ConstructionCode2
AdaptCLIP: Adapting CLIP for Universal Visual Anomaly DetectionCode2
A Tutorial on Structural Identifiability of Epidemic Models Using StructuralIdentifiability.jlCode2
Superposition Yields Robust Neural ScalingCode2
MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and ThoroughlyCode2
Show:102550
← PrevPage 108 of 13232Next →