SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 54515500 of 661570 papers

TitleStatusHype
Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language ModelsCode2
SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image EditingCode2
RM-R1: Reward Modeling as ReasoningCode2
T2S: High-resolution Time Series Generation with Text-to-Series Diffusion ModelsCode2
No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by ThemselvesCode2
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language ModelsCode2
Efficient Multivariate Time Series Forecasting via Calibrated Language Models with Privileged Knowledge DistillationCode2
MemEngine: A Unified and Modular Library for Developing Advanced Memory of LLM-based AgentsCode2
An Empirical Study of Qwen3 QuantizationCode2
SkillMimic-V2: Learning Robust and Generalizable Interaction Skills from Sparse and Noisy DemonstrationsCode2
PoseX: AI Defeats Physics Approaches on Protein-Ligand Cross DockingCode2
A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and EfficiencyCode2
CostFilter-AD: Enhancing Anomaly Detection through Matching Cost FilteringCode2
Don't be lazy: CompleteP enables compute-efficient deep transformersCode2
CAMELTrack: Context-Aware Multi-cue ExpLoitation for Online Multi-Object TrackingCode2
MINERVA: Evaluating Complex Video ReasoningCode2
Vision Mamba in Remote Sensing: A Comprehensive Survey of Techniques, Applications and OutlookCode2
LightEMMA: Lightweight End-to-End Multimodal Model for Autonomous DrivingCode2
Explainable AI in Spatial AnalysisCode2
One Net to Rule Them All: Domain Randomization in Quadcopter Racing Across Different PlatformsCode2
Noise Modeling in One Hour: Minimizing Preparation Efforts for Self-supervised Low-Light RAW Image DenoisingCode2
mAIstro: an open-source multi-agentic system for automated end-to-end development of radiomics and deep learning models for medical imagingCode2
HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene GenerationCode2
GPU Performance Portability needs AutotuningCode2
RWKV-X: A Linear Complexity Hybrid Language ModelCode2
Visual Text Processing: A Comprehensive Review and Unified EvaluationCode2
Multi-Agent Reinforcement Learning for Resources Allocation Optimization: A SurveyCode2
Sparse2DGS: Geometry-Prioritized Gaussian Splatting for Surface Reconstruction from Sparse ViewsCode2
UniversalRAG: Retrieval-Augmented Generation over Corpora of Diverse Modalities and GranularitiesCode2
GauSS-MI: Gaussian Splatting Shannon Mutual Information for Active 3D ReconstructionCode2
RuleKit 2: Faster and simpler rule learningCode2
Softpick: No Attention Sink, No Massive Activations with Rectified SoftmaxCode2
Rulebook: bringing co-routines to reinforcement learning environmentsCode2
STCOcc: Sparse Spatial-Temporal Cascade Renovation for 3D Occupancy and Scene Flow PredictionCode2
Adaptive Dual-domain Learning for Underwater Image EnhancementCode2
BrowseComp-ZH: Benchmarking Web Browsing Ability of Large Language Models in ChineseCode2
Generative AI for Character Animation: A Comprehensive Survey of Techniques, Applications, and Future DirectionsCode2
Towards Practical Second-Order Optimizers in Deep Learning: Insights from Fisher Information AnalysisCode2
SPD Learning for Covariance-Based Neuroimaging Analysis: Perspectives, Methods, and ChallengesCode2
SORT3D: Spatial Object-centric Reasoning Toolbox for Zero-Shot 3D Grounding Using Large Language ModelsCode2
DiMeR: Disentangled Mesh Reconstruction ModelCode2
FinBERT-QA: Financial Question Answering with pre-trained BERT Language ModelsCode2
GotenNet: Rethinking Efficient 3D Equivariant Graph Neural NetworksCode2
LiDPM: Rethinking Point Diffusion for Lidar Scene CompletionCode2
CaRL: Learning Scalable Planning Policies with Simple RewardsCode2
Process Reward Models That ThinkCode2
AdaParse: An Adaptive Parallel PDF Parsing and Resource Scaling EngineCode2
Can Large Language Models Help Multimodal Language Analysis? MMLA: A Comprehensive BenchmarkCode2
Dynamic Early Exit in Reasoning ModelsCode2
CAPO: Cost-Aware Prompt OptimizationCode2
Show:102550
← PrevPage 110 of 13232Next →