SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 37013750 of 177340 papers

TitleStatusHype
Diffusion-TS: Interpretable Diffusion for General Time Series GenerationCode3
TapeAgents: a Holistic Framework for Agent Development and OptimizationCode3
MixLinear: Extreme Low Resource Multivariate Time Series Forecasting with 0.1K ParametersCode3
DataSentinel: A Game-Theoretic Detection of Prompt Injection AttacksCode3
Adversarial Cheap TalkCode3
Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single ImageCode3
EscherNet: A Generative Model for Scalable View SynthesisCode3
3DIS-FLUX: simple and efficient multi-instance generation with DiT renderingCode3
Reactive Diffusion Policy: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich ManipulationCode3
Rethinking the Evaluation of Visible and Infrared Image FusionCode3
Training Verifiers to Solve Math Word ProblemsCode3
Interactive Medical Image Segmentation: A Benchmark Dataset and BaselineCode3
Generating Long Sequences with Sparse TransformersCode3
Towards Generalizable Tumor SynthesisCode3
Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement LearningCode3
Pipeline Parallelism with Controllable MemoryCode3
SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative PipelineCode3
L0: Reinforcement Learning to Become General AgentsCode3
MMAD: The First-Ever Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly DetectionCode3
ASFT: Aligned Supervised Fine-Tuning through Absolute LikelihoodCode3
AdaWorld: Learning Adaptable World Models with Latent ActionsCode3
SIMPL: A Simple and Efficient Multi-agent Motion Prediction Baseline for Autonomous DrivingCode3
cmaes : A Simple yet Practical Python Library for CMA-ESCode3
Emu: Generative Pretraining in MultimodalityCode3
BlenderLLM: Training Large Language Models for Computer-Aided Design with Self-improvementCode3
Automatically Interpreting Millions of Features in Large Language ModelsCode3
GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing TasksCode3
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV CacheCode3
AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM AgentsCode3
AndroidLab: Training and Systematic Benchmarking of Android Autonomous AgentsCode3
HAC++: Towards 100X Compression of 3D Gaussian SplattingCode3
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video GenerationCode3
Deep Reasoning Translation via Reinforcement LearningCode3
Segment Anything in 3D with Radiance FieldsCode3
Consistency Flow Matching: Defining Straight Flows with Velocity ConsistencyCode3
PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise DataCode3
Deep Learning-Based Object Pose Estimation: A Comprehensive SurveyCode3
MotionFollower: Editing Video Motion via Lightweight Score-Guided DiffusionCode3
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-TrainingCode3
AnimeGamer: Infinite Anime Life Simulation with Next Game State PredictionCode3
PE3R: Perception-Efficient 3D ReconstructionCode3
The Mighty ToRR: A Benchmark for Table Reasoning and RobustnessCode3
Baichuan-Omni Technical ReportCode3
Robot Utility Models: General Policies for Zero-Shot Deployment in New EnvironmentsCode3
RLVR-World: Training World Models with Reinforcement LearningCode3
Tool Learning with Large Language Models: A SurveyCode3
DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image EditingCode3
Step-level Value Preference Optimization for Mathematical ReasoningCode3
Middle Architecture CriteriaCode3
TinyGPT-V: Efficient Multimodal Large Language Model via Small BackbonesCode3
Show:102550
← PrevPage 75 of 3547Next →