SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 36013625 of 177340 papers

TitleStatusHype
Benchmarking LLMs via Uncertainty QuantificationCode3
Olympus: A Universal Task Router for Computer Vision TasksCode3
A guide to convolution arithmetic for deep learningCode3
ARC Prize 2024: Technical ReportCode3
Tokenization, Fusion, and Augmentation: Towards Fine-grained Multi-modal Entity RepresentationCode3
LiDAR4D: Dynamic Neural Fields for Novel Space-time View LiDAR SynthesisCode3
Defeating Prompt Injections by DesignCode3
SA-Med2D-20M Dataset: Segment Anything in 2D Medical Imaging with 20 Million masksCode3
MeshXL: Neural Coordinate Field for Generative 3D Foundation ModelsCode3
Faithful Logical Reasoning via Symbolic Chain-of-ThoughtCode3
Multimodal Table UnderstandingCode3
KV-Edit: Training-Free Image Editing for Precise Background PreservationCode3
DriveArena: A Closed-loop Generative Simulation Platform for Autonomous DrivingCode3
VideoGen-Eval: Agent-based System for Video Generation EvaluationCode3
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech SynthesisCode3
JAFAR: Jack up Any Feature at Any ResolutionCode3
VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video GenerationCode3
GENERator: A Long-Context Generative Genomic Foundation ModelCode3
EVEv2: Improved Baselines for Encoder-Free Vision-Language ModelsCode3
SemiKong: Curating, Training, and Evaluating A Semiconductor Industry-Specific Large Language ModelCode3
Half-Inverse Gradients for Physical Deep LearningCode3
pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D ReconstructionCode3
OptiMUS: Scalable Optimization Modeling with (MI)LP Solvers and Large Language ModelsCode3
DisCo: Disentangled Control for Realistic Human Dance GenerationCode3
^2DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network PotentialsCode3
Show:102550
← PrevPage 145 of 7094Next →