SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 26012625 of 661570 papers

TitleStatusHype
VerMCTS: Synthesizing Multi-Step Programs using a Verifier, a Large Language Model, and Tree SearchCode3
STORM: Spatio-Temporal Reconstruction Model for Large-Scale Outdoor ScenesCode3
Differentiable Data Augmentation with KorniaCode3
Supplementary Material for Efficient and Robust Automated Machine LearningCode3
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP TasksCode3
Why Do Multi-Agent LLM Systems Fail?Code3
SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language PretrainingCode3
Token Merging: Your ViT But FasterCode3
StableVideo: Text-driven Consistency-aware Diffusion Video EditingCode3
Data-centric AI: Perspectives and ChallengesCode3
Declarative Machine Learning SystemsCode3
Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image ModelsCode3
BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown ObjectsCode3
TorchBench: Benchmarking PyTorch with High API Surface CoverageCode3
How Can Recommender Systems Benefit from Large Language Models: A SurveyCode3
Scaffold-GS: Structured 3D Gaussians for View-Adaptive RenderingCode3
DeFlow: Decoder of Scene Flow Network in Autonomous DrivingCode3
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts ModelsCode3
FaceXFormer: A Unified Transformer for Facial AnalysisCode3
Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation DatasetCode3
Vaporetto: Efficient Japanese Tokenization Based on Improved Pointwise Linear ClassificationCode3
HARDVS: Revisiting Human Activity Recognition with Dynamic Vision SensorsCode3
A Note on the Prediction-Powered BootstrapCode3
S-Graphs 2.0 -- A Hierarchical-Semantic Optimization and Loop Closure for SLAMCode3
AudioBench: A Universal Benchmark for Audio Large Language ModelsCode3
Show:102550
← PrevPage 105 of 26463Next →