SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 33513400 of 177340 papers

TitleStatusHype
Rewrite the StarsCode3
OpenFedLLM: Training Large Language Models on Decentralized Private Data via Federated LearningCode3
Test-Time Training Scaling Laws for Chemical Exploration in Drug DesignCode3
Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion ModelsCode3
Robust and Efficient Medical Imaging with Self-SupervisionCode3
Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement LearningCode3
Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AICode3
LangProBe: a Language Programs BenchmarkCode3
VerMCTS: Synthesizing Multi-Step Programs using a Verifier, a Large Language Model, and Tree SearchCode3
STORM: Spatio-Temporal Reconstruction Model for Large-Scale Outdoor ScenesCode3
Differentiable Data Augmentation with KorniaCode3
Supplementary Material for Efficient and Robust Automated Machine LearningCode3
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP TasksCode3
Why Do Multi-Agent LLM Systems Fail?Code3
SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language PretrainingCode3
Token Merging: Your ViT But FasterCode3
StableVideo: Text-driven Consistency-aware Diffusion Video EditingCode3
Data-centric AI: Perspectives and ChallengesCode3
Declarative Machine Learning SystemsCode3
Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image ModelsCode3
BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown ObjectsCode3
TorchBench: Benchmarking PyTorch with High API Surface CoverageCode3
How Can Recommender Systems Benefit from Large Language Models: A SurveyCode3
Scaffold-GS: Structured 3D Gaussians for View-Adaptive RenderingCode3
DeFlow: Decoder of Scene Flow Network in Autonomous DrivingCode3
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts ModelsCode3
FaceXFormer: A Unified Transformer for Facial AnalysisCode3
Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation DatasetCode3
Vaporetto: Efficient Japanese Tokenization Based on Improved Pointwise Linear ClassificationCode3
HARDVS: Revisiting Human Activity Recognition with Dynamic Vision SensorsCode3
A Note on the Prediction-Powered BootstrapCode3
S-Graphs 2.0 -- A Hierarchical-Semantic Optimization and Loop Closure for SLAMCode3
AudioBench: A Universal Benchmark for Audio Large Language ModelsCode3
Alias-Free Generative Adversarial NetworksCode3
HDR-GS: Efficient High Dynamic Range Novel View Synthesis at 1000x Speed via Gaussian SplattingCode3
Embodied CoT Distillation From LLM To Off-the-shelf AgentsCode3
MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force FieldsCode3
GLiREL -- Generalist Model for Zero-Shot Relation ExtractionCode3
ZIM: Zero-Shot Image Matting for AnythingCode3
ivis Dimensionality Reduction Framework for Biomacromolecular SimulationsCode3
Vulnerability Detection with Code Language Models: How Far Are We?Code3
Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Language ModelsCode3
Vision-LSTM: xLSTM as Generic Vision BackboneCode3
A Survey on Evaluation of Large Language ModelsCode3
Movie Gen: A Cast of Media Foundation ModelsCode3
Recent Advances on Machine Learning for Computational Fluid Dynamics: A SurveyCode3
Locate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing CommunityCode3
Point Transformer V3: Simpler, Faster, StrongerCode3
OmniSQL: Synthesizing High-quality Text-to-SQL Data at ScaleCode3
OpenDelta: A Plug-and-play Library for Parameter-efficient Adaptation of Pre-trained ModelsCode3
Show:102550
← PrevPage 68 of 3547Next →