SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 27262750 of 661570 papers

TitleStatusHype
VILA-U: a Unified Foundation Model Integrating Visual Understanding and GenerationCode3
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn'tCode3
Vine Copulas as Differentiable Computational GraphsCode3
Safe RLHF: Safe Reinforcement Learning from Human FeedbackCode3
Predicting from Strings: Language Model Embeddings for Bayesian OptimizationCode3
Discovering Language Model Behaviors with Model-Written EvaluationsCode3
A Survey of Camouflaged Object Detection and BeyondCode3
MCTrack: A Unified 3D Multi-Object Tracking Framework for Autonomous DrivingCode3
Trial and Error: Exploration-Based Trajectory Optimization for LLM AgentsCode3
PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical CompetitionCode3
A Survey of Neural Code Intelligence: Paradigms, Advances and BeyondCode3
MVSFormer++: Revealing the Devil in Transformer's Details for Multi-View StereoCode3
Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and VideoCode3
MyoSuite -- A contact-rich simulation suite for musculoskeletal motor controlCode3
Effects of charging and discharging capabilities on trade-offs between model accuracy and computational efficiency in pumped thermal electricity storageCode3
Evolving from Single-modal to Multi-modal Facial Deepfake Detection: A SurveyCode3
Towards Kinetic Manipulation of the Latent SpaceCode3
Medical SAM Adapter: Adapting Segment Anything Model for Medical Image SegmentationCode3
AA-CLIP: Enhancing Zero-shot Anomaly Detection via Anomaly-Aware CLIPCode3
xLSTM-UNet can be an Effective 2D & 3D Medical Image Segmentation Backbone with Vision-LSTM (ViL) better than its Mamba CounterpartCode3
Open-Source Skull Reconstruction with MONAICode3
MMedAgent: Learning to Use Medical Tools with Multi-modal AgentCode3
DiarizationLM: Speaker Diarization Post-Processing with Large Language ModelsCode3
RelBench: A Benchmark for Deep Learning on Relational DatabasesCode3
A Survey on Text-guided 3D Visual Grounding: Elements, Recent Advances, and Future DirectionsCode3
Show:102550
← PrevPage 110 of 26463Next →