SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 33013350 of 659983 papers

TitleStatusHype
ACE2: Accurately learning subseasonal to decadal atmospheric variability and forced responsesCode3
ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music, and SpeechCode3
KnowAgent: Knowledge-Augmented Planning for LLM-Based AgentsCode3
Scaling Analysis of Interleaved Speech-Text Language ModelsCode3
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated CapabilitiesCode3
GPU-accelerated Evolutionary Many-objective Optimization Using Tensorized NSGA-IIICode3
Docs2KG: Unified Knowledge Graph Construction from Heterogeneous Documents Assisted by Large Language ModelsCode3
EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and BenchmarkCode3
A Survey on LLM Test-Time Compute via Search: Tasks, LLM Profiling, Search Algorithms, and Relevant FrameworksCode3
PyThaiNLP: Thai Natural Language Processing in PythonCode3
FRACTAL: An Ultra-Large-Scale Aerial Lidar Dataset for 3D Semantic Segmentation of Diverse LandscapesCode3
A Survey of Large Language Models in Medicine: Progress, Application, and ChallengeCode3
MultiDiffusion: Fusing Diffusion Paths for Controlled Image GenerationCode3
Rule Based Rewards for Language Model SafetyCode3
Hyper-parameter tuning for text guided image editingCode3
RepoGraph: Enhancing AI Software Engineering with Repository-level Code GraphCode3
Efficient Large Language Models: A SurveyCode3
Navigating Eukaryotic Genome Annotation Pipelines: A Route Map to BRAKER, Galba, and TSEBRACode3
PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital WorldCode3
SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time seriesCode3
BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language ModelsCode3
Arctic-Text2SQL-R1: Simple Rewards, Strong Reasoning in Text-to-SQLCode3
DrivingForward: Feed-forward 3D Gaussian Splatting for Driving Scene Reconstruction from Flexible Surround-view InputCode3
TreeLoRA: Efficient Continual Learning via Layer-Wise LoRAs Guided by a Hierarchical Gradient-Similarity TreeCode3
VidTwin: Video VAE with Decoupled Structure and DynamicsCode3
Probabilistic Weather Forecasting with Hierarchical Graph Neural NetworksCode3
Dataset and Baseline System for Multi-lingual Extraction and Normalization of Temporal and Numerical ExpressionsCode3
How Well Do Supervised 3D Models Transfer to Medical Imaging Tasks?Code3
LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at ScaleCode3
OctoPack: Instruction Tuning Code Large Language ModelsCode3
Learning Smooth Humanoid Locomotion through Lipschitz-Constrained PoliciesCode3
Low-Pass Filtering SGD for Recovering Flat Optima in the Deep Learning Optimization LandscapeCode3
MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-ExpertsCode3
On the use of deep learning for phase recoveryCode3
Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language ModelsCode3
NVS-Solver: Video Diffusion Model as Zero-Shot Novel View SynthesizerCode3
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language ModelCode3
MAPIE: an open-source library for distribution-free uncertainty quantificationCode3
PhysX: Physical-Grounded 3D Asset GenerationCode3
Sigma: Siamese Mamba Network for Multi-Modal Semantic SegmentationCode3
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at ScaleCode3
X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign LanguagesCode3
DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving ScenesCode3
Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement LearningCode3
LLM4CP: Adapting Large Language Models for Channel PredictionCode3
Universal Actions for Enhanced Embodied Foundation ModelsCode3
ChatRex: Taming Multimodal LLM for Joint Perception and UnderstandingCode3
DROID-Splat: Combining end-to-end SLAM with 3D Gaussian SplattingCode3
OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action DetectionCode3
Relaxing Accurate Initialization Constraint for 3D Gaussian SplattingCode3
Show:102550
← PrevPage 67 of 13200Next →