SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 60516100 of 661570 papers

TitleStatusHype
voc2vec: A Foundation Model for Non-Verbal VocalizationCode2
Robust Dynamic Facial Expression RecognitionCode2
AutoToM: Automated Bayesian Inverse Planning and Model Discovery for Open-ended Theory of MindCode2
Protein Large Language Models: A Comprehensive SurveyCode2
OccProphet: Pushing Efficiency Frontier of Camera-Only 4D Occupancy Forecasting with Observer-Forecaster-Refiner FrameworkCode2
KAD: No More FAD! An Effective and Efficient Evaluation Metric for Audio GenerationCode2
PIP-KAG: Mitigating Knowledge Conflicts in Knowledge-Augmented Generation via Parametric PruningCode2
VaViM and VaVAM: Autonomous Driving through Video Generative ModelingCode2
Mantis: Lightweight Calibrated Foundation Model for User-Friendly Time Series ClassificationCode2
A Training-free LLM-based Approach to General Chinese Character Error CorrectionCode2
Boosting Global-Local Feature Matching via Anomaly Synthesis for Multi-Class Point Cloud Anomaly DetectionCode2
AttentionEngine: A Versatile Framework for Efficient Attention Mechanisms on Diverse Hardware PlatformsCode2
ReQFlow: Rectified Quaternion Flow for Efficient and High-Quality Protein Backbone GenerationCode2
TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton OperatorsCode2
MAGO-SP: Detection and Correction of Water-Fat Swaps in Magnitude-Only VIBE MRICode2
Multimodal RewardBench: Holistic Evaluation of Reward Models for Vision Language ModelsCode2
MedVAE: Efficient Automated Interpretation of Medical Images with Large-Scale Generalizable AutoencodersCode2
dtaianomaly: A Python library for time series anomaly detectionCode2
HiddenDetect: Detecting Jailbreak Attacks against Large Vision-Language Models via Monitoring Hidden StatesCode2
GiGL: Large-Scale Graph Neural Networks at SnapchatCode2
FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image AnalysisCode2
AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPOCode2
Fast and Accurate Blind Flexible DockingCode2
Optimizing Model Selection for Compound AI SystemsCode2
OBELiX: A Curated Dataset of Crystal Structures and Experimentally Measured Ionic Conductivities for Lithium Solid-State ElectrolytesCode2
A Survey on Data Contamination for Large Language ModelsCode2
Risk-mediated dynamic regulation of effective contacts de-synchronizes outbreaks in metapopulation epidemic modelsCode2
Medical Image Classification with KAN-Integrated Transformers and Dilated Neighborhood AttentionCode2
Calibration and Option Pricing with Stochastic Volatility and Double Exponential JumpsCode2
Repo2Run: Automated Building Executable Environment for Code Repository at ScaleCode2
Smaller But Better: Unifying Layout Generation with Smaller Large Language ModelsCode2
SIFT: Grounding LLM Reasoning in Contexts via StickersCode2
MoM: Linear Sequence Modeling with Mixture-of-MemoriesCode2
TESS 2: A Large-Scale Generalist Diffusion Language ModelCode2
Geolocation with Real Human Gameplay Data: A Large-Scale Dataset and Human-Like Reasoning FrameworkCode2
Refining Sentence Embedding Model through Ranking Sentences Generation with Large Language ModelsCode2
Event-Based Video Frame Interpolation With Cross-Modal Asymmetric Bidirectional Motion FieldsCode2
DataSciBench: An LLM Agent Benchmark for Data ScienceCode2
Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language ModelsCode2
JL1-CD: A New Benchmark for Remote Sensing Change Detection and a Robust Multi-Teacher Knowledge Distillation FrameworkCode2
Helix-mRNA: A Hybrid Foundation Model For Full Sequence mRNA TherapeuticsCode2
DAMamba: Vision State Space Model with Dynamic Adaptive ScanCode2
HeadInfer: Memory-Efficient LLM Inference by Head-wise OffloadingCode2
NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule GenerationCode2
Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear DistillationCode2
A Machine Learning Approach That Beats Large Rubik's CubesCode2
Electron flow matching for generative reaction mechanism prediction obeying conservation lawsCode2
CHATS: Combining Human-Aligned Optimization and Test-Time Sampling for Text-to-Image GenerationCode2
VUS: Effective and Efficient Accuracy Measures for Time-Series Anomaly DetectionCode2
MotifBench: A standardized protein design benchmark for motif-scaffolding problemsCode2
Show:102550
← PrevPage 122 of 13232Next →