SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 60516075 of 177340 papers

TitleStatusHype
Lenia - Biology of Artificial LifeCode2
WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language ModelsCode2
SGPT: GPT Sentence Embeddings for Semantic SearchCode2
AXIAL: Attention-based eXplainability for Interpretable Alzheimer's Localized Diagnosis using 2D CNNs on 3D MRI brain scansCode2
Model Uncertainty in Evolutionary Optimization and Bayesian Optimization: A Comparative AnalysisCode2
AGILE: A Novel Reinforcement Learning Framework of LLM AgentsCode2
Learning Local Equivariant Representations for Large-Scale Atomistic DynamicsCode2
How Can Time Series Analysis Benefit From Multiple Modalities? A Survey and OutlookCode2
Bayesian Neural Networks for One-to-Many Mapping in Image EnhancementCode2
BK-SDM: A Lightweight, Fast, and Cheap Version of Stable DiffusionCode2
Alphazero-like Tree-Search can Guide Large Language Model Decoding and TrainingCode2
MicroFlow: An Efficient Rust-Based Inference Engine for TinyMLCode2
Advancing Time Series Classification with Multimodal Language ModelingCode2
Trajectory balance: Improved credit assignment in GFlowNetsCode2
From Instance Training to Instruction Learning: Task Adapters Generation from InstructionsCode2
RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language ModelsCode2
φ-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and ExploitationCode2
Efficient Mixed Transformer for Single Image Super-ResolutionCode2
CoMoGaussian: Continuous Motion-Aware Gaussian Splatting from Motion-Blurred ImagesCode2
PAL: Proxy-Guided Black-Box Attack on Large Language ModelsCode2
PyReason: Software for Open World Temporal LogicCode2
mDPO: Conditional Preference Optimization for Multimodal Large Language ModelsCode2
In-Context MattingCode2
NTIRE 2025 Challenge on Image Super-Resolution (4): Methods and ResultsCode2
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language ModelsCode2
Show:102550
← PrevPage 243 of 7094Next →