SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 71267150 of 177340 papers

TitleStatusHype
Discovering uncertainty: Gaussian constitutive neural networks with correlated weightsCode2
InterCode: Standardizing and Benchmarking Interactive Coding with Execution FeedbackCode2
SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer DevicesCode2
CAGRA: Highly Parallel Graph Construction and Approximate Nearest Neighbor Search for GPUsCode2
Defending LLMs against Jailbreaking Attacks via BacktranslationCode2
TabDDPM: Modelling Tabular Data with Diffusion ModelsCode2
MCIBI++: Soft Mining Contextual Information Beyond Image for Semantic SegmentationCode2
RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human expertsCode2
3D LiDAR Mapping in Dynamic Environments Using a 4D Implicit Neural RepresentationCode2
Long and Short Guidance in Score identity Distillation for One-Step Text-to-Image GenerationCode2
Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive DecodingCode2
RepoHyper: Search-Expand-Refine on Semantic Graphs for Repository-Level Code CompletionCode2
MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their MixCode2
Multi-Agent Trajectory Prediction with Difficulty-Guided Feature Enhancement NetworkCode2
SRGS: Super-Resolution 3D Gaussian SplattingCode2
ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red TeamingCode2
AdaNeRF: Adaptive Sampling for Real-time Rendering of Neural Radiance FieldsCode2
Language Models can Self-Lengthen to Generate Long TextsCode2
Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective ResamplingCode2
Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace ProjectionCode2
Multi-modal Molecule Structure-text Model for Text-based Retrieval and EditingCode2
MM-Retinal: Knowledge-Enhanced Foundational Pretraining with Fundus Image-Text ExpertiseCode2
VOOM: Robust Visual Object Odometry and Mapping using Hierarchical LandmarksCode2
Lenia - Biology of Artificial LifeCode2
WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language ModelsCode2
Show:102550
← PrevPage 286 of 7094Next →