SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 98519875 of 474278 papers

TitleStatusHype
WeakSAM: Segment Anything Meets Weakly-supervised Instance-level RecognitionCode2
Stable Neural Stochastic Differential Equations in Analyzing Irregular Time Series DataCode2
Data Science with LLMs and Interpretable ModelsCode2
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language ModelsCode2
tinyBenchmarks: evaluating LLMs with fewer examplesCode2
Symbolic Music Generation with Non-Differentiable Rule Guided DiffusionCode2
D-Flow: Differentiating through Flows for Controlled GenerationCode2
Coercing LLMs to do and reveal (almost) anythingCode2
Deep Generative Models for Offline Policy Learning: Tutorial, Survey, and Perspectives on Future DirectionsCode2
T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory StitchingCode2
Full-Atom Peptide Design with Geometric Latent DiffusionCode2
Ouroboros: Generating Longer Drafts Phrase by Phrase for Faster Speculative DecodingCode2
ActiveRAG: Autonomously Knowledge Assimilation and Accommodation through Retrieval-Augmented AgentsCode2
FanOutQA: A Multi-Hop, Multi-Document Question Answering Benchmark for Large Language ModelsCode2
Geometry-Informed Neural NetworksCode2
PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action ChainCode2
A Comprehensive Study of Jailbreak Attack versus Defense for Large Language ModelsCode2
OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific ProblemsCode2
Self-Distillation Bridges Distribution Gap in Language Model Fine-TuningCode2
Neeko: Leveraging Dynamic LoRA for Efficient Multi-Character Role-Playing AgentCode2
VOOM: Robust Visual Object Odometry and Mapping using Hierarchical LandmarksCode2
GradSafe: Detecting Jailbreak Prompts for LLMs via Safety-Critical Gradient AnalysisCode2
RhythmFormer: Extracting Patterned rPPG Signals based on Periodic Sparse AttentionCode2
EMO-SUPERB: An In-depth Look at Speech Emotion RecognitionCode2
Transformer tricks: Precomputing the first layerCode2
Show:102550
← PrevPage 395 of 18972Next →