The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 9851–9875 of 474278 papers

Title	Date	Tasks	Status	Hype
WeakSAM: Segment Anything Meets Weakly-supervised Instance-level Recognition	Feb 22, 2024	Image-level Supervised Instance Segmentationobject-detection	CodeCode Available	2
Stable Neural Stochastic Differential Equations in Analyzing Irregular Time Series Data	Feb 22, 2024	Irregular Time SeriesMissing Values	CodeCode Available	2
Data Science with LLMs and Interpretable Models	Feb 22, 2024	Additive modelsQuestion Answering	CodeCode Available	2
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models	Feb 22, 2024	AllMixture-of-Experts	CodeCode Available	2
tinyBenchmarks: evaluating LLMs with fewer examples	Feb 22, 2024	MMLUMultiple-choice	CodeCode Available	2
Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion	Feb 22, 2024	Music Generation	CodeCode Available	2
D-Flow: Differentiating through Flows for Controlled Generation	Feb 21, 2024		CodeCode Available	2
Coercing LLMs to do and reveal (almost) anything	Feb 21, 2024		CodeCode Available	2
Deep Generative Models for Offline Policy Learning: Tutorial, Survey, and Perspectives on Future Directions	Feb 21, 2024	Decision MakingImitation Learning	CodeCode Available	2
T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching	Feb 21, 2024	Image Generation	CodeCode Available	2
Full-Atom Peptide Design with Geometric Latent Diffusion	Feb 21, 2024		CodeCode Available	2
Ouroboros: Generating Longer Drafts Phrase by Phrase for Faster Speculative Decoding	Feb 21, 2024	Text Generation	CodeCode Available	2
ActiveRAG: Autonomously Knowledge Assimilation and Accommodation through Retrieval-Augmented Agents	Feb 21, 2024	Active LearningPosition	CodeCode Available	2
FanOutQA: A Multi-Hop, Multi-Document Question Answering Benchmark for Large Language Models	Feb 21, 2024	Question Answering	CodeCode Available	2
Geometry-Informed Neural Networks	Feb 21, 2024	Diversity	CodeCode Available	2
PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain	Feb 21, 2024	Autonomous DrivingDecision Making	CodeCode Available	2
A Comprehensive Study of Jailbreak Attack versus Defense for Large Language Models	Feb 21, 2024		CodeCode Available	2
OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems	Feb 21, 2024	Logical Fallacies	CodeCode Available	2
Self-Distillation Bridges Distribution Gap in Language Model Fine-Tuning	Feb 21, 2024	Instruction FollowingLanguage Modeling	CodeCode Available	2
Neeko: Leveraging Dynamic LoRA for Efficient Multi-Character Role-Playing Agent	Feb 21, 2024	Incremental Learning	CodeCode Available	2
VOOM: Robust Visual Object Odometry and Mapping using Hierarchical Landmarks	Feb 21, 2024	Computational EfficiencyObject	CodeCode Available	2
GradSafe: Detecting Jailbreak Prompts for LLMs via Safety-Critical Gradient Analysis	Feb 21, 2024		CodeCode Available	2
RhythmFormer: Extracting Patterned rPPG Signals based on Periodic Sparse Attention	Feb 20, 2024		CodeCode Available	2
EMO-SUPERB: An In-depth Look at Speech Emotion Recognition	Feb 20, 2024	Emotion RecognitionSelf-Supervised Learning	CodeCode Available	2
Transformer tricks: Precomputing the first layer	Feb 20, 2024		CodeCode Available	2