The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 15351–15400 of 474278 papers

Title	Date	Tasks	Status	Hype
Rethinking Machine Unlearning in Image Generation Models	Jun 3, 2025	BenchmarkingImage Generation	CodeCode Available	1
ByteMorph: Benchmarking Instruction-Guided Image Editing with Non-Rigid Motions	Jun 3, 2025	BenchmarkingDiversity	CodeCode Available	1
TL;DR: Too Long, Do Re-weighting for Efficient LLM Reasoning Compression	Jun 3, 2025		CodeCode Available	1
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation	Jun 3, 2025	Question Answering	CodeCode Available	1
EPFL-Smart-Kitchen-30: Densely annotated cooking dataset with 3D kinematics to challenge video and language models	Jun 2, 2025	Action RecognitionAction Segmentation	CodeCode Available	1
OD3: Optimization-free Dataset Distillation for Object Detection	Jun 2, 2025	Dataset Distillationimage-classification	CodeCode Available	1
Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models	Jun 2, 2025	Instruction FollowingReinforcement Learning (RL)	CodeCode Available	1
WHEN TO ACT, WHEN TO WAIT: Modeling Structural Trajectories for Intent Triggerability in Task-Oriented Dialogue	Jun 2, 2025	Task-Oriented Dialogue Systems	CodeCode Available	1
Silence is Golden: Leveraging Adversarial Examples to Nullify Audio Control in LDM-based Talking-Head Generation	Jun 2, 2025	MisinformationTalking Head Generation	CodeCode Available	1
Polishing Every Facet of the GEM: Testing Linguistic Competence of LLMs and Humans in Korean	Jun 2, 2025	Multiple-choice	CodeCode Available	1
STORM-BORN: A Challenging Mathematical Derivations Dataset Curated via a Human-in-the-Loop Multi-Agent Framework	Jun 2, 2025	Math	CodeCode Available	1
EfficientFER: EfficientNetv2 Based Deep Learning Approach for Facial Expression Recognition	Jun 2, 2025	Deep LearningEmotion Recognition	CodeCode Available	1
Exploring the Potential of LLMs as Personalized Assistants: Dataset, Evaluation, and Analysis	Jun 2, 2025		CodeCode Available	1
scDataset: Scalable Data Loading for Deep Learning on Large-Scale Single-Cell Omics	Jun 2, 2025		CodeCode Available	1
SEMNAV: A Semantic Segmentation-Driven Approach to Visual Semantic Navigation	Jun 2, 2025	Domain AdaptationNavigate	CodeCode Available	1
SAM-I2V: Upgrading SAM to Support Promptable Video Segmentation with Less than 0.2% Training Cost	Jun 2, 2025	Image SegmentationSemantic Segmentation	CodeCode Available	1
GLoSS: Generative Language Models with Semantic Search for Sequential Recommendation	Jun 2, 2025	Sequential Recommendation	CodeCode Available	1
TimeGraph: Synthetic Benchmark Datasets for Robust Time-Series Causal Discovery	Jun 2, 2025	Causal DiscoveryDataset Generation	CodeCode Available	1
Datasheets Aren't Enough: DataRubrics for Automated Quality Metrics and Accountability	Jun 2, 2025	DescriptiveSynthetic Data Generation	CodeCode Available	1
IF-GUIDE: Influence Function-Guided Detoxification of LLMs	Jun 2, 2025		CodeCode Available	1
AIMSCheck: Leveraging LLMs for AI-Assisted Review of Modern Slavery Statements Across Jurisdictions	Jun 2, 2025		CodeCode Available	1
SPACE: Your Genomic Profile Predictor is a Powerful DNA Foundation Model	Jun 2, 2025	Mixture-of-ExpertsUnsupervised Pre-training	CodeCode Available	1
Crowdsourcing MUSHRA Tests in the Age of Generative Speech Technologies: A Comparative Analysis of Subjective and Objective Testing Methods	Jun 1, 2025		CodeCode Available	1
LEMONADE: A Large Multilingual Expert-Annotated Abstractive Event Dataset for the Real World	Jun 1, 2025	document understandingEntity Linking	CodeCode Available	1
Protap: A Benchmark for Protein Modeling on Realistic Downstream Applications	Jun 1, 2025		CodeCode Available	1
CODEMENV: Benchmarking Large Language Models on Code Migration	Jun 1, 2025	Benchmarking	CodeCode Available	1
PFMBench: Protein Foundation Model Benchmark	Jun 1, 2025	model	CodeCode Available	1
IRT-Router: Effective and Interpretable Multi-LLM Routing via Item Response Theory	Jun 1, 2025	Semantic SimilaritySemantic Textual Similarity	CodeCode Available	1
Reasoning Like an Economist: Post-Training on Economic Problems Induces Strategic Generalization in LLMs	May 31, 2025		CodeCode Available	1
Con Instruction: Universal Jailbreaking of Multimodal Large Language Models via Non-Textual Modalities	May 31, 2025	ARC	CodeCode Available	1
MIRROR: Cognitive Inner Monologue Between Conversational Turns for Persistent Reflection and Reasoning in Conversational LLMs	May 31, 2025		CodeCode Available	1
Look mom, no experimental data! Learning to score protein-ligand interactions from simulations	May 31, 2025		CodeCode Available	1
A Brain Graph Foundation Model: Pre-Training and Prompt-Tuning for Any Atlas and Disorder	May 31, 2025	Contrastive LearningMeta-Learning	CodeCode Available	1
An LLM Agent for Functional Bug Detection in Network Protocols	May 31, 2025		CodeCode Available	1
AVROBUSTBENCH: Benchmarking the Robustness of Audio-Visual Recognition Models at Test-Time	May 31, 2025	BenchmarkingTest-time Adaptation	CodeCode Available	1
PAKTON: A Multi-Agent Framework for Question Answering in Long Legal Agreements	May 31, 2025	Privacy PreservingQuestion Answering	CodeCode Available	1
dpmm: Differentially Private Marginal Models, a Library for Synthetic Tabular Data Generation	May 31, 2025	Synthetic Data GenerationTabular Data Generation	CodeCode Available	1
SEED: A Benchmark Dataset for Sequential Facial Attribute Editing with Diffusion Models	May 31, 2025	AttributeFacial Editing	CodeCode Available	1
DefenderBench: A Toolkit for Evaluating Language Agents in Cybersecurity Environments	May 31, 2025	Large Language Model	CodeCode Available	1
Neuro2Semantic: A Transfer Learning Framework for Semantic Reconstruction of Continuous Language from Human Intracranial EEG	May 31, 2025	EEGText Generation	CodeCode Available	1
Synergizing LLMs with Global Label Propagation for Multimodal Fake News Detection	May 31, 2025	Fake News Detection	CodeCode Available	1
DrVD-Bench: Do Vision-Language Models Reason Like Human Doctors in Medical Image Diagnosis?	May 30, 2025	DiagnosticMedical Image Analysis	CodeCode Available	1
Bench4KE: Benchmarking Automated Competency Question Generation	May 30, 2025	BenchmarkingQuestion Generation	CodeCode Available	1
CL-LoRA: Continual Low-Rank Adaptation for Rehearsal-Free Class-Incremental Learning	May 30, 2025	class-incremental learningClass Incremental Learning	CodeCode Available	1
Timing is Important: Risk-aware Fund Allocation based on Time-Series Forecasting	May 30, 2025	Time SeriesTime Series Forecasting	CodeCode Available	1
Can Slow-thinking LLMs Reason Over Time? Empirical Studies in Time Series Forecasting	May 30, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
Chameleon: A MatMul-Free Temporal Convolutional Network Accelerator for End-to-End Few-Shot and Continual Learning from Sequential Data	May 30, 2025	Continual LearningFew-Shot Learning	CodeCode Available	1
A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource Settings	May 30, 2025	Math	CodeCode Available	1
Weakly-Supervised Affordance Grounding Guided by Part-Level Semantic Priors	May 30, 2025	Human-Object Interaction DetectionSemantic Segmentation	CodeCode Available	1
VideoCAD: A Large-Scale Video Dataset for Learning UI Interactions and 3D Reasoning from CAD Software	May 30, 2025	Question AnsweringSpatial Reasoning	CodeCode Available	1