The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 15651–15700 of 474278 papers

Title	Date	Tasks	Status	Hype
Training-free LLM Merging for Multi-task Learning	Jun 14, 2025	Multiple-choiceMulti-Task Learning	CodeCode Available	0
The SWE-Bench Illusion: When State-of-the-Art LLMs Remember Instead of Reason	Jun 14, 2025	DiagnosticMemorization	—Unverified	0
Information fusion strategy integrating pre-trained language model and contrastive learning for materials knowledge mining	Jun 14, 2025	Contrastive LearningLanguage Modeling	—Unverified	0
Model Merging for Knowledge Editing	Jun 14, 2025	knowledge editingmodel	CodeCode Available	0
Domain Generalization for Person Re-identification: A Survey Towards Domain-Agnostic Person Matching	Jun 14, 2025	Domain GeneralizationPerson Re-Identification	CodeCode Available	1
Robust LLM Unlearning with MUDMAN: Meta-Unlearning with Disruption Masking And Normalization	Jun 14, 2025	Meta-LearningTAR	CodeCode Available	0
TagRouter: Learning Route to LLMs through Tags for Open-Domain Text Generation Tasks	Jun 14, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
ConsistencyChecker: Tree-based Evaluation of LLM Generalization Capabilities	Jun 14, 2025	Machine Translation	CodeCode Available	0
Delving into Instance-Dependent Label Noise in Graph Data: A Comprehensive Study and Benchmark	Jun 14, 2025	BenchmarkingGraph Learning	CodeCode Available	0
AgentOrchestra: A Hierarchical Multi-Agent Framework for General-Purpose Task Solving	Jun 14, 2025		CodeCode Available	7
A Comprehensive Survey of Deep Research: Systems, Methodologies, and Applications	Jun 14, 2025	Information RetrievalSurvey	CodeCode Available	3
Trust-MARL: Trust-Based Multi-Agent Reinforcement Learning Framework for Cooperative On-Ramp Merging Control in Heterogeneous Traffic Flow	Jun 14, 2025	Multi-agent Reinforcement Learning	—Unverified	0
GroupNL: Low-Resource and Robust CNN Design over Cloud and Device	Jun 14, 2025	GPU	—Unverified	0
BSA: Ball Sparse Attention for Large-scale Geometries	Jun 14, 2025		CodeCode Available	1
Mitigating Non-Target Speaker Bias in Guided Speaker Embedding	Jun 14, 2025	Speaker Verification	—Unverified	0
Optimized Spectral Fault Receptive Fields for Diagnosis-Informed Prognosis	Jun 14, 2025	Fault DiagnosisPrognosis	—Unverified	0
Revisiting Clustering of Neural Bandits: Selective Reinitialization for Mitigating Loss of Plasticity	Jun 14, 2025	Change DetectionClustering	—Unverified	0
Restoring Gaussian Blurred Face Images for Deanonymization Attacks	Jun 14, 2025	DeblurringFace Anonymization	—Unverified	0
QGuard:Question-based Zero-shot Guard for Multi-modal LLM Safety	Jun 14, 2025		—Unverified	0
Towards Fairness Assessment of Dutch Hate Speech Detection	Jun 14, 2025	counterfactualFairness	—Unverified	0
Advances in LLMs with Focus on Reasoning, Adaptability, Efficiency and Ethics	Jun 14, 2025	Computational EfficiencyEthics	—Unverified	0
Style-based Composer Identification and Attribution of Symbolic Music Scores: a Systematic Survey	Jun 14, 2025	Authorship Attribution	—Unverified	0
Component Based Quantum Machine Learning Explainability	Jun 14, 2025	Quantum Machine Learning	—Unverified	0
ReFrame: Layer Caching for Accelerated Inference in Real-Time Rendering	Jun 14, 2025	DecoderDenoising	—Unverified	0
Semivalue-based data valuation is arbitrary and gameable	Jun 14, 2025	Data Valuation	—Unverified	0
From Human to Machine Psychology: A Conceptual Framework for Understanding Well-Being in Large Language Model	Jun 14, 2025	Language ModelingLanguage Modelling	—Unverified	0
Improving Factuality for Dialogue Response Generation via Graph-Based Knowledge Augmentation	Jun 14, 2025	Response Generation	—Unverified	0
Learning Best Paths in Quantum Networks	Jun 14, 2025	Benchmarking	—Unverified	0
Deep Fusion of Ultra-Low-Resolution Thermal Camera and Gyroscope Data for Lighting-Robust and Compute-Efficient Rotational Odometry	Jun 14, 2025	Computational EfficiencySensor Fusion	—Unverified	0
AntiGrounding: Lifting Robotic Actions into VLM Representation Space for Decision Making	Jun 14, 2025	Decision MakingQuestion Answering	—Unverified	0
Phonikud: Hebrew Grapheme-to-Phoneme Conversion for Real-Time Text-to-Speech	Jun 14, 2025	Grapheme-to-Phoneme Conversiontext-to-speech	—Unverified	0
OscNet v1.5: Energy Efficient Hopfield Network on CMOS Oscillators for Image Classification	Jun 14, 2025	image-classificationImage Classification	CodeCode Available	0
CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following	Jun 14, 2025	Beat TrackingGenre classification	—Unverified	0
MatchPlant: An Open-Source Pipeline for UAV-Based Single-Plant Detection and Data Extraction	Jun 14, 2025	object-detectionObject Detection	CodeCode Available	0
MEraser: An Effective Fingerprint Erasure Approach for Large Language Models	Jun 14, 2025		CodeCode Available	0
Pushing the Limits of Safety: A Technical Report on the ATLAS Challenge 2025	Jun 14, 2025		CodeCode Available	0
Inference-Time Gaze Refinement for Micro-Expression Recognition: Enhancing Event-Based Eye Tracking with Motion-Aware Post-Processing	Jun 14, 2025	Gaze EstimationMicro Expression Recognition	CodeCode Available	0
ANIRA: An Architecture for Neural Network Inference in Real-Time Audio Applications	Jun 14, 2025	Benchmarking	CodeCode Available	3
SplashNet: Split-and-Share Encoders for Accurate and Efficient Typing with Surface Electromyography	Jun 14, 2025		CodeCode Available	0
Fairness Research For Machine Learning Should Integrate Societal Considerations	Jun 14, 2025	Fairness	—Unverified	0
GSDNet: Revisiting Incomplete Multimodal-Diffusion from Graph Spectrum Perspective for Conversation Emotion Recognition	Jun 14, 2025	Emotion RecognitionModality completion	—Unverified	0
Information Suppression in Large Language Models: Auditing, Quantifying, and Characterizing Censorship in DeepSeek	Jun 14, 2025	Language ModelingLanguage Modelling	—Unverified	0
DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under Uncertainty	Jun 14, 2025	continuous-controlContinuous Control	CodeCode Available	0
StreamMel: Real-Time Zero-shot Text-to-Speech via Interleaved Continuous Autoregressive Modeling	Jun 14, 2025	text-to-speechText to Speech	—Unverified	0
Optimizing Blood Transfusions and Predicting Shortages in Resource-Constrained Areas	Jun 14, 2025	Predictionregression	—Unverified	0
Optimizing Federated Learning using Remote Embeddings for Graph Neural Networks	Jun 14, 2025	Federated Learning	—Unverified	0
How Grounded is Wikipedia? A Study on Structured Evidential Support	Jun 14, 2025	Articles	CodeCode Available	0
Feeling Machines: Ethics, Culture, and the Rise of Emotional AI	Jun 14, 2025	EthicsNavigate	—Unverified	0
IndoorWorld: Integrating Physical Task Solving and Social Simulation in A Heterogeneous Multi-Agent Environment	Jun 14, 2025	AI Agent	—Unverified	0
InverTune: Removing Backdoors from Multimodal Contrastive Learning Models via Trigger Inversion and Activation Tuning	Jun 14, 2025	backdoor defenseContrastive Learning	—Unverified	0