The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 18001–18050 of 474278 papers

Title	Date	Tasks	Status	Hype
EuroLLM-9B: Technical Report	Jun 4, 2025	Language ModelingLanguage Modelling	—Unverified	0
EV-Flying: an Event-based Dataset for In-The-Wild Recognition of Flying Objects	Jun 4, 2025	Event-based visionObject Recognition	—Unverified	0
YOND: Practical Blind Raw Image Denoising Free from Camera-Specific Data Dependency	Jun 4, 2025	DenoisingImage Denoising	—Unverified	0
RewardAnything: Generalizable Principle-Following Reward Models	Jun 4, 2025	Instruction FollowingLarge Language Model	CodeCode Available	1
PRJ: Perception-Retrieval-Judgement for Generated Images	Jun 4, 2025	DescriptiveRetrieval	—Unverified	0
Recent Advances in Medical Image Classification	Jun 4, 2025	ClassificationExplainable artificial intelligence	—Unverified	0
DiffCAP: Diffusion-based Cumulative Adversarial Purification for Vision Language Models	Jun 4, 2025	Adversarial PurificationDenoising	—Unverified	0
FedFACT: A Provable Framework for Controllable Group-Fairness Calibration in Federated Learning	Jun 4, 2025	FairnessFederated Learning	—Unverified	0
Model Splitting Enhanced Communication-Efficient Federated Learning for CSI Feedback	Jun 4, 2025	Federated Learning	—Unverified	0
R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning	Jun 4, 2025	RAG	CodeCode Available	0
Vulnerability-Aware Alignment: Mitigating Uneven Forgetting in Harmful Fine-Tuning	Jun 4, 2025	Safety Alignment	—Unverified	0
Multi-view Surface Reconstruction Using Normal and Reflectance Cues	Jun 4, 2025	Surface Reconstruction	CodeCode Available	2
INP-Former++: Advancing Universal Anomaly Detection via Intrinsic Normal Prototypes and Residual Learning	Jun 4, 2025	Anomaly DetectionMedical Diagnosis	CodeCode Available	3
Learning from Noise: Enhancing DNNs for Event-Based Vision through Controlled Noise Injection	Jun 4, 2025	Event-based vision	CodeCode Available	0
DrSR: LLM based Scientific Equation Discovery with Dual Reasoning from Data and Experience	Jun 4, 2025	Efficient ExplorationEquation Discovery	—Unverified	0
Diffusion Transformer-based Universal Dose Denoising for Pencil Beam Scanning Proton Therapy	Jun 4, 2025	Denoising	—Unverified	0
Classifying Dental Care Providers Through Machine Learning with Features Ranking	Jun 4, 2025	feature selectionMissing Values	—Unverified	0
macOSWorld: A Multilingual Interactive Benchmark for GUI Agents	Jun 4, 2025	BenchmarkingDomain Adaptation	CodeCode Available	1
GEM: Empowering LLM for both Embedding Generation and Language Understanding	Jun 4, 2025	DecoderLarge Language Model	—Unverified	0
Hierarchical Text Classification Using Contrastive Learning Informed Path Guided Hierarchy	Jun 4, 2025	Contrastive Learningtext-classification	—Unverified	0
Replay Can Provably Increase Forgetting	Jun 4, 2025	Continual Learning	—Unverified	0
RedRFT: A Light-Weight Benchmark for Reinforcement Fine-Tuning-Based Red Teaming	Jun 4, 2025	Red Teaming	CodeCode Available	0
Backbone Augmented Training for Adaptations	Jun 4, 2025	Image GenerationText Generation	—Unverified	0
You Only Train Once	Jun 4, 2025	Semantic Segmentation	—Unverified	0
A Poisson-Guided Decomposition Network for Extreme Low-Light Image Enhancement	Jun 4, 2025	Color ConstancyDecoder	—Unverified	0
Mechanistic Decomposition of Sentence Representations	Jun 4, 2025	Dictionary LearningSentence	—Unverified	0
MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP	Jun 4, 2025	BenchmarkingLanguage Modelling	—Unverified	0
Building a Few-Shot Cross-Domain Multilingual NLU Model for Customer Care	Jun 4, 2025	Intent DetectionKnowledge Distillation	—Unverified	0
MedAgentGym: Training LLM Agents for Code-Based Medical Reasoning at Scale	Jun 4, 2025	BenchmarkingLanguage Modeling	—Unverified	0
Unpacking Let Alone: Human-Scale Models Generalize to a Rare Construction in Form but not Meaning	Jun 4, 2025	Form	—Unverified	0
Zero-Shot Open-Schema Entity Structure Discovery	Jun 4, 2025	Attributegraph construction	—Unverified	0
SQLens: An End-to-End Framework for Error Detection and Correction in Text-to-SQL	Jun 4, 2025	Text to SQLText-To-SQL	—Unverified	0
DRE: An Effective Dual-Refined Method for Integrating Small and Large Language Models in Open-Domain Dialogue Evaluation	Jun 4, 2025	Dialogue Evaluationvalid	—Unverified	0
Automated Skill Discovery for Language Agents through Exploration and Iterative Feedback	Jun 4, 2025	Large Language Model	—Unverified	0
Plugging Schema Graph into Multi-Table QA: A Human-Guided Framework for Reducing LLM Reliance	Jun 4, 2025	Question AnsweringSemantic Similarity	—Unverified	0
Schema Generation for Large Knowledge Graphs Using Large Language Models	Jun 4, 2025	Knowledge Graphs	—Unverified	0
Knowledge-guided Contextual Gene Set Analysis Using Large Language Models	Jun 4, 2025	Benchmarking	—Unverified	0
Bridging the Performance Gap Between Target-Free and Target-Based Reinforcement Learning With Iterated Q-Learning	Jun 4, 2025	Q-Learning	—Unverified	0
The Latent Space Hypothesis: Toward Universal Medical Representation Learning	Jun 4, 2025	Continual LearningRepresentation Learning	—Unverified	0
Quantum-Inspired Genetic Optimization for Patient Scheduling in Radiation Oncology	Jun 4, 2025	Scheduling	—Unverified	0
Relational reasoning and inductive bias in transformers trained on a transitive inference task	Jun 4, 2025	In-Context LearningInductive Bias	—Unverified	0
A Lyapunov Drift-Plus-Penalty Method Tailored for Reinforcement Learning with Queue Stability	Jun 4, 2025	reinforcement-learningReinforcement Learning	—Unverified	0
Short-Term Power Demand Forecasting for Diverse Consumer Types to Enhance Grid Planning and Synchronisation	Jun 4, 2025	Demand ForecastingEarth Observation	—Unverified	0
Deep learning for predicting hauling fleet production capacity under uncertainties in open pit mines using real and simulated data	Jun 4, 2025	Scheduling	—Unverified	0
Self-Supervised Contrastive Learning is Approximately Supervised Contrastive Learning	Jun 4, 2025	Contrastive Learning	—Unverified	0
KOALA++: Efficient Kalman-Based Optimization of Neural Networks with Gradient-Covariance Products	Jun 4, 2025	image-classificationImage Classification	—Unverified	0
RETRO SYNFLOW: Discrete Flow Matching for Accurate and Diverse Single-Step Retrosynthesis	Jun 4, 2025	RetrosynthesisSingle-step retrosynthesis	—Unverified	0
Selective Matching Losses -- Not All Scores Are Created Equal	Jun 4, 2025	AllSensitivity	—Unverified	0
AUTOCT: Automating Interpretable Clinical Trial Prediction with LLM Agents	Jun 4, 2025	Drug DiscoveryPrediction	—Unverified	0
Beyond Memorization: A Rigorous Evaluation Framework for Medical Knowledge Editing	Jun 4, 2025	knowledge editingMemorization	CodeCode Available	0