The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 14201–14250 of 474278 papers

Title	Date	Tasks	Status	Hype
Lost in Translation? Converting RegExes for Log Parsing into Dynatrace Pattern Language	Jun 24, 2025	Log Parsing	—Unverified	0
Uncovering Conceptual Blindspots in Generative Image Models Using Sparse Autoencoders	Jun 24, 2025	Memorization	—Unverified	0
MATE: LLM-Powered Multi-Agent Translation Environment for Accessibility Applications	Jun 24, 2025		CodeCode Available	1
PEVLM: Parallel Encoding for Vision-Language Models	Jun 24, 2025	Autonomous DrivingVideo Understanding	—Unverified	0
Spotting Out-of-Character Behavior: Atomic-Level Evaluation of Persona Fidelity in Open-Ended Generation	Jun 24, 2025	Text Generation	CodeCode Available	0
KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality	Jun 24, 2025	HallucinationHallucination Evaluation	CodeCode Available	1
Video Compression for Spatiotemporal Earth System Data	Jun 24, 2025	Earth ObservationVideo Compression	CodeCode Available	2
Towards an Introspective Dynamic Model of Globally Distributed Computing Infrastructures	Jun 24, 2025	Distributed ComputingManagement	—Unverified	0
EvDetMAV: Generalized MAV Detection from Moving Event Cameras	Jun 24, 2025		CodeCode Available	1
Diffusion-based Task-oriented Semantic Communications with Model Inversion Attack	Jun 24, 2025	Semantic CommunicationSSIM	—Unverified	0
Can One Safety Loop Guard Them All? Agentic Guard Rails for Federated Computing	Jun 24, 2025	AllPrivacy Preserving	—Unverified	0
MILAAP: Mobile Link Allocation via Attention-based Prediction	Jun 24, 2025	PredictionScheduling	—Unverified	0
MAIZX: A Carbon-Aware Framework for Optimizing Cloud Computing Emissions	Jun 24, 2025	Cloud ComputingEdge-computing	—Unverified	0
A Principled Path to Fitted Distributional Evaluation	Jun 24, 2025	Atari GamesOff-policy evaluation	—Unverified	0
Controlled Retrieval-augmented Context Evaluation for Long-form RAG	Jun 24, 2025	DiagnosticForm	—Unverified	0
Sampling Matters in Explanations: Towards Trustworthy Attribution Analysis Building Block in Visual Models through Maximizing Explanation Certainty	Jun 24, 2025	Image Attribution	—Unverified	0
Higher-Order Neuromorphic Ising Machines -- Autoencoders and Fowler-Nordheim Annealers are all you need for Scalability	Jun 24, 2025	AllCombinatorial Optimization	—Unverified	0
HARPT: A Corpus for Analyzing Consumers' Trust and Privacy Concerns in Mobile Health Apps	Jun 24, 2025	Data Augmentation	—Unverified	0
SoK: Can Synthetic Images Replace Real Data? A Survey of Utility and Privacy of Synthetic Image Generation	Jun 24, 2025	Image GenerationPrivacy Preserving	—Unverified	0
Machine Learning with Privacy for Protected Attributes	Jun 24, 2025	Attribute	—Unverified	0
Private Model Personalization Revisited	Jun 24, 2025	Binary ClassificationFederated Learning	—Unverified	0
SMARTIES: Spectrum-Aware Multi-Sensor Auto-Encoder for Remote Sensing Images	Jun 24, 2025		CodeCode Available	1
Consensus-Driven Uncertainty for Robotic Grasping based on RGB Perception	Jun 24, 2025	ObjectPose Estimation	CodeCode Available	0
Self-Supervised Multimodal NeRF for Autonomous Driving	Jun 24, 2025	Autonomous DrivingNeRF	CodeCode Available	1
PocketVina Enables Scalable and Highly Accurate Physically Valid Docking through Multi-Pocket Conditioning	Jun 24, 2025	BenchmarkingDrug Discovery	CodeCode Available	2
CoVE: Compressed Vocabulary Expansion Makes Better LLM-based Recommender Systems	Jun 24, 2025	Recommendation Systems	CodeCode Available	0
EBC-ZIP: Improving Blockwise Crowd Counting with Zero-Inflated Poisson Regression	Jun 24, 2025	Crowd CountingDensity Estimation	CodeCode Available	1
One Prototype Is Enough: Single-Prototype Activation for Interpretable Image Classification	Jun 24, 2025	Classificationimage-classification	CodeCode Available	0
Identifying Physically Realizable Triggers for Backdoored Face Recognition Networks	Jun 24, 2025	Face Recognition	—Unverified	0
WebGuard++:Interpretable Malicious URL Detection via Bidirectional Fusion of HTML Subgraphs and Multi-Scale Convolutional BERT	Jun 24, 2025	Contrastive LearningSpecificity	—Unverified	0
Network Structures as an Attack Surface: Topology-Based Privacy Leakage in Federated Learning	Jun 24, 2025	Federated Learning	—Unverified	0
KnowML: Improving Generalization of ML-NIDS with Attack Knowledge Graphs	Jun 24, 2025	Intrusion DetectionKnowledge Graphs	—Unverified	0
PrivacyXray: Detecting Privacy Breaches in LLMs through Semantic Consistency and Probability Certainty	Jun 24, 2025	Semantic SimilaritySemantic Textual Similarity	—Unverified	0
Recalling The Forgotten Class Memberships: Unlearned Models Can Be Noisy Labelers to Leak Privacy	Jun 24, 2025	Knowledge DistillationLearning with noisy labels	—Unverified	0
From Reproduction to Replication: Evaluating Research Agents with Progressive Code Masking	Jun 24, 2025	Code Generationscientific discovery	CodeCode Available	0
Assessing Risk of Stealing Proprietary Models for Medical Imaging Tasks	Jun 24, 2025	Diagnostic	CodeCode Available	0
Machine-Learning-Assisted Photonic Device Development: A Multiscale Approach from Theory to Characterization	Jun 24, 2025	Active LearningBayesian Optimization	—Unverified	0
Fast and Distributed Equivariant Graph Neural Networks by Virtual Node Learning	Jun 24, 2025	Graph Learning	CodeCode Available	1
ToSA: Token Merging with Spatial Awareness	Jun 24, 2025	Embodied Question AnsweringQuestion Answering	CodeCode Available	0
An ab initio foundation model of wavefunctions that accurately describes chemical bond breaking	Jun 24, 2025		CodeCode Available	2
Quantum Neural Networks for Propensity Score Estimation and Survival Analysis in Observational Biomedical Studies	Jun 24, 2025	Causal InferenceSelection bias	—Unverified	0
Elucidated Rolling Diffusion Models for Probabilistic Weather Forecasting	Jun 24, 2025	DenoisingWeather Forecasting	CodeCode Available	1
Context Attribution with Multi-Armed Bandit Optimization	Jun 24, 2025	Thompson Sampling	—Unverified	0
DiaLLMs: EHR Enhanced Clinical Conversational System for Clinical Test Recommendation and Diagnosis Prediction	Jun 24, 2025	Prediction	—Unverified	0
Persona-Assigned Large Language Models Exhibit Human-Like Motivated Reasoning	Jun 24, 2025	Misinformation	CodeCode Available	0
Achieving Trustworthy Real-Time Decision Support Systems with Low-Latency Interpretable AI Models	Jun 24, 2025	Decision Making	—Unverified	0
The Most Important Features in Generalized Additive Models Might Be Groups of Features	Jun 24, 2025	Additive modelsInterpretable Machine Learning	—Unverified	0
Robotics Under Construction: Challenges on Job Sites	Jun 24, 2025	Autonomous Navigation	—Unverified	0
Position: Machine Learning Conferences Should Establish a "Refutations and Critiques" Track	Jun 24, 2025	Position	—Unverified	0
MNN-AECS: Energy Optimization for LLM Decoding on Mobile Devices via Adaptive Core Selection	Jun 24, 2025	CPULarge Language Model	—Unverified	0