The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 18151–18200 of 474278 papers

Title	Date	Tasks	Status	Hype
Pruning Everything, Everywhere, All at Once	Jun 4, 2025	AllComputational Efficiency	CodeCode Available	0
UniCUE: Unified Recognition and Generation Framework for Chinese Cued Speech Video-to-Speech Generation	Jun 4, 2025	cross-modal alignmentLipreading	—Unverified	0
An Efficient Task-Oriented Dialogue Policy: Evolutionary Reinforcement Learning Injected by Elite Individuals	Jun 4, 2025	Deep Reinforcement LearningEvolutionary Algorithms	—Unverified	0
ConsistentChat: Building Skeleton-Guided Consistent Dialogues for Large Language Models from Scratch	Jun 4, 2025	Dialogue Generation	—Unverified	0
The Cost of Dynamic Reasoning: Demystifying AI Agents and Test-Time Scaling from an AI Infrastructure Perspective	Jun 4, 2025	AI AgentLarge Language Model	—Unverified	0
CORE: Constraint-Aware One-Step Reinforcement Learning for Simulation-Guided Neural Network Accelerator Design	Jun 4, 2025	Reinforcement Learning (RL)	—Unverified	0
Learning to Diagnose Privately: DP-Powered LLMs for Radiology Report Classification	Jun 4, 2025	Privacy Preserving	—Unverified	0
Privacy and Security Threat for OpenAI GPTs	Jun 4, 2025	Chatbot	—Unverified	0
Evaluating Apple Intelligence's Writing Tools for Privacy Against Large Language Model-Based Inference Attacks: Insights from Early Datasets	Jun 4, 2025	Language ModelingLanguage Modelling	—Unverified	0
Understanding and Meeting Practitioner Needs When Measuring Representational Harms Caused by LLM-Based Systems	Jun 4, 2025	Language ModelingLanguage Modelling	—Unverified	0
AgentMisalignment: Measuring the Propensity for Misaligned Behaviour in LLM-Based Agents	Jun 4, 2025	Large Language ModelPrompt Engineering	—Unverified	0
Facts are Harder Than Opinions -- A Multilingual, Comparative Analysis of LLM-Based Fact-Checking Reliability	Jun 4, 2025	DiversityFact Checking	—Unverified	0
Crowd-SFT: Crowdsourcing for LLM Alignment	Jun 4, 2025	FairnessModel Selection	—Unverified	0
Preface to the Special Issue of the TAL Journal on Scholarly Document Processing	Jun 4, 2025	Information RetrievalNavigate	—Unverified	0
Does Prompt Design Impact Quality of Data Imputation by LLMs?	Jun 4, 2025	Binary ClassificationImputation	—Unverified	0
Photoreal Scene Reconstruction from an Egocentric Device	Jun 4, 2025		CodeCode Available	2
SplArt: Articulation Estimation and Part-Level Reconstruction with 3D Gaussian Splatting	Jun 4, 2025	3DGS	CodeCode Available	1
Assessing Intersectional Bias in Representations of Pre-Trained Image Recognition Models	Jun 4, 2025		CodeCode Available	0
Training Cross-Morphology Embodied AI Agents: From Practical Challenges to Theoretical Foundations	Jun 4, 2025		CodeCode Available	0
Watermarking Degrades Alignment in Language Models: Analysis and Mitigation	Jun 4, 2025	Text Generation	CodeCode Available	0
TracLLM: A Generic Framework for Attributing Long Context LLMs	Jun 4, 2025	DenoisingRAG	CodeCode Available	1
POLARIS: A High-contrast Polarimetric Imaging Benchmark Dataset for Exoplanetary Disk Representation Learning	Jun 4, 2025	Representation Learning	CodeCode Available	0
Rethinking the Stability-Plasticity Trade-off in Continual Learning from an Architectural Perspective	Jun 4, 2025	Continual Learning	CodeCode Available	0
Survey of Active Learning Hyperparameters: Insights from a Large-Scale Experimental Grid	Jun 4, 2025	Active Learning	CodeCode Available	0
RAID: A Dataset for Testing the Adversarial Robustness of AI-Generated Image Detectors	Jun 4, 2025	Adversarial Robustness	CodeCode Available	0
TextAtari: 100K Frames Game Playing with Language Agents	Jun 4, 2025	Atari GamesDecision Making	CodeCode Available	0
Understanding challenges to the interpretation of disaggregated evaluations of algorithmic fairness	Jun 4, 2025	FairnessSelection bias	—Unverified	0
An Expansion-Based Approach for Quantified Integer Programming	Jun 4, 2025		CodeCode Available	0
Think Like a Person Before Responding: A Multi-Faceted Evaluation of Persona-Guided LLMs for Countering Hate	Jun 4, 2025	Language ModelingLanguage Modelling	CodeCode Available	0
Gradient Inversion Attacks on Parameter-Efficient Fine-Tuning	Jun 4, 2025	Federated Learningparameter-efficient fine-tuning	CodeCode Available	0
Diffusion Domain Teacher: Diffusion Guided Domain Adaptive Object Detector	Jun 4, 2025	Domain Adaptationobject-detection	CodeCode Available	1
Matching Markets Meet LLMs: Algorithmic Reasoning with Ranked Preferences	Jun 4, 2025	Blockingparameter-efficient fine-tuning	—Unverified	0
ReXVQA: A Large-scale Visual Question Answering Benchmark for Generalist Chest X-ray Understanding	Jun 4, 2025	NegationNegation Detection	—Unverified	0
ViTSGMM: A Robust Semi-Supervised Image Recognition Network Using Sparse Labels	Jun 4, 2025	Semi-Supervised Image Classification	CodeCode Available	0
CogniPair: From LLM Chatbots to Conscious AI Agents -- GNWT-Based Multi-Agent Digital Twins for Social Pairing -- Dating & Hiring Applications	Jun 4, 2025	Large Language Model	—Unverified	0
VLMs Can Aggregate Scattered Training Patches	Jun 4, 2025	Data Poisoning	CodeCode Available	1
Facial Appearance Capture at Home with Patch-Level Reflectance Prior	Jun 4, 2025		CodeCode Available	2
TokAlign: Efficient Vocabulary Adaptation via Token Alignment	Jun 4, 2025	SentenceText Compression	CodeCode Available	1
HtFLlib: A Comprehensive Heterogeneous Federated Learning Library and Benchmark	Jun 4, 2025	Federated LearningTransfer Learning	CodeCode Available	3
CHEER-Ekman: Fine-grained Embodied Emotion Classification	Jun 3, 2025		CodeCode Available	0
Multi-level Mixture of Experts for Multimodal Entity Linking	Jun 3, 2025		CodeCode Available	0
ByteMorph: Benchmarking Instruction-Guided Image Editing with Non-Rigid Motions	Jun 3, 2025	BenchmarkingDiversity	CodeCode Available	1
Contrast & Compress: Learning Lightweight Embeddings for Short Trajectories	Jun 3, 2025	Autonomous NavigationContrastive Learning	—Unverified	0
Investigating Quantum Feature Maps in Quantum Support Vector Machines for Lung Cancer Classification	Jun 3, 2025	Cancer ClassificationDiagnostic	—Unverified	0
Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework	Jun 3, 2025	Retrieval-augmented Generation	—Unverified	0
Bridging Neural ODE and ResNet: A Formal Error Bound for Safety Verification	Jun 3, 2025		CodeCode Available	0
RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions	Jun 3, 2025	Referring ExpressionSynthetic Data Generation	—Unverified	0
Mitigating Manipulation and Enhancing Persuasion: A Reflective Multi-Agent Approach for Legal Argument Generation	Jun 3, 2025	Hallucination	—Unverified	0
DIAMOND: An LLM-Driven Agent for Context-Aware Baseball Highlight Summarization	Jun 3, 2025	Event DetectionSports Analytics	—Unverified	0
A Multi-Agent Framework for Mitigating Dialect Biases in Privacy Policy Question-Answering Systems	Jun 3, 2025	Question Answering	—Unverified	0