The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 14051–14100 of 474278 papers

Title	Date	Tasks	Status	Hype
EAR: Erasing Concepts from Unified Autoregressive Models	Jun 25, 2025	Image Generation	CodeCode Available	0
Industrial Energy Disaggregation with Digital Twin-generated Dataset and Efficient Data Augmentation	Jun 25, 2025	Data AugmentationNon-Intrusive Load Monitoring	CodeCode Available	0
Model Editing as a Double-Edged Sword: Steering Agent Ethical Behavior Toward Beneficence or Harm	Jun 25, 2025	Model Editing	CodeCode Available	0
Benchmarking Unsupervised Strategies for Anomaly Detection in Multivariate Time Series	Jun 25, 2025	Anomaly DetectionBenchmarking	CodeCode Available	0
The Decrypto Benchmark for Multi-Agent Reasoning and Theory of Mind	Jun 25, 2025	Multi-agent Reinforcement LearningNavigate	CodeCode Available	1
Comparative Analysis of Deep Learning Models for Crop Disease Detection: A Transfer Learning Approach	Jun 25, 2025	Deep LearningManagement	—Unverified	0
Hear No Evil: Detecting Gradient Leakage by Malicious Servers in Federated Learning	Jun 25, 2025	Federated Learning	—Unverified	0
CCRS: A Zero-Shot LLM-as-a-Judge Framework for Comprehensive RAG Evaluation	Jun 25, 2025	RAG	—Unverified	0
DeepQuark: deep-neural-network approach to multiquark bound states	Jun 25, 2025	Variational Monte Carlo	—Unverified	0
From Codicology to Code: A Comparative Study of Transformer and YOLO-based Detectors for Layout Analysis in Historical Documents	Jun 25, 2025	Document Layout Analysisobject-detection	—Unverified	0
TAPS: Tool-Augmented Personalisation via Structured Tagging	Jun 25, 2025		CodeCode Available	0
Narrative Shift Detection: A Hybrid Approach of Dynamic Topic Models and Large Language Models	Jun 25, 2025	ArticlesChange Point Detection	CodeCode Available	0
SV-LLM: An Agentic Approach for SoC Security Verification using Large Language Models	Jun 25, 2025	Code GenerationIn-Context Learning	—Unverified	0
Knowledge-Aware Diverse Reranking for Cross-Source Question Answering	Jun 25, 2025	Question AnsweringRAG	—Unverified	0
Bridging Compositional and Distributional Semantics: A Survey on Latent Semantic Geometry via AutoEncoder	Jun 25, 2025	Representation Learning	—Unverified	0
Collaborative Batch Size Optimization for Federated Learning	Jun 25, 2025	Federated Learning	—Unverified	0
SACL: Understanding and Combating Textual Bias in Code Retrieval with Semantic-Augmented Reranking and Localization	Jun 25, 2025	Code GenerationHumanEval	—Unverified	0
DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation	Jun 25, 2025	Code GenerationDenoising	CodeCode Available	4
Case-based Reasoning Augmented Large Language Model Framework for Decision Making in Realistic Safety-Critical Driving Scenarios	Jun 25, 2025	Autonomous DrivingDecision Making	—Unverified	0
Towards Community-Driven Agents for Machine Learning Engineering	Jun 25, 2025	Language ModelingLanguage Modelling	CodeCode Available	0
OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling	Jun 25, 2025	Language ModelingLanguage Modelling	CodeCode Available	2
H-FEX: A Symbolic Learning Method for Hamiltonian Systems	Jun 25, 2025	Symbolic Regression	—Unverified	0
CogGen: A Learner-Centered Generative AI Architecture for Intelligent Tutoring with Programming Video	Jun 25, 2025	Knowledge TracingVideo Segmentation	—Unverified	0
Show, Tell and Summarize: Dense Video Captioning Using Visual Cue Aided Sentence Summarization	Jun 25, 2025	Dense Video CaptioningDescriptive	—Unverified	0
Enhancing Large Language Models through Structured Reasoning	Jun 25, 2025	Decision Making	—Unverified	0
A Modular Multitask Reasoning Framework Integrating Spatio-temporal Models and LLMs	Jun 25, 2025	In-Context LearningNatural Language Queries	—Unverified	0
Mixtures of Neural Cellular Automata: A Stochastic Framework for Growth Modelling and Self-Organization	Jun 25, 2025	Image SegmentationSemantic Segmentation	—Unverified	0
Tabular Feature Discovery With Reasoning Type Exploration	Jun 25, 2025	Feature Engineering	—Unverified	0
Automatic Demonstration Selection for LLM-based Tabular Data Classification	Jun 25, 2025	In-Context LearningLanguage Modeling	—Unverified	0
How to Retrieve Examples in In-context Learning to Improve Conversational Emotion Recognition using Large Language Models?	Jun 25, 2025	Emotion RecognitionIn-Context Learning	—Unverified	0
Beyond-Expert Performance with Limited Demonstrations: Efficient Imitation Learning with Double Exploration	Jun 25, 2025	Imitation LearningMuJoCo	—Unverified	0
CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition	Jun 25, 2025	Action RecognitionDecision Making	—Unverified	0
When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs	Jun 25, 2025	Math	—Unverified	0
Vulnerability Disclosure through Adaptive Black-Box Adversarial Attacks on NIDS	Jun 25, 2025	Change Point Detectionfeature selection	—Unverified	0
Biomed-Enriched: A Biomedical Dataset Enriched with LLMs for Pretraining and Extracting Rare and Hidden Content	Jun 25, 2025	ArticlesContinual Pretraining	—Unverified	0
MEL: Multi-level Ensemble Learning for Resource-Constrained Environments	Jun 25, 2025	Ensemble Learning	—Unverified	0
Causal discovery in deterministic discrete LTI-DAE systems	Jun 25, 2025	Causal Discovery	—Unverified	0
Distilling A Universal Expert from Clustered Federated Learning	Jun 25, 2025	Federated Learning	—Unverified	0
TESSERA: Temporal Embeddings of Surface Spectra for Earth Representation and Analysis	Jun 25, 2025	Earth ObservationSelf-Supervised Learning	—Unverified	0
WallStreetFeds: Client-Specific Tokens as Investment Vehicles in Federated Learning	Jun 25, 2025	Contribution AssessmentFederated Learning	—Unverified	0
Self-Supervised Graph Learning via Spectral Bootstrapping and Laplacian-Based Augmentations	Jun 25, 2025	Graph Learning	CodeCode Available	0
AI Assistants to Enhance and Exploit the PETSc Knowledge Base	Jun 25, 2025	RAGReranking	—Unverified	0
Deciphering GunType Hierarchy through Acoustic Analysis of Gunshot Recordings	Jun 25, 2025	Gunshot Detection	—Unverified	0
Irec: A Metacognitive Scaffolding for Self-Regulated Learning through Just-in-Time Insight Recall: A Conceptual Framework and System Prototype	Jun 25, 2025	graph constructionLarge Language Model	—Unverified	0
Leveraging AI Graders for Missing Score Imputation to Achieve Accurate Ability Estimation in Constructed-Response Tests	Jun 25, 2025	Data AugmentationImputation	—Unverified	0
AALC: Large Language Model Efficient Reasoning via Adaptive Accuracy-Length Control	Jun 25, 2025	Language ModelingLanguage Modelling	CodeCode Available	0
COIN: Uncertainty-Guarding Selective Question Answering for Foundation Models with Provable Risk Guarantees	Jun 25, 2025	Conformal PredictionQuestion Answering	—Unverified	0
Causal Operator Discovery in Partial Differential Equations via Counterfactual Physics-Informed Neural Networks	Jun 25, 2025	counterfactual	—Unverified	0
Efficient Federated Learning with Encrypted Data Sharing for Data-Heterogeneous Edge Devices	Jun 25, 2025	Federated Learning	—Unverified	0
Generating and Customizing Robotic Arm Trajectories using Neural Networks	Jun 25, 2025		CodeCode Available	0