The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 18751–18800 of 474278 papers

Title	Date	Tasks	Status	Hype
CODEMENV: Benchmarking Large Language Models on Code Migration	Jun 1, 2025	Benchmarking	CodeCode Available	1
Higher-Order Responsibility	Jun 1, 2025	Decision MakingEthics	—Unverified	0
CoBRA: Quantifying Strategic Language Use and LLM Pragmatics	Jun 1, 2025		CodeCode Available	0
PromptVFX: Text-Driven Fields for Open-World 3D Gaussian Animation	Jun 1, 2025		CodeCode Available	0
Mamba Drafters for Speculative Decoding	Jun 1, 2025	Large Language ModelMamba	—Unverified	0
A Large Language Model-Supported Threat Modeling Framework for Transportation Cyber-Physical Systems	Jun 1, 2025	In-Context LearningLanguage Modeling	—Unverified	0
SafeGenes: Evaluating the Adversarial Robustness of Genomic Foundation Models	Jun 1, 2025	Adversarial Robustness	—Unverified	0
FedRPCA: Enhancing Federated LoRA Aggregation Using Robust PCA	Jun 1, 2025	Federated LearningTask Arithmetic	—Unverified	0
IRT-Router: Effective and Interpretable Multi-LLM Routing via Item Response Theory	Jun 1, 2025	Semantic SimilaritySemantic Textual Similarity	CodeCode Available	1
Mispronunciation Detection Without L2 Pronunciation Dataset in Low-Resource Setting: A Case Study in Finland Swedish	Jun 1, 2025		CodeCode Available	0
Graph Neural Networks for Jamming Source Localization	Jun 1, 2025	feature selectiongraph construction	CodeCode Available	0
EvoGit: Decentralized Code Evolution via Git-Based Multi-Agent Collaboration	Jun 1, 2025		CodeCode Available	5
Probing the Geometry of Truth: Consistency and Generalization of Truth Directions in LLMs Across Logical Transformations and Question Answering Tasks	Jun 1, 2025	In-Context LearningNegation	CodeCode Available	0
Will Agents Replace Us? Perceptions of Autonomous Multi-Agent AI	Jun 1, 2025	AI Agent	CodeCode Available	0
ACCESS DENIED INC: The First Benchmark Environment for Sensitivity Awareness	Jun 1, 2025	BenchmarkingManagement	CodeCode Available	0
Learning DNF through Generalized Fourier Representations	Jun 1, 2025	Learning Theory	—Unverified	0
No Soundness in the Real World: On the Challenges of the Verification of Deployed Neural Networks	Jun 1, 2025		CodeCode Available	0
Unfolding Boxes with Local Constraints	Jun 1, 2025		CodeCode Available	0
RARE: Retrieval-Aware Robustness Evaluation for Retrieval-Augmented Generation Systems	Jun 1, 2025	RAGRetrieval	CodeCode Available	0
CAPAA: Classifier-Agnostic Projector-Based Adversarial Attack	Jun 1, 2025	Adversarial Attack	CodeCode Available	0
Regulatory Graphs and GenAI for Real-Time Transaction Monitoring and Compliance Explanation in Banking	Jun 1, 2025	Graph Neural NetworkRetrieval-augmented Generation	—Unverified	0
MedBookVQA: A Systematic and Comprehensive Medical Benchmark Derived from Open-Access Book	Jun 1, 2025	Benchmarking	CodeCode Available	0
Quantization-based Bounds on the Wasserstein Metric	Jun 1, 2025	Computational EfficiencyDomain Adaptation	—Unverified	0
LEMONADE: A Large Multilingual Expert-Annotated Abstractive Event Dataset for the Real World	Jun 1, 2025	document understandingEntity Linking	CodeCode Available	1
How Neural Networks Organize Concepts: Introducing Concept Trajectory Analysis for Deep Learning Interpretability	Jun 1, 2025	Bias Detection	CodeCode Available	0
Adapting General-Purpose Embedding Models to Private Datasets Using Keyword-based Retrieval	May 31, 2025		CodeCode Available	0
Concept-Centric Token Interpretation for Vector-Quantized Generative Models	May 31, 2025		CodeCode Available	0
ChartGen: Scaling Chart Understanding Via Code-Guided Synthetic Chart Generation	May 31, 2025		CodeCode Available	0
Mitigating Plasticity Loss in Continual Reinforcement Learning by Reducing Churn	May 31, 2025	Continual LearningOpenAI Gym	—Unverified	0
iDPA: Instance Decoupled Prompt Attention for Incremental Medical Object Detection	May 31, 2025	Continual LearningMedical Object Detection	—Unverified	0
Revisiting LLMs as Zero-Shot Time-Series Forecasters: Small Noise Can Break Large Models	May 31, 2025	SensitivityTime Series	CodeCode Available	0
LoRA as a Flexible Framework for Securing Large Vision Systems	May 31, 2025	Autonomous Drivingparameter-efficient fine-tuning	—Unverified	0
Enhancing Multimodal Continual Instruction Tuning with BranchLoRA	May 31, 2025	Mixture-of-Experts	—Unverified	0
ARIA: Training Language Agents with Intention-Driven Reward Aggregation	May 31, 2025	Decision MakingReinforcement Learning (RL)	—Unverified	0
Goal-Aware Identification and Rectification of Misinformation in Multi-Agent Systems	May 31, 2025	Language ModelingLanguage Modelling	CodeCode Available	0
An application of machine learning to the motion response prediction of floating assets	May 31, 2025	Decision Making	—Unverified	0
CoQuIR: A Comprehensive Benchmark for Code Quality-Aware Information Retrieval	May 31, 2025	Code GenerationInformation Retrieval	—Unverified	0
Organizational Adaptation to Generative AI in Cybersecurity: A Systematic Review	May 31, 2025	Large Language Model	—Unverified	0
CineMA: A Foundation Model for Cine Cardiac MRI	May 31, 2025	Myocardium Segmentation	CodeCode Available	2
The Coupling Effect of Sensing Targets on the Environment for 3GPP ISAC Channels: Observation, Modeling, and Validation	May 31, 2025	BlockingIntegrated sensing and communication	—Unverified	0
Look mom, no experimental data! Learning to score protein-ligand interactions from simulations	May 31, 2025		CodeCode Available	1
ABCDEFGH: An Adaptation-Based Convolutional Neural Network-CycleGAN Disease-Courses Evolution Framework Using Generative Models in Health Education	May 31, 2025	Diagnostic	CodeCode Available	0
Latent Wavelet Diffusion: Enabling 4K Image Synthesis for Free	May 31, 2025	2k4k	—Unverified	0
MR2US-Pro: Prostate MR to Ultrasound Image Translation and Registration Based on Diffusion Models	May 31, 2025	3D ReconstructionAnatomy	—Unverified	0
Image Restoration Learning via Noisy Supervision in the Fourier Domain	May 31, 2025	Image Restoration	—Unverified	0
Joint Activity Detection and Channel Estimation for Massive Connectivity: Where Message Passing Meets Score-Based Generative Priors	May 31, 2025	Action DetectionActivity Detection	—Unverified	0
Integrated Sensing, Computing and Semantic Communication for Vehicular Networks	May 31, 2025	Autonomous VehiclesSemantic Communication	—Unverified	0
A Family of Robust Generalized Adaptive Filters and Application for Time-series Prediction	May 31, 2025	Time SeriesTime Series Prediction	—Unverified	0
Power-of-Two (PoT) Weights in Large Language Models (LLMs)	May 31, 2025	Quantization	—Unverified	0
Active Learning via Regression Beyond Realizability	May 31, 2025	Active Learningregression	—Unverified	0