The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 7701–7750 of 661570 papers

Title	Date	Status	Hype
Ill-Conditioning in Dictionary-Based Dynamic-Equation Learning: A Systems Biology Case Study	Mar 11, 2026	—Unverified	0
Jailbreak Scaling Laws for Large Language Models: Polynomial-Exponential Crossover	Mar 11, 2026	—Unverified	0
On the Computational Hardness of Transformers	Mar 11, 2026	—Unverified	0
LLM-Augmented Digital Twin for Policy Evaluation in Short-Video Platforms	Mar 11, 2026	—Unverified	0
RewardHackingAgents: Benchmarking Evaluation Integrity for LLM ML-Engineering Agents	Mar 11, 2026	—Unverified	0
DriveXQA: Cross-modal Visual Question Answering for Adverse Driving Scene Understanding	Mar 11, 2026	—Unverified	0
FinRule-Bench: A Benchmark for Joint Reasoning over Financial Tables and Principles	Mar 11, 2026	—Unverified	0
Evaluating Explainable AI Attribution Methods in Neural Machine Translation via Attention-Guided Knowledge Distillation	Mar 11, 2026	—Unverified	0
Spatially Robust Inference with Predicted and Missing at Random Labels	Mar 11, 2026	—Unverified	0
Learning to Assist: Physics-Grounded Human-Human Control via Multi-Agent Reinforcement Learning	Mar 11, 2026	—Unverified	0
Novelty Adaptation Through Hybrid Large Language Model (LLM)-Symbolic Planning and LLM-guided Reinforcement Learning	Mar 11, 2026	—Unverified	0
TimeSqueeze: Dynamic Patching for Efficient Time Series Forecasting	Mar 11, 2026	—Unverified	0
Ensuring Safety in Automated Mechanical Ventilation through Offline Reinforcement Learning and Digital Twin Verification	Mar 11, 2026	—Unverified	0
How do AI agents talk about science and research? An exploration of scientific discussions on Moltbook using BERTopic	Mar 11, 2026	—Unverified	0
Vision-Based Hand Shadowing for Robotic Manipulation via Inverse Kinematics	Mar 11, 2026	—Unverified	0
Beyond the Class Subspace: Teacher-Guided Training for Reliable Out-of-Distribution Detection in Single-Domain Models	Mar 11, 2026	—Unverified	0
Improving LLM Performance Through Black-Box Online Tuning: A Case for Adding System Specs to Factsheets for Trusted AI	Mar 11, 2026	—Unverified	0
Hierarchical Granularity Alignment and State Space Modeling for Robust Multimodal AU Detection in the Wild	Mar 11, 2026	—Unverified	0
The Unlearning Mirage: A Dynamic Framework for Evaluating LLM Unlearning	Mar 11, 2026	—Unverified	0
Towards Trustworthy Selective Generation: Reliability-Guided Diffusion for Ultra-Low-Field to High-Field MRI Synthesis	Mar 11, 2026	—Unverified	0
POrTAL: Plan-Orchestrated Tree Assembly for Lookahead	Mar 11, 2026	—Unverified	0
Busemann Functions in the Wasserstein Space: Existence, Closed-Forms, and Applications to Slicing	Mar 11, 2026	—Unverified	0
Temporal Text Classification with Large Language Models	Mar 11, 2026	—Unverified	0
abx_amr_simulator: A simulation environment for antibiotic prescribing policy optimization under antimicrobial resistance	Mar 11, 2026	—Unverified	0
Continued Pretraining for Low-Resource Swahili ASR: Achieving State-of-the-Art Performance with Minimal Labeled Data	Mar 11, 2026	—Unverified	0
On the Robustness of Langevin Dynamics to Score Function Error	Mar 11, 2026	—Unverified	0
TRACE: AI-Assisted Assessment of Collaborative Projects in Computer Science Education	Mar 11, 2026	—Unverified	0
Single molecule localization microscopy challenge: a biologically inspired benchmark for long-sequence modeling	Mar 11, 2026	—Unverified	0
Multilingual Financial Fraud Detection Using Machine Learning and Transformer Models: A Bangla-English Study	Mar 11, 2026	—Unverified	0
ThReadMed-QA: A Multi-Turn Medical Dialogue Benchmark from Real Patient Questions	Mar 11, 2026	—Unverified	0
Geodesic Semantic Search: Learning Local Riemannian Metrics for Citation Graph Retrieval	Mar 11, 2026	CodeCode Available	0
Synthetic Data Generation for Brain-Computer Interfaces: Overview, Benchmarking, and Future Directions	Mar 11, 2026	CodeCode Available	0
Multi-objective Genetic Programming with Multi-view Multi-level Feature for Enhanced Protein Secondary Structure Prediction	Mar 11, 2026	CodeCode Available	0
UNet-AF: An alias-free UNet for image restoration	Mar 11, 2026	CodeCode Available	0
ResearchGym: Evaluating Language Model Agents on Real-World AI Research	Mar 11, 2026	—Unverified	1
Jr. AI Scientist and Its Risk Report: Autonomous Scientific Exploration from a Baseline Paper	Mar 11, 2026	—Unverified	1
A Survey of Reasoning in Autonomous Driving Systems: Open Challenges and Emerging Paradigms	Mar 11, 2026	—Unverified	0
SwissGov-RSD: A Human-annotated, Cross-lingual Benchmark for Token-level Recognition of Semantic Differences Between Related Documents	Mar 11, 2026	—Unverified	0
Communication Enables Cooperation in LLM Agents: A Comparison with Curriculum-Based Approaches	Mar 11, 2026	—Unverified	0
Copula-ResLogit: A Deep-Copula Framework for Unobserved Confounding Effects	Mar 11, 2026	—Unverified	0
UrbanAlign: Post-hoc Semantic Calibration for VLM-Human Preference Alignment	Mar 11, 2026	—Unverified	0
Beyond Relevance: On the Relationship Between Retrieval and RAG Information Coverage	Mar 11, 2026	—Unverified	0
Evaluation of LLMs in retrieving food and nutritional context for RAG systems	Mar 11, 2026	—Unverified	0
Quantum entanglement provides a competitive advantage in adversarial games	Mar 11, 2026	—Unverified	0
A New Tensor Network: Tubal Tensor Train and Its Applications	Mar 11, 2026	—Unverified	0
MUNIChus: Multilingual News Image Captioning Benchmark	Mar 11, 2026	—Unverified	0
Revisiting Value Iteration: Unified Analysis of Discounted and Average-Reward Cases	Mar 11, 2026	—Unverified	0
Novel Architecture of RPA In Oral Cancer Lesion Detection	Mar 11, 2026	—Unverified	0
Interventional Time Series Priors for Causal Foundation Models	Mar 11, 2026	—Unverified	0
The Attack and Defense Landscape of Agentic AI: A Comprehensive Survey	Mar 11, 2026	—Unverified	0