The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 5301–5350 of 661570 papers

Title	Date	Status	Hype
Think Before You Lie: How Reasoning Leads to Honesty	Mar 16, 2026	—Unverified	0
When Tables Go Crazy: Evaluating Multimodal Models on French Financial Documents	Mar 16, 2026	—Unverified	0
SToRM: Supervised Token Reduction for Multi-modal LLMs toward efficient end-to-end autonomous driving	Mar 16, 2026	—Unverified	0
Probing and Bridging Geometry-Interaction Cues for Affordance Reasoning in Vision Foundation Models	Mar 16, 2026	—Unverified	0
Buffer Matters: Unleashing the Power of Off-Policy Reinforcement Learning in Large Language Model Reasoning	Mar 16, 2026	—Unverified	0
Under the Influence: Quantifying Persuasion and Vigilance in Large Language Models	Mar 16, 2026	—Unverified	0
Toward Personalized LLM-Powered Agents: Foundations, Evaluation, and Future Directions	Mar 16, 2026	—Unverified	0
TARAZ: Persian Short-Answer Question Benchmark for Cultural Evaluation of Language Models	Mar 16, 2026	—Unverified	0
BLINK: Behavioral Latent Modeling of NK Cell Cytotoxicity	Mar 16, 2026	—Unverified	0
Latent-Mark: An Audio Watermark Robust to Neural Resynthesis	Mar 16, 2026	—Unverified	0
On the Statistical Optimality of Optimal Decision Trees	Mar 16, 2026	—Unverified	0
Preserving Continuous Symmetry in Discrete Spaces: Geometric-Aware Quantization for SO(3)-Equivariant GNNs	Mar 16, 2026	—Unverified	0
A Multilingual Human Annotated Corpus of Original and Easy-to-Read Texts to Support Access to Democratic Participatory Processes	Mar 16, 2026	—Unverified	0
NerVE: Nonlinear Eigenspectrum Dynamics in LLM Feed-Forward Networks	Mar 16, 2026	—Unverified	0
Distributional Regression with Tabular Foundation Models: Evaluating Probabilistic Predictions via Proper Scoring Rules	Mar 16, 2026	—Unverified	0
Structural Causal Bottleneck Models	Mar 16, 2026	—Unverified	0
ForgeDreamer: Industrial Text-to-3D Generation with Multi-Expert LoRA and Cross-View Hypergraph	Mar 16, 2026	—Unverified	0
Open-World Motion Forecasting	Mar 16, 2026	—Unverified	0
GLM-OCR Technical Report	Mar 16, 2026	—Unverified	0
MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants	Mar 16, 2026	—Unverified	1
TOSSS: a CVE-based Software Security Benchmark for Large Language Models	Mar 16, 2026	—Unverified	0
Detecting Intrinsic and Instrumental Self-Preservation in Autonomous Agents: The Unified Continuation-Interest Protocol	Mar 16, 2026	—Unverified	0
Shape-of-You: Fused Gromov-Wasserstein Optimal Transport for Semantic Correspondence in-the-Wild	Mar 16, 2026	—Unverified	0
BackdoorIDS: Zero-shot Backdoor Detection for Pretrained Vision Encoder	Mar 16, 2026	—Unverified	0
Truth as a Compression Artifact in Language Model Training	Mar 16, 2026	—Unverified	0
Mitigating the Multiplicity Burden: The Role of Calibration in Reducing Predictive Multiplicity of Classifiers	Mar 16, 2026	—Unverified	0
Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models	Mar 16, 2026	—Unverified	0
The DIME Architecture: A Unified Operational Algorithm for Neural Representation, Dynamics, Control and Integration	Mar 16, 2026	—Unverified	0
Optimizing Task Completion Time Updates Using POMDPs	Mar 16, 2026	—Unverified	0
MetaKE: Meta-learning Aligned Knowledge Editing via Bi-level Optimization	Mar 16, 2026	—Unverified	0
The COTe score: A decomposable framework for evaluating Document Layout Analysis models	Mar 16, 2026	—Unverified	0
A Fractional Fox H-Function Kernel for Support Vector Machines: Robust Classification via Weighted Transmutation Operators	Mar 16, 2026	—Unverified	0
Exact Federated Continual Unlearning for Ridge Heads on Frozen Foundation Models	Mar 16, 2026	—Unverified	0
BadLLM-TG: A Backdoor Defender powered by LLM Trigger Generator	Mar 16, 2026	—Unverified	0
Beyond Final Answers: CRYSTAL Benchmark for Transparent Multimodal Reasoning Evaluation	Mar 16, 2026	—Unverified	0
Semantic Invariance in Agentic AI	Mar 16, 2026	—Unverified	0
MVHOI: Bridge Multi-view Condition to Complex Human-Object Interaction Video Reenactment via 3D Foundation Model	Mar 16, 2026	—Unverified	0
Robust Building Damage Detection in Cross-Disaster Settings Using Domain Adaptation	Mar 16, 2026	—Unverified	0
Scaling Autoregressive Models for Lattice Thermodynamics	Mar 16, 2026	—Unverified	0
AURORA-KITTI: Any-Weather Depth Completion and Denoising in the Wild	Mar 16, 2026	—Unverified	0
Beyond Local Code Optimization: Multi-Agent Reasoning for Software System Optimization	Mar 16, 2026	—Unverified	0
Towards Next-Generation LLM Training: From the Data-Centric Perspective	Mar 16, 2026	—Unverified	0
Training-Free Generation of Protein Sequences from Small Family Alignments via Stochastic Attention	Mar 16, 2026	—Unverified	0
Multimodal Deep Learning for Early Prediction of Patient Deterioration in the ICU: Integrating Time-Series EHR Data with Clinical Notes	Mar 16, 2026	—Unverified	0
GameUIAgent: An LLM-Powered Framework for Automated Game UI Design with Structured Intermediate Representation	Mar 16, 2026	—Unverified	0
Enhancing Hands in 3D Whole-Body Pose Estimation with Conditional Hands Modulator	Mar 16, 2026	—Unverified	0
Automated Diabetic Screening via Anterior Segment Ocular Imaging: A Deep Learning and Explainable AI Approach	Mar 16, 2026	—Unverified	0
A Skill-augmented Agentic Framework and Benchmark for Multi-Video Understanding	Mar 16, 2026	—Unverified	0
Gauge-Equivariant Intrinsic Neural Operators for Geometry-Consistent Learning of Elliptic PDE Maps	Mar 16, 2026	—Unverified	0
Efficient Event Camera Volume System	Mar 16, 2026	—Unverified	0