The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 5301–5325 of 661570 papers

Title	Date	Status	Hype
Think Before You Lie: How Reasoning Leads to Honesty	Mar 16, 2026	—Unverified	0
When Tables Go Crazy: Evaluating Multimodal Models on French Financial Documents	Mar 16, 2026	—Unverified	0
SToRM: Supervised Token Reduction for Multi-modal LLMs toward efficient end-to-end autonomous driving	Mar 16, 2026	—Unverified	0
Probing and Bridging Geometry-Interaction Cues for Affordance Reasoning in Vision Foundation Models	Mar 16, 2026	—Unverified	0
Buffer Matters: Unleashing the Power of Off-Policy Reinforcement Learning in Large Language Model Reasoning	Mar 16, 2026	—Unverified	0
Under the Influence: Quantifying Persuasion and Vigilance in Large Language Models	Mar 16, 2026	—Unverified	0
Toward Personalized LLM-Powered Agents: Foundations, Evaluation, and Future Directions	Mar 16, 2026	—Unverified	0
TARAZ: Persian Short-Answer Question Benchmark for Cultural Evaluation of Language Models	Mar 16, 2026	—Unverified	0
BLINK: Behavioral Latent Modeling of NK Cell Cytotoxicity	Mar 16, 2026	—Unverified	0
Latent-Mark: An Audio Watermark Robust to Neural Resynthesis	Mar 16, 2026	—Unverified	0
On the Statistical Optimality of Optimal Decision Trees	Mar 16, 2026	—Unverified	0
Preserving Continuous Symmetry in Discrete Spaces: Geometric-Aware Quantization for SO(3)-Equivariant GNNs	Mar 16, 2026	—Unverified	0
A Multilingual Human Annotated Corpus of Original and Easy-to-Read Texts to Support Access to Democratic Participatory Processes	Mar 16, 2026	—Unverified	0
NerVE: Nonlinear Eigenspectrum Dynamics in LLM Feed-Forward Networks	Mar 16, 2026	—Unverified	0
Distributional Regression with Tabular Foundation Models: Evaluating Probabilistic Predictions via Proper Scoring Rules	Mar 16, 2026	—Unverified	0
Structural Causal Bottleneck Models	Mar 16, 2026	—Unverified	0
ForgeDreamer: Industrial Text-to-3D Generation with Multi-Expert LoRA and Cross-View Hypergraph	Mar 16, 2026	—Unverified	0
Open-World Motion Forecasting	Mar 16, 2026	—Unverified	0
GLM-OCR Technical Report	Mar 16, 2026	—Unverified	0
MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants	Mar 16, 2026	—Unverified	1
TOSSS: a CVE-based Software Security Benchmark for Large Language Models	Mar 16, 2026	—Unverified	0
Detecting Intrinsic and Instrumental Self-Preservation in Autonomous Agents: The Unified Continuation-Interest Protocol	Mar 16, 2026	—Unverified	0
Shape-of-You: Fused Gromov-Wasserstein Optimal Transport for Semantic Correspondence in-the-Wild	Mar 16, 2026	—Unverified	0
BackdoorIDS: Zero-shot Backdoor Detection for Pretrained Vision Encoder	Mar 16, 2026	—Unverified	0
Truth as a Compression Artifact in Language Model Training	Mar 16, 2026	—Unverified	0