The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4551–4575 of 661570 papers

Title	Date	Status	Hype
Who Benchmarks the Benchmarks? A Case Study of LLM Evaluation in Icelandic	Mar 17, 2026	—Unverified	0
SparkVSR: Interactive Video Super-Resolution via Sparse Keyframe Propagation	Mar 17, 2026	—Unverified	3
Noise-Response Calibration: A Causal Intervention Protocol for LLM-Judges	Mar 17, 2026	—Unverified	0
Arabic Morphosyntactic Tagging and Dependency Parsing with Large Language Models	Mar 17, 2026	—Unverified	0
Cascade-Aware Multi-Agent Routing: Spatio-Temporal Sidecars and Geometry-Switching	Mar 17, 2026	—Unverified	0
ANTS: Adaptive Negative Textual Space Shaping for OOD Detection via Test-Time MLLM Understanding and Reasoning	Mar 17, 2026	CodeCode Available	0
A Practical Algorithm for Feature-Rich, Non-Stationary Bandit Problems	Mar 17, 2026	—Unverified	0
TurnWise: The Gap between Single- and Multi-turn Language Model Capabilities	Mar 17, 2026	—Unverified	0
Conservative Continuous-Time Treatment Optimization	Mar 17, 2026	—Unverified	0
Designing for Disagreement: Front-End Guardrails for Assistance Allocation in LLM-Enabled Robots	Mar 17, 2026	—Unverified	0
Tarab: A Multi-Dialect Corpus of Arabic Lyrics and Poetry	Mar 17, 2026	—Unverified	0
Evaluating Ill-Defined Tasks in Large Language Models	Mar 17, 2026	—Unverified	0
Why the Valuable Capabilities of LLMs Are Precisely the Unexplainable Ones	Mar 17, 2026	—Unverified	0
Controlling Fish Schools via Reinforcement Learning of Virtual Fish Movement	Mar 17, 2026	—Unverified	0
Ontological foundations for contrastive explanatory narration of robot plans	Mar 17, 2026	—Unverified	0
VQKV: High-Fidelity and High-Ratio Cache Compression via Vector-Quantization	Mar 17, 2026	—Unverified	0
TempCore: Are Video QA Benchmarks Temporally Grounded? A Frame Selection Sensitivity Analysis and Benchmark	Mar 17, 2026	—Unverified	0
Good Arguments Against the People Pleasers: How Reasoning Mitigates (Yet Masks) LLM Sycophancy	Mar 17, 2026	—Unverified	0
What DINO saw: ALiBi positional encoding reduces positional bias in Vision Transformers	Mar 17, 2026	—Unverified	0
BenchPreS: A Benchmark for Context-Aware Personalized Preference Selectivity of Persistent-Memory LLMs	Mar 17, 2026	—Unverified	0
From Passive to Persuasive: Localized Activation Injection for Empathy and Negotiation	Mar 17, 2026	—Unverified	0
LLM-Guided Reinforcement Learning for Audio-Visual Speech Enhancement	Mar 17, 2026	—Unverified	0
Scalable Feature Learning on Huge Knowledge Graphs for Downstream Machine Learning	Mar 17, 2026	—Unverified	0
When Machine Learning Gets Personal: Evaluating Prediction and Explanation	Mar 17, 2026	—Unverified	0
Feature Attribution in 5G Intrusion Detection: A Statistical vs. Logic-Based Comparison	Mar 17, 2026	—Unverified	0