The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 7476–7500 of 474278 papers

Title	Date	Status
Too Good to be Bad: On the Failure of LLMs to Role-Play Villains	Nov 12, 2025	—Unverified
Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs	Nov 12, 2025	—Unverified
Capturing Polysemanticity with PRISM: A Multi-Concept Feature Description Framework	Nov 12, 2025	—Unverified
AgentFlux: Decoupled Fine-Tuning & Inference for On-Device Agentic Systems	Nov 12, 2025	—Unverified
SPEED-Q: Staged Processing with Enhanced Distillation towards Efficient Low-bit On-device VLM Quantization	Nov 12, 2025	CodeCode Available
Soiling detection for Advanced Driver Assistance Systems	Nov 12, 2025	CodeCode Available
Neural B-frame Video Compression with Bi-directional Reference Harmonization	Nov 12, 2025	CodeCode Available
Causally-Grounded Dual-Path Attention Intervention for Object Hallucination Mitigation in LVLMs	Nov 12, 2025	CodeCode Available
MM-CRITIC: A Holistic Evaluation of Large Multimodal Models as Multimodal Critique	Nov 12, 2025	CodeCode Available
Iterated Population Based Training with Task-Agnostic Restarts	Nov 12, 2025	CodeCode Available
Augment to Augment: Diverse Augmentations Enable Competitive Ultra-Low-Field MRI Enhancement	Nov 12, 2025	CodeCode Available
Learning by Neighbor-Aware Semantics, Deciding by Open-form Flows: Towards Robust Zero-Shot Skeleton Action Recognition	Nov 12, 2025	CodeCode Available
FactReasoner: A Probabilistic Approach to Long-Form Factuality Assessment for Large Language Models	Nov 12, 2025	CodeCode Available
Abn-BLIP: Abnormality-aligned Bootstrapping Language-Image Pre-training for Pulmonary Embolism Diagnosis and Report Generation from CTPA	Nov 12, 2025	CodeCode Available
EcomMMMU: Strategic Utilization of Visuals for Robust Multimodal E-commerce Models	Nov 12, 2025	CodeCode Available
PsychCounsel-Bench: Evaluating the Psychology Intelligence of Large Language Models	Nov 12, 2025	CodeCode Available
PALMS+: Modular Image-Based Floor Plan Localization Leveraging Depth Foundation Model	Nov 12, 2025	CodeCode Available
DI3CL: Contrastive Learning With Dynamic Instances and Contour Consistency for SAR Land-Cover Classification Foundation Model	Nov 12, 2025	CodeCode Available
DG-DETR: Toward Domain Generalized Detection Transformer	Nov 12, 2025	CodeCode Available
evMLP: An Efficient Event-Driven MLP Architecture for Vision	Nov 12, 2025	CodeCode Available
Trustworthy Pedestrian Trajectory Prediction via Pattern-Aware Interaction Modeling	Nov 12, 2025	CodeCode Available
Rethinking Pan-sharpening: A New Training Process for Full-Resolution Generalization	Nov 12, 2025	CodeCode Available
Mitigating Hallucinations in Large Language Models via Causal Reasoning	Nov 12, 2025	CodeCode Available
RadHARSimulator V2: Video to Doppler Generator	Nov 12, 2025	CodeCode Available
Diversifying Counterattacks: Orthogonal Exploration for Robust CLIP Inference	Nov 12, 2025	CodeCode Available