The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 7051–7100 of 661570 papers

Title	Date	Status	Hype
SectEval: Evaluating the Latent Sectarian Preferences of Large Language Models	Mar 13, 2026	CodeCode Available	0
Hierarchical Reference Sets for Robust Unsupervised Detection of Scattered and Clustered Outliers	Mar 13, 2026	CodeCode Available	0
Thinking in Streaming Video	Mar 13, 2026	CodeCode Available	0
Multi-Agent Guided Policy Optimization	Mar 13, 2026	CodeCode Available	0
Prompt-R1: Collaborative Automatic Prompting Framework via End-to-end Reinforcement Learning	Mar 13, 2026	CodeCode Available	0
Referee: Reference-aware Audiovisual Deepfake Detection	Mar 13, 2026	CodeCode Available	0
Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation	Mar 13, 2026	—Unverified	2
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs	Mar 13, 2026	—Unverified	2
XSkill: Continual Learning from Experience and Skills in Multimodal Agents	Mar 13, 2026	—Unverified	2
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks	Mar 13, 2026	—Unverified	4
Finite Difference Flow Optimization for RL Post-Training of Text-to-Image Models	Mar 13, 2026	—Unverified	0
JigsawComm: Joint Semantic Feature Encoding and Transmission for Communication-Efficient Cooperative Perception	Mar 13, 2026	—Unverified	0
Prompt-Driven Lightweight Foundation Model for Instance Segmentation-Based Fault Detection in Freight Trains	Mar 13, 2026	CodeCode Available	0
Adaptive Vision-Language Model Routing for Computer Use Agents	Mar 13, 2026	CodeCode Available	0
Altered Thoughts, Altered Actions: Probing Chain-of-Thought Vulnerabilities in VLA Robotic Manipulation	Mar 13, 2026	—Unverified	0
Deep Learning Based Estimation of Blood Glucose Levels from Multidirectional Scleral Blood Vessel Imaging	Mar 13, 2026	—Unverified	0
MRGeo: Robust Cross-View Geo-Localization of Corrupted Images via Spatial and Channel Feature Enhancement	Mar 13, 2026	—Unverified	0
Pyramid MoA: A Probabilistic Framework for Cost-Optimized Anytime Inference	Mar 13, 2026	—Unverified	0
GLEAM: A Multimodal Imaging Dataset and HAMM for Glaucoma Classification	Mar 13, 2026	—Unverified	0
Contextual Graph Representations for Task-Driven 3D Perception and Planning	Mar 12, 2026	—Unverified	0
PLDR-LLMs Reason At Self-Organized Criticality	Mar 12, 2026	—Unverified	0
ProMAS: Proactive Error Forecasting for Multi-Agent Systems Using Markov Transition Dynamics	Mar 12, 2026	—Unverified	0
Agreement Between Large Language Models, Human Reviewers, and Authors in Evaluating STROBE Checklists for Observational Studies in Rheumatology	Mar 12, 2026	—Unverified	0
MST-Direct: Matching via Sinkhorn Transport for Multivariate Geostatistical Simulation with Complex Non-Linear Dependencies	Mar 12, 2026	—Unverified	0
Adapting Methods for Domain-Specific Japanese Small LMs: Scale, Architecture, and Quantization	Mar 12, 2026	—Unverified	0
Gaussian Process Regression-based Knowledge Distillation Framework for Simultaneous Prediction of Physical and Mechanical Properties of Epoxy Polymers	Mar 12, 2026	—Unverified	0
An Intent of Collaboration: On Agencies between Designers and Emerging (Intelligent) Technologies	Mar 12, 2026	—Unverified	0
Quantum-Secure-By-Construction (QSC): A Paradigm Shift For Post-Quantum Agentic Intelligence	Mar 12, 2026	—Unverified	0
A Dynamic Survey of Fuzzy, Intuitionistic Fuzzy, Neutrosophic, Plithogenic, and Extensional Sets	Mar 12, 2026	—Unverified	0
Compiled Memory: Not More Information, but More Precise Instructions for Language Agents	Mar 12, 2026	—Unverified	0
Hybrid Energy-Aware Reward Shaping: A Unified Lightweight Physics-Guided Methodology for Policy Optimization	Mar 12, 2026	—Unverified	0
OMNIA: Closing the Loop by Leveraging LLMs for Knowledge Graph Completion	Mar 12, 2026	—Unverified	0
Diabetic Retinopathy Grading with CLIP-based Ranking-Aware Adaptation:A Comparative Study on Fundus Image	Mar 12, 2026	—Unverified	0
Schema First Tool APIs for LLM Agents: A Controlled Study of Tool Misuse, Recovery, and Budgeted Performance	Mar 12, 2026	—Unverified	0
Accelerating Suffix Jailbreak attacks with Prefix-Shared KV-cache	Mar 12, 2026	—Unverified	0
Bridging the Visual-to-Physical Gap: Physically Aligned Representations for Fall Risk Analysis	Mar 12, 2026	—Unverified	0
WAT: Online Video Understanding Needs Watching Before Thinking	Mar 12, 2026	—Unverified	0
Neuro-Symbolic Generation and Validation of Memory-Aware Formal Function Specifications	Mar 12, 2026	—Unverified	0
Bridging Protocol and Production: Design Patterns for Deploying AI Agents with Model Context Protocol	Mar 12, 2026	—Unverified	0
GPrune-LLM: Generalization-Aware Structured Pruning for Large Language Models	Mar 12, 2026	—Unverified	0
Diffusion Models Generalize but Not in the Way You Might Think	Mar 12, 2026	—Unverified	0
Generalization and Memorization in Rectified Flow	Mar 12, 2026	—Unverified	0
Projection Guided Personalized Federated Learning for Low Dose CT Denoising	Mar 12, 2026	—Unverified	0
Distance-aware Soft Prompt Learning for Multimodal Valence-Arousal Estimation	Mar 12, 2026	—Unverified	0
Anchor Forcing: Anchor Memory and Tri-Region RoPE for Interactive Streaming Video Diffusion	Mar 12, 2026	CodeCode Available	0
Nuanced Emotion Recognition Based on a Segment-based MLLM Framework Leveraging Qwen3-Omni for AH Detection	Mar 12, 2026	CodeCode Available	0
CreativeBench: Benchmarking and Enhancing Machine Creativity via Self-Evolving Challenges	Mar 12, 2026	—Unverified	1
Distribution estimation via Flow Matching with Lipschitz guarantees	Mar 12, 2026	—Unverified	0
Impact of Markov Decision Process Design on Sim-to-Real Reinforcement Learning	Mar 12, 2026	—Unverified	0
MIMIC: Multimodal Inversion for Model Interpretation and Conceptualization	Mar 12, 2026	—Unverified	0