The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2951–3000 of 659983 papers

Title	Date	Status	Hype
TALO: Pushing 3D Vision Foundation Models Towards Globally Consistent Online Reconstruction	Mar 18, 2026	CodeCode Available	0
Meta-Reinforcement Learning with Self-Reflection for Agentic Search	Mar 18, 2026	CodeCode Available	0
Rationale Matters: Learning Transferable Rubrics via Proxy-Guided Critique for VLM Reward Models	Mar 18, 2026	CodeCode Available	0
Towards Motion-aware Referring Image Segmentation	Mar 18, 2026	CodeCode Available	0
UniSAFE: A Comprehensive Benchmark for Safety Evaluation of Unified Multimodal Models	Mar 18, 2026	CodeCode Available	0
Procedural Generation of Algorithm Discovery Tasks in Machine Learning	Mar 18, 2026	CodeCode Available	0
Fine-Grained Post-Training Quantization for Large Vision Language Models with Quantization-Aware Integrated Gradients	Mar 18, 2026	CodeCode Available	0
Self-Attention And Beyond the Infinite: Towards Linear Transformers with Infinite Self-Attention	Mar 18, 2026	CodeCode Available	0
Learning to See and Act: Task-Aware Virtual View Exploration for Robotic Manipulation	Mar 18, 2026	—Unverified	1
Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery	Mar 18, 2026	—Unverified	4
Complementary Reinforcement Learning	Mar 18, 2026	—Unverified	1
Stereo World Model: Camera-Guided Stereo Video Generation	Mar 18, 2026	—Unverified	1
Tree Search for LLM Agent Reinforcement Learning	Mar 18, 2026	—Unverified	3
Generative Refocusing: Flexible Defocus Control from a Single Image	Mar 18, 2026	—Unverified	3
Learning Goal-Oriented Vision-and-Language Navigation with Self-Improving Demonstrations at Scale	Mar 18, 2026	—Unverified	1
FoMo X: Modular Explainability Signals for Outlier Detection Foundation Models	Mar 18, 2026	—Unverified	0
Parameter-Efficient Modality-Balanced Symmetric Fusion for Multimodal Remote Sensing Semantic Segmentation	Mar 18, 2026	CodeCode Available	0
Between the Layers Lies the Truth: Uncertainty Estimation in LLMs Using Intra-Layer Local Information Scores	Mar 17, 2026	—Unverified	0
Scaling Attention via Feature Sparsity	Mar 17, 2026	—Unverified	0
Latent Semantic Manifolds in Large Language Models	Mar 17, 2026	—Unverified	0
Research on Individual Trait Clustering and Development Pathway Adaptation Based on the K-means Algorithm	Mar 17, 2026	—Unverified	0
Sample Transform Cost-Based Training-Free Hallucination Detector for Large Language Models	Mar 17, 2026	—Unverified	0
Mitigating Premature Discretization with Progressive Quantization for Robust Vector Tokenization	Mar 17, 2026	—Unverified	0
Mix-and-Match Pruning: Globally Guided Layer-Wise Sparsification of DNNs	Mar 17, 2026	—Unverified	0
Learning Communication Between Heterogeneous Agents in Multi-Agent Reinforcement Learning for Autonomous Cyber Defence	Mar 17, 2026	—Unverified	0
Efficient AI-Driven Multi-Section Whole Slide Image Analysis for Biochemical Recurrence Prediction in Prostate Cancer	Mar 17, 2026	—Unverified	0
Solomonoff induction	Mar 17, 2026	—Unverified	0
Understanding Pruning Regimes in Vision-Language Models Through Domain-Aware Layer Selection	Mar 17, 2026	—Unverified	0
Me, Myself, and π : Evaluating and Explaining LLM Introspection	Mar 17, 2026	—Unverified	0
OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis	Mar 17, 2026	—Unverified	0
A General Deep Learning Framework for Wireless Resource Allocation under Discrete Constraints	Mar 17, 2026	—Unverified	0
Prompt-tuning with Attribute Guidance for Low-resource Entity Matching	Mar 17, 2026	—Unverified	0
Target Concept Tuning Improves Extreme Weather Forecasting	Mar 17, 2026	—Unverified	0
An FPGA-Based SoC Architecture with a RISC-V Controller for Energy-Efficient Temporal-Coding Spiking Neural Networks	Mar 17, 2026	—Unverified	0
NANOZK: Layerwise Zero-Knowledge Proofs for Verifiable Large Language Model Inference	Mar 17, 2026	—Unverified	0
DEAF: A Benchmark for Diagnostic Evaluation of Acoustic Faithfulness in Audio Language Models	Mar 17, 2026	—Unverified	0
Auditing the Auditors: Does Community-based Moderation Get It Right?	Mar 17, 2026	—Unverified	0
MDM-Prime-v2: Binary Encoding and Index Shuffling Enable Compute-optimal Scaling of Diffusion Language Models	Mar 17, 2026	—Unverified	1
TCATSeg: A Tooth Center-Wise Attention Network for 3D Dental Model Semantic Segmentation	Mar 17, 2026	—Unverified	0
Beyond Accuracy: Evaluating Forecasting Models by Multi-Echelon Inventory Cost	Mar 17, 2026	—Unverified	0
Closed-Loop Action Chunks with Dynamic Corrections for Training-Free Diffusion Policy	Mar 17, 2026	CodeCode Available	0
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild	Mar 17, 2026	CodeCode Available	0
LLM NL2SQL Robustness: Surface Noise vs. Linguistic Variation in Traditional and Agentic Settings	Mar 17, 2026	—Unverified	0
Learning the Intrinsic Dimensionality of Fermi-Pasta-Ulam-Tsingou Trajectories: A Nonlinear Approach using a Deep Autoencoder Model	Mar 17, 2026	—Unverified	0
Learning through Creation: A Hash-Free Framework for On-the-Fly Category Discovery	Mar 17, 2026	CodeCode Available	0
Who Benchmarks the Benchmarks? A Case Study of LLM Evaluation in Icelandic	Mar 17, 2026	—Unverified	0
SparkVSR: Interactive Video Super-Resolution via Sparse Keyframe Propagation	Mar 17, 2026	—Unverified	3
Noise-Response Calibration: A Causal Intervention Protocol for LLM-Judges	Mar 17, 2026	—Unverified	0
Arabic Morphosyntactic Tagging and Dependency Parsing with Large Language Models	Mar 17, 2026	—Unverified	0
Cascade-Aware Multi-Agent Routing: Spatio-Temporal Sidecars and Geometry-Switching	Mar 17, 2026	—Unverified	0