The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 17901–17950 of 474278 papers

Title	Date	Tasks	Status	Hype
Identifying and Understanding Cross-Class Features in Adversarial Training	Jun 5, 2025	Robust classification	CodeCode Available	0
From Play to Replay: Composed Video Retrieval for Temporally Fine-Grained Videos	Jun 5, 2025	Action ClassificationComposed Video Retrieval (CoVR)	CodeCode Available	0
A Unified Framework for Provably Efficient Algorithms to Estimate Shapley Values	Jun 5, 2025	Benchmarking	—Unverified	0
Tight analyses of first-order methods with error feedback	Jun 5, 2025		CodeCode Available	0
Inference economics of language models	Jun 5, 2025		CodeCode Available	0
User Altruism in Recommendation Systems	Jun 5, 2025	Recommendation Systems	CodeCode Available	0
Olfactory Inertial Odometry: Sensor Calibration and Drift Compensation	Jun 5, 2025	Navigate	—Unverified	0
When Thinking LLMs Lie: Unveiling the Strategic Deception in Representations of Reasoning Models	Jun 5, 2025	HallucinationMisinformation	—Unverified	0
FlashDMoE: Fast Distributed MoE in a Single Kernel	Jun 5, 2025	16kCPU	CodeCode Available	3
Progressive Tempering Sampler with Diffusion	Jun 5, 2025		CodeCode Available	1
OGGSplat: Open Gaussian Growing for Generalizable Reconstruction with Expanded Field-of-View	Jun 5, 2025	3D Reconstruction	CodeCode Available	1
EMBER2024 -- A Benchmark Dataset for Holistic Evaluation of Malware Classifiers	Jun 5, 2025	Malware AnalysisMalware Classification	CodeCode Available	2
Learning Monotonic Probabilities with a Generative Cost Model	Jun 4, 2025		CodeCode Available	0
Video, How Do Your Tokens Merge?	Jun 4, 2025		CodeCode Available	0
Pre^3: Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation	Jun 4, 2025		CodeCode Available	7
Debate, Reflect, and Distill: Multi-Agent Feedback with Tree-Structured Preference Optimization for Efficient Language Model Enhancement	Jun 4, 2025	Knowledge DistillationLanguage Modeling	—Unverified	0
Go-Browse: Training Web Agents with Structured Exploration	Jun 4, 2025	Efficient ExplorationLanguage Modeling	—Unverified	0
Prompt Candidates, then Distill: A Teacher-Student Framework for LLM-driven Data Annotation	Jun 4, 2025	Small Language Modeltext-classification	CodeCode Available	1
OSGNet @ Ego4D Episodic Memory Challenge 2025	Jun 4, 2025	Moment QueriesNatural Language Queries	CodeCode Available	1
Self-Composing Policies for Scalable Continual Reinforcement Learning	Jun 4, 2025	continuous-controlContinuous Control	—Unverified	0
RedDebate: Safer Responses through Multi-Agent Red Teaming Debates	Jun 4, 2025	Red Teaming	CodeCode Available	0
MANBench: Is Your Multimodal Model Smarter than Human?	Jun 4, 2025	model	CodeCode Available	0
Delta-KNN: Improving Demonstration Selection in In-Context Learning for Alzheimer's Disease Detection	Jun 4, 2025	Alzheimer's Disease DetectionIn-Context Learning	—Unverified	0
AI Agents for Conversational Patient Triage: Preliminary Simulation-Based Evaluation with Real-World EHR Data	Jun 4, 2025	AI Agent	—Unverified	0
Impact of Hill coefficient and time delay on a perceptual decision-making model	Jun 4, 2025	Decision Making	—Unverified	0
Fifteen Years of Child-Centered Long-Form Recordings: Promises, Resources, and Remaining Challenges to Validity	Jun 4, 2025	Form	—Unverified	0
Challenges in Automated Processing of Speech from Child Wearables: The Case of Voice Type Classifier	Jun 4, 2025	Blocking	—Unverified	0
Improving Child Speech Recognition and Reading Mistake Detection by Using Prompts	Jun 4, 2025	Mistake Detectionspeech-recognition	—Unverified	0
MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos	Jun 4, 2025	Multimodal Reasoning	—Unverified	0
Rex-Thinker: Grounded Object Referring via Chain-of-Thought Reasoning	Jun 4, 2025	ObjectReferring Expression	—Unverified	0
Long or short CoT? Investigating Instance-level Switch of Large Reasoning Models	Jun 4, 2025	Computational Efficiency	—Unverified	0
AD-EE: Early Exiting for Fast and Reliable Vision-Language Models in Autonomous Driving	Jun 4, 2025	Autonomous DrivingCausal Inference	—Unverified	0
SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models	Jun 4, 2025	FormText Generation	CodeCode Available	1
LeanExplore: A search engine for Lean 4 declarations	Jun 4, 2025	Automated Theorem Proving	CodeCode Available	2
Similarity-based fuzzy clustering scientific articles: potentials and challenges from mathematical and computational perspectives	Jun 4, 2025	ArticlesGPU	—Unverified	0
Frame-Level Real-Time Assessment of Stroke Rehabilitation Exercises from Video-Level Labeled Data: Task-Specific vs. Foundation Models	Jun 4, 2025	Pseudo Label	—Unverified	0
A Comprehensive Study on Medical Image Segmentation using Deep Neural Networks	Jun 4, 2025	EthicsExplainable artificial intelligence	—Unverified	0
MudiNet: Task-guided Disentangled Representation Learning for 5G Indoor Multipath-assisted Positioning	Jun 4, 2025	Representation LearningVariational Inference	—Unverified	0
SVD-Based Graph Fractional Fourier Transform on Directed Graphs and Its Application	Jun 4, 2025	Denoising	—Unverified	0
Spatiotemporal Prediction of Electric Vehicle Charging Load Based on Large Language Models	Jun 4, 2025	Scheduling	—Unverified	0
High-Speed Ultra-Energy-Efficient Memristor-Based Massive MIMO SIC Detector Circuit with Hybrid Analog-Digital Computing Architecture	Jun 4, 2025	GPU	—Unverified	0
Learning Fair And Effective Points-Based Rewards Programs	Jun 4, 2025	Fairness	—Unverified	0
A note on metapopulation models	Jun 4, 2025	Epidemiology	—Unverified	0
Generalized Lotka-Volterra systems with quenched random interactions and saturating functional response	Jun 4, 2025	Diversity	—Unverified	0
HiFiTTS-2: A Large-Scale High Bandwidth Speech Dataset	Jun 4, 2025	Speech Synthesistext-to-speech	—Unverified	0
Phi-Omni-ST: A multimodal language model for direct speech-to-speech translation	Jun 4, 2025	Language ModelingLanguage Modelling	—Unverified	0
The mutual exclusivity bias of bilingual visually grounded speech models	Jun 4, 2025		CodeCode Available	0
Latent Guided Sampling for Combinatorial Optimization	Jun 4, 2025	Combinatorial OptimizationDrug Discovery	CodeCode Available	0
Identification of RIS-Assisted Paths for Wireless Integrated Sensing and Communication	Jun 4, 2025	Integrated sensing and communication	—Unverified	0
ControlThinker: Unveiling Latent Semantics for Controllable Image Generation through Visual Reasoning	Jun 4, 2025	Image GenerationVisual Reasoning	CodeCode Available	0