The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 15301–15350 of 474278 papers

Title	Date	Tasks	Status	Hype
Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay	Jun 5, 2025	Reinforcement Learning (RL)	CodeCode Available	1
Time to Talk: LLM Agents for Asynchronous Group Communication in Mafia Games	Jun 5, 2025	Action GenerationAsynchronous Group Communication	CodeCode Available	1
FEAT: Full-Dimensional Efficient Attention Transformer for Medical Video Generation	Jun 5, 2025	DenoisingVideo Generation	CodeCode Available	1
Joint Evaluation of Answer and Reasoning Consistency for Hallucination Detection in Large Reasoning Models	Jun 5, 2025	DiagnosticHallucination	CodeCode Available	1
Advancing Tool-Augmented Large Language Models via Meta-Verification and Reflection Learning	Jun 5, 2025	Imitation Learning	CodeCode Available	1
Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual Simulations	Jun 5, 2025	4kSpatial Reasoning	CodeCode Available	1
OGGSplat: Open Gaussian Growing for Generalizable Reconstruction with Expanded Field-of-View	Jun 5, 2025	3D Reconstruction	CodeCode Available	1
Agentomics-ML: Autonomous Machine Learning Experimentation Agent for Genomic and Transcriptomic Data	Jun 5, 2025	Drug DiscoveryLarge Language Model	CodeCode Available	1
Diagonal Batching Unlocks Parallelism in Recurrent Memory Transformers for Long Contexts	Jun 5, 2025	GPUScheduling	CodeCode Available	1
Progressive Tempering Sampler with Diffusion	Jun 5, 2025		CodeCode Available	1
MineInsight: A Multi-sensor Dataset for Humanitarian Demining Robotics in Off-Road Environments	Jun 5, 2025	HumanitarianLandmine	CodeCode Available	1
macOSWorld: A Multilingual Interactive Benchmark for GUI Agents	Jun 4, 2025	BenchmarkingDomain Adaptation	CodeCode Available	1
Graph Counselor: Adaptive Graph Exploration via Multi-Agent Synergy to Enhance LLM Reasoning	Jun 4, 2025	Retrieval-augmented Generation	CodeCode Available	1
OSGNet @ Ego4D Episodic Memory Challenge 2025	Jun 4, 2025	Moment QueriesNatural Language Queries	CodeCode Available	1
SplArt: Articulation Estimation and Part-Level Reconstruction with 3D Gaussian Splatting	Jun 4, 2025	3DGS	CodeCode Available	1
El0ps: An Exact L0-regularized Problems Solver	Jun 4, 2025		CodeCode Available	1
TokAlign: Efficient Vocabulary Adaptation via Token Alignment	Jun 4, 2025	SentenceText Compression	CodeCode Available	1
OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis	Jun 4, 2025	Action GenerationDecision Making	CodeCode Available	1
Target Semantics Clustering via Text Representations for Robust Universal Domain Adaptation	Jun 4, 2025	Domain AdaptationUniversal Domain Adaptation	CodeCode Available	1
VLMs Can Aggregate Scattered Training Patches	Jun 4, 2025	Data Poisoning	CodeCode Available	1
A Generic Branch-and-Bound Algorithm for _0-Penalized Problems with Supplementary Material	Jun 4, 2025		CodeCode Available	1
Prompt Candidates, then Distill: A Teacher-Student Framework for LLM-driven Data Annotation	Jun 4, 2025	Small Language Modeltext-classification	CodeCode Available	1
Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image Models	Jun 4, 2025	Math	CodeCode Available	1
TracLLM: A Generic Framework for Attributing Long Context LLMs	Jun 4, 2025	DenoisingRAG	CodeCode Available	1
RewardAnything: Generalizable Principle-Following Reward Models	Jun 4, 2025	Instruction FollowingLarge Language Model	CodeCode Available	1
SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models	Jun 4, 2025	FormText Generation	CodeCode Available	1
Even Faster Hyperbolic Random Forests: A Beltrami-Klein Wrapper Approach	Jun 4, 2025		CodeCode Available	1
AdaDecode: Accelerating LLM Decoding with Adaptive Layer Parallelism	Jun 4, 2025		CodeCode Available	1
LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation	Jun 4, 2025	Multiple-choice	CodeCode Available	1
POSS: Position Specialist Generates Better Draft for Speculative Decoding	Jun 4, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
Diffusion Domain Teacher: Diffusion Guided Domain Adaptive Object Detector	Jun 4, 2025	Domain Adaptationobject-detection	CodeCode Available	1
ComRoPE: Scalable and Robust Rotary Position Embedding Parameterized by Trainable Commuting Angle Matrices	Jun 4, 2025	Position	CodeCode Available	1
Zero-Shot Temporal Interaction Localization for Egocentric Videos	Jun 4, 2025	Action LocalizationHuman-Object Interaction Detection	CodeCode Available	1
ByteMorph: Benchmarking Instruction-Guided Image Editing with Non-Rigid Motions	Jun 3, 2025	BenchmarkingDiversity	CodeCode Available	1
Rethinking Machine Unlearning in Image Generation Models	Jun 3, 2025	BenchmarkingImage Generation	CodeCode Available	1
Speaker Diarization with Overlapping Community Detection Using Graph Attention Networks and Label Propagation Algorithm	Jun 3, 2025	Action DetectionActivity Detection	CodeCode Available	1
FlySearch: Exploring how vision-language models explore	Jun 3, 2025	HallucinationTask Planning	CodeCode Available	1
Zero-Shot Tree Detection and Segmentation from Aerial Forest Imagery	Jun 3, 2025	Image SegmentationSegmentation	CodeCode Available	1
UniSite: The First Cross-Structure Dataset and Learning Framework for End-to-End Ligand Binding Site Detection	Jun 3, 2025	Drug DesignPrediction	CodeCode Available	1
SViMo: Synchronized Diffusion for Video and Motion Generation in Hand-object Interaction Scenarios	Jun 3, 2025	Motion GenerationVideo Generation	CodeCode Available	1
ThinkTank: A Framework for Generalizing Domain-Specific AI Agent Systems into Universal Collaborative Intelligence Platforms	Jun 3, 2025	AI AgentRetrieval-augmented Generation	CodeCode Available	1
NetPress: Dynamically Generated LLM Benchmarks for Network Applications	Jun 3, 2025	Benchmarking	CodeCode Available	1
Adversarial Attacks on Robotic Vision Language Action Models	Jun 3, 2025	Vision-Language-Action	CodeCode Available	1
GeneA-SLAM2: Dynamic SLAM with AutoEncoder-Preprocessed Genetic Keypoints Resampling and Depth Variance-Guided Dynamic Region Removal	Jun 3, 2025	object-detectionObject Detection	CodeCode Available	1
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation	Jun 3, 2025	Question Answering	CodeCode Available	1
PhysGaia: A Physics-Aware Dataset of Multi-Body Interactions for Dynamic Novel View Synthesis	Jun 3, 2025	Novel View SynthesisScene Understanding	CodeCode Available	1
Dense Match Summarization for Faster Two-view Estimation	Jun 3, 2025		CodeCode Available	1
Simple, Good, Fast: Self-Supervised World Models Free of Baggage	Jun 3, 2025	Data AugmentationRepresentation Learning	CodeCode Available	1
Adaptive Differential Denoising for Respiratory Sounds Classification	Jun 3, 2025	Audio ClassificationClassification	CodeCode Available	1
Cell-o1: Training LLMs to Solve Single-Cell Reasoning Puzzles with Reinforcement Learning	Jun 3, 2025		CodeCode Available	1