The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 17751–17800 of 474278 papers

Title	Date	Tasks	Status	Hype
ICPC-Eval: Probing the Frontiers of LLM Reasoning with Competitive Programming Contests	Jun 5, 2025	Code Generation	—Unverified	0
MesaNet: Sequence Modeling by Locally Optimal Test-Time Training	Jun 5, 2025	Language ModelingLanguage Modelling	—Unverified	0
MMRefine: Unveiling the Obstacles to Robust Refinement in Multimodal Large Language Models	Jun 5, 2025		CodeCode Available	0
Kernel k-Medoids as General Vector Quantization	Jun 5, 2025	Data CompressionDensity Estimation	—Unverified	0
DeePoly: A High-Order Accuracy Scientific Machine Learning Framework for Function Approximation and Solving PDEs	Jun 5, 2025	Computational Efficiency	CodeCode Available	1
Learning normalized image densities via dual score matching	Jun 5, 2025	Denoising	CodeCode Available	0
SeedEdit 3.0: Fast and High-Quality Generative Image Editing	Jun 5, 2025	Instruction Following	—Unverified	0
OpenGT: A Comprehensive Benchmark For Graph Transformers	Jun 5, 2025	Fairness	CodeCode Available	1
EMO-Debias: Benchmarking Gender Debiasing Techniques in Multi-Label Speech Emotion Recognition	Jun 5, 2025	BenchmarkingEmotion Recognition	—Unverified	0
Evaluation is All You Need: Strategic Overclaiming of LLM Reasoning Capabilities Through Evaluation Design	Jun 5, 2025	All	—Unverified	0
Mitigating Degree Bias Adaptively with Hard-to-Learn Nodes in Graph Contrastive Learning	Jun 5, 2025	Contrastive LearningNode Classification	—Unverified	0
Influence Functions for Edge Edits in Non-Convex Graph Neural Networks	Jun 5, 2025	Prediction	—Unverified	0
FlowDirector: Training-Free Flow Steering for Precise Text-to-Video Editing	Jun 5, 2025	Text-to-Video EditingVideo Editing	—Unverified	0
Are LLMs Reliable Translators of Logical Reasoning Across Lexically Diversified Contexts?	Jun 5, 2025	Formal LogicIn-Context Learning	CodeCode Available	0
Reliably detecting model failures in deployment without labels	Jun 5, 2025		CodeCode Available	0
Towards Vision-Language-Garment Models For Web Knowledge Garment Understanding and Generation	Jun 5, 2025	Zero-shot Generalization	—Unverified	0
Information Locality as an Inductive Bias for Neural Language Models	Jun 5, 2025	Inductive Bias	CodeCode Available	0
Please Translate Again: Two Simple Experiments on Whether Human-Like Reasoning Helps Translation	Jun 5, 2025	Translation	—Unverified	0
SSA-COMET: Do LLMs Outperform Learned Metrics in Evaluating MT for Under-Resourced African Languages?	Jun 5, 2025	Machine TranslationSentence	—Unverified	0
MuSciClaims: Multimodal Scientific Claim Verification	Jun 5, 2025	ArticlesClaim Verification	—Unverified	0
Revisiting Test-Time Scaling: A Survey and a Diversity-Aware Method for Efficient Reasoning	Jun 5, 2025	DiversityMathematical Reasoning	—Unverified	0
TaDA: Training-free recipe for Decoding with Adaptive KV Cache Compression and Mean-centering	Jun 5, 2025	Quantization	—Unverified	0
Evaluating Vision-Language and Large Language Models for Automated Student Assessment in Indonesian Classrooms	Jun 5, 2025	Multiple-choice	—Unverified	0
ConECT Dataset: Overcoming Data Scarcity in Context-Aware E-Commerce MT	Jun 5, 2025	Language ModelingLanguage Modelling	—Unverified	0
SCOP: Evaluating the Comprehension Process of Large Language Models from a Cognitive View	Jun 5, 2025	Reading Comprehension	—Unverified	0
Automatic Robustness Stress Testing of LLMs as Mathematical Problem Solvers	Jun 5, 2025	GSM8KMath	—Unverified	0
Does It Make Sense to Speak of Introspection in Large Language Models?	Jun 5, 2025	valid	—Unverified	0
RIVAL: Reinforcement Learning with Iterative and Adversarial Optimization for Machine Translation	Jun 5, 2025	Machine TranslationTranslation	—Unverified	0
IIITH-BUT system for IWSLT 2025 low-resource Bhojpuri to Hindi speech translation	Jun 5, 2025	Data AugmentationTranslation	—Unverified	0
Evaluating the Effectiveness of Linguistic Knowledge in Pretrained Language Models: A Case Study of Universal Dependencies	Jun 5, 2025	Paraphrase Identification	—Unverified	0
Improving Low-Resource Morphological Inflection via Self-Supervised Objectives	Jun 5, 2025	DecoderLanguage Modeling	—Unverified	0
Seeing the Invisible: Machine learning-Based QPI Kernel Extraction via Latent Alignment	Jun 5, 2025	Representation Learning	—Unverified	0
TIMING: Temporality-Aware Integrated Gradients for Time Series Explanation	Jun 5, 2025	Explainable artificial intelligenceExplainable Artificial Intelligence (XAI)	CodeCode Available	1
AliTok: Towards Sequence Modeling Alignment between Tokenizer and Autoregressive Model	Jun 5, 2025	DecoderImage Generation	CodeCode Available	2
Exploring Diffusion Transformer Designs via Grafting	Jun 5, 2025		CodeCode Available	2
ComfyUI-Copilot: An Intelligent Assistant for Automated Workflow Development	Jun 5, 2025	Large Language Model	CodeCode Available	7
A Smooth Sea Never Made a Skilled SAILOR: Robust Imitation via Learning to Search	Jun 5, 2025	Imitation Learning	CodeCode Available	2
Advancing Tool-Augmented Large Language Models via Meta-Verification and Reflection Learning	Jun 5, 2025	Imitation Learning	CodeCode Available	1
Diagonal Batching Unlocks Parallelism in Recurrent Memory Transformers for Long Contexts	Jun 5, 2025	GPUScheduling	CodeCode Available	1
Kinetics: Rethinking Test-Time Scaling Laws	Jun 5, 2025		CodeCode Available	2
LeanPO: Lean Preference Optimization for Likelihood Alignment in Video-LLMs	Jun 5, 2025		CodeCode Available	0
Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos	Jun 5, 2025	GPUSemantic Segmentation	CodeCode Available	2
TreeRPO: Tree Relative Policy Optimization	Jun 5, 2025	Math	CodeCode Available	0
iN2V: Bringing Transductive Node Embeddings to Inductive Graphs	Jun 5, 2025	Node ClassificationRepresentation Learning	CodeCode Available	0
Practical Manipulation Model for Robust Deepfake Detection	Jun 5, 2025	DeepFake DetectionFace Swapping	CodeCode Available	0
HoliSafe: Holistic Safety Benchmarking and Modeling with Safety Meta Token for Vision-Language Model	Jun 5, 2025	BenchmarkingLanguage Modeling	—Unverified	0
MineInsight: A Multi-sensor Dataset for Humanitarian Demining Robotics in Off-Road Environments	Jun 5, 2025	HumanitarianLandmine	CodeCode Available	1
Survey on the Evaluation of Generative Models in Music	Jun 5, 2025	Survey	—Unverified	0
Learning Beyond Experience: Generalizing to Unseen State Space with Reservoir Computing	Jun 5, 2025	Domain Generalization	CodeCode Available	0
MARS: Radio Map Super-resolution and Reconstruction Method under Sparse Channel Measurements	Jun 5, 2025	SSIMSuper-Resolution	—Unverified	0