The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 15751–15800 of 474278 papers

Title	Date	Tasks	Status	Hype
Backdoor Cleaning without External Guidance in MLLM Fine-tuning	May 22, 2025		CodeCode Available	1
KTAE: A Model-Free Algorithm to Key-Tokens Advantage Estimation in Mathematical Reasoning	May 22, 2025	Mathematical Reasoningreinforcement-learning	CodeCode Available	1
Transformer Copilot: Learning from The Mistake Log in LLM Fine-tuning	May 22, 2025		CodeCode Available	1
A Comprehensive Evaluation of Contemporary ML-Based Solvers for Combinatorial Optimization	May 22, 2025	Combinatorial OptimizationLanguage Modeling	CodeCode Available	1
Let Androids Dream of Electric Sheep: A Human-like Image Implication Understanding and Reasoning Framework	May 22, 2025	Multiple-choiceVisual Question Answering (VQA)	CodeCode Available	1
MPO: Multilingual Safety Alignment via Reward Gap Optimization	May 22, 2025	Safety Alignment	CodeCode Available	1
CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark	May 22, 2025	GPUTranslation	CodeCode Available	1
Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning	May 22, 2025	FormQuestion Answering	CodeCode Available	1
From EduVisBench to EduVisAgent: A Benchmark and Multi-Agent Framework for Pedagogical Visualization	May 22, 2025	Visual Reasoning	CodeCode Available	1
UFT: Unifying Supervised and Reinforcement Fine-Tuning	May 22, 2025		CodeCode Available	1
RE-TRIP : Reflectivity Instance Augmented Triangle Descriptor for 3D Place Recognition	May 22, 2025	3D Place RecognitionInstance Segmentation	CodeCode Available	1
Learning Adaptive and Temporally Causal Video Tokenization in a 1D Latent Space	May 22, 2025	Video Reconstruction	CodeCode Available	1
Do Large Language Models Excel in Complex Logical Reasoning with Formal Language?	May 22, 2025	Logical Reasoning	CodeCode Available	1
ICYM2I: The illusion of multimodal informativeness under missingness	May 22, 2025	Informativeness	CodeCode Available	1
AdvReal: Adversarial Patch Generation Framework with Application to Adversarial Safety Evaluation of Object Detection Systems	May 22, 2025	Autonomous Vehiclesobject-detection	CodeCode Available	1
ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark	May 22, 2025	document understandingMultimodal Reasoning	CodeCode Available	1
V2V: Scaling Event-Based Vision through Efficient Video-to-Voxel Simulation	May 22, 2025	Event-based visionOptical Flow Estimation	CodeCode Available	1
Deliberation on Priors: Trustworthy Reasoning of Large Language Models on Knowledge Graphs	May 21, 2025	Knowledge DistillationKnowledge Graphs	CodeCode Available	1
The Atlas of In-Context Learning: How Attention Heads Shape In-Context Retrieval Augmentation	May 21, 2025	Answer GenerationIn-Context Learning	CodeCode Available	1
GUI-G1: Understanding R1-Zero-Like Training for Visual Grounding in GUI Agents	May 21, 2025	Answer GenerationReinforcement Learning (RL)	CodeCode Available	1
CineTechBench: A Benchmark for Cinematographic Technique Understanding and Generation	May 21, 2025	Video Generation	CodeCode Available	1
SAMA-UNet: Enhancing Medical Image Segmentation with Self-Adaptive Mamba-Like Attention and Causal-Resonance Learning	May 21, 2025	Image SegmentationMamba	CodeCode Available	1
From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning	May 21, 2025	Question AnsweringReinforcement Learning (RL)	CodeCode Available	1
Streamline Without Sacrifice -- Squeeze out Computation Redundancy in LMM	May 21, 2025	DecoderToken Reduction	CodeCode Available	1
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning	May 21, 2025	Math	CodeCode Available	1
Steering Generative Models with Experimental Data for Protein Fitness Optimization	May 21, 2025	Bayesian OptimizationThompson Sampling	CodeCode Available	1
Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!	May 21, 2025		CodeCode Available	1
NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning	May 21, 2025	General Reinforcement LearningLogical Reasoning	CodeCode Available	1
Angle Domain Guidance: Latent Diffusion Requires Rotation Rather Than Extrapolation	May 21, 2025	Image Generation	CodeCode Available	1
How Memory Management Impacts LLM Agents: An Empirical Study of Experience-Following Behavior	May 21, 2025	Large Language ModelManagement	CodeCode Available	1
ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World Challenges	May 21, 2025	Mathvalid	CodeCode Available	1
Sonnet: Spectral Operator Neural Network for Multivariable Time Series Forecasting	May 21, 2025	Time SeriesTime Series Forecasting	CodeCode Available	1
Continuous Representation Methods, Theories, and Applications: An Overview and Perspectives	May 21, 2025	Image RestorationNovel View Synthesis	CodeCode Available	1
Intentional Gesture: Deliver Your Intentions with Gestures for Speech	May 21, 2025	Gesture Generation	CodeCode Available	1
Learning to Reason via Mixture-of-Thought for Logical Reasoning	May 21, 2025	Logical ReasoningNatural Language Inference	CodeCode Available	1
The Devil is in Fine-tuning and Long-tailed Problems:A New Benchmark for Scene Text Detection	May 21, 2025	Scene Text DetectionSelf-Supervised Learning	CodeCode Available	1
X-GRM: Large Gaussian Reconstruction Model for Sparse-view X-rays to Computed Tomography	May 21, 2025	CT Reconstruction	CodeCode Available	1
PiFlow: Principle-aware Scientific Discovery with Multi-Agent Collaboration	May 21, 2025	Large Language Modelscientific discovery	CodeCode Available	1
Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey	May 21, 2025	FairnessSurvey	CodeCode Available	1
DeepKD: A Deeply Decoupled and Denoised Knowledge Distillation Trainer	May 21, 2025	DenoisingKnowledge Distillation	CodeCode Available	1
Pre-training Large Memory Language Models with Internal and External Knowledge	May 21, 2025	Memorization	CodeCode Available	1
ThinkRec: Thinking-based recommendation via LLM	May 21, 2025	Text Generation	CodeCode Available	1
Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models	May 21, 2025	Bayesian OptimizationSpeech Synthesis	CodeCode Available	1
Multimodal Conditional Information Bottleneck for Generalizable AI-Generated Image Detection	May 21, 2025		CodeCode Available	1
Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems through Benign Queries	May 21, 2025	RAGRetrieval-augmented Generation	CodeCode Available	1
HopWeaver: Synthesizing Authentic Multi-Hop Questions Across Text Corpora	May 21, 2025	Multi-hop Question AnsweringQuestion Answering	CodeCode Available	1
RLBenchNet: The Right Network for the Right Reinforcement Learning Task	May 21, 2025	continuous-controlContinuous Control	CodeCode Available	1
Training Step-Level Reasoning Verifiers with Formal Verification Tools	May 21, 2025	Formal LogicMath	CodeCode Available	1
UWSAM: Segment Anything Model Guided Underwater Instance Segmentation and A Large-scale Benchmark Dataset	May 21, 2025	Instance SegmentationKnowledge Distillation	CodeCode Available	1
Stronger ViTs With Octic Equivariance	May 21, 2025	Inductive BiasSelf-Supervised Learning	CodeCode Available	1