The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 6201–6250 of 661570 papers

Title	Date	Tasks	Status	Hype
Sparse Autoencoders for Hypothesis Generation	Feb 5, 2025		CodeCode Available	2
The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering	Feb 5, 2025	Hallucination	CodeCode Available	2
Seeing World Dynamics in a Nutshell	Feb 5, 2025	Video Reconstruction	CodeCode Available	2
CTR-Driven Advertising Image Generation with Multimodal Large Language Models	Feb 5, 2025	Image GenerationReinforcement Learning (RL)	CodeCode Available	2
Honegumi: An Interface for Accelerating the Adoption of Bayesian Optimization in the Experimental Sciences	Feb 4, 2025	Bayesian OptimizationExperimental Design	CodeCode Available	2
QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search	Feb 4, 2025		CodeCode Available	2
STAIR: Improving Safety Alignment with Introspective Reasoning	Feb 4, 2025	Safety Alignment	CodeCode Available	2
Reusing Embeddings: Reproducible Reward Model Research in Large Language Model Alignment without GPUs	Feb 4, 2025	Code GenerationLanguage Modeling	CodeCode Available	2
CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance	Feb 4, 2025	Code GenerationText Generation	CodeCode Available	2
On the Guidance of Flow Matching	Feb 4, 2025	Decision MakingImage Generation	CodeCode Available	2
Reviving The Classics: Active Reward Modeling in Large Language Model Alignment	Feb 4, 2025	Computational EfficiencyExperimental Design	CodeCode Available	2
Diff9D: Diffusion-Based Domain-Generalized Category-Level 9-DoF Object Pose Estimation	Feb 4, 2025	DenoisingDomain Generalization	CodeCode Available	2
Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization	Feb 3, 2025	model	CodeCode Available	2
Compressed Image Generation with Denoising Diffusion Codebook Models	Feb 3, 2025	Conditional Image GenerationDenoising	CodeCode Available	2
Efficient Diffusion Models: A Survey	Feb 3, 2025	Survey	CodeCode Available	2
Towards Robust and Generalizable Lensless Imaging with Modular Learned Reconstruction	Feb 3, 2025	Transfer Learning	CodeCode Available	2
Massive Values in Self-Attention Modules are the Key to Contextual Knowledge Understanding	Feb 3, 2025	Quantization	CodeCode Available	2
The Jumping Reasoning Curve? Tracking the Evolution of Reasoning Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles	Feb 3, 2025	ARCMultimodal Reasoning	CodeCode Available	2
LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer	Feb 3, 2025		CodeCode Available	2
Preference Leakage: A Contamination Problem in LLM-as-a-judge	Feb 3, 2025		CodeCode Available	2
When Do LLMs Help With Node Classification? A Comprehensive Analysis	Feb 2, 2025	Node Classification	CodeCode Available	2
LEAD: Large Foundation Model for EEG-Based Alzheimer's Disease Detection	Feb 2, 2025	Alzheimer's Disease DetectionEEG	CodeCode Available	2
FlexCloud: Direct, Modular Georeferencing and Drift-Correction of Point Cloud Maps	Feb 1, 2025	Autonomous Drivingmotion prediction	CodeCode Available	2
Segment Anything for Histopathology	Feb 1, 2025	Image SegmentationInstance Segmentation	CodeCode Available	2
MetaOpenFOAM 2.0: Large Language Model Driven Chain of Thought for Automating CFD Simulation and Post-Processing	Feb 1, 2025	Language ModelingLanguage Modelling	CodeCode Available	2
UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models	Feb 1, 2025	Math	CodeCode Available	2
PyMOLfold: Interactive Protein and Ligand Structure Prediction in PyMOL	Feb 1, 2025	PredictionProtein Folding	CodeCode Available	2
GestureLSM: Latent Shortcut based Co-Speech Gesture Generation with Spatial-Temporal Modeling	Jan 31, 2025	DenoisingGesture Generation	CodeCode Available	2
RaySplats: Ray Tracing based Gaussian Splatting	Jan 31, 2025	3DGS	CodeCode Available	2
Advancing Dense Endoscopic Reconstruction with Gaussian Splatting-driven Surface Normal-aware Tracking and Mapping	Jan 31, 2025	3DGSNovel View Synthesis	CodeCode Available	2
Efficient Reasoning with Hidden Thinking	Jan 31, 2025	DecoderMultimodal Reasoning	CodeCode Available	2
Visual Autoregressive Modeling for Image Super-Resolution	Jan 31, 2025	Image Super-ResolutionQuantization	CodeCode Available	2
mFollowIR: a Multilingual Benchmark for Instruction Following in Retrieval	Jan 31, 2025	Instruction FollowingRetrieval	CodeCode Available	2
An Adversarial Approach to Register Extreme Resolution Tissue Cleared 3D Brain Images	Jan 31, 2025	Image Registration	CodeCode Available	2
TRADES: Generating Realistic Market Simulations with Diffusion Models	Jan 31, 2025	Denoising	CodeCode Available	2
STP: Self-play LLM Theorem Provers with Iterative Conjecturing and Proving	Jan 31, 2025	Automated Theorem Proving	CodeCode Available	2
AIN: The Arabic INclusive Large Multimodal Model	Jan 31, 2025	document understandingmodel	CodeCode Available	2
Diverse Preference Optimization	Jan 30, 2025	Diversity	CodeCode Available	2
Track-On: Transformer-based Online Point Tracking with Memory	Jan 30, 2025	Point Tracking	CodeCode Available	2
Free-T2M: Frequency Enhanced Text-to-Motion Diffusion Model With Consistency Loss	Jan 30, 2025	DenoisingMotion Generation	CodeCode Available	2
GuardReasoner: Towards Reasoning-based LLM Safeguards	Jan 30, 2025		CodeCode Available	2
Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation	Jan 29, 2025	Red TeamingSafety Alignment	CodeCode Available	2
General Scene Adaptation for Vision-and-Language Navigation	Jan 29, 2025	DiversityVision and Language Navigation	CodeCode Available	2
Closing the Gap Between Synthetic and Ground Truth Time Series Distributions via Neural Mapping	Jan 29, 2025	Time SeriesTime Series Classification	CodeCode Available	2
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate	Jan 29, 2025	Instruction FollowingMath	CodeCode Available	2
SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders	Jan 29, 2025	Adversarial AttackDenoising	CodeCode Available	2
MultiChallenge: A Realistic Multi-Turn Conversation Evaluation Benchmark Challenging to Frontier LLMs	Jan 29, 2025	AllInstruction Following	CodeCode Available	2
CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs	Jan 28, 2025	Hallucination	CodeCode Available	2
AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders	Jan 28, 2025	Language ModelingLanguage Modelling	CodeCode Available	2
SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language Model	Jan 28, 2025	BenchmarkingLanguage Modeling	CodeCode Available	2