The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 6226–6250 of 474278 papers

Title	Date	Tasks	Status	Hype
UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models	Feb 1, 2025	Math	CodeCode Available	2
MetaOpenFOAM 2.0: Large Language Model Driven Chain of Thought for Automating CFD Simulation and Post-Processing	Feb 1, 2025	Language ModelingLanguage Modelling	CodeCode Available	2
RaySplats: Ray Tracing based Gaussian Splatting	Jan 31, 2025	3DGS	CodeCode Available	2
TRADES: Generating Realistic Market Simulations with Diffusion Models	Jan 31, 2025	Denoising	CodeCode Available	2
mFollowIR: a Multilingual Benchmark for Instruction Following in Retrieval	Jan 31, 2025	Instruction FollowingRetrieval	CodeCode Available	2
Visual Autoregressive Modeling for Image Super-Resolution	Jan 31, 2025	Image Super-ResolutionQuantization	CodeCode Available	2
STP: Self-play LLM Theorem Provers with Iterative Conjecturing and Proving	Jan 31, 2025	Automated Theorem Proving	CodeCode Available	2
GestureLSM: Latent Shortcut based Co-Speech Gesture Generation with Spatial-Temporal Modeling	Jan 31, 2025	DenoisingGesture Generation	CodeCode Available	2
AIN: The Arabic INclusive Large Multimodal Model	Jan 31, 2025	document understandingmodel	CodeCode Available	2
An Adversarial Approach to Register Extreme Resolution Tissue Cleared 3D Brain Images	Jan 31, 2025	Image Registration	CodeCode Available	2
Advancing Dense Endoscopic Reconstruction with Gaussian Splatting-driven Surface Normal-aware Tracking and Mapping	Jan 31, 2025	3DGSNovel View Synthesis	CodeCode Available	2
Efficient Reasoning with Hidden Thinking	Jan 31, 2025	DecoderMultimodal Reasoning	CodeCode Available	2
Diverse Preference Optimization	Jan 30, 2025	Diversity	CodeCode Available	2
Free-T2M: Frequency Enhanced Text-to-Motion Diffusion Model With Consistency Loss	Jan 30, 2025	DenoisingMotion Generation	CodeCode Available	2
Track-On: Transformer-based Online Point Tracking with Memory	Jan 30, 2025	Point Tracking	CodeCode Available	2
GuardReasoner: Towards Reasoning-based LLM Safeguards	Jan 30, 2025		CodeCode Available	2
General Scene Adaptation for Vision-and-Language Navigation	Jan 29, 2025	DiversityVision and Language Navigation	CodeCode Available	2
Closing the Gap Between Synthetic and Ground Truth Time Series Distributions via Neural Mapping	Jan 29, 2025	Time SeriesTime Series Classification	CodeCode Available	2
MultiChallenge: A Realistic Multi-Turn Conversation Evaluation Benchmark Challenging to Frontier LLMs	Jan 29, 2025	AllInstruction Following	CodeCode Available	2
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate	Jan 29, 2025	Instruction FollowingMath	CodeCode Available	2
Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation	Jan 29, 2025	Red TeamingSafety Alignment	CodeCode Available	2
SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders	Jan 29, 2025	Adversarial AttackDenoising	CodeCode Available	2
SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language Model	Jan 28, 2025	BenchmarkingLanguage Modeling	CodeCode Available	2
AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders	Jan 28, 2025	Language ModelingLanguage Modelling	CodeCode Available	2
CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs	Jan 28, 2025	Hallucination	CodeCode Available	2