The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 15051–15100 of 474278 papers

Title	Date	Tasks	Status	Hype
Cache Me If You Can: How Many KVs Do You Need for Effective Long-Context LMs?	Jun 20, 2025	Book summarizationLong-Context Understanding	CodeCode Available	1
Generative Modeling of Full-Atom Protein Conformations using Latent Diffusion on Graph Embeddings	Jun 20, 2025	Graph Neural Network	CodeCode Available	1
A Large-Scale Real-World Evaluation of LLM-Based Virtual Teaching Assistant	Jun 20, 2025		CodeCode Available	1
UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation	Jun 20, 2025	Representation Learning	CodeCode Available	1
R3eVision: A Survey on Robust Rendering, Restoration, and Enhancement for 3D Low-Level Vision	Jun 19, 2025	3DGS3D Reconstruction	CodeCode Available	1
Large Language Models are Near-Optimal Decision-Makers with a Non-Human Learning Behavior	Jun 19, 2025	Decision Making	CodeCode Available	1
Adversarial Attacks and Detection in Visual Place Recognition for Safer Robot Navigation	Jun 19, 2025	Adversarial AttackRobot Navigation	CodeCode Available	1
DiffO: Single-step Diffusion for Image Compression at Ultra-Low Bitrates	Jun 19, 2025	DenoisingImage Compression	CodeCode Available	1
Dense 3D Displacement Estimation for Landslide Monitoring via Fusion of TLS Point Clouds and Embedded RGB Images	Jun 19, 2025	3D geometry	CodeCode Available	1
The Condition Number as a Scale-Invariant Proxy for Information Encoding in Neural Units	Jun 19, 2025	Large Language ModelMultimodal Large Language Model	CodeCode Available	1
Probe before You Talk: Towards Black-box Defense against Backdoor Unalignment for Large Language Models	Jun 19, 2025	Large Language ModelSafety Alignment	CodeCode Available	1
EndoMUST: Monocular Depth Estimation for Robotic Endoscopy via End-to-end Multi-step Self-supervised Training	Jun 19, 2025	Depth EstimationIntrinsic Image Decomposition	CodeCode Available	1
Probing the Robustness of Large Language Models Safety to Latent Perturbations	Jun 19, 2025	DiagnosticSafety Alignment	CodeCode Available	1
LMR-BENCH: Evaluating LLM Agent's Ability on Reproducing Language Modeling Research	Jun 19, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech Systems	Jun 19, 2025	BenchmarkingDescriptive	CodeCode Available	1
OJBench: A Competition Level Code Benchmark For Large Language Models	Jun 19, 2025	Math	CodeCode Available	1
On using AI for EEG-based BCI applications: problems, current challenges and future trends	Jun 19, 2025	EEG	CodeCode Available	1
StoryWriter: A Multi-Agent Framework for Long Story Generation	Jun 19, 2025	Story Generation	CodeCode Available	1
Diffusion-based Counterfactual Augmentation: Towards Robust and Interpretable Knee Osteoarthritis Grading	Jun 18, 2025	Clinical Knowledgecounterfactual	CodeCode Available	1
Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model	Jun 18, 2025	Image Generation	CodeCode Available	1
All is Not Lost: LLM Recovery without Checkpoints	Jun 18, 2025	AllScheduling	CodeCode Available	1
GRAM: A Generative Foundation Reward Model for Reward Generalization	Jun 17, 2025		CodeCode Available	1
Equivariance Everywhere All At Once: A Recipe for Graph Foundation Models	Jun 17, 2025	AllNode Classification	CodeCode Available	1
Refining music sample identification with a self-supervised graph neural network	Jun 17, 2025	Contrastive LearningGraph Neural Network	CodeCode Available	1
Sampling from Your Language Model One Byte at a Time	Jun 17, 2025	Code GenerationLanguage Modeling	CodeCode Available	1
TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization	Jun 17, 2025		CodeCode Available	1
A Variational Framework for Improving Naturalness in Generative Spoken Language Models	Jun 17, 2025		CodeCode Available	1
Optimizing Length Compression in Large Reasoning Models	Jun 17, 2025		CodeCode Available	1
Unsupervised Imaging Inverse Problems with Diffusion Distribution Matching	Jun 17, 2025	Blind Super-ResolutionDeblurring	CodeCode Available	1
Déjà Vu: Efficient Video-Language Query Engine with Learning-based Inter-Frame Computation Reuse	Jun 17, 2025		CodeCode Available	1
MOL: Joint Estimation of Micro-Expression, Optical Flow, and Landmark via Transformer-Graph-Style Convolution	Jun 17, 2025	Facial Landmark DetectionMicro Expression Recognition	CodeCode Available	1
GUI-Robust: A Comprehensive Dataset for Testing GUI Agent Robustness in Real-World Anomalies	Jun 17, 2025	Benchmarking	CodeCode Available	1
3DGS-IEval-15K: A Large-scale Image Quality Evaluation Database for 3D Gaussian-Splatting	Jun 17, 2025	3DGSImage Quality Assessment	CodeCode Available	1
AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents	Jun 17, 2025		CodeCode Available	1
Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team	Jun 17, 2025	Code GenerationGSM8K	CodeCode Available	1
RMIT-ADM+S at the SIGIR 2025 LiveRAG Challenge	Jun 17, 2025	Answer GenerationLanguage Modeling	CodeCode Available	1
SeqPE: Transformer with Sequential Position Encoding	Jun 16, 2025	image-classificationImage Classification	CodeCode Available	1
COME: Adding Scene-Centric Forecasting Control to Occupancy World Model	Jun 16, 2025	Autonomous DrivingRepresentation Learning	CodeCode Available	1
PeakWeather: MeteoSwiss Weather Station Measurements for Spatiotemporal Deep Learning	Jun 16, 2025	Deep LearningGraph structure learning	CodeCode Available	1
Self-Supervised Enhancement for Depth from a Lightweight ToF Sensor with Monocular Images	Jun 16, 2025	Depth EstimationSelf-Supervised Learning	CodeCode Available	1
TR2M: Transferring Monocular Relative Depth to Metric Depth with Language Descriptions and Scale-Oriented Contrast	Jun 16, 2025	Contrastive LearningDepth Estimation	CodeCode Available	1
The Price of Freedom: Exploring Expressivity and Runtime Tradeoffs in Equivariant Tensor Products	Jun 16, 2025	Benchmarking	CodeCode Available	1
Unlearning Isn't Invisible: Detecting Unlearning Traces in LLMs from Model Outputs	Jun 16, 2025	Machine Unlearning	CodeCode Available	1
Tady: A Neural Disassembler without Structural Constraint Violations	Jun 16, 2025		CodeCode Available	1
Verifying the Verifiers: Unveiling Pitfalls and Potentials in Fact Verifiers	Jun 16, 2025	Fact CheckingFact Verification	CodeCode Available	1
Steering LLM Thinking with Budget Guidance	Jun 16, 2025	Math	CodeCode Available	1
SimpleDoc: Multi-Modal Document Understanding with Dual-Cue Page Retrieval and Iterative Refinement	Jun 16, 2025	document understandingQuestion Answering	CodeCode Available	1
Rectifying Privacy and Efficacy Measurements in Machine Unlearning: A New Inference Attack Perspective	Jun 16, 2025	Inference AttackMachine Unlearning	CodeCode Available	1
RealHiTBench: A Comprehensive Realistic Hierarchical Table Benchmark for Evaluating LLM-Based Table Analysis	Jun 16, 2025		CodeCode Available	1
Curriculum Learning for Biological Sequence Prediction: The Case of De Novo Peptide Sequencing	Jun 16, 2025	de novo peptide sequencing	CodeCode Available	1