The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 17601–17650 of 474278 papers

Title	Date	Tasks	Status	Hype
Co-MTP: A Cooperative Trajectory Prediction Framework with Multi-Temporal Fusion for Autonomous Driving	Feb 23, 2025	Autonomous DrivingPrediction	CodeCode Available	1
Cross-domain Few-shot Object Detection with Multi-modal Textual Enrichment	Feb 23, 2025	Cross-Domain Few-ShotCross-Domain Few-Shot Object Detection	CodeCode Available	1
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models	Feb 23, 2025	Code GenerationHumanEval	CodeCode Available	1
BiDeV: Bilateral Defusing Verification for Complex Claim Fact-Checking	Feb 22, 2025	Fact Checking	CodeCode Available	1
MolSpectra: Pre-training 3D Molecular Representation with Multi-modal Energy Spectra	Feb 22, 2025	molecular representation	CodeCode Available	1
CipherFace: A Fully Homomorphic Encryption-Driven Framework for Secure Cloud-Based Facial Recognition	Feb 22, 2025		CodeCode Available	1
TimePFN: Effective Multivariate Time Series Forecasting with Synthetic Data	Feb 22, 2025	Bayesian InferenceFew-Shot Learning	CodeCode Available	1
Linear Attention for Efficient Bidirectional Sequence Modeling	Feb 22, 2025	State Space Models	CodeCode Available	1
Mapping 1,000+ Language Models via the Log-Likelihood Vector	Feb 22, 2025	Text Generation	CodeCode Available	1
Understanding the Emergence of Multimodal Representation Alignment	Feb 22, 2025	Representation Learning	CodeCode Available	1
Int2Int: a framework for mathematics with transformers	Feb 22, 2025		CodeCode Available	1
Weakly Supervised Video Scene Graph Generation via Natural Language Supervision	Feb 21, 2025	Graph GenerationImage Captioning	CodeCode Available	1
CondiQuant: Condition Number Based Low-Bit Quantization for Image Super-Resolution	Feb 21, 2025	Image Super-ResolutionQuantization	CodeCode Available	1
TurboFuzzLLM: Turbocharging Mutation-based Fuzzing for Effectively Jailbreaking Large Language Models in Practice	Feb 21, 2025		CodeCode Available	1
CoT-ICL Lab: A Petri Dish for Studying Chain-of-Thought Learning from In-Context Demonstrations	Feb 21, 2025	DecoderDiversity	CodeCode Available	1
Lung-DDPM: Semantic Layout-guided Diffusion Models for Thoracic CT Image Synthesis	Feb 21, 2025	DenoisingImage Generation	CodeCode Available	1
KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse	Feb 21, 2025	Question Answering	CodeCode Available	1
Scaling Sparse and Dense Retrieval in Decoder-Only LLMs	Feb 21, 2025	DecoderKnowledge Distillation	CodeCode Available	1
M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image Quality Assessment	Feb 21, 2025	Image Quality Assessment	CodeCode Available	1
SiMHand: Mining Similar Hands for Large-Scale 3D Hand Pose Pre-training	Feb 21, 2025	3D Hand Pose EstimationContrastive Learning	CodeCode Available	1
Forgotten Polygons: Multimodal Large Language Models are Shape-Blind	Feb 21, 2025	MathMathematical Problem-Solving	CodeCode Available	1
Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-Probing	Feb 21, 2025		CodeCode Available	1
Almost AI, Almost Human: The Challenge of Detecting AI-Polished Writing	Feb 21, 2025	Text Generation	CodeCode Available	1
Fed-SB: A Silver Bullet for Extreme Communication Efficiency and Performance in (Private) Federated LoRA Fine-Tuning	Feb 21, 2025	Arithmetic Reasoning	CodeCode Available	1
Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs	Feb 21, 2025	Benchmarking	CodeCode Available	1
R-LoRA: Random Initialization of Multi-Head LoRA for Multi-Task Learning	Feb 21, 2025	Multi-Task Learningparameter-efficient fine-tuning	CodeCode Available	1
TabMixer: advancing tabular data analysis with an enhanced MLP-mixer approach	Feb 21, 2025	Computational EfficiencyDeep Learning	CodeCode Available	1
Scale-Free Graph-Language Models	Feb 21, 2025	Graph Generation	CodeCode Available	1
Leader-Follower Formation Tracking Control of Quadrotor UAVs Using Bearing Measurements	Feb 21, 2025	Collision Avoidance	CodeCode Available	1
Self-Taught Agentic Long Context Understanding	Feb 21, 2025	Long-Context Understanding	CodeCode Available	1
ARS: Automatic Routing Solver with Large Language Models	Feb 21, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
Modality-Aware Neuron Pruning for Unlearning in Multimodal Large Language Models	Feb 21, 2025		CodeCode Available	1
FormalSpecCpp: A Dataset of C++ Formal Specifications created using LLMs	Feb 21, 2025		CodeCode Available	1
Monocular Depth Estimation and Segmentation for Transparent Object with Iterative Semantic and Geometric Fusion	Feb 20, 2025	Depth EstimationMonocular Depth Estimation	CodeCode Available	1
LabTOP: A Unified Model for Lab Test Outcome Prediction on Electronic Health Records	Feb 20, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
How Far are LLMs from Being Our Digital Twins? A Benchmark for Persona-Based Behavior Chain Simulation	Feb 20, 2025	Decision Making	CodeCode Available	1
Reward-Guided Iterative Refinement in Diffusion Models at Test-Time with Applications to Protein and DNA Design	Feb 20, 2025	DenoisingEvolutionary Algorithms	CodeCode Available	1
A Stronger Mixture of Low-Rank Experts for Fine-Tuning Foundation Models	Feb 20, 2025	Domain Adaptation	CodeCode Available	1
Plan-over-Graph: Towards Parallelable LLM Agent Schedule	Feb 20, 2025	Task Planning	CodeCode Available	1
EAGER-LLM: Enhancing Large Language Models as Recommenders through Exogenous Behavior-Semantic Integration	Feb 20, 2025	DecoderRecommendation Systems	CodeCode Available	1
Adaptive Convolution for CNN-based Speech Enhancement Models	Feb 20, 2025	DecoderSpeech Enhancement	CodeCode Available	1
PEARL: Towards Permutation-Resilient LLMs	Feb 20, 2025	In-Context Learning	CodeCode Available	1
Time Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts	Feb 20, 2025		CodeCode Available	1
Towards Routing and Edge Computing in Satellite-Terrestrial Networks: A Column Generation Approach	Feb 20, 2025	Edge-computing	CodeCode Available	1
InductionBench: LLMs Fail in the Simplest Complexity Class	Feb 20, 2025	scientific discovery	CodeCode Available	1
FacaDiffy: Inpainting Unseen Facade Parts Using Diffusion Models	Feb 20, 2025		CodeCode Available	1
Does Time Have Its Place? Temporal Heads: Where Language Models Recall Time-specific Information	Feb 20, 2025	Question Answering	CodeCode Available	1
STeCa: Step-level Trajectory Calibration for LLM Agent Learning	Feb 20, 2025	Decision MakingLanguage Modeling	CodeCode Available	1
Generating π-Functional Molecules Using STGG+ with Active Learning	Feb 20, 2025	Active Learningreinforcement-learning	CodeCode Available	1
Dynamic Low-Rank Sparse Adaptation for Large Language Models	Feb 20, 2025	CPUGPU	CodeCode Available	1