The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 18001–18050 of 474278 papers

Title	Date	Tasks	Status	Hype
CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference	Feb 6, 2025	Mixture-of-Experts	CodeCode Available	1
Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt Optimization	Feb 6, 2025		CodeCode Available	1
Active Task Disambiguation with LLMs	Feb 6, 2025	Experimental DesignQuestion Selection	CodeCode Available	1
UltraIF: Advancing Instruction Following from the Wild	Feb 6, 2025	Instruction Following	CodeCode Available	1
Robotouille: An Asynchronous Planning Benchmark for LLM Agents	Feb 6, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
HOG-Diff: Higher-Order Guided Diffusion for Graph Generation	Feb 6, 2025	Graph GenerationImage Generation	CodeCode Available	1
MedGNN: Towards Multi-resolution Spatiotemporal Graph Learning for Medical Time Series Classification	Feb 6, 2025	Electrocardiography (ECG)Graph Learning	CodeCode Available	1
Temporal Distribution Shift in Real-World Pharmaceutical Data: Implications for Uncertainty Quantification in QSAR Models	Feb 6, 2025	Drug DiscoveryUncertainty Quantification	CodeCode Available	1
Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions	Feb 6, 2025	Safety Alignment	CodeCode Available	1
Large Language Models for Multi-Robot Systems: A Survey	Feb 6, 2025	Action GenerationBenchmarking	CodeCode Available	1
Content-Rich AIGC Video Quality Assessment via Intricate Text Alignment and Motion-Aware Consistency	Feb 6, 2025	Video GenerationVideo Quality Assessment	CodeCode Available	1
Syntriever: How to Train Your Retriever with Synthetic Data from LLMs	Feb 6, 2025	Information Retrieval	CodeCode Available	1
Fine, I'll Merge It Myself: A Multi-Fidelity Framework for Automated Model Merging	Feb 6, 2025		CodeCode Available	1
ADIFF: Explaining audio difference using natural language	Feb 6, 2025	AudioCapsAudio captioning	CodeCode Available	1
AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference	Feb 6, 2025		CodeCode Available	1
MRAMG-Bench: A Comprehensive Benchmark for Advancing Multimodal Retrieval-Augmented Multimodal Generation	Feb 6, 2025	Answer Generationmultimodal generation	CodeCode Available	1
TorchResist: Open-Source Differentiable Resist Simulator	Feb 6, 2025		CodeCode Available	1
STURM-Flood: a curated dataset for deep learning-based flood extent mapping leveraging Sentinel-1 and Sentinel-2 imagery	Feb 6, 2025	Management	CodeCode Available	1
SyMANTIC: An Efficient Symbolic Regression Method for Interpretable and Parsimonious Model Discovery in Science and Beyond	Feb 5, 2025	feature selectionGPU	CodeCode Available	1
Understanding and Enhancing the Transferability of Jailbreaking Attacks	Feb 5, 2025	Intent RecognitionRed Teaming	CodeCode Available	1
Intent Representation Learning with Large Language Model for Recommendation	Feb 5, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
SimMark: A Robust Sentence-Level Similarity-Based Watermarking Algorithm for Large Language Models	Feb 5, 2025	SentenceSentence Embeddings	CodeCode Available	1
SpaceGNN: Multi-Space Graph Neural Network for Node Anomaly Detection with Extremely Limited Labels	Feb 5, 2025	Anomaly DetectionData Augmentation	CodeCode Available	1
Enhancing Reasoning to Adapt Large Language Models for Domain-Specific Applications	Feb 5, 2025	In-Context LearningLanguage Modeling	CodeCode Available	1
Do Large Language Model Benchmarks Test Reliability?	Feb 5, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
Fine-Tuning Strategies for Continual Online EEG Motor Imagery Decoding: Insights from a Large-Scale Longitudinal Study	Feb 5, 2025	DecoderDomain Adaptation	CodeCode Available	1
PICBench: Benchmarking LLMs for Photonic Integrated Circuits Design	Feb 5, 2025	BenchmarkingPrompt Engineering	CodeCode Available	1
A Mixture-Based Framework for Guiding Diffusion Models	Feb 5, 2025	Denoising	CodeCode Available	1
Kozax: Flexible and Scalable Genetic Programming in JAX	Feb 5, 2025	GPU	CodeCode Available	1
Gompertz Linear Units: Leveraging Asymmetry for Enhanced Learning Dynamics	Feb 5, 2025	image-classificationImage Classification	CodeCode Available	1
Discrete GCBF Proximal Policy Optimization for Multi-agent Safe Optimal Control	Feb 5, 2025		CodeCode Available	1
A Multi-Task Learning Approach to Linear Multivariate Forecasting	Feb 5, 2025	Multi-Task Learning	CodeCode Available	1
All-in-One Image Compression and Restoration	Feb 5, 2025	AllImage Compression	CodeCode Available	1
CLIP Behaves like a Bag-of-Words Model Cross-modally but not Uni-modally	Feb 5, 2025	Attributecross-modal alignment	CodeCode Available	1
Comprehensive Layer-wise Analysis of SSL Models for Audio Deepfake Detection	Feb 5, 2025	Audio Deepfake DetectionDeepFake Detection	CodeCode Available	1
Interactive Symbolic Regression through Offline Reinforcement Learning: A Co-Design Framework	Feb 5, 2025	Equation Discoveryregression	CodeCode Available	1
iVISPAR -- An Interactive Visual-Spatial Reasoning Benchmark for VLMs	Feb 5, 2025	Spatial Reasoning	CodeCode Available	1
SymmCD: Symmetry-Preserving Crystal Generation with Diffusion Models	Feb 5, 2025	valid	CodeCode Available	1
CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing	Feb 4, 2025	Collaborative InferenceLanguage Modeling	CodeCode Available	1
Mind the Gap: Evaluating Patch Embeddings from General-Purpose and Histopathology Foundation Models for Cell Segmentation and Classification	Feb 4, 2025	Cell SegmentationDecoder	CodeCode Available	1
SurvHive: a package to consistently access multiple survival-analysis packages	Feb 4, 2025	Survival Analysis	CodeCode Available	1
Transformers Boost the Performance of Decision Trees on Tabular Data across Sample Sizes	Feb 4, 2025	In-Context LearningNatural Language Understanding	CodeCode Available	1
DAMO: Data- and Model-aware Alignment of Multi-modal LLMs	Feb 4, 2025	Hallucination	CodeCode Available	1
Adaptive Self-improvement LLM Agentic System for ML Library Development	Feb 4, 2025		CodeCode Available	1
Hier-EgoPack: Hierarchical Egocentric Video Understanding with Diverse Task Perspectives	Feb 4, 2025	Video Understanding	CodeCode Available	1
Accurate Pocket Identification for Binding-Site-Agnostic Docking	Feb 4, 2025	Blind DockingDrug Design	CodeCode Available	1
T-SCEND: Test-time Scalable MCTS-enhanced Diffusion Model	Feb 4, 2025	Contrastive LearningDenoising	CodeCode Available	1
Unified Spatial-Temporal Edge-Enhanced Graph Networks for Pedestrian Trajectory Prediction	Feb 4, 2025	Pedestrian Trajectory PredictionTrajectory Prediction	CodeCode Available	1
SimBEV: A Synthetic Multi-Task Multi-Sensor Driving Data Generation Tool and Dataset	Feb 4, 2025	3D Object DetectionAutonomous Driving	CodeCode Available	1
EFKAN: A KAN-Integrated Neural Operator For Efficient Magnetotelluric Forward Modeling	Feb 4, 2025		CodeCode Available	1