The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 9926–9950 of 474278 papers

Title	Date	Tasks	Status	Hype
Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE)	Feb 16, 2024	Model Editing	CodeCode Available	2
RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model	Feb 16, 2024	Autonomous DrivingDecision Making	CodeCode Available	2
TimeSeriesBench: An Industrial-Grade Benchmark for Time Series Anomaly Detection Models	Feb 16, 2024	Anomaly DetectionTime Series	CodeCode Available	2
Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation	Feb 16, 2024	Video Generation	CodeCode Available	2
Linear Transformers with Learnable Kernel Functions are Better In-Context Models	Feb 16, 2024	In-Context LearningLanguage Modeling	CodeCode Available	2
DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization	Feb 15, 2024	Denoising	CodeCode Available	2
Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent	Feb 15, 2024	AllDecision Making	CodeCode Available	2
SAMformer: Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention	Feb 15, 2024	Time SeriesTime Series Forecasting	CodeCode Available	2
AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator	Feb 15, 2024	BenchmarkingDiagnostic	CodeCode Available	2
X-maps: Direct Depth Lookup for Event-based Structured Light Systems	Feb 15, 2024	Depth EstimationDisparity Estimation	CodeCode Available	2
PAL: Proxy-Guided Black-Box Attack on Large Language Models	Feb 15, 2024		CodeCode Available	2
Detecting CSV File Dialects by Table Uniformity Measurement and Data Type Inference	Feb 15, 2024	CSV dialect detection	CodeCode Available	2
BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains	Feb 15, 2024	Few-Shot LearningMedical Question Answering	CodeCode Available	2
A StrongREJECT for Empty Jailbreaks	Feb 15, 2024	MMLU	CodeCode Available	2
Chain-of-Thought Reasoning Without Prompting	Feb 15, 2024	Prompt Engineering	CodeCode Available	2
ChemReasoner: Heuristic Search over a Large Language Model's Knowledge Space using Quantum-Chemical Feedback	Feb 15, 2024	Computational chemistryGraph Neural Network	CodeCode Available	2
Recovering the Pre-Fine-Tuning Weights of Generative Models	Feb 15, 2024	Pre-Fine-Tuning Weight Recovery	CodeCode Available	2
QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference	Feb 15, 2024	GPUQuantization	CodeCode Available	2
MultiMedEval: A Benchmark and a Toolkit for Evaluating Medical Vision-Language Models	Feb 14, 2024	BenchmarkingDiversity	CodeCode Available	2
LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale, Comprehensive, High-Quality Instruction Tuning Dataset	Feb 14, 2024	Drug Discovery	CodeCode Available	2
Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents	Feb 14, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
YOLOv8-AM: YOLOv8 Based on Effective Attention Mechanisms for Pediatric Wrist Fracture Detection	Feb 14, 2024	Fracture detectionmedical image detection	CodeCode Available	2
Less is More: Fewer Interpretable Region via Submodular Subset Selection	Feb 14, 2024	Error UnderstandingImage Attribution	CodeCode Available	2
PC-NeRF: Parent-Child Neural Radiance Fields Using Sparse LiDAR Frames in Autonomous Driving Environments	Feb 14, 2024	3D Reconstruction3D Scene Reconstruction	CodeCode Available	2
Extreme Video Compression with Pre-trained Diffusion Models	Feb 14, 2024	DecoderImage Compression	CodeCode Available	2