The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 9901–9950 of 661570 papers

Title	Date	Tasks	Status	Hype
Aligning Modalities in Vision Large Language Models via Preference Fine-tuning	Feb 18, 2024	HallucinationInstruction Following	CodeCode Available	2
MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization	Feb 18, 2024	Code GenerationData Visualization	CodeCode Available	2
Continual Learning on Graphs: Challenges, Solutions, and Opportunities	Feb 18, 2024	Continual LearningGraph Learning	CodeCode Available	2
Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark	Feb 18, 2024	Benchmarking	CodeCode Available	2
Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning	Feb 18, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
MultiCorrupt: A Multi-Modal Robustness Dataset and Benchmark of LiDAR-Camera Fusion for 3D Object Detection	Feb 18, 2024	3D Object DetectionDataset Generation	CodeCode Available	2
3D Point Cloud Compression with Recurrent Neural Network and Image Compression Methods	Feb 18, 2024	Data CompressionImage Compression	CodeCode Available	2
Combinatorial Client-Master Multiagent Deep Reinforcement Learning for Task Offloading in Mobile Edge Computing	Feb 18, 2024	Deep Reinforcement LearningEdge-computing	CodeCode Available	2
Neighborhood-Enhanced Supervised Contrastive Learning for Collaborative Filtering	Feb 18, 2024	Collaborative FilteringContrastive Learning	CodeCode Available	2
Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents	Feb 17, 2024	Backdoor Attackbackdoor defense	CodeCode Available	2
CoLLaVO: Crayon Large Language and Vision mOdel	Feb 17, 2024	Large Language Modelmodel	CodeCode Available	2
Optimizing tiny colorless feedback delay networks	Feb 17, 2024		CodeCode Available	2
EEG2Rep: Enhancing Self-supervised EEG Representation Through Informative Masked Inputs	Feb 17, 2024	EEGEEG Signal Classification	CodeCode Available	2
PEDANTS: Cheap but Effective and Interpretable Answer Equivalence	Feb 17, 2024	BenchmarkingForm	CodeCode Available	2
Beyond Generalization: A Survey of Out-Of-Distribution Adaptation on Graphs	Feb 17, 2024		CodeCode Available	2
Centroid-Based Efficient Minimum Bayes Risk Decoding	Feb 17, 2024	de-enTranslation	CodeCode Available	2
An end-to-end attention-based approach for learning on graphs	Feb 16, 2024	Graph ClassificationGraph Regression	CodeCode Available	2
RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model	Feb 16, 2024	Autonomous DrivingDecision Making	CodeCode Available	2
Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs	Feb 16, 2024	Quantization	CodeCode Available	2
When is Tree Search Useful for LLM Planning? It Depends on the Discriminator	Feb 16, 2024	Mathematical ReasoningRe-Ranking	CodeCode Available	2
Large Language Models as Zero-shot Dialogue State Tracker through Function Calling	Feb 16, 2024	AvgDialogue State Tracking	CodeCode Available	2
Do Llamas Work in English? On the Latent Language of Multilingual Transformers	Feb 16, 2024		CodeCode Available	2
Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE)	Feb 16, 2024	Model Editing	CodeCode Available	2
ASGEA: Exploiting Logic Rules from Align-Subgraphs for Entity Alignment	Feb 16, 2024	Entity AlignmentGraph Neural Network	CodeCode Available	2
TimeSeriesBench: An Industrial-Grade Benchmark for Time Series Anomaly Detection Models	Feb 16, 2024	Anomaly DetectionTime Series	CodeCode Available	2
Distillation Enhanced Generative Retrieval	Feb 16, 2024	RetrievalText Retrieval	CodeCode Available	2
Incremental Sequence Labeling: A Tale of Two Shifts	Feb 16, 2024	Incremental LearningKnowledge Distillation	CodeCode Available	2
Linear Transformers with Learnable Kernel Functions are Better In-Context Models	Feb 16, 2024	In-Context LearningLanguage Modeling	CodeCode Available	2
OpenFMNav: Towards Open-Set Zero-Shot Object Navigation via Vision-Language Foundation Models	Feb 16, 2024	Common Sense ReasoningNavigate	CodeCode Available	2
Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation	Feb 16, 2024	Video Generation	CodeCode Available	2
Recovering the Pre-Fine-Tuning Weights of Generative Models	Feb 15, 2024	Pre-Fine-Tuning Weight Recovery	CodeCode Available	2
Chain-of-Thought Reasoning Without Prompting	Feb 15, 2024	Prompt Engineering	CodeCode Available	2
X-maps: Direct Depth Lookup for Event-based Structured Light Systems	Feb 15, 2024	Depth EstimationDisparity Estimation	CodeCode Available	2
AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator	Feb 15, 2024	BenchmarkingDiagnostic	CodeCode Available	2
PAL: Proxy-Guided Black-Box Attack on Large Language Models	Feb 15, 2024		CodeCode Available	2
SAMformer: Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention	Feb 15, 2024	Time SeriesTime Series Forecasting	CodeCode Available	2
ChemReasoner: Heuristic Search over a Large Language Model's Knowledge Space using Quantum-Chemical Feedback	Feb 15, 2024	Computational chemistryGraph Neural Network	CodeCode Available	2
BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains	Feb 15, 2024	Few-Shot LearningMedical Question Answering	CodeCode Available	2
Detecting CSV File Dialects by Table Uniformity Measurement and Data Type Inference	Feb 15, 2024	CSV dialect detection	CodeCode Available	2
DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization	Feb 15, 2024	Denoising	CodeCode Available	2
QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference	Feb 15, 2024	GPUQuantization	CodeCode Available	2
Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent	Feb 15, 2024	AllDecision Making	CodeCode Available	2
A StrongREJECT for Empty Jailbreaks	Feb 15, 2024	MMLU	CodeCode Available	2
MultiMedEval: A Benchmark and a Toolkit for Evaluating Medical Vision-Language Models	Feb 14, 2024	BenchmarkingDiversity	CodeCode Available	2
PC-NeRF: Parent-Child Neural Radiance Fields Using Sparse LiDAR Frames in Autonomous Driving Environments	Feb 14, 2024	3D Reconstruction3D Scene Reconstruction	CodeCode Available	2
LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale, Comprehensive, High-Quality Instruction Tuning Dataset	Feb 14, 2024	Drug Discovery	CodeCode Available	2
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference	Feb 14, 2024		CodeCode Available	2
Universal Machine Learning Kohn-Sham Hamiltonian for Materials	Feb 14, 2024		CodeCode Available	2
Personalized Large Language Models	Feb 14, 2024	Emotion RecognitionHate Speech Detection	CodeCode Available	2
Less is More: Fewer Interpretable Region via Submodular Subset Selection	Feb 14, 2024	Error UnderstandingImage Attribution	CodeCode Available	2