The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 8201–8225 of 474278 papers

Title	Date	Tasks	Status	Hype
Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph	Jun 21, 2024	BenchmarkingText Generation	CodeCode Available	2
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression	Jun 21, 2024	GPULanguage Modeling	CodeCode Available	2
SelfReg-UNet: Self-Regularized UNet for Medical Image Segmentation	Jun 21, 2024	DecoderImage Segmentation	CodeCode Available	2
Unifying Unsupervised Graph-Level Anomaly Detection and Out-of-Distribution Detection: A Benchmark	Jun 21, 2024	Anomaly DetectionOut-of-Distribution Detection	CodeCode Available	2
Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models	Jun 21, 2024	Spatial Reasoning	CodeCode Available	2
GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation	Jun 21, 2024	3D GenerationGPU	CodeCode Available	2
FIRST: Faster Improved Listwise Reranking with Single Token Decoding	Jun 21, 2024	Information RetrievalLanguage Modeling	CodeCode Available	2
DExter: Learning and Controlling Performance Expression with Diffusion Models	Jun 21, 2024	Music Performance Rendering	CodeCode Available	2
EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms	Jun 20, 2024	Evolutionary Algorithms	CodeCode Available	2
MacroHFT: Memory Augmented Context-aware Reinforcement Learning On High Frequency Trading	Jun 20, 2024	Algorithmic TradingDecision Making	CodeCode Available	2
LLM-A*: Large Language Model Enhanced Incremental Heuristic Search on Path Planning	Jun 20, 2024	Autonomous NavigationHeuristic Search	CodeCode Available	2
CodeRAG-Bench: Can Retrieval Augment Code Generation?	Jun 20, 2024	Code GenerationRAG	CodeCode Available	2
Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study	Jun 20, 2024	In-Context LearningKnowledge Distillation	CodeCode Available	2
Evaluating RAG-Fusion with RAGElo: an Automated Elo-based Framework	Jun 20, 2024	HallucinationQuestion Answering	CodeCode Available	2
How far are today's time-series models from real-world weather forecasting applications?	Jun 20, 2024	BenchmarkingTime Series	CodeCode Available	2
HoTPP Benchmark: Are We Good at the Long Horizon Events Forecasting?	Jun 20, 2024	BenchmarkingPoint Processes	CodeCode Available	2
Asynchronous Large Language Model Enhanced Planner for Autonomous Driving	Jun 20, 2024	Autonomous DrivingLanguage Modeling	CodeCode Available	2
LeYOLO, New Scalable and Efficient CNN Architecture for Object Detection	Jun 20, 2024	Computational EfficiencyObject	CodeCode Available	2
TAGLAS: An atlas of text-attributed graph datasets in the era of large graph and language models	Jun 20, 2024	Graph Question AnsweringNode Classification	CodeCode Available	2
CityNav: Language-Goal Aerial Navigation Dataset with Geographic Information	Jun 20, 2024	Vision and Language Navigation	CodeCode Available	2
Feature Fusion Based on Mutual-Cross-Attention Mechanism for EEG Emotion Recognition	Jun 20, 2024	DiagnosticEEG	CodeCode Available	2
RNA-FrameFlow: Flow Matching for de novo 3D RNA Backbone Design	Jun 19, 2024	Diversity	CodeCode Available	2
InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales	Jun 19, 2024	DenoisingIn-Context Learning	CodeCode Available	2
WATT: Weight Average Test-Time Adaptation of CLIP	Jun 19, 2024	image-classificationImage Classification	CodeCode Available	2
Rethinking Abdominal Organ Segmentation (RAOS) in the clinical scenario: A robustness evaluation benchmark with challenging cases	Jun 19, 2024	8kHallucination	CodeCode Available	2