The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 8201–8250 of 661570 papers

Title	Date	Tasks	Status	Hype
TorchSpatial: A Location Encoding Framework and Benchmark for Spatial Representation Learning	Jun 21, 2024	FairnessGeographic Question Answering	CodeCode Available	2
FIRST: Faster Improved Listwise Reranking with Single Token Decoding	Jun 21, 2024	Information RetrievalLanguage Modeling	CodeCode Available	2
RouteFinder: Towards Foundation Models for Vehicle Routing Problems	Jun 21, 2024	AttributeMulti-Task Learning	CodeCode Available	2
SelfReg-UNet: Self-Regularized UNet for Medical Image Segmentation	Jun 21, 2024	DecoderImage Segmentation	CodeCode Available	2
Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models	Jun 21, 2024	Spatial Reasoning	CodeCode Available	2
GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation	Jun 21, 2024	3D GenerationGPU	CodeCode Available	2
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression	Jun 21, 2024	GPULanguage Modeling	CodeCode Available	2
Unifying Unsupervised Graph-Level Anomaly Detection and Out-of-Distribution Detection: A Benchmark	Jun 21, 2024	Anomaly DetectionOut-of-Distribution Detection	CodeCode Available	2
LeYOLO, New Scalable and Efficient CNN Architecture for Object Detection	Jun 20, 2024	Computational EfficiencyObject	CodeCode Available	2
Evaluating RAG-Fusion with RAGElo: an Automated Elo-based Framework	Jun 20, 2024	HallucinationQuestion Answering	CodeCode Available	2
LLM-A*: Large Language Model Enhanced Incremental Heuristic Search on Path Planning	Jun 20, 2024	Autonomous NavigationHeuristic Search	CodeCode Available	2
CodeRAG-Bench: Can Retrieval Augment Code Generation?	Jun 20, 2024	Code GenerationRAG	CodeCode Available	2
Feature Fusion Based on Mutual-Cross-Attention Mechanism for EEG Emotion Recognition	Jun 20, 2024	DiagnosticEEG	CodeCode Available	2
CityNav: Language-Goal Aerial Navigation Dataset with Geographic Information	Jun 20, 2024	Vision and Language Navigation	CodeCode Available	2
EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms	Jun 20, 2024	Evolutionary Algorithms	CodeCode Available	2
HoTPP Benchmark: Are We Good at the Long Horizon Events Forecasting?	Jun 20, 2024	BenchmarkingPoint Processes	CodeCode Available	2
How far are today's time-series models from real-world weather forecasting applications?	Jun 20, 2024	BenchmarkingTime Series	CodeCode Available	2
MacroHFT: Memory Augmented Context-aware Reinforcement Learning On High Frequency Trading	Jun 20, 2024	Algorithmic TradingDecision Making	CodeCode Available	2
TAGLAS: An atlas of text-attributed graph datasets in the era of large graph and language models	Jun 20, 2024	Graph Question AnsweringNode Classification	CodeCode Available	2
Asynchronous Large Language Model Enhanced Planner for Autonomous Driving	Jun 20, 2024	Autonomous DrivingLanguage Modeling	CodeCode Available	2
Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study	Jun 20, 2024	In-Context LearningKnowledge Distillation	CodeCode Available	2
Adaptable Logical Control for Large Language Models	Jun 19, 2024	MathText Generation	CodeCode Available	2
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words	Jun 19, 2024	Dialogue Understanding	CodeCode Available	2
ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World	Jun 19, 2024	DiagnosticMultiple-choice	CodeCode Available	2
Rethinking Abdominal Organ Segmentation (RAOS) in the clinical scenario: A robustness evaluation benchmark with challenging cases	Jun 19, 2024	8kHallucination	CodeCode Available	2
GraphKAN: Enhancing Feature Extraction with Graph Kolmogorov Arnold Networks	Jun 19, 2024	Kolmogorov-Arnold Networks	CodeCode Available	2
A large-scale multicenter breast cancer DCE-MRI benchmark dataset with expert segmentations	Jun 19, 2024	Benchmarking	CodeCode Available	2
InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales	Jun 19, 2024	DenoisingIn-Context Learning	CodeCode Available	2
Encoder vs Decoder: Comparative Analysis of Encoder and Decoder Language Models on Multilingual NLU Tasks	Jun 19, 2024	DecoderLanguage Modeling	CodeCode Available	2
WATT: Weight Average Test-Time Adaptation of CLIP	Jun 19, 2024	image-classificationImage Classification	CodeCode Available	2
StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic Images	Jun 19, 2024	Object RecognitionScene Understanding	CodeCode Available	2
RNA-FrameFlow: Flow Matching for de novo 3D RNA Backbone Design	Jun 19, 2024	Diversity	CodeCode Available	2
Dissecting Adversarial Robustness of Multimodal LM Agents	Jun 18, 2024	Adversarial RobustnessAdversarial Text	CodeCode Available	2
Can Go AIs be adversarially robust?	Jun 18, 2024	Diversity	CodeCode Available	2
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving	Jun 18, 2024	Arithmetic ReasoningMath	CodeCode Available	2
Immiscible Diffusion: Accelerating Diffusion Training with Noise Assignment	Jun 18, 2024	Denoising	CodeCode Available	2
Universal Score-based Speech Enhancement with High Content Preservation	Jun 18, 2024	Speech Enhancement	CodeCode Available	2
Breaking the Ceiling of the LLM Community by Treating Token Generation as a Classification for Ensembling	Jun 18, 2024	Arithmetic ReasoningLanguage Modeling	CodeCode Available	2
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization	Jun 18, 2024	Landmark-based LipreadingLipreading	CodeCode Available	2
Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM	Jun 18, 2024	Anomaly DetectionAnomaly Localization	CodeCode Available	2
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI	Jun 18, 2024	Benchmarkingscientific discovery	CodeCode Available	2
AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention	Jun 18, 2024	ObjectResponse Generation	CodeCode Available	2
Automated MRI Quality Assessment of Brain T1-weighted MRI in Clinical Data Warehouses: A Transfer Learning Approach Relying on Artefact Simulation	Jun 18, 2024	Transfer Learning	CodeCode Available	2
GeoBench: Benchmarking and Analyzing Monocular Geometry Estimation Models	Jun 18, 2024	BenchmarkingDepth Estimation	CodeCode Available	2
Coding Speech through Vocal Tract Kinematics	Jun 18, 2024	Voice Conversion	CodeCode Available	2
AgentReview: Exploring Peer Review Dynamics with LLM Agents	Jun 18, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction	Jun 18, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
AEM: Attention Entropy Maximization for Multiple Instance Learning based Whole Slide Image Classification	Jun 18, 2024	Diversityimage-classification	CodeCode Available	2
ChangeViT: Unleashing Plain Vision Transformers for Change Detection	Jun 18, 2024	Change Detection	CodeCode Available	2
TroL: Traversal of Layers for Large Language and Vision Models	Jun 18, 2024	Visual Question Answering	CodeCode Available	2