The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3451–3475 of 661570 papers

Title	Date	Tasks	Status	Hype
A Survey on Text-guided 3D Visual Grounding: Elements, Recent Advances, and Future Directions	Jun 9, 2024	3D visual groundingSurvey	CodeCode Available	3
TopoBench: A Framework for Benchmarking Topological Deep Learning	Jun 9, 2024	BenchmarkingDeep Learning	CodeCode Available	3
Probabilistic Weather Forecasting with Hierarchical Graph Neural Networks	Jun 7, 2024	graph constructionWeather Forecasting	CodeCode Available	3
VISTA3D: Versatile Imaging SegmenTation and Annotation model for 3D Computed Tomography	Jun 7, 2024	Computed Tomography (CT)Image Segmentation	CodeCode Available	3
FedLLM-Bench: Realistic Benchmarks for Federated Learning of Large Language Models	Jun 7, 2024	Federated Learning	CodeCode Available	3
CRAG -- Comprehensive RAG Benchmark	Jun 7, 2024	HallucinationLanguage Modelling	CodeCode Available	3
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild	Jun 7, 2024	BenchmarkingChatbot	CodeCode Available	3
GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents	Jun 7, 2024	Natural Language Understanding	CodeCode Available	3
Multi-Head RAG: Solving Multi-Aspect Problems with LLMs	Jun 7, 2024	BenchmarkingDecoder	CodeCode Available	3
Improving Alignment and Robustness with Circuit Breakers	Jun 6, 2024	Adversarial Robustness	CodeCode Available	3
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion	Jun 6, 2024	3D Generation	CodeCode Available	3
VideoTetris: Towards Compositional Text-to-Video Generation	Jun 6, 2024	DenoisingText-to-Video Generation	CodeCode Available	3
Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image	Jun 6, 2024	3D Scene ReconstructionDepth Estimation	CodeCode Available	3
Vision-LSTM: xLSTM as Generic Vision Backbone	Jun 6, 2024		CodeCode Available	3
Are We Done with MMLU?	Jun 6, 2024	MMLUVirology	CodeCode Available	3
MLVU: Benchmarking Multi-task Long Video Understanding	Jun 6, 2024	BenchmarkingVideo Understanding	CodeCode Available	3
Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization	Jun 6, 2024	DenoisingImage Generation	CodeCode Available	3
Docs2KG: Unified Knowledge Graph Construction from Heterogeneous Documents Assisted by Large Language Models	Jun 5, 2024	Data Integrationgraph construction	CodeCode Available	3
FusionBench: A Comprehensive Benchmark of Deep Model Fusion	Jun 5, 2024	image-classificationImage Classification	CodeCode Available	3
Computation-Efficient Era: A Comprehensive Survey of State Space Models in Medical Image Analysis	Jun 5, 2024	MambaMedical Image Analysis	CodeCode Available	3
Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation	Jun 4, 2024	2D Object Detection3D Instance Segmentation	CodeCode Available	3
Improved Modelling of Federated Datasets using Mixtures-of-Dirichlet-Multinomials	Jun 4, 2024	Federated Learning	CodeCode Available	3
FedMKT: Federated Mutual Knowledge Transfer for Large and Small Language Models	Jun 4, 2024	Text GenerationTransfer Learning	CodeCode Available	3
Description Boosting for Zero-Shot Entity and Relation Classification	Jun 4, 2024	RelationRelation Classification	CodeCode Available	3
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark	Jun 3, 2024	MMLUMulti-task Language Understanding	CodeCode Available	3