| Towards Graph Foundation Models: A Study on the Generalization of Positional and Structural Encodings | Dec 10, 2024 | BenchmarkingGraph Learning | —Unverified | 0 | 0 |
| Methods and open-source toolkit for analyzing and visualizing challenge results | Oct 11, 2019 | Benchmarking | —Unverified | 0 | 0 |
| Methods and Trends in Detecting Generated Images: A Comprehensive Review | Feb 21, 2025 | BenchmarkingDeepFake Detection | —Unverified | 0 | 0 |
| Metrics for Benchmarking and Uncertainty Quantification: Quality, Applicability, and a Path to Best Practices for Machine Learning in Chemistry | Sep 30, 2020 | BenchmarkingBIG-bench Machine Learning | —Unverified | 0 | 0 |
| Bench-Marking Information Extraction in Semi-Structured Historical Handwritten Records | Jul 17, 2018 | BenchmarkingHandwritten Text Recognition | —Unverified | 0 | 0 |
| Benchmarking Inference Performance of Deep Learning Models on Analog Devices | Nov 24, 2020 | BenchmarkingDeep Learning | —Unverified | 0 | 0 |
| MHQA: A Diverse, Knowledge Intensive Mental Health Question Answering Challenge for Language Models | Feb 21, 2025 | BenchmarkingDiagnostic | —Unverified | 0 | 0 |
| MHTS: Multi-Hop Tree Structure Framework for Generating Difficulty-Controllable QA Datasets for RAG Evaluation | Mar 29, 2025 | Answer GenerationBenchmarking | —Unverified | 0 | 0 |
| Benchmarking Individual Tree Mapping with Sub-meter Imagery | Nov 14, 2023 | BenchmarkingSegmentation | —Unverified | 0 | 0 |
| Microtask crowdsourcing for disease mention annotation in PubMed abstracts | Aug 8, 2014 | Benchmarking | —Unverified | 0 | 0 |
| Microvasculature Segmentation in Human BioMolecular Atlas Program (HuBMAP) | Aug 6, 2023 | BenchmarkingImage Segmentation | —Unverified | 0 | 0 |
| Benchmarking Image Transformers for Prostate Cancer Detection from Ultrasound Data | Mar 27, 2024 | BenchmarkingCancer Classification | —Unverified | 0 | 0 |
| Benchmarking Image Sensors Under Adverse Weather Conditions for Autonomous Driving | Dec 6, 2019 | Autonomous DrivingBenchmarking | —Unverified | 0 | 0 |
| MileBench: Benchmarking MLLMs in Long Context | Apr 29, 2024 | BenchmarkingDiagnostic | —Unverified | 0 | 0 |
| Addressing the Real-world Class Imbalance Problem in Dermatology | Oct 9, 2020 | BenchmarkingFew-Shot Learning | —Unverified | 0 | 0 |
| MiLQ: Benchmarking IR Models for Bilingual Web Search with Mixed Language Queries | May 22, 2025 | BenchmarkingInformation Retrieval | —Unverified | 0 | 0 |
| Benchmarking Image Embeddings for E-Commerce: Evaluating Off-the Shelf Foundation Models, Fine-Tuning Strategies and Practical Trade-offs | Apr 10, 2025 | BenchmarkingContrastive Learning | —Unverified | 0 | 0 |
| Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge | Jun 26, 2025 | Benchmarking | —Unverified | 0 | 0 |
| Are Machines Better at Complex Reasoning? Unveiling Human-Machine Inference Gaps in Entailment Verification | Feb 6, 2024 | BenchmarkingMultiple-choice | —Unverified | 0 | 0 |
| Benchmarking human visual search computational models in natural scenes: models comparison and reference datasets | Oct 12, 2021 | Benchmarking | —Unverified | 0 | 0 |
| Mind the Retrosynthesis Gap: Bridging the divide between Single-step and Multi-step Retrosynthesis Prediction | Dec 12, 2022 | BenchmarkingMulti-step retrosynthesis | —Unverified | 0 | 0 |
| What Does Neuro Mean to Cardio? Investigating the Role of Clinical Specialty Data in Medical LLMs | May 15, 2025 | AllBenchmarking | —Unverified | 0 | 0 |
| Mind Your Theory: Theory of Mind Goes Deeper Than Reasoning | Dec 18, 2024 | BenchmarkingPosition | —Unverified | 0 | 0 |
| Benchmarking Human Face Similarity Using Identical Twins | Aug 25, 2022 | Benchmarking | —Unverified | 0 | 0 |
| Towards Ideal Temporal Graph Neural Networks: Evaluations and Conclusions after 10,000 GPU Hours | Dec 28, 2024 | BenchmarkingGPU | —Unverified | 0 | 0 |