| MEETING DELEGATE: Benchmarking LLMs on Attending Meetings on Our Behalf | Feb 5, 2025 | BenchmarkingScheduling | —Unverified | 0 | 0 |
| Towards Explainability and Fairness in Swiss Judgement Prediction: Benchmarking on a Multilingual Dataset | Feb 26, 2024 | BenchmarkingCross-Lingual Transfer | —Unverified | 0 | 0 |
| MegaCOIN: Enhancing Medium-Grained Color Perception for Vision-Language Models | Dec 5, 2024 | BenchmarkingDomain Generalization | —Unverified | 0 | 0 |
| Benchmarking Large Language Model Capabilities for Conditional Generation | Jun 29, 2023 | BenchmarkingFew-Shot Learning | —Unverified | 0 | 0 |
| Benchmarking Language Models for Cyberbullying Identification and Classification from Social-media Texts | Jun 1, 2022 | BenchmarkingBinary Classification | —Unverified | 0 | 0 |
| MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks | Nov 13, 2023 | Benchmarking | —Unverified | 0 | 0 |
| MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP | Jun 4, 2025 | BenchmarkingLanguage Modelling | —Unverified | 0 | 0 |
| Benchmarking Lane-changing Decision-making for Deep Reinforcement Learning | Sep 22, 2021 | Autonomous DrivingBenchmarking | —Unverified | 0 | 0 |
| MeltpoolNet: Melt pool Characteristic Prediction in Metal Additive Manufacturing Using Machine Learning | Jan 26, 2022 | ArticlesBenchmarking | —Unverified | 0 | 0 |
| Benchmarking Knowledge-Enhanced Commonsense Question Answering via Knowledge-to-Text Transformation | Jan 4, 2021 | BenchmarkingQuestion Answering | —Unverified | 0 | 0 |
| MERGE -- A Bimodal Audio-Lyrics Dataset for Static Music Emotion Recognition | Jul 8, 2024 | BenchmarkingDeep Learning | —Unverified | 0 | 0 |
| Towards Explainable Network Intrusion Detection using Large Language Models | Aug 8, 2024 | BenchmarkingIntrusion Detection | —Unverified | 0 | 0 |
| Benchmarking KAZE and MCM for Multiclass Classification | May 20, 2015 | BenchmarkingClassification | —Unverified | 0 | 0 |
| What cleaves? Is proteasomal cleavage prediction reaching a ceiling? | Oct 24, 2022 | BenchmarkingDenoising | —Unverified | 0 | 0 |
| Benchmarking Joint Lexical and Syntactic Analysis on Multiword-Rich Data | Apr 1, 2017 | BenchmarkingDependency Parsing | —Unverified | 0 | 0 |
| Benchmarking Joint Face Spoofing and Forgery Detection with Visual and Physiological Cues | Aug 10, 2022 | BenchmarkingDeepFake Detection | —Unverified | 0 | 0 |
| Metaethical Perspectives on 'Benchmarking' AI Ethics | Apr 11, 2022 | BenchmarkingEthics | —Unverified | 0 | 0 |
| Towards Fair Machine Learning Software: Understanding and Addressing Model Bias Through Counterfactual Thinking | Feb 16, 2023 | Benchmarkingcounterfactual | —Unverified | 0 | 0 |
| Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction | Aug 29, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| A deep convolutional neural network model for rapid prediction of fluvial flood inundation | Jun 20, 2020 | BenchmarkingComputational Efficiency | —Unverified | 0 | 0 |
| Meta learning to classify intent and slot labels with noisy few shot examples | Nov 30, 2020 | Benchmarkingintent-classification | —Unverified | 0 | 0 |
| Benchmarking Invertible Architectures on Inverse Problems | Jan 26, 2021 | Benchmarking | —Unverified | 0 | 0 |
| Benchmarking inverse statistical approaches for protein structure and design with exactly solvable models | Nov 15, 2016 | Benchmarking | —Unverified | 0 | 0 |
| Metastatic Cancer Outcome Prediction with Injective Multiple Instance Pooling | Mar 9, 2022 | BenchmarkingManagement | —Unverified | 0 | 0 |
| Benchmarking in Optimization: Best Practice and Open Issues | Jul 7, 2020 | Benchmarking | —Unverified | 0 | 0 |
| Towards Graph Foundation Models: A Study on the Generalization of Positional and Structural Encodings | Dec 10, 2024 | BenchmarkingGraph Learning | —Unverified | 0 | 0 |
| Methods and open-source toolkit for analyzing and visualizing challenge results | Oct 11, 2019 | Benchmarking | —Unverified | 0 | 0 |
| Methods and Trends in Detecting Generated Images: A Comprehensive Review | Feb 21, 2025 | BenchmarkingDeepFake Detection | —Unverified | 0 | 0 |
| Metrics for Benchmarking and Uncertainty Quantification: Quality, Applicability, and a Path to Best Practices for Machine Learning in Chemistry | Sep 30, 2020 | BenchmarkingBIG-bench Machine Learning | —Unverified | 0 | 0 |
| Bench-Marking Information Extraction in Semi-Structured Historical Handwritten Records | Jul 17, 2018 | BenchmarkingHandwritten Text Recognition | —Unverified | 0 | 0 |
| Benchmarking Inference Performance of Deep Learning Models on Analog Devices | Nov 24, 2020 | BenchmarkingDeep Learning | —Unverified | 0 | 0 |
| MHQA: A Diverse, Knowledge Intensive Mental Health Question Answering Challenge for Language Models | Feb 21, 2025 | BenchmarkingDiagnostic | —Unverified | 0 | 0 |
| MHTS: Multi-Hop Tree Structure Framework for Generating Difficulty-Controllable QA Datasets for RAG Evaluation | Mar 29, 2025 | Answer GenerationBenchmarking | —Unverified | 0 | 0 |
| Benchmarking Individual Tree Mapping with Sub-meter Imagery | Nov 14, 2023 | BenchmarkingSegmentation | —Unverified | 0 | 0 |
| Microtask crowdsourcing for disease mention annotation in PubMed abstracts | Aug 8, 2014 | Benchmarking | —Unverified | 0 | 0 |
| Microvasculature Segmentation in Human BioMolecular Atlas Program (HuBMAP) | Aug 6, 2023 | BenchmarkingImage Segmentation | —Unverified | 0 | 0 |
| Benchmarking Image Transformers for Prostate Cancer Detection from Ultrasound Data | Mar 27, 2024 | BenchmarkingCancer Classification | —Unverified | 0 | 0 |
| Benchmarking Image Sensors Under Adverse Weather Conditions for Autonomous Driving | Dec 6, 2019 | Autonomous DrivingBenchmarking | —Unverified | 0 | 0 |
| MileBench: Benchmarking MLLMs in Long Context | Apr 29, 2024 | BenchmarkingDiagnostic | —Unverified | 0 | 0 |
| Addressing the Real-world Class Imbalance Problem in Dermatology | Oct 9, 2020 | BenchmarkingFew-Shot Learning | —Unverified | 0 | 0 |
| MiLQ: Benchmarking IR Models for Bilingual Web Search with Mixed Language Queries | May 22, 2025 | BenchmarkingInformation Retrieval | —Unverified | 0 | 0 |
| Benchmarking Image Embeddings for E-Commerce: Evaluating Off-the Shelf Foundation Models, Fine-Tuning Strategies and Practical Trade-offs | Apr 10, 2025 | BenchmarkingContrastive Learning | —Unverified | 0 | 0 |
| Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge | Jun 26, 2025 | Benchmarking | —Unverified | 0 | 0 |
| Are Machines Better at Complex Reasoning? Unveiling Human-Machine Inference Gaps in Entailment Verification | Feb 6, 2024 | BenchmarkingMultiple-choice | —Unverified | 0 | 0 |
| Benchmarking human visual search computational models in natural scenes: models comparison and reference datasets | Oct 12, 2021 | Benchmarking | —Unverified | 0 | 0 |
| Mind the Retrosynthesis Gap: Bridging the divide between Single-step and Multi-step Retrosynthesis Prediction | Dec 12, 2022 | BenchmarkingMulti-step retrosynthesis | —Unverified | 0 | 0 |
| What Does Neuro Mean to Cardio? Investigating the Role of Clinical Specialty Data in Medical LLMs | May 15, 2025 | AllBenchmarking | —Unverified | 0 | 0 |
| Mind Your Theory: Theory of Mind Goes Deeper Than Reasoning | Dec 18, 2024 | BenchmarkingPosition | —Unverified | 0 | 0 |
| Benchmarking Human Face Similarity Using Identical Twins | Aug 25, 2022 | Benchmarking | —Unverified | 0 | 0 |
| Towards Ideal Temporal Graph Neural Networks: Evaluations and Conclusions after 10,000 GPU Hours | Dec 28, 2024 | BenchmarkingGPU | —Unverified | 0 | 0 |