| From Knowledge to Reasoning: Evaluating LLMs for Ionic Liquids Research in Chemical and Biological Engineering | May 11, 2025 | BenchmarkingGeneral Knowledge | CodeCode Available | 0 | 5 |
| Dermatological Diagnosis Explainability Benchmark for Convolutional Neural Networks | Feb 23, 2023 | BenchmarkingMedical Diagnosis | CodeCode Available | 0 | 5 |
| Benchmarking Human and Automated Prompting in the Segment Anything Model | Oct 29, 2024 | BenchmarkingImage Segmentation | CodeCode Available | 0 | 5 |
| Depth Functions for Partial Orders with a Descriptive Analysis of Machine Learning Algorithms | Apr 19, 2023 | BenchmarkingDescriptive | CodeCode Available | 0 | 5 |
| Benchmarking histopathology foundation models in a multi-center dataset for skin cancer subtyping | Jun 23, 2025 | BenchmarkingDiversity | CodeCode Available | 0 | 5 |
| From MNIST to ImageNet and Back: Benchmarking Continual Curriculum Learning | Mar 16, 2023 | BenchmarkingContinual Learning | CodeCode Available | 0 | 5 |
| Fully Automatic Segmentation of Gross Target Volume and Organs-at-Risk for Radiotherapy Planning of Nasopharyngeal Carcinoma | Oct 4, 2023 | BenchmarkingSegmentation | CodeCode Available | 0 | 5 |
| Benchmarking HillVallEA for the GECCO 2019 Competition on Multimodal Optimization | Jul 25, 2019 | Benchmarking | CodeCode Available | 0 | 5 |
| Delving into Instance-Dependent Label Noise in Graph Data: A Comprehensive Study and Benchmark | Jun 14, 2025 | BenchmarkingGraph Learning | CodeCode Available | 0 | 5 |
| Benchmarking Hierarchical Script Knowledge | Jun 1, 2019 | Benchmarking | CodeCode Available | 0 | 5 |
| FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question Answering | May 27, 2025 | BenchmarkingQuestion Answering | CodeCode Available | 0 | 5 |
| Delta-Influence: Unlearning Poisons via Influence Functions | Nov 20, 2024 | AttributeBenchmarking | CodeCode Available | 0 | 5 |
| Forecasting time series with constraints | Feb 14, 2025 | Additive modelsBenchmarking | CodeCode Available | 0 | 5 |
| FHBench: Towards Efficient and Personalized Federated Learning for Multimodal Healthcare | Apr 15, 2025 | BenchmarkingDiagnostic | CodeCode Available | 0 | 5 |
| Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is Coming | Jul 17, 2019 | Autonomous DrivingBenchmarking | CodeCode Available | 0 | 5 |
| Forecasting Future International Events: A Reliable Dataset for Text-Based Event Modeling | Nov 21, 2024 | ArticlesBenchmarking | CodeCode Available | 0 | 5 |
| Aesthetic Image Captioning From Weakly-Labelled Photographs | Aug 29, 2019 | Aesthetic Image CaptioningBenchmarking | CodeCode Available | 0 | 5 |
| Defense-friendly Images in Adversarial Attacks: Dataset and Metrics for Perturbation Difficulty | Nov 5, 2020 | Adversarial AttackBenchmarking | CodeCode Available | 0 | 5 |
| DefAn: Definitive Answer Dataset for LLMs Hallucination Evaluation | Jun 13, 2024 | BenchmarkingHallucination | CodeCode Available | 0 | 5 |
| Forecasting Across Time Series Databases using Recurrent Neural Networks on Groups of Similar Series: A Clustering Approach | Oct 9, 2017 | BenchmarkingClustering | CodeCode Available | 0 | 5 |
| FORLORN: A Framework for Comparing Offline Methods and Reinforcement Learning for Optimization of RAN Parameters | Sep 8, 2022 | Benchmarkingcontinuous-control | CodeCode Available | 0 | 5 |
| Fluorescence Reference Target Quantitative Analysis Library | Apr 22, 2025 | Benchmarking | CodeCode Available | 0 | 5 |
| Finding the Perfect Fit: Applying Regression Models to ClimateBench v1.0 | Aug 23, 2023 | Benchmarkingregression | CodeCode Available | 0 | 5 |
| Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem | Mar 6, 2024 | BenchmarkingHallucination | CodeCode Available | 0 | 5 |
| Benchmarking Graph Representations and Graph Neural Networks for Multivariate Time Series Classification | Jan 14, 2025 | BenchmarkingGraph Representation Learning | CodeCode Available | 0 | 5 |