| Benchmarking and Analyzing In-context Learning, Fine-tuning and Supervised Learning for Biomedical Knowledge Curation: a focused study on chemical entities of biological interest | Dec 20, 2023 | BenchmarkingIn-Context Learning | —Unverified | 0 |
| Evaluating Generative Models for Tabular Data: Novel Metrics and Benchmarking | Apr 29, 2025 | BenchmarkingIntrusion Detection | —Unverified | 0 |
| Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation | Feb 10, 2025 | Benchmarking | —Unverified | 0 |
| Can we hop in general? A discussion of benchmark selection and design using the Hopper environment | Oct 11, 2024 | BenchmarkingReinforcement Learning (RL) | —Unverified | 0 |
| Can't See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs | Feb 16, 2025 | Benchmarking | —Unverified | 0 |
| Benchmarking and Analyzing Generative Data for Visual Recognition | Jul 25, 2023 | BenchmarkingRetrieval | —Unverified | 0 |
| A dataset for benchmarking vision-based localization at intersections | Nov 4, 2018 | Benchmarking | —Unverified | 0 |
| Evaluating Robustness of Visual Representations for Object Assembly Task Requiring Spatio-Geometrical Reasoning | Oct 15, 2023 | BenchmarkingSpatial Reasoning | —Unverified | 0 |
| Can time series forecasting be automated? A benchmark and analysis | Jul 23, 2024 | BenchmarkingDecision Making | —Unverified | 0 |
| Can Machines “Learn” Halide Perovskite Crystal Formation without Accurate Physicochemical Features? | May 26, 2020 | Benchmarking | —Unverified | 0 |