| The Ability of Large Language Models to Evaluate Constraint-satisfaction in Agent Responses to Open-ended Requests | Sep 22, 2024 | Benchmarking | —Unverified | 0 |
| The ACL RD-TEC: A Dataset for Benchmarking Terminology Extraction and Classification in Computational Linguistics | Aug 1, 2014 | BenchmarkingGeneral Classification | —Unverified | 0 |
| The Adversarial AI-Art: Understanding, Generation, Detection, and Benchmarking | Apr 22, 2024 | BenchmarkingMisinformation | —Unverified | 0 |
| The Algonauts Project: A Platform for Communication between the Sciences of Biological and Artificial Intelligence | May 14, 2019 | Benchmarkingspeech-recognition | —Unverified | 0 |
| Language Models as a Service: Overview of a New Paradigm and its Challenges | Sep 28, 2023 | Benchmarking | —Unverified | 0 |
| The Benchmark Lottery | Jul 14, 2021 | BenchmarkingBIG-bench Machine Learning | —Unverified | 0 |
| The Brain Tumor Segmentation (BraTS-METS) Challenge 2023: Brain Metastasis Segmentation on Pre-treatment MRI | Jun 1, 2023 | BenchmarkingBrain Tumor Segmentation | —Unverified | 0 |
| The CLC-UKET Dataset: Benchmarking Case Outcome Prediction for the UK Employment Tribunal | Sep 12, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| The Convergent Ethics of AI? Analyzing Moral Foundation Priorities in Large Language Models with a Multi-Framework Approach | Apr 27, 2025 | BenchmarkingDecision Making | —Unverified | 0 |
| The Curious Case of Integrator Reach Sets, Part I: Basic Theory | Feb 23, 2021 | Benchmarking | —Unverified | 0 |
| The Design and Implementation of a Scalable DL Benchmarking Platform | Nov 19, 2019 | Benchmarking | —Unverified | 0 |
| The Disagreement Problem in Faithfulness Metrics | Nov 13, 2023 | BenchmarkingExplainable artificial intelligence | —Unverified | 0 |
| The DLV System for Knowledge Representation and Reasoning | Nov 4, 2002 | Benchmarking | —Unverified | 0 |
| The Dota 2 Bot Competition | Mar 4, 2021 | BenchmarkingDota 2 | —Unverified | 0 |
| The Effect of Domain and Diacritics in Yoruba–English Neural Machine Translation | Aug 1, 2021 | BenchmarkingMachine Translation | —Unverified | 0 |
| The EuroCity Persons Dataset: A Novel Benchmark for Object Detection | May 18, 2018 | BenchmarkingObject | —Unverified | 0 |
| The Evolutionary Computation Methods No One Should Use | Jan 5, 2023 | Benchmarking | —Unverified | 0 |
| The Expressive Power of Word Embeddings | Jan 15, 2013 | BenchmarkingSentence | —Unverified | 0 |
| The Extractive-Abstractive Axis: Measuring Content "Borrowing" in Generative Language Models | Jul 20, 2023 | Benchmarking | —Unverified | 0 |
| The FaceChannelS: Strike of the Sequences for the AffWild 2 Challenge | Oct 4, 2020 | BenchmarkingBIG-bench Machine Learning | —Unverified | 0 |
| The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input | Jan 6, 2025 | BenchmarkingForm | —Unverified | 0 |
| The Forchheim Image Database for Camera Identification in the Wild | Nov 4, 2020 | BenchmarkingFact Checking | —Unverified | 0 |
| The Impact of ASR on the Automatic Analysis of Linguistic Complexity and Sophistication in Spontaneous L2 Speech | Apr 17, 2021 | Benchmarking | —Unverified | 0 |
| The Impact of Genomic Variation on Function (IGVF) Consortium | Jul 24, 2023 | Benchmarking | —Unverified | 0 |
| The iNaturalist Sounds Dataset | May 31, 2025 | Benchmarking | —Unverified | 0 |