| The Ability of Large Language Models to Evaluate Constraint-satisfaction in Agent Responses to Open-ended Requests | Sep 22, 2024 | Benchmarking | —Unverified | 0 |
| The ACL RD-TEC: A Dataset for Benchmarking Terminology Extraction and Classification in Computational Linguistics | Aug 1, 2014 | BenchmarkingGeneral Classification | —Unverified | 0 |
| The Adversarial AI-Art: Understanding, Generation, Detection, and Benchmarking | Apr 22, 2024 | BenchmarkingMisinformation | —Unverified | 0 |
| The Algonauts Project: A Platform for Communication between the Sciences of Biological and Artificial Intelligence | May 14, 2019 | Benchmarkingspeech-recognition | —Unverified | 0 |
| Language Models as a Service: Overview of a New Paradigm and its Challenges | Sep 28, 2023 | Benchmarking | —Unverified | 0 |
| The Benchmark Lottery | Jul 14, 2021 | BenchmarkingBIG-bench Machine Learning | —Unverified | 0 |
| The Brain Tumor Segmentation (BraTS-METS) Challenge 2023: Brain Metastasis Segmentation on Pre-treatment MRI | Jun 1, 2023 | BenchmarkingBrain Tumor Segmentation | —Unverified | 0 |
| The CLC-UKET Dataset: Benchmarking Case Outcome Prediction for the UK Employment Tribunal | Sep 12, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| The Convergent Ethics of AI? Analyzing Moral Foundation Priorities in Large Language Models with a Multi-Framework Approach | Apr 27, 2025 | BenchmarkingDecision Making | —Unverified | 0 |
| The Curious Case of Integrator Reach Sets, Part I: Basic Theory | Feb 23, 2021 | Benchmarking | —Unverified | 0 |
| The Design and Implementation of a Scalable DL Benchmarking Platform | Nov 19, 2019 | Benchmarking | —Unverified | 0 |
| The Disagreement Problem in Faithfulness Metrics | Nov 13, 2023 | BenchmarkingExplainable artificial intelligence | —Unverified | 0 |
| The DLV System for Knowledge Representation and Reasoning | Nov 4, 2002 | Benchmarking | —Unverified | 0 |
| The Dota 2 Bot Competition | Mar 4, 2021 | BenchmarkingDota 2 | —Unverified | 0 |
| The Effect of Domain and Diacritics in Yoruba–English Neural Machine Translation | Aug 1, 2021 | BenchmarkingMachine Translation | —Unverified | 0 |
| The EuroCity Persons Dataset: A Novel Benchmark for Object Detection | May 18, 2018 | BenchmarkingObject | —Unverified | 0 |
| The Evolutionary Computation Methods No One Should Use | Jan 5, 2023 | Benchmarking | —Unverified | 0 |
| The Expressive Power of Word Embeddings | Jan 15, 2013 | BenchmarkingSentence | —Unverified | 0 |
| The Extractive-Abstractive Axis: Measuring Content "Borrowing" in Generative Language Models | Jul 20, 2023 | Benchmarking | —Unverified | 0 |
| The FaceChannelS: Strike of the Sequences for the AffWild 2 Challenge | Oct 4, 2020 | BenchmarkingBIG-bench Machine Learning | —Unverified | 0 |
| The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input | Jan 6, 2025 | BenchmarkingForm | —Unverified | 0 |
| The Forchheim Image Database for Camera Identification in the Wild | Nov 4, 2020 | BenchmarkingFact Checking | —Unverified | 0 |
| The Impact of ASR on the Automatic Analysis of Linguistic Complexity and Sophistication in Spontaneous L2 Speech | Apr 17, 2021 | Benchmarking | —Unverified | 0 |
| The Impact of Genomic Variation on Function (IGVF) Consortium | Jul 24, 2023 | Benchmarking | —Unverified | 0 |
| The iNaturalist Sounds Dataset | May 31, 2025 | Benchmarking | —Unverified | 0 |
| The Interactive Effects of Operators and Parameters to GA Performance Under Different Problem Sizes | Aug 1, 2015 | Benchmarking | —Unverified | 0 |
| The JPEG Pleno Learning-based Point Cloud Coding Standard: Serving Man and Machine | Sep 12, 2024 | Autonomous DrivingBenchmarking | —Unverified | 0 |
| The Jungle of Generative Drug Discovery: Traps, Treasures, and Ways Out | Dec 24, 2024 | BenchmarkingDeep Learning | —Unverified | 0 |
| The Karp Dataset | Jan 24, 2025 | BenchmarkingMathematical Reasoning | —Unverified | 0 |
| The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs | Oct 2, 2024 | BenchmarkingHallucination | —Unverified | 0 |
| The Leaderboard Illusion | Apr 29, 2025 | BenchmarkingChatbot | —Unverified | 0 |
| The Liouville Generator for Producing Integrable Expressions | Jun 17, 2024 | Benchmarking | —Unverified | 0 |
| The Low Emission Oil&Gas Open (LEOGO) Reference Platform of an Off-Grid Energy System for Renewable Integration Studies | Aug 16, 2022 | BenchmarkingManagement | —Unverified | 0 |
| The Moral Mind(s) of Large Language Models | Nov 19, 2024 | BenchmarkingDecision Making | —Unverified | 0 |
| The Multi-speaker Multi-style Voice Cloning Challenge 2021 | Apr 5, 2021 | BenchmarkingVoice Cloning | —Unverified | 0 |
| The Neural Painter: Multi-Turn Image Generation | Jun 16, 2018 | BenchmarkingConditional Image Generation | —Unverified | 0 |
| The ObjectFolder Benchmark: Multisensory Learning with Neural and Real Objects | Jun 1, 2023 | BenchmarkingObject | —Unverified | 0 |
| Theory of Mind in Large Language Models: Examining Performance of 11 State-of-the-Art models vs. Children Aged 7-10 on Advanced Tests | Oct 31, 2023 | Benchmarking | —Unverified | 0 |
| The Oxford Spires Dataset: Benchmarking Large-Scale LiDAR-Visual Localisation, Reconstruction and Radiance Field Methods | Nov 15, 2024 | 3D ReconstructionBenchmarking | —Unverified | 0 |
| The Paradox of Success in Evolutionary and Bioinspired Optimization: Revisiting Critical Issues, Key Studies, and Methodological Pathways | Jan 13, 2025 | BenchmarkingMetaheuristic Optimization | —Unverified | 0 |
| The ParClusterers Benchmark Suite (PCBS): A Fine-Grained Analysis of Scalable Graph Clustering | Nov 15, 2024 | BenchmarkingClustering | —Unverified | 0 |
| The Partial Response Network: a neural network nomogram | Aug 16, 2019 | Additive modelsBenchmarking | —Unverified | 0 |
| The Pitfalls of Benchmarking in Algorithm Selection: What We Are Getting Wrong | May 12, 2025 | Benchmarking | —Unverified | 0 |
| The Protein Engineering Tournament: An Open Science Benchmark for Protein Modeling and Design | Sep 18, 2023 | Benchmarking | —Unverified | 0 |
| Thermal Image-based Fault Diagnosis in Induction Machines via Self-Organized Operational Neural Networks | Dec 8, 2024 | BenchmarkingDiagnostic | —Unverified | 0 |
| The Role of Local Intrinsic Dimensionality in Benchmarking Nearest Neighbor Search | Jul 17, 2019 | BenchmarkingDiversity | —Unverified | 0 |
| The Russian practice of applying cluster approach in regional development | Jun 8, 2021 | Benchmarking | —Unverified | 0 |
| The Seeker's Dilemma: Realistic Formulation and Benchmarking for Hardware Trojan Detection | Feb 27, 2024 | Benchmarking | —Unverified | 0 |
| The Sparsity Roofline: Understanding the Hardware Limits of Sparse Neural Networks | Sep 30, 2023 | Benchmarking | —Unverified | 0 |
| The Trap of Presumed Equivalence: Artificial General Intelligence Should Not Be Assessed on the Scale of Human Intelligence | Oct 14, 2024 | Benchmarking | —Unverified | 0 |