| How much progress have we made in neural network training? A New Evaluation Protocol for Benchmarking Optimizers | Oct 19, 2020 | BenchmarkingGraph Mining | —Unverified | 0 | 0 |
| How Propense Are Large Language Models at Producing Code Smells? A Benchmarking Study | Dec 25, 2024 | BenchmarkingCode Generation | —Unverified | 0 | 0 |
| Benchmarking Ultra-High-Definition Image Super-Resolution | Jan 1, 2021 | 4k8k | —Unverified | 0 | 0 |
| The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input | Jan 6, 2025 | BenchmarkingForm | —Unverified | 0 | 0 |
| Benchmarking Twitter Sentiment Analysis Tools | May 1, 2014 | BenchmarkingDecision Making | —Unverified | 0 | 0 |
| The Forchheim Image Database for Camera Identification in the Wild | Nov 4, 2020 | BenchmarkingFact Checking | —Unverified | 0 | 0 |
| MultiTrust: A Comprehensive Benchmark Towards Trustworthy Multimodal Large Language Models | Jun 11, 2024 | BenchmarkingFairness | —Unverified | 0 | 0 |
| How Universal are Universal Dependencies? Exploiting Syntax for Multilingual Clause-level Sentiment Detection | May 1, 2020 | BenchmarkingBIG-bench Machine Learning | —Unverified | 0 | 0 |
| Benchmarking Transformers-based models on French Spoken Language Understanding tasks | Jul 19, 2022 | BenchmarkingSpoken Language Understanding | —Unverified | 0 | 0 |
| How well it works: Benchmarking performance of GPT models on medical natural language processing tasks | Jun 12, 2024 | Benchmarking | —Unverified | 0 | 0 |
| You Only Crash Once v2: Perceptually Consistent Strong Features for One-Stage Domain Adaptive Detection of Space Terrain | Jan 23, 2025 | BenchmarkingDomain Adaptation | —Unverified | 0 | 0 |
| The Impact of ASR on the Automatic Analysis of Linguistic Complexity and Sophistication in Spontaneous L2 Speech | Apr 17, 2021 | Benchmarking | —Unverified | 0 | 0 |
| The Impact of Genomic Variation on Function (IGVF) Consortium | Jul 24, 2023 | Benchmarking | —Unverified | 0 | 0 |
| A General Taylor Framework for Unifying and Revisiting Attribution Methods | May 28, 2021 | BenchmarkingDecision Making | —Unverified | 0 | 0 |
| HULK: An Energy Efficiency Benchmark Platform for Responsible Natural Language Processing | Feb 14, 2020 | Benchmarking | —Unverified | 0 | 0 |
| Benchmarking Transformer-based Language Models for Arabic Sentiment and Sarcasm Detection | Apr 1, 2021 | BenchmarkingSarcasm Detection | —Unverified | 0 | 0 |
| Benchmarking Toxic Molecule Classification using Graph Neural Networks and Few Shot Learning | Nov 22, 2023 | BenchmarkingDrug Discovery | —Unverified | 0 | 0 |
| Human Body Shape Classification Based on a Single Image | May 29, 2023 | BenchmarkingClassification | —Unverified | 0 | 0 |
| Benchmarking Time Series Forecasting Models: From Statistical Techniques to Foundation Models in Real-World Applications | Feb 5, 2025 | BenchmarkingFeature Engineering | —Unverified | 0 | 0 |
| Benchmarking the Text-to-SQL Capability of Large Language Models: A Comprehensive Evaluation | Mar 5, 2024 | BenchmarkingIn-Context Learning | —Unverified | 0 | 0 |
| A generalized kinetic framework applied to whole-cell catalysis in biofilm flow reactors clarifies performance enhancements | Apr 10, 2019 | Benchmarking | —Unverified | 0 | 0 |
| HyBiomass: Global Hyperspectral Imagery Benchmark Dataset for Evaluating Geospatial Foundation Models in Forest Aboveground Biomass Estimation | Jun 12, 2025 | Benchmarking | —Unverified | 0 | 0 |
| Hybrid data driven/thermal simulation model for comfort assessment | Sep 4, 2023 | Benchmarking | —Unverified | 0 | 0 |
| Hybrid Long Document Summarization using C2F-FAR and ChatGPT: A Practical Study | Jun 1, 2023 | ArticlesBenchmarking | —Unverified | 0 | 0 |
| The iNaturalist Sounds Dataset | May 31, 2025 | Benchmarking | —Unverified | 0 | 0 |