| Estimating Task Completion Times for Network Rollouts using Statistical Models within Partitioning-based Regression Methods | Nov 20, 2022 | Benchmarkingregression | —Unverified | 0 | 0 |
| Estimating the Effect of Crosstalk Error on Circuit Fidelity Using Noisy Intermediate-Scale Quantum Devices | Feb 10, 2024 | Benchmarking | —Unverified | 0 | 0 |
| Estimating transmission from genetic and epidemiological data: a metric to compare transmission trees | Sep 28, 2016 | Benchmarking | —Unverified | 0 | 0 |
| EuroCon: Benchmarking Parliament Deliberation for Political Consensus Finding | May 26, 2025 | Benchmarking | —Unverified | 0 | 0 |
| Europarl-ASR: A Large Corpus of Parliamentary Debates for Streaming ASR Benchmarking and Speech Data Filtering/Verbatimization | Aug 30, 2021 | BenchmarkingData Augmentation | —Unverified | 0 | 0 |
| Challenges and Advancements in Modeling Shock Fronts with Physics-Informed Neural Networks: A Review and Benchmarking Study | Mar 14, 2025 | Benchmarking | —Unverified | 0 | 0 |
| Tackling Visual Control via Multi-View Exploration Maximization | Nov 28, 2022 | BenchmarkingReinforcement Learning (RL) | —Unverified | 0 | 0 |
| Challenge Results Are Not Reproducible | Jul 14, 2023 | BenchmarkingImage Segmentation | —Unverified | 0 | 0 |
| ChakmaNMT: A Low-resource Machine Translation On Chakma Language | Oct 14, 2024 | BenchmarkingMachine Translation | —Unverified | 0 | 0 |
| Evalita-LLM: Benchmarking Large Language Models on Italian | Feb 4, 2025 | BenchmarkingMultiple-choice | —Unverified | 0 | 0 |
| Chain of LoRA: Efficient Fine-tuning of Language Models via Residual Learning | Jan 8, 2024 | BenchmarkingCoLA | —Unverified | 0 | 0 |
| TACO: Benchmarking Generalizable Bimanual Tool-ACtion-Object Understanding | Jan 16, 2024 | Action RecognitionBenchmarking | —Unverified | 0 | 0 |
| Evaluating and Benchmarking Foundation Models for Earth Observation and Geospatial AI | Jun 26, 2024 | BenchmarkingCrop Type Mapping | —Unverified | 0 | 0 |
| C-FedRAG: A Confidential Federated Retrieval-Augmented Generation System | Dec 17, 2024 | BenchmarkingRAG | —Unverified | 0 | 0 |
| Evaluating Cultural and Social Awareness of LLM Web Agents | Oct 30, 2024 | BenchmarkingNavigate | —Unverified | 0 | 0 |
| Evaluating Deep Clustering Algorithms on Non-Categorical 3D CAD Models | Apr 29, 2024 | BenchmarkingClustering | —Unverified | 0 | 0 |
| Tactile MNIST: Benchmarking Active Tactile Perception | Jun 3, 2025 | BenchmarkingScene Understanding | —Unverified | 0 | 0 |
| Evaluating Financial Sentiment Analysis with Annotators Instruction Assisted Prompting: Enhancing Contextual Interpretation and Stock Prediction Accuracy | May 9, 2025 | BenchmarkingSentiment Analysis | —Unverified | 0 | 0 |
| Evaluating Generative AI-Enhanced Content: A Conceptual Framework Using Qualitative, Quantitative, and Mixed-Methods Approaches | Nov 26, 2024 | Benchmarking | —Unverified | 0 | 0 |
| Evaluating Generative Models for Tabular Data: Novel Metrics and Benchmarking | Apr 29, 2025 | BenchmarkingIntrusion Detection | —Unverified | 0 | 0 |
| CETBench: A Novel Dataset constructed via Transformations over Programs for Benchmarking LLMs for Code-Equivalence Checking | Jun 4, 2025 | BenchmarkingCode Generation | —Unverified | 0 | 0 |
| Certifying almost all quantum states with few single-qubit measurements | Apr 10, 2024 | AllBenchmarking | —Unverified | 0 | 0 |
| Evaluating Large Language Models on Spatial Tasks: A Multi-Task Benchmarking Study | Aug 26, 2024 | 8kBenchmarking | —Unverified | 0 | 0 |
| Certified Adversarial Defenses Meet Out-of-Distribution Corruptions: Benchmarking Robustness and Simple Baselines | Dec 1, 2021 | Adversarial RobustnessBenchmarking | —Unverified | 0 | 0 |
| A Latent Fingerprint in the Wild Database | Apr 3, 2023 | Benchmarking | —Unverified | 0 | 0 |
| CellCycleGAN: Spatiotemporal Microscopy Image Synthesis of Cell Populations using Statistical Shape Models and Conditional GANs | Oct 22, 2020 | BenchmarkingCell Segmentation | —Unverified | 0 | 0 |
| Evaluating Music Recommender Systems for Groups | Jul 31, 2017 | BenchmarkingRecommendation Systems | —Unverified | 0 | 0 |
| Evaluating Nuanced Bias in Large Language Model Free Response Answers | Jul 11, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 | 0 |
| CDTB: A Color and Depth Visual Object Tracking Dataset and Benchmark | Jul 1, 2019 | BenchmarkingObject Tracking | —Unverified | 0 | 0 |
| Evaluating Robustness of LLMs on Crisis-Related Microblogs across Events, Information Types, and Linguistic Features | Dec 8, 2024 | Benchmarking | —Unverified | 0 | 0 |
| Evaluating Robustness of Visual Representations for Object Assembly Task Requiring Spatio-Geometrical Reasoning | Oct 15, 2023 | BenchmarkingSpatial Reasoning | —Unverified | 0 | 0 |
| A Large-scale Study on Training Sample Memorization in Generative Modeling | Jan 1, 2021 | BenchmarkingMemorization | —Unverified | 0 | 0 |
| A large-scale, physically-based synthetic dataset for satellite pose estimation | Jun 15, 2025 | BenchmarkingDataset Generation | —Unverified | 0 | 0 |
| Talking Turns: Benchmarking Audio Foundation Models on Turn-Taking Dynamics | Mar 3, 2025 | BenchmarkingSpoken Dialogue Systems | —Unverified | 0 | 0 |
| Evaluating Text-to-Image Synthesis with a Conditional Fréchet Distance | Mar 27, 2025 | BenchmarkingImage Generation | —Unverified | 0 | 0 |
| A Benchmarking Protocol for Pansharpening: Dataset, Preprocessing, and Quality Assessment | Jun 7, 2021 | BenchmarkingPansharpening | —Unverified | 0 | 0 |
| Evaluating the Generation of Spatial Relations in Text and Image Generative Models | Nov 12, 2024 | BenchmarkingImage Generation | —Unverified | 0 | 0 |
| Evaluating the Performance of Large Language Models via Debates | Jun 16, 2024 | Benchmarking | —Unverified | 0 | 0 |
| A large-scale heterogeneous 3D magnetic resonance brain imaging dataset for self-supervised learning | Jun 17, 2025 | BenchmarkingSelf-Supervised Learning | —Unverified | 0 | 0 |
| TARGET: Benchmarking Table Retrieval for Generative Tasks | May 14, 2025 | BenchmarkingRepresentation Learning | —Unverified | 0 | 0 |
| Efficient Demand Response Location Targeting for Price Spike Mitigation by Exploiting Price-demand Relationship | Nov 27, 2022 | Benchmarking | —Unverified | 0 | 0 |
| Evaluating Visual Conversational Agents via Cooperative Human-AI Games | Aug 17, 2017 | Benchmarking | —Unverified | 0 | 0 |
| Evaluation and Ensembling of Methods for Reverse Engineering of Brain Connectivity from Imaging Data | Mar 15, 2016 | BenchmarkingCausal Discovery | —Unverified | 0 | 0 |
| Evaluation Methodology for Attacks Against Confidence Thresholding Models | May 1, 2019 | Adversarial RobustnessBenchmarking | —Unverified | 0 | 0 |
| Evaluation Methods and Measures for Causal Learning Algorithms | Feb 7, 2022 | BenchmarkingBIG-bench Machine Learning | —Unverified | 0 | 0 |
| Evaluation of Algorithms for Multi-Modality Whole Heart Segmentation: An Open-Access Grand Challenge | Feb 21, 2019 | AnatomyBenchmarking | —Unverified | 0 | 0 |
| Evaluation of Architectural Synthesis Using Generative AI | Mar 4, 2025 | Benchmarking | —Unverified | 0 | 0 |
| Evaluation of Human-AI Teams for Learned and Rule-Based Agents in Hanabi | Jul 15, 2021 | BenchmarkingDeep Reinforcement Learning | —Unverified | 0 | 0 |
| CayleyPy RL: Pathfinding and Reinforcement Learning on Cayley Graphs | Feb 25, 2025 | Benchmarkingreinforcement-learning | —Unverified | 0 | 0 |
| Evaluation of Popular XAI Applied to Clinical Prediction Models: Can They be Trusted? | Jun 21, 2023 | BenchmarkingExplainable artificial intelligence | —Unverified | 0 | 0 |