| Evaluating Music Recommender Systems for Groups | Jul 31, 2017 | BenchmarkingRecommendation Systems | —Unverified | 0 |
| Evaluating Nuanced Bias in Large Language Model Free Response Answers | Jul 11, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Evaluating Robustness of LLMs on Crisis-Related Microblogs across Events, Information Types, and Linguistic Features | Dec 8, 2024 | Benchmarking | —Unverified | 0 |
| Evaluating Robustness of Visual Representations for Object Assembly Task Requiring Spatio-Geometrical Reasoning | Oct 15, 2023 | BenchmarkingSpatial Reasoning | —Unverified | 0 |
| Evaluating Text-to-Image Synthesis with a Conditional Fréchet Distance | Mar 27, 2025 | BenchmarkingImage Generation | —Unverified | 0 |
| Evaluating the Generation of Spatial Relations in Text and Image Generative Models | Nov 12, 2024 | BenchmarkingImage Generation | —Unverified | 0 |
| Evaluating the Performance of Large Language Models via Debates | Jun 16, 2024 | Benchmarking | —Unverified | 0 |
| Evaluating Visual Conversational Agents via Cooperative Human-AI Games | Aug 17, 2017 | Benchmarking | —Unverified | 0 |
| Evaluation and Ensembling of Methods for Reverse Engineering of Brain Connectivity from Imaging Data | Mar 15, 2016 | BenchmarkingCausal Discovery | —Unverified | 0 |
| Evaluation Methodology for Attacks Against Confidence Thresholding Models | May 1, 2019 | Adversarial RobustnessBenchmarking | —Unverified | 0 |