| Benchmarking Large Language Model Uncertainty for Prompt Optimization | Sep 16, 2024 | BenchmarkingDiversity | CodeCode Available | 0 | 5 |
| Are Personalized Stochastic Parrots More Dangerous? Evaluating Persona Biases in Dialogue Systems | Oct 8, 2023 | Benchmarking | CodeCode Available | 0 | 5 |
| Flexible Generation of Preference Data for Recommendation Analysis | Jul 23, 2024 | BenchmarkingRecommendation Systems | CodeCode Available | 0 | 5 |
| GiantHunter: Accurate detection of giant virus in metagenomic data using reinforcement-learning and Monte Carlo tree search | Jan 26, 2025 | BenchmarkingDiversity | CodeCode Available | 0 | 5 |
| Domain-Expanded ASTE: Rethinking Generalization in Aspect Sentiment Triplet Extraction | May 23, 2023 | Aspect-Based Sentiment AnalysisAspect-Based Sentiment Analysis (ABSA) | CodeCode Available | 0 | 5 |
| Arena-Rosnav 2.0: A Development and Benchmarking Platform for Robot Navigation in Highly Dynamic Environments | Feb 20, 2023 | BenchmarkingRobot Navigation | CodeCode Available | 0 | 5 |
| Domain2Vec: Domain Embedding for Unsupervised Domain Adaptation | Jul 17, 2020 | BenchmarkingDisentanglement | CodeCode Available | 0 | 5 |
| Do Localization Methods Actually Localize Memorized Data in LLMs? A Tale of Two Benchmarks | Nov 15, 2023 | BenchmarkingNetwork Pruning | CodeCode Available | 0 | 5 |
| Generalization and Regularization in DQN | Sep 29, 2018 | Atari GamesBenchmarking | CodeCode Available | 0 | 5 |
| Generative Models for Fast Simulation of Cherenkov Detectors at the Electron-Ion Collider | Apr 26, 2025 | BenchmarkingGPU | CodeCode Available | 0 | 5 |