| Scenarios and Approaches for Situated Natural Language Explanations | Jun 7, 2024 | BenchmarkingIn-Context Learning | —Unverified | 0 |
| Behavior Structformer: Learning Players Representations with Structured Tokenization | Jun 7, 2024 | Benchmarking | —Unverified | 0 |
| VisionAD, a software package of performant anomaly detection algorithms, and Proportion Localised, an interpretable metric | Jun 7, 2024 | Anomaly DetectionBenchmarking | CodeCode Available | 0 |
| Hints-In-Browser: Benchmarking Language Models for Programming Feedback Generation | Jun 7, 2024 | Benchmarking | —Unverified | 0 |
| Better Late Than Never: Formulating and Benchmarking Recommendation Editing | Jun 6, 2024 | BenchmarkingRecommendation Systems | CodeCode Available | 0 |
| Benchmarking AlphaFold3's protein-protein complex accuracy and machine learning prediction reliability for binding free energy changes upon mutation | Jun 6, 2024 | BenchmarkingDrug Discovery | —Unverified | 0 |
| Performance of large language models in numerical vs. semantic medical knowledge: Benchmarking on evidence-based Q&As | Jun 6, 2024 | ArticlesBenchmarking | —Unverified | 0 |
| NATURAL PLAN: Benchmarking LLMs on Natural Language Planning | Jun 6, 2024 | BenchmarkingScheduling | —Unverified | 0 |
| BEADs: Bias Evaluation Across Domains | Jun 6, 2024 | BenchmarkingBias Detection | —Unverified | 0 |
| Empirical Guidelines for Deploying LLMs onto Resource-constrained Edge Devices | Jun 6, 2024 | BenchmarkingRAG | —Unverified | 0 |