| Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces | May 23, 2023 | Benchmarking | CodeCode Available | 1 |
| R2H: Building Multimodal Navigation Helpers that Respond to Help Requests | May 23, 2023 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| When the Music Stops: Tip-of-the-Tongue Retrieval for Music | May 23, 2023 | BenchmarkingLanguage Modeling | CodeCode Available | 0 |
| Benchmarking Machine Translation with Cultural Awareness | May 23, 2023 | BenchmarkingIn-Context Learning | CodeCode Available | 0 |
| Robust Model-Based Optimization for Challenging Fitness Landscapes | May 23, 2023 | Benchmarkingmodel | CodeCode Available | 0 |
| Exploring Large Language Models for Classical Philology | May 23, 2023 | BenchmarkingDecoder | CodeCode Available | 1 |
| Multilingual Large Language Models Are Not (Yet) Code-Switchers | May 23, 2023 | BenchmarkingLanguage Identification | —Unverified | 0 |
| How Fragile is Relation Extraction under Entity Replacements? | May 22, 2023 | BenchmarkingCausal Inference | CodeCode Available | 0 |
| Element-aware Summarization with Large Language Models: Expert-aligned Evaluation and Chain-of-Thought Method | May 22, 2023 | BenchmarkingHallucination | CodeCode Available | 1 |
| A Benchmark on Extremely Weakly Supervised Text Classification: Reconcile Seed Matching and Prompting Approaches | May 22, 2023 | BenchmarkingClassification | CodeCode Available | 0 |