| AnyTOD: A Programmable Task-Oriented Dialog System | Dec 20, 2022 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Benchmarking Spatial Relationships in Text-to-Image Generation | Dec 20, 2022 | BenchmarkingImage Generation | CodeCode Available | 1 |
| Trial-Based Dominance Enables Non-Parametric Tests to Compare both the Speed and Accuracy of Stochastic Optimizers | Dec 19, 2022 | BenchmarkingStochastic Optimization | —Unverified | 0 |
| GiCCS: A German in-Context Conversational Similarity Benchmark | Dec 16, 2022 | BenchmarkingSemantic Textual Similarity | —Unverified | 0 |
| Biomedical image analysis competitions: The state of current participation practice | Dec 16, 2022 | BenchmarkingSurvey | —Unverified | 0 |
| Automatic vehicle trajectory data reconstruction at scale | Dec 15, 2022 | Benchmarkingvehicle detection | —Unverified | 0 |
| Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift | Dec 15, 2022 | BenchmarkingImage Captioning | CodeCode Available | 1 |
| Benchmarking Large Language Models for Automated Verilog RTL Code Generation | Dec 13, 2022 | BenchmarkingCode Generation | CodeCode Available | 1 |
| Mind the Retrosynthesis Gap: Bridging the divide between Single-step and Multi-step Retrosynthesis Prediction | Dec 12, 2022 | BenchmarkingMulti-step retrosynthesis | —Unverified | 0 |
| PyPop7: A Pure-Python Library for Population-Based Black-Box Optimization | Dec 12, 2022 | BenchmarkingEvolutionary Algorithms | CodeCode Available | 2 |