| Anchor Points: Benchmarking Models with Much Fewer Examples | Sep 14, 2023 | BenchmarkingLanguage Modeling | CodeCode Available | 0 |
| M3Dsynth: A dataset of medical 3D images with AI-generated local manipulations | Sep 14, 2023 | BenchmarkingComputed Tomography (CT) | CodeCode Available | 0 |
| Leveraging Contextual Information for Effective Entity Salience Detection | Sep 14, 2023 | ArticlesBenchmarking | —Unverified | 0 |
| Benchmarking machine learning models for quantum state classification | Sep 14, 2023 | BenchmarkingClassification | —Unverified | 0 |
| VerilogEval: Evaluating Large Language Models for Verilog Code Generation | Sep 14, 2023 | BenchmarkingCode Generation | CodeCode Available | 2 |
| So you think you can track? | Sep 13, 2023 | BenchmarkingObject | —Unverified | 0 |
| Benchmarking Procedural Language Understanding for Low-Resource Languages: A Case Study on Turkish | Sep 13, 2023 | BenchmarkingTranslation | CodeCode Available | 0 |
| An Image Dataset for Benchmarking Recommender Systems with Raw Pixels | Sep 13, 2023 | BenchmarkingRecommendation Systems | CodeCode Available | 1 |
| AmodalSynthDrive: A Synthetic Amodal Perception Dataset for Autonomous Driving | Sep 12, 2023 | Autonomous DrivingBenchmarking | —Unverified | 0 |
| Unveiling the potential of large language models in generating semantic and cross-language clones | Sep 12, 2023 | BenchmarkingCode Generation | —Unverified | 0 |