| Needle In A Haystack, Fast: Benchmarking Image Perceptual Similarity Metrics At Scale | Jun 1, 2022 | Benchmarking | CodeCode Available | 1 |
| Working Memory Capacity of ChatGPT: An Empirical Study | Apr 30, 2023 | BenchmarkingLanguage Modeling | CodeCode Available | 1 |
| CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation | Nov 10, 2023 | BenchmarkingCloud Computing | CodeCode Available | 1 |
| Clinical Prompt Learning with Frozen Language Models | May 11, 2022 | BenchmarkingGPU | CodeCode Available | 1 |
| Neural Methods for Logical Reasoning Over Knowledge Graphs | Sep 28, 2022 | BenchmarkingKnowledge Graphs | CodeCode Available | 1 |
| CLoG: Benchmarking Continual Learning of Image Generation Models | Jun 7, 2024 | BenchmarkingContinual Learning | CodeCode Available | 1 |
| ClearPose: Large-scale Transparent Object Dataset and Benchmark | Mar 8, 2022 | BenchmarkingDepth Completion | CodeCode Available | 1 |
| Benchmarking Generated Poses: How Rational is Structure-based Drug Design with Generative Models? | Aug 14, 2023 | BenchmarkingDrug Design | CodeCode Available | 1 |
| Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization | Nov 15, 2023 | BenchmarkingInstruction Following | CodeCode Available | 1 |
| A Comparison of Image Denoising Methods | Apr 18, 2023 | BenchmarkingDenoising | CodeCode Available | 1 |