| CoqPilot, a plugin for LLM-based generation of proofs | Oct 25, 2024 | Benchmarking | CodeCode Available | 2 |
| CoIR: A Comprehensive Benchmark for Code Information Retrieval Models | Jul 3, 2024 | BenchmarkingCode Search | CodeCode Available | 2 |
| Commit0: Library Generation from Scratch | Dec 2, 2024 | BenchmarkingCode Generation | CodeCode Available | 2 |
| CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation | Oct 30, 2024 | BenchmarkingPassage Retrieval | CodeCode Available | 2 |
| Datasets and Benchmarks for Offline Safe Reinforcement Learning | Jun 15, 2023 | Autonomous DrivingBenchmarking | CodeCode Available | 2 |
| Class-incremental Learning for Time Series: Benchmark and Evaluation | Feb 19, 2024 | Activity RecognitionBenchmarking | CodeCode Available | 2 |
| ClimateLearn: Benchmarking Machine Learning for Weather and Climate Modeling | Jul 4, 2023 | BenchmarkingWeather Forecasting | CodeCode Available | 2 |
| Panoptic Scene Graph Generation | Jul 22, 2022 | BenchmarkingPanoptic Scene Graph Generation | CodeCode Available | 2 |
| CausalGym: Benchmarking causal interpretability methods on linguistic tasks | Feb 19, 2024 | BenchmarkingInterpretability Techniques for Deep Learning | CodeCode Available | 2 |
| MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code Generation | Aug 17, 2022 | BenchmarkingCode Generation | CodeCode Available | 2 |