| No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance | Apr 4, 2024 | BenchmarkingImage Generation | CodeCode Available | 2 | 5 |
| Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations | Jun 9, 2022 | Benchmarkingcontinuous-control | CodeCode Available | 2 | 5 |
| Class-incremental Learning for Time Series: Benchmark and Evaluation | Feb 19, 2024 | Activity RecognitionBenchmarking | CodeCode Available | 2 | 5 |
| Commit0: Library Generation from Scratch | Dec 2, 2024 | BenchmarkingCode Generation | CodeCode Available | 2 | 5 |
| Building Normalizing Flows with Stochastic Interpolants | Sep 30, 2022 | BenchmarkingDensity Estimation | CodeCode Available | 2 | 5 |
| BrowseComp-ZH: Benchmarking Web Browsing Ability of Large Language Models in Chinese | Apr 27, 2025 | BenchmarkingProper Noun | CodeCode Available | 2 | 5 |
| BTS: Building Timeseries Dataset: Empowering Large-Scale Building Analytics | Jun 13, 2024 | Benchmarking | CodeCode Available | 2 | 5 |
| A Toolkit for Reliable Benchmarking and Research in Multi-Objective Reinforcement Learning | Sep 26, 2023 | BenchmarkingMulti-Objective Reinforcement Learning | CodeCode Available | 2 | 5 |
| Benchmarking Robustness of 3D Point Cloud Recognition Against Common Corruptions | Jan 28, 2022 | 3D Point Cloud Classification3D Point Cloud Data Augmentation | CodeCode Available | 2 | 5 |
| Benchmarking Complex Instruction-Following with Multiple Constraints Composition | Jul 4, 2024 | BenchmarkingInstruction Following | CodeCode Available | 2 | 5 |