| CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Benchmarking on HumanEval-X | Mar 30, 2023 | BenchmarkingCode Generation | CodeCode Available | 5 |
| OpenUnlearning: Accelerating LLM Unlearning via Unified Benchmarking of Methods and Metrics | Jun 14, 2025 | Benchmarking | CodeCode Available | 4 |
| TerraTorch: The Geospatial Foundation Models Toolkit | Mar 26, 2025 | BenchmarkingDecoder | CodeCode Available | 4 |
| Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models | Mar 20, 2025 | BenchmarkingReinforcement Learning (RL) | CodeCode Available | 4 |
| Recent Advances in Large Langauge Model Benchmarks against Data Contamination: From Static to Dynamic Evaluation | Feb 23, 2025 | Benchmarking | CodeCode Available | 4 |
| Building reliable sim driving agents by scaling self-play | Feb 20, 2025 | Autonomous VehiclesBenchmarking | CodeCode Available | 4 |
| A deep learning framework for efficient pathology image analysis | Feb 18, 2025 | BenchmarkingDeep Learning | CodeCode Available | 4 |
| Accelerating Data Processing and Benchmarking of AI Models for Pathology | Feb 10, 2025 | Benchmarking | CodeCode Available | 4 |
| Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound | Feb 7, 2025 | Benchmarking | CodeCode Available | 4 |
| Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation | Feb 4, 2025 | BenchmarkingInformation Retrieval | CodeCode Available | 4 |