| DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects | Oct 3, 2024 | BenchmarkingImitation Learning | CodeCode Available | 1 |
| LLM-Pilot: Characterize and Optimize Performance of your LLM Inference Services | Oct 3, 2024 | BenchmarkingGPU | CodeCode Available | 1 |
| StringLLM: Understanding the String Processing Capability of Large Language Models | Oct 2, 2024 | Benchmarking | CodeCode Available | 1 |
| MONICA: Benchmarking on Long-tailed Medical Image Classification | Oct 2, 2024 | BenchmarkingClassification | CodeCode Available | 1 |
| MedQA-CS: Benchmarking Large Language Models Clinical Skills Using an AI-SCE Framework | Oct 2, 2024 | BenchmarkingInstruction Following | CodeCode Available | 1 |
| Exploring QUIC Dynamics: A Large-Scale Dataset for Encrypted Traffic Analysis | Sep 30, 2024 | BenchmarkingIntrusion Detection | CodeCode Available | 1 |
| ARLBench: Flexible and Efficient Benchmarking for Hyperparameter Optimization in Reinforcement Learning | Sep 27, 2024 | AutoMLBenchmarking | CodeCode Available | 1 |
| MALPOLON: A Framework for Deep Species Distribution Modeling | Sep 26, 2024 | BenchmarkingGPU | CodeCode Available | 1 |
| HazeSpace2M: A Dataset for Haze Aware Single Image Dehazing | Sep 25, 2024 | BenchmarkingImage Dehazing | CodeCode Available | 1 |
| RMCBench: Benchmarking Large Language Models' Resistance to Malicious Code | Sep 23, 2024 | BenchmarkingCode Generation | CodeCode Available | 1 |