| SustainDC: Benchmarking for Sustainable Data Center Control | Aug 14, 2024 | BenchmarkingManagement | CodeCode Available | 2 |
| COALA: A Practical and Vision-Centric Federated Learning Platform | Jul 23, 2024 | BenchmarkingContinual Learning | CodeCode Available | 2 |
| MOMAland: A Set of Benchmarks for Multi-Objective Multi-Agent Reinforcement Learning | Jul 23, 2024 | BenchmarkingDecision Making | CodeCode Available | 2 |
| Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models | Jul 17, 2024 | BenchmarkingRed Teaming | CodeCode Available | 2 |
| GV-Bench: Benchmarking Local Feature Matching for Geometric Verification of Long-term Loop Closure Detection | Jul 16, 2024 | BenchmarkingLoop Closure Detection | CodeCode Available | 2 |
| WayveScenes101: A Dataset and Benchmark for Novel View Synthesis in Autonomous Driving | Jul 11, 2024 | Autonomous DrivingBenchmarking | CodeCode Available | 2 |
| InstructLayout: Instruction-Driven 2D and 3D Layout Synthesis with Semantic Graph Prior | Jul 10, 2024 | BenchmarkingDecoder | CodeCode Available | 2 |
| HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance | Jul 9, 2024 | BenchmarkingConditional Image Generation | CodeCode Available | 2 |
| SH17: A Dataset for Human Safety and Personal Protective Equipment Detection in Manufacturing Industry | Jul 5, 2024 | Benchmarkingobject-detection | CodeCode Available | 2 |
| Benchmarking Complex Instruction-Following with Multiple Constraints Composition | Jul 4, 2024 | BenchmarkingInstruction Following | CodeCode Available | 2 |
| Craftium: An Extensible Framework for Creating Reinforcement Learning Environments | Jul 4, 2024 | BenchmarkingMinecraft | CodeCode Available | 2 |
| CoIR: A Comprehensive Benchmark for Code Information Retrieval Models | Jul 3, 2024 | BenchmarkingCode Search | CodeCode Available | 2 |
| FairMedFM: Fairness Benchmarking for Medical Imaging Foundation Models | Jul 1, 2024 | BenchmarkingFairness | CodeCode Available | 2 |
| Benchmarking Predictive Coding Networks -- Made Simple | Jul 1, 2024 | Benchmarking | CodeCode Available | 2 |
| MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations | Jul 1, 2024 | Benchmarkingdocument understanding | CodeCode Available | 2 |
| UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models | Jun 27, 2024 | AttributeBenchmarking | CodeCode Available | 2 |
| MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data | Jun 26, 2024 | BenchmarkingMath | CodeCode Available | 2 |
| GenRL: Multimodal-foundation world models for generalization in embodied agents | Jun 26, 2024 | BenchmarkingReinforcement Learning (RL) | CodeCode Available | 2 |
| Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA | Jun 25, 2024 | BenchmarkingLong-Context Understanding | CodeCode Available | 2 |
| From Perfect to Noisy World Simulation: Customizable Embodied Multi-modal Perturbations for SLAM Robustness Benchmarking | Jun 24, 2024 | BenchmarkingNeRF | CodeCode Available | 2 |
| DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation | Jun 24, 2024 | BenchmarkingImage Generation | CodeCode Available | 2 |
| FaceScore: Benchmarking and Enhancing Face Quality in Human Generation | Jun 24, 2024 | BenchmarkingDenoising | CodeCode Available | 2 |
| Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking | Jun 23, 2024 | Benchmarking | CodeCode Available | 2 |
| GenoTEX: An LLM Agent Benchmark for Automated Gene Expression Data Analysis | Jun 21, 2024 | AI AgentAutoML | CodeCode Available | 2 |
| Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph | Jun 21, 2024 | BenchmarkingText Generation | CodeCode Available | 2 |