| LCA-on-the-Line: Benchmarking Out-of-Distribution Generalization with Class Taxonomies | Jul 22, 2024 | BenchmarkingOut-of-Distribution Generalization | CodeCode Available | 1 |
| POGEMA: A Benchmark Platform for Cooperative Multi-Agent Pathfinding | Jul 20, 2024 | BenchmarkingHeuristic Search | CodeCode Available | 1 |
| Thinking Racial Bias in Fair Forgery Detection: Models, Datasets and Evaluations | Jul 19, 2024 | BenchmarkingFairness | CodeCode Available | 1 |
| Restore Anything Model via Efficient Degradation Adaptation | Jul 18, 2024 | 5-Degradation Blind All-in-One Image RestorationBenchmarking | CodeCode Available | 1 |
| SKADA-Bench: Benchmarking Unsupervised Domain Adaptation Methods with Realistic Validation On Diverse Modalities | Jul 16, 2024 | BenchmarkingDomain Adaptation | CodeCode Available | 1 |
| Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models | Jul 16, 2024 | BenchmarkingCode Generation | CodeCode Available | 1 |
| Separable Operator Networks | Jul 15, 2024 | BenchmarkingGPU | CodeCode Available | 1 |
| When Heterophily Meets Heterogeneity: Challenges and a New Large-Scale Graph Benchmark | Jul 15, 2024 | BenchmarkingGraph Learning | CodeCode Available | 1 |
| CIBench: Evaluating Your LLMs with a Code Interpreter Plugin | Jul 15, 2024 | Benchmarking | CodeCode Available | 1 |
| OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling | Jul 13, 2024 | BenchmarkingMath | CodeCode Available | 1 |