| Coarse-to-Fine Q-attention with Learned Path Ranking | Apr 4, 2022 | Benchmarking | CodeCode Available | 1 |
| CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial Optimization | Apr 6, 2025 | BenchmarkingCombinatorial Optimization | CodeCode Available | 1 |
| A Comparative Attention Framework for Better Few-Shot Object Detection on Aerial Images | Oct 25, 2022 | BenchmarkingFew-Shot Object Detection | CodeCode Available | 1 |
| ArtFID: Quantitative Evaluation of Neural Style Transfer | Jul 25, 2022 | BenchmarkingMeta-Learning | CodeCode Available | 1 |
| Benchmarking LLMs for Political Science: A United Nations Perspective | Feb 19, 2025 | BenchmarkingDecision Making | CodeCode Available | 1 |
| COCO: The Large Scale Black-Box Optimization Benchmarking (bbob-largescale) Test Suite | Mar 15, 2019 | Benchmarking | CodeCode Available | 1 |
| CombiBench: Benchmarking LLM Capability for Combinatorial Mathematics | May 6, 2025 | Benchmarking | CodeCode Available | 1 |
| CriticBench: Benchmarking LLMs for Critique-Correct Reasoning | Feb 22, 2024 | Benchmarking | CodeCode Available | 1 |
| DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios | Oct 31, 2024 | BenchmarkingLLM-generated Text Detection | CodeCode Available | 1 |
| ClearPose: Large-scale Transparent Object Dataset and Benchmark | Mar 8, 2022 | BenchmarkingDepth Completion | CodeCode Available | 1 |