| Coarse-to-Fine Q-attention with Learned Path Ranking | Apr 4, 2022 | Benchmarking | CodeCode Available | 1 | 5 |
| Element-aware Summarization with Large Language Models: Expert-aligned Evaluation and Chain-of-Thought Method | May 22, 2023 | BenchmarkingHallucination | CodeCode Available | 1 | 5 |
| CODEBench: A Neural Architecture and Hardware Accelerator Co-Design Framework | Dec 7, 2022 | Benchmarking | CodeCode Available | 1 | 5 |
| KPI-EDGAR: A Novel Dataset and Accompanying Metric for Relation Extraction from Financial Documents | Oct 17, 2022 | BenchmarkingJoint Entity and Relation Extraction | CodeCode Available | 1 | 5 |
| Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations | Mar 21, 2024 | BenchmarkingMemorization | CodeCode Available | 1 | 5 |
| COCO: The Large Scale Black-Box Optimization Benchmarking (bbob-largescale) Test Suite | Mar 15, 2019 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Meaning Representations in Neural Semantic Parsing | Nov 1, 2020 | BenchmarkingSemantic Parsing | CodeCode Available | 1 | 5 |
| Benchmarking Chinese Text Recognition: Datasets, Baselines, and an Empirical Study | Dec 30, 2021 | AttributeBenchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Classical and Learning-Based Multibeam Point Cloud Registration | May 10, 2024 | BenchmarkingPoint Cloud Registration | CodeCode Available | 1 | 5 |
| Benchmarking Meta-embeddings: What Works and What Does Not | Nov 1, 2021 | BenchmarkingEmbeddings Evaluation | CodeCode Available | 1 | 5 |