| Benchmarking LLMs' Swarm intelligence | May 7, 2025 | Benchmarking | CodeCode Available | 1 |
| Combinatorial Optimization with Policy Adaptation using Latent Space Search | Nov 13, 2023 | BenchmarkingCombinatorial Optimization | CodeCode Available | 1 |
| Data Generating Process to Evaluate Causal Discovery Techniques for Time Series Data | Apr 16, 2021 | BenchmarkingCausal Discovery | CodeCode Available | 1 |
| Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data | Feb 27, 2024 | Benchmarking | CodeCode Available | 1 |
| Benchmarking Low-Shot Robustness to Natural Distribution Shifts | Apr 21, 2023 | Benchmarking | CodeCode Available | 1 |
| Are we really making much progress? Revisiting, benchmarking, and refining heterogeneous graph neural networks | Dec 30, 2021 | BenchmarkingHeterogeneous Node Classification | CodeCode Available | 1 |
| From Claims to Evidence: A Unified Framework and Critical Analysis of CNN vs. Transformer vs. Mamba in Medical Image Segmentation | Mar 3, 2025 | BenchmarkingComputational Efficiency | CodeCode Available | 1 |
| Are We There Yet? Evaluating State-of-the-Art Neural Network based Geoparsers Using EUPEG as a Benchmarking Platform | Jul 15, 2020 | ArticlesBenchmarking | CodeCode Available | 1 |
| Deep Learning-Based Synchronization for Uplink NB-IoT | May 22, 2022 | BenchmarkingDeep Learning | CodeCode Available | 1 |
| AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents | Apr 9, 2024 | Benchmarking | CodeCode Available | 1 |
| DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4 | Mar 20, 2023 | BenchmarkingDe-identification | CodeCode Available | 1 |
| Benchmarking Large Language Models for News Summarization | Jan 31, 2023 | BenchmarkingNews Summarization | CodeCode Available | 1 |
| Benchmarking machine learning models on multi-centre eICU critical care dataset | Oct 2, 2019 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 1 |
| 3D Common Corruptions and Data Augmentation | Mar 2, 2022 | BenchmarkingData Augmentation | CodeCode Available | 1 |
| Demystifying Learning Rate Policies for High Accuracy Training of Deep Neural Networks | Aug 18, 2019 | BenchmarkingImage Classification | CodeCode Available | 1 |
| Depth-Driven Geometric Prompt Learning for Laparoscopic Liver Landmark Detection | Jun 25, 2024 | BenchmarkingPrompt Learning | CodeCode Available | 1 |
| Benchmarking Multi-Scene Fire and Smoke Detection | Oct 22, 2024 | Benchmarking | CodeCode Available | 1 |
| CODEBench: A Neural Architecture and Hardware Accelerator Co-Design Framework | Dec 7, 2022 | Benchmarking | CodeCode Available | 1 |
| Benchmarking Meaning Representations in Neural Semantic Parsing | Nov 1, 2020 | BenchmarkingSemantic Parsing | CodeCode Available | 1 |
| ARLBench: Flexible and Efficient Benchmarking for Hyperparameter Optimization in Reinforcement Learning | Sep 27, 2024 | AutoMLBenchmarking | CodeCode Available | 1 |
| Benchmarking Meta-embeddings: What Works and What Does Not | Nov 1, 2021 | BenchmarkingEmbeddings Evaluation | CodeCode Available | 1 |
| AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios | Oct 25, 2024 | BenchmarkingDiversity | CodeCode Available | 1 |
| Benchmarking Micro-action Recognition: Dataset, Methods, and Applications | Mar 8, 2024 | Action RecognitionBenchmarking | CodeCode Available | 1 |
| DFGC 2022: The Second DeepFake Game Competition | Jun 30, 2022 | BenchmarkingFace Swapping | CodeCode Available | 1 |
| CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code Generation | Feb 26, 2025 | BenchmarkingCode Generation | CodeCode Available | 1 |