| ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis | Mar 9, 2021 | BenchmarkingClassification | CodeCode Available | 1 | 5 |
| Continual Learning with Foundation Models: An Empirical Study of Latent Replay | Apr 30, 2022 | BenchmarkingContinual Learning | CodeCode Available | 1 | 5 |
| Are We There Yet? Evaluating State-of-the-Art Neural Network based Geoparsers Using EUPEG as a Benchmarking Platform | Jul 15, 2020 | ArticlesBenchmarking | CodeCode Available | 1 | 5 |
| Are we really making much progress? Revisiting, benchmarking, and refining heterogeneous graph neural networks | Dec 30, 2021 | BenchmarkingHeterogeneous Node Classification | CodeCode Available | 1 | 5 |
| From Claims to Evidence: A Unified Framework and Critical Analysis of CNN vs. Transformer vs. Mamba in Medical Image Segmentation | Mar 3, 2025 | BenchmarkingComputational Efficiency | CodeCode Available | 1 | 5 |
| FM-Planner: Foundation Model Guided Path Planning for Autonomous Drone Navigation | May 27, 2025 | BenchmarkingDecision Making | CodeCode Available | 1 | 5 |
| AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios | May 22, 2025 | BenchmarkingInstruction Following | CodeCode Available | 1 | 5 |
| FM-TS: Flow Matching for Time Series Generation | Nov 12, 2024 | BenchmarkingImputation | CodeCode Available | 1 | 5 |
| FNBench: Benchmarking Robust Federated Learning against Noisy Labels | May 10, 2025 | BenchmarkingFederated Learning | CodeCode Available | 1 | 5 |
| Should we be going MAD? A Look at Multi-Agent Debate Strategies for LLMs | Nov 29, 2023 | Benchmarking | CodeCode Available | 1 | 5 |