| FragXsiteDTI: Revealing Responsible Segments in Drug-Target Interaction with Transformer-Driven Interpretation | Nov 4, 2023 | BenchmarkingDrug Discovery | CodeCode Available | 1 | 5 |
| AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios | Oct 25, 2024 | BenchmarkingDiversity | CodeCode Available | 1 | 5 |
| FreeMan: Towards Benchmarking 3D Human Pose Estimation under Real-World Conditions | Sep 10, 2023 | 3D Human Pose Estimation3D Pose Estimation | CodeCode Available | 1 | 5 |
| ARLBench: Flexible and Efficient Benchmarking for Hyperparameter Optimization in Reinforcement Learning | Sep 27, 2024 | AutoMLBenchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Recommendation, Classification, and Tracing Based on Hugging Face Knowledge Graph | May 23, 2025 | BenchmarkingManagement | CodeCode Available | 1 | 5 |
| 3D Common Corruptions and Data Augmentation | Mar 2, 2022 | BenchmarkingData Augmentation | CodeCode Available | 1 | 5 |
| Continual Learning with Foundation Models: An Empirical Study of Latent Replay | Apr 30, 2022 | BenchmarkingContinual Learning | CodeCode Available | 1 | 5 |
| AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents | Apr 9, 2024 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Quantized Neural Networks on FPGAs with FINN | Feb 2, 2021 | BenchmarkingQuantization | CodeCode Available | 1 | 5 |
| Foundation Model of Electronic Medical Records for Adaptive Risk Estimation | Feb 10, 2025 | Benchmarking | CodeCode Available | 1 | 5 |
| fseval: A Benchmarking Framework for Feature Selection and Feature Ranking Algorithms | Nov 23, 2022 | Automated Feature EngineeringBenchmarking | CodeCode Available | 1 | 5 |
| Are We There Yet? Evaluating State-of-the-Art Neural Network based Geoparsers Using EUPEG as a Benchmarking Platform | Jul 15, 2020 | ArticlesBenchmarking | CodeCode Available | 1 | 5 |
| Are we really making much progress? Revisiting, benchmarking, and refining heterogeneous graph neural networks | Dec 30, 2021 | BenchmarkingHeterogeneous Node Classification | CodeCode Available | 1 | 5 |
| From Claims to Evidence: A Unified Framework and Critical Analysis of CNN vs. Transformer vs. Mamba in Medical Image Segmentation | Mar 3, 2025 | BenchmarkingComputational Efficiency | CodeCode Available | 1 | 5 |
| AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios | May 22, 2025 | BenchmarkingInstruction Following | CodeCode Available | 1 | 5 |
| Benchmarking emergency department triage prediction models with machine learning and large public electronic health records | Nov 22, 2021 | Benchmarking | CodeCode Available | 1 | 5 |
| Should we be going MAD? A Look at Multi-Agent Debate Strategies for LLMs | Nov 29, 2023 | Benchmarking | CodeCode Available | 1 | 5 |
| Are Vision Language Models Ready for Clinical Diagnosis? A 3D Medical Benchmark for Tumor-centric Visual Question Answering | May 25, 2025 | AnatomyBenchmarking | CodeCode Available | 1 | 5 |
| ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis | Mar 9, 2021 | BenchmarkingClassification | CodeCode Available | 1 | 5 |
| 3D AffordanceNet: A Benchmark for Visual Object Affordance Understanding | Mar 30, 2021 | Affordance DetectionBenchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Reinforcement Learning Techniques for Autonomous Navigation | Oct 10, 2022 | Autonomous NavigationBenchmarking | CodeCode Available | 1 | 5 |
| Formalizing Multimedia Recommendation through Multimodal Deep Learning | Sep 11, 2023 | BenchmarkingDeep Learning | CodeCode Available | 1 | 5 |
| FTNet: Feature Transverse Network for Thermal Image Semantic Segmentation | Oct 26, 2021 | BenchmarkingScene Segmentation | CodeCode Available | 1 | 5 |
| Flames: Benchmarking Value Alignment of LLMs in Chinese | Nov 12, 2023 | BenchmarkingFairness | CodeCode Available | 1 | 5 |
| FM-Planner: Foundation Model Guided Path Planning for Autonomous Drone Navigation | May 27, 2025 | BenchmarkingDecision Making | CodeCode Available | 1 | 5 |