| Probing Acoustic Representations for Phonetic Properties | Oct 25, 2020 | Benchmarkingspeech-recognition | CodeCode Available | 0 |
| Probing Conceptual Understanding of Large Visual-Language Models | Apr 7, 2023 | Benchmarking | CodeCode Available | 0 |
| Probing Critical Learning Dynamics of PLMs for Hate Speech Detection | Feb 3, 2024 | BenchmarkingHate Speech Detection | CodeCode Available | 0 |
| Using Color To Identify Insider Threats | Nov 25, 2021 | Benchmarking | CodeCode Available | 0 |
| An Exploration of Exploration: Measuring the ability of lexicase selection to find obscure pathways to optimality | Jul 20, 2021 | BenchmarkingDiagnostic | CodeCode Available | 0 |
| Towards Computational Performance Engineering for Unsupervised Concept Drift Detection -- Complexities, Benchmarking, Performance Analysis | Apr 17, 2023 | BenchmarkingDrift Detection | CodeCode Available | 0 |
| Transfer Learning between Motor Imagery Datasets using Deep Learning -- Validation of Framework and Comparison of Datasets | Sep 4, 2023 | BenchmarkingMotor Imagery | CodeCode Available | 0 |
| Synth4bench: a framework for generating synthetic genomics data for the evaluation of tumor-only somatic variant calling algorithms | Mar 8, 2024 | BenchmarkingSynthetic Data Generation | CodeCode Available | 0 |
| Process Extraction from Text: Benchmarking the State of the Art and Paving the Way for Future Challenges | Oct 7, 2021 | BenchmarkingModel extraction | CodeCode Available | 0 |
| Transfer Learning for Prosthetics Using Imitation Learning | Jan 15, 2019 | BenchmarkingImitation Learning | CodeCode Available | 0 |
| Benchmarking datasets for Anomaly-based Network Intrusion Detection: KDD CUP 99 alternatives | Nov 13, 2018 | BenchmarkingIntrusion Detection | CodeCode Available | 0 |
| Synthetic Datasets for Machine Learning on Spatio-Temporal Graphs using PDEs | Feb 6, 2025 | BenchmarkingEpidemiology | CodeCode Available | 0 |
| Synthetic location trajectory generation using categorical diffusion models | Feb 19, 2024 | BenchmarkingDecision Making | CodeCode Available | 0 |
| Synthetic Porous Microstructures: Automatic Design, Simulation, and Permeability Analysis | Feb 20, 2025 | Benchmarking | CodeCode Available | 0 |
| Synthetic Time Series Forecasting with Transformer Architectures: Extensive Simulation Benchmarks | May 26, 2025 | BenchmarkingDecision Making Under Uncertainty | CodeCode Available | 0 |
| An Experimental Study of the Transferability of Spectral Graph Networks | Dec 18, 2020 | BenchmarkingGeneral Classification | CodeCode Available | 0 |
| Comparison Performance of Spectrogram and Scalogram as Input of Acoustic Recognition Task | Mar 6, 2024 | Benchmarking | CodeCode Available | 0 |
| Can LLMs replace Neil deGrasse Tyson? Evaluating the Reliability of LLMs as Science Communicators | Sep 21, 2024 | Benchmarking | CodeCode Available | 0 |
| Benchmarking Data Heterogeneity Evaluation Approaches for Personalized Federated Learning | Oct 9, 2024 | BenchmarkingFairness | CodeCode Available | 0 |
| Comparing Machine Learning Algorithms by Union-Free Generic Depth | Dec 20, 2023 | Benchmarking | CodeCode Available | 0 |
| SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists | Aug 30, 2024 | BenchmarkingSentiment Analysis | CodeCode Available | 0 |
| Transformation-Interaction-Rational Representation for Symbolic Regression | Apr 25, 2022 | BenchmarkingForm | CodeCode Available | 0 |
| Towards Enhancing Fault Tolerance in Neural Networks | Jul 6, 2019 | Benchmarking | CodeCode Available | 0 |
| Robust Model-Based Optimization for Challenging Fitness Landscapes | May 23, 2023 | Benchmarkingmodel | CodeCode Available | 0 |
| Air Learning: A Deep Reinforcement Learning Gym for Autonomous Aerial Robot Visual Navigation | Jun 2, 2019 | BenchmarkingDeep Reinforcement Learning | CodeCode Available | 0 |
| Transformers for Green Semantic Communication: Less Energy, More Semantics | Oct 11, 2023 | BenchmarkingCPU | CodeCode Available | 0 |
| Benchmarking Data Efficiency in Δ-ML and Multifidelity Models for Quantum Chemistry | Oct 15, 2024 | Benchmarking | CodeCode Available | 0 |
| ViP: Video Platform for PyTorch | Oct 7, 2019 | BenchmarkingVideo Understanding | CodeCode Available | 0 |
| PsOCR: Benchmarking Large Multimodal Models for Optical Character Recognition in Low-resource Pashto Language | May 15, 2025 | BenchmarkingOptical Character Recognition | CodeCode Available | 0 |
| Comparative Study Between Distance Measures On Supervised Optimum-Path Forest Classification | Feb 8, 2022 | Anomaly DetectionBenchmarking | CodeCode Available | 0 |
| Towards Efficient Synchronous Federated Training: A Survey on System Optimization Strategies | Sep 9, 2021 | BenchmarkingFederated Learning | CodeCode Available | 0 |
| Which Model to Trust: Assessing the Influence of Models on the Performance of Reinforcement Learning Algorithms for Continuous Control Tasks | Oct 25, 2021 | Benchmarkingcontinuous-control | CodeCode Available | 0 |
| Benchmarking Automated Clinical Language Simplification: Dataset, Algorithm, and Evaluation | Dec 4, 2020 | BenchmarkingMachine Translation | CodeCode Available | 0 |
| Comparative Analysis: Violence Recognition from Videos using Transfer Learning | Aug 26, 2024 | Action RecognitionBenchmarking | CodeCode Available | 0 |
| Multi-Source Knowledge Pruning for Retrieval-Augmented Generation: A Benchmark and Empirical Study | Sep 3, 2024 | BenchmarkingHallucination | CodeCode Available | 0 |
| Benchmarking community drug response prediction models: datasets, models, tools, and metrics for cross-dataset generalization analysis | Mar 18, 2025 | BenchmarkingDrug Response Prediction | CodeCode Available | 0 |
| Compact Trilinear Interaction for Visual Question Answering | Sep 26, 2019 | BenchmarkingKnowledge Distillation | CodeCode Available | 0 |
| Benchmarking Classic and Learned Navigation in Complex 3D Environments | Jan 30, 2019 | Benchmarking | CodeCode Available | 0 |
| An Experimental Evaluation of Imputation Models for Spatial-Temporal Traffic Data | Dec 6, 2024 | BenchmarkingImputation | CodeCode Available | 0 |
| Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation Models | Jun 15, 2024 | BenchmarkingData Augmentation | CodeCode Available | 0 |
| VisFactor: Benchmarking Fundamental Visual Cognition in Multimodal Large Language Models | Feb 23, 2025 | BenchmarkingSpatial Reasoning | CodeCode Available | 0 |
| ColorGrid: A Multi-Agent Non-Stationary Environment for Goal Inference and Assistance | Jan 17, 2025 | BenchmarkingMulti-agent Reinforcement Learning | CodeCode Available | 0 |
| CODES: Benchmarking Coupled ODE Surrogates | Oct 28, 2024 | BenchmarkingUncertainty Quantification | CodeCode Available | 0 |
| CodeS: Towards Code Model Generalization Under Distribution Shift | Jun 11, 2022 | BenchmarkingCode Classification | CodeCode Available | 0 |
| Code Ownership in Open-Source AI Software Security | Dec 18, 2023 | Benchmarking | CodeCode Available | 0 |
| Benchmarking ChatGPT on Algorithmic Reasoning | Apr 4, 2024 | Benchmarking | CodeCode Available | 0 |
| COCO: Performance Assessment | May 11, 2016 | Benchmarking | CodeCode Available | 0 |
| Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2) | Apr 5, 2024 | Benchmarking | CodeCode Available | 0 |
| Benchmarking ChatGPT-4 on ACR Radiation Oncology In-Training (TXIT) Exam and Red Journal Gray Zone Cases: Potentials and Challenges for AI-Assisted Medical Education and Decision Making in Radiation Oncology | Apr 24, 2023 | BenchmarkingDecision Making | CodeCode Available | 0 |
| Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs | May 21, 2025 | BenchmarkingQuestion Answering | CodeCode Available | 0 |