| Causality-enhanced Decision-Making for Autonomous Mobile Robots in Dynamic Environments | Apr 16, 2025 | BenchmarkingCausal Inference | CodeCode Available | 0 | 5 |
| Benchmarking Apache Spark and Hadoop MapReduce on Big Data Classification | Sep 21, 2022 | BenchmarkingManagement | CodeCode Available | 0 | 5 |
| Benchmarking and Enhancing LLM Agents in Localizing Linux Kernel Bugs | May 26, 2025 | BenchmarkingFault localization | CodeCode Available | 0 | 5 |
| PATH: A Discrete-sequence Dataset for Evaluating Online Unsupervised Anomaly Detection Approaches for Multivariate Time Series | Nov 21, 2024 | Anomaly DetectionBenchmarking | CodeCode Available | 0 | 5 |
| DyKnow: Dynamically Verifying Time-Sensitive Factual Knowledge in LLMs | Apr 10, 2024 | Benchmarkingknowledge editing | CodeCode Available | 0 | 5 |
| Benchmarking and Confidence Evaluation of LALMs For Temporal Reasoning | May 19, 2025 | Benchmarking | CodeCode Available | 0 | 5 |
| Is Your Model Fairly Certain? Uncertainty-Aware Fairness Evaluation for LLMs | May 29, 2025 | BenchmarkingFairness | CodeCode Available | 0 | 5 |
| ISImed: A Framework for Self-Supervised Learning using Intrinsic Spatial Information in Medical Images | Oct 22, 2024 | BenchmarkingSelf-Supervised Learning | CodeCode Available | 0 | 5 |
| Anchor Points: Benchmarking Models with Much Fewer Examples | Sep 14, 2023 | BenchmarkingLanguage Modeling | CodeCode Available | 0 | 5 |
| Benchmarking a transformer-FREE model for ad-hoc retrieval | Apr 1, 2021 | BenchmarkingCPU | CodeCode Available | 0 | 5 |
| An Auditing Test To Detect Behavioral Shift in Language Models | Oct 25, 2024 | BenchmarkingChange Detection | CodeCode Available | 0 | 5 |
| IoT Data Trust Evaluation via Machine Learning | Aug 15, 2023 | BenchmarkingTime Series | CodeCode Available | 0 | 5 |
| VitaGraph: Building a Knowledge Graph for Biologically Relevant Learning Tasks | May 16, 2025 | BenchmarkingLink Prediction | CodeCode Available | 0 | 5 |
| IPC: A Benchmark Data Set for Learning with Graph-Structured Data | May 15, 2019 | BenchmarkingGraph Classification | CodeCode Available | 0 | 5 |
| Capsule Vision 2024 Challenge: Multi-Class Abnormality Classification for Video Capsule Endoscopy | Aug 9, 2024 | BenchmarkingMedical Image Analysis | CodeCode Available | 0 | 5 |
| Learning collective multi-cellular dynamics from temporal scRNA-seq via a transformer-enhanced Neural SDE | May 22, 2025 | BenchmarkingTime Series | CodeCode Available | 0 | 5 |
| InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions | Oct 18, 2023 | BenchmarkingVisual Grounding | CodeCode Available | 0 | 5 |
| An Analyst-Inspector Framework for Evaluating Reproducibility of LLMs in Data Science | Feb 23, 2025 | BenchmarkingCode Generation | CodeCode Available | 0 | 5 |
| Investigating the Impact of Hard Samples on Accuracy Reveals In-class Data Imbalance | Sep 22, 2024 | AutoMLBenchmarking | CodeCode Available | 0 | 5 |
| Can Tree Based Approaches Surpass Deep Learning in Anomaly Detection? A Benchmarking Study | Feb 11, 2024 | Anomaly DetectionBenchmarking | CodeCode Available | 0 | 5 |
| Inverse Contextual Bandits: Learning How Behavior Evolves over Time | Jul 13, 2021 | BenchmarkingDecision Making | CodeCode Available | 0 | 5 |
| CityNet: A Comprehensive Multi-Modal Urban Dataset for Advanced Research in Urban Computing | Jun 30, 2021 | BenchmarkingTransfer Learning | CodeCode Available | 0 | 5 |
| City-Scale Road Audit System using Deep Learning | Nov 26, 2018 | BenchmarkingDeep Learning | CodeCode Available | 0 | 5 |
| Cityscape-Adverse: Benchmarking Robustness of Semantic Segmentation with Realistic Scene Modifications via Diffusion-Based Image Editing | Nov 1, 2024 | BenchmarkingSemantic Segmentation | CodeCode Available | 0 | 5 |
| IOLBENCH: Benchmarking LLMs on Linguistic Reasoning | Jan 8, 2025 | Benchmarking | CodeCode Available | 0 | 5 |