| Benchmarking Deep Learning and Vision Foundation Models for Atypical vs. Normal Mitosis Classification with Cross-Dataset Evaluation | Jun 26, 2025 | BenchmarkingTransfer Learning | CodeCode Available | 0 |
| mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at Scale | Jun 26, 2025 | Anomaly DetectionBenchmarking | CodeCode Available | 0 |
| Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge | Jun 26, 2025 | Benchmarking | —Unverified | 0 |
| FeDa4Fair: Client-Level Federated Datasets for Fairness Evaluation | Jun 26, 2025 | AttributeBenchmarking | —Unverified | 0 |
| FixCLR: Negative-Class Contrastive Learning for Semi-Supervised Domain Generalization | Jun 25, 2025 | BenchmarkingContrastive Learning | —Unverified | 0 |
| Benchmarking Unsupervised Strategies for Anomaly Detection in Multivariate Time Series | Jun 25, 2025 | Anomaly DetectionBenchmarking | CodeCode Available | 0 |
| Multimodal Information Retrieval for Open World with Edit Distance Weak Supervision | Jun 25, 2025 | BenchmarkingInformation Retrieval | —Unverified | 0 |
| scMamba: A Scalable Foundation Model for Single-Cell Multi-Omics Integration Beyond Highly Variable Feature Selection | Jun 25, 2025 | BenchmarkingContrastive Learning | —Unverified | 0 |
| A Survey of Predictive Maintenance Methods: An Analysis of Prognostics via Classification and Regression | Jun 25, 2025 | BenchmarkingManagement | —Unverified | 0 |
| HRIBench: Benchmarking Vision-Language Models for Real-Time Human Perception in Human-Robot Interaction | Jun 25, 2025 | BenchmarkingPerson Identification | CodeCode Available | 0 |
| AI-Driven MRI-based Brain Tumour Segmentation Benchmarking | Jun 25, 2025 | BenchmarkingImage Segmentation | —Unverified | 0 |
| BrokenVideos: A Benchmark Dataset for Fine-Grained Artifact Localization in AI-Generated Videos | Jun 25, 2025 | Artifact DetectionBenchmarking | —Unverified | 0 |
| inMOTIFin: a lightweight end-to-end simulation software for regulatory sequences | Jun 25, 2025 | Benchmarking | CodeCode Available | 0 |
| MultiHuman-Testbench: Benchmarking Image Generation for Multiple Humans | Jun 25, 2025 | Action DetectionBenchmarking | —Unverified | 0 |
| Quantitative Benchmarking of Anomaly Detection Methods in Digital Pathology | Jun 24, 2025 | Anomaly DetectionArtifact Detection | —Unverified | 0 |
| MDR-DeePC: Model-Inspired Distributionally Robust Data-Enabled Predictive Control | Jun 24, 2025 | Benchmarking | —Unverified | 0 |
| QHackBench: Benchmarking Large Language Models for Quantum Code Generation Using PennyLane Hackathon Challenges | Jun 24, 2025 | BenchmarkingCode Generation | —Unverified | 0 |
| Staining normalization in histopathology: Method benchmarking using multicenter dataset | Jun 23, 2025 | Benchmarking | —Unverified | 0 |
| Simulation-Based Sensitivity Analysis in Optimal Treatment Regimes and Causal Decomposition with Individualized Interventions | Jun 23, 2025 | BenchmarkingSensitivity | —Unverified | 0 |
| Generalizing Vision-Language Models to Novel Domains: A Comprehensive Survey | Jun 23, 2025 | BenchmarkingSurvey | —Unverified | 0 |
| Benchmarking Music Generation Models and Metrics via Human Preference Studies | Jun 23, 2025 | BenchmarkingMusic Generation | —Unverified | 0 |
| Survey of HPC in US Research Institutions | Jun 23, 2025 | BenchmarkingGPU | —Unverified | 0 |
| Benchmarking histopathology foundation models in a multi-center dataset for skin cancer subtyping | Jun 23, 2025 | BenchmarkingDiversity | CodeCode Available | 0 |
| Statistical Multicriteria Evaluation of LLM-Generated Text | Jun 22, 2025 | BenchmarkingDiversity | CodeCode Available | 0 |
| On the Robustness of Human-Object Interaction Detection against Distribution Shift | Jun 22, 2025 | BenchmarkingData Augmentation | —Unverified | 0 |