| Mind the XAI Gap: A Human-Centered LLM Framework for Democratizing Explainable AI | Jun 13, 2025 | BenchmarkingIn-Context Learning | CodeCode Available | 0 |
| crossMoDA Challenge: Evolution of Cross-Modality Domain Adaptation Techniques for Vestibular Schwannoma and Cochlea Segmentation from 2021 to 2023 | Jun 13, 2025 | BenchmarkingDomain Adaptation | —Unverified | 0 |
| Benchmarking Multimodal LLMs on Recognition and Understanding over Chemical Tables | Jun 13, 2025 | BenchmarkingDescriptive | —Unverified | 0 |
| OIBench: Benchmarking Strong Reasoning Models with Olympiad in Informatics | Jun 12, 2025 | Benchmarking | —Unverified | 0 |
| HyBiomass: Global Hyperspectral Imagery Benchmark Dataset for Evaluating Geospatial Foundation Models in Forest Aboveground Biomass Estimation | Jun 12, 2025 | Benchmarking | —Unverified | 0 |
| Primender Sequence: A Novel Mathematical Construct for Testing Symbolic Inference and AI Reasoning | Jun 12, 2025 | Benchmarking | —Unverified | 0 |
| Sum Rate Maximization for Pinching Antennas Assisted RSMA System With Multiple Waveguides | Jun 12, 2025 | Benchmarking | —Unverified | 0 |
| FedVLMBench: Benchmarking Federated Fine-Tuning of Vision-Language Models | Jun 11, 2025 | BenchmarkingFederated Learning | —Unverified | 0 |
| HopaDIFF: Holistic-Partial Aware Fourier Conditioned Diffusion for Referring Human Action Segmentation in Multi-Person Scenarios | Jun 11, 2025 | Action RecognitionAction Segmentation | CodeCode Available | 0 |
| ScholarSearch: Benchmarking Scholar Searching Ability of LLMs | Jun 11, 2025 | BenchmarkingInformation Retrieval | —Unverified | 0 |
| ICE-ID: A Novel Historical Census Data Benchmark Comparing NARS against LLMs, \& a ML Ensemble on Longitudinal Identity Resolution | Jun 11, 2025 | Benchmarking | —Unverified | 0 |
| Bench to the Future: A Pastcasting Benchmark for Forecasting Agents | Jun 11, 2025 | Benchmarking | —Unverified | 0 |
| Reasoning as a Resource: Optimizing Fast and Slow Thinking in Code Generation Models | Jun 11, 2025 | BenchmarkingCode Generation | —Unverified | 0 |
| GRAIL: A Benchmark for GRaph ActIve Learning in Dynamic Sensing Environments | Jun 11, 2025 | Active LearningBenchmarking | —Unverified | 0 |
| A Manually Annotated Image-Caption Dataset for Detecting Children in the Wild | Jun 11, 2025 | Age EstimationBenchmarking | CodeCode Available | 0 |
| Large Language Models Have Intrinsic Meta-Cognition, but Need a Good Lens | Jun 10, 2025 | BenchmarkingMathematical Reasoning | —Unverified | 0 |
| Graph Attention-based Decentralized Actor-Critic for Dual-Objective Control of Multi-UAV Swarms | Jun 10, 2025 | BenchmarkingGraph Attention | —Unverified | 0 |
| AraReasoner: Evaluating Reasoning-Based LLMs for Arabic NLP | Jun 10, 2025 | BenchmarkingSentiment Analysis | —Unverified | 0 |
| Solving excited states for long-range interacting trapped ions with neural networks | Jun 10, 2025 | Benchmarking | —Unverified | 0 |
| Benchmarking Foundation Speech and Language Models for Alzheimer's Disease and Related Dementia Detection from Spontaneous Speech | Jun 9, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Ensuring Reliability of Curated EHR-Derived Data: The Validation of Accuracy for LLM/ML-Extracted Information and Data (VALID) Framework | Jun 9, 2025 | BenchmarkingFairness | —Unverified | 0 |
| GradEscape: A Gradient-Based Evader Against AI-Generated Text Detectors | Jun 9, 2025 | BenchmarkingModel extraction | —Unverified | 0 |
| Benchmarking Pre-Trained Time Series Models for Electricity Price Forecasting | Jun 9, 2025 | BenchmarkingDecision Making | —Unverified | 0 |
| The Catechol Benchmark: Time-series Solvent Selection Data for Few-shot Machine Learning | Jun 9, 2025 | Active LearningBenchmarking | CodeCode Available | 0 |
| Generative Models at the Frontier of Compression: A Survey on Generative Face Video Coding | Jun 9, 2025 | BenchmarkingVideo Compression | —Unverified | 0 |