| HyBiomass: Global Hyperspectral Imagery Benchmark Dataset for Evaluating Geospatial Foundation Models in Forest Aboveground Biomass Estimation | Jun 12, 2025 | Benchmarking | —Unverified | 0 |
| Primender Sequence: A Novel Mathematical Construct for Testing Symbolic Inference and AI Reasoning | Jun 12, 2025 | Benchmarking | —Unverified | 0 |
| ScholarSearch: Benchmarking Scholar Searching Ability of LLMs | Jun 11, 2025 | BenchmarkingInformation Retrieval | —Unverified | 0 |
| HopaDIFF: Holistic-Partial Aware Fourier Conditioned Diffusion for Referring Human Action Segmentation in Multi-Person Scenarios | Jun 11, 2025 | Action RecognitionAction Segmentation | CodeCode Available | 0 |
| FedVLMBench: Benchmarking Federated Fine-Tuning of Vision-Language Models | Jun 11, 2025 | BenchmarkingFederated Learning | —Unverified | 0 |
| Bench to the Future: A Pastcasting Benchmark for Forecasting Agents | Jun 11, 2025 | Benchmarking | —Unverified | 0 |
| ICE-ID: A Novel Historical Census Data Benchmark Comparing NARS against LLMs, \& a ML Ensemble on Longitudinal Identity Resolution | Jun 11, 2025 | Benchmarking | —Unverified | 0 |
| Reasoning as a Resource: Optimizing Fast and Slow Thinking in Code Generation Models | Jun 11, 2025 | BenchmarkingCode Generation | —Unverified | 0 |
| GRAIL: A Benchmark for GRaph ActIve Learning in Dynamic Sensing Environments | Jun 11, 2025 | Active LearningBenchmarking | —Unverified | 0 |
| A Manually Annotated Image-Caption Dataset for Detecting Children in the Wild | Jun 11, 2025 | Age EstimationBenchmarking | CodeCode Available | 0 |