| IntPhys 2: Benchmarking Intuitive Physics Understanding In Complex Synthetic Environments | Jun 11, 2025 | Benchmarking | CodeCode Available | 2 |
| FedVLMBench: Benchmarking Federated Fine-Tuning of Vision-Language Models | Jun 11, 2025 | BenchmarkingFederated Learning | —Unverified | 0 |
| Attention, Please! Revisiting Attentive Probing for Masked Image Modeling | Jun 11, 2025 | BenchmarkingComputational Efficiency | CodeCode Available | 1 |
| A Manually Annotated Image-Caption Dataset for Detecting Children in the Wild | Jun 11, 2025 | Age EstimationBenchmarking | CodeCode Available | 0 |
| GRAIL: A Benchmark for GRaph ActIve Learning in Dynamic Sensing Environments | Jun 11, 2025 | Active LearningBenchmarking | —Unverified | 0 |
| Graph Attention-based Decentralized Actor-Critic for Dual-Objective Control of Multi-UAV Swarms | Jun 10, 2025 | BenchmarkingGraph Attention | —Unverified | 0 |
| scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell Data | Jun 10, 2025 | BenchmarkingData Augmentation | CodeCode Available | 1 |
| CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmark of Large Language Models in Mental Health Counseling | Jun 10, 2025 | Benchmarking | CodeCode Available | 1 |
| AraReasoner: Evaluating Reasoning-Based LLMs for Arabic NLP | Jun 10, 2025 | BenchmarkingSentiment Analysis | —Unverified | 0 |
| Large Language Models Have Intrinsic Meta-Cognition, but Need a Good Lens | Jun 10, 2025 | BenchmarkingMathematical Reasoning | —Unverified | 0 |