| Solving excited states for long-range interacting trapped ions with neural networks | Jun 10, 2025 | Benchmarking | —Unverified | 0 |
| Benchmarking Foundation Speech and Language Models for Alzheimer's Disease and Related Dementia Detection from Spontaneous Speech | Jun 9, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| The Catechol Benchmark: Time-series Solvent Selection Data for Few-shot Machine Learning | Jun 9, 2025 | Active LearningBenchmarking | CodeCode Available | 0 |
| Generative Models at the Frontier of Compression: A Survey on Generative Face Video Coding | Jun 9, 2025 | BenchmarkingVideo Compression | —Unverified | 0 |
| REMoH: A Reflective Evolution of Multi-objective Heuristics approach via Large Language Models | Jun 9, 2025 | BenchmarkingDecision Making | —Unverified | 0 |
| SurgBench: A Unified Large-Scale Benchmark for Surgical Video Analysis | Jun 9, 2025 | Action ClassificationBenchmarking | —Unverified | 0 |
| HuSc3D: Human Sculpture dataset for 3D object reconstruction | Jun 9, 2025 | 3D Object Reconstruction3D Reconstruction | CodeCode Available | 0 |
| RADAR: Benchmarking Language Models on Imperfect Tabular Data | Jun 9, 2025 | BenchmarkingMissing Values | CodeCode Available | 1 |
| Ensuring Reliability of Curated EHR-Derived Data: The Validation of Accuracy for LLM/ML-Extracted Information and Data (VALID) Framework | Jun 9, 2025 | BenchmarkingFairness | —Unverified | 0 |
| SOP-Bench: Complex Industrial SOPs for Evaluating LLM Agents | Jun 9, 2025 | BenchmarkingSynthetic Data Generation | —Unverified | 0 |