| Statistical Multicriteria Evaluation of LLM-Generated Text | Jun 22, 2025 | BenchmarkingDiversity | CodeCode Available | 0 |
| Leveling the Playing Field: Carefully Comparing Classical and Learned Controllers for Quadrotor Trajectory Tracking | Jun 21, 2025 | BenchmarkingReinforcement Learning (RL) | —Unverified | 0 |
| Universal Music Representations? Evaluating Foundation Models on World Music Corpora | Jun 20, 2025 | BenchmarkingFew-Shot Learning | CodeCode Available | 0 |
| A Comparative Analysis of Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) as Dimensionality Reduction Techniques | Jun 20, 2025 | BenchmarkingDimensionality Reduction | —Unverified | 0 |
| OSWorld-Human: Benchmarking the Efficiency of Computer-Use Agents | Jun 19, 2025 | Benchmarking | —Unverified | 0 |
| Spotting tell-tale visual artifacts in face swapping videos: strengths and pitfalls of CNN detectors | Jun 19, 2025 | BenchmarkingFace Swapping | —Unverified | 0 |
| Finance Language Model Evaluation (FLaME) | Jun 18, 2025 | BenchmarkingLanguage Model Evaluation | —Unverified | 0 |
| PGLib-CO2: A Power Grid Library for Computing and Optimizing Carbon Emissions | Jun 17, 2025 | Benchmarking | —Unverified | 0 |
| A large-scale heterogeneous 3D magnetic resonance brain imaging dataset for self-supervised learning | Jun 17, 2025 | BenchmarkingSelf-Supervised Learning | —Unverified | 0 |
| ImpliRet: Benchmarking the Implicit Fact Retrieval Challenge | Jun 17, 2025 | BenchmarkingRetrieval | CodeCode Available | 0 |