| Reinforcement Learning with Graph Attention for Routing and Wavelength Assignment with Lightpath Reuse | Feb 20, 2025 | BenchmarkingGraph Attention | —Unverified | 0 |
| Line Goes Up? Inherent Limitations of Benchmarks for Evaluating Large Language Models | Feb 20, 2025 | Benchmarking | —Unverified | 0 |
| Building reliable sim driving agents by scaling self-play | Feb 20, 2025 | Autonomous VehiclesBenchmarking | CodeCode Available | 4 |
| Position: There are no Champions in Long-Term Time Series Forecasting | Feb 19, 2025 | BenchmarkingPosition | —Unverified | 0 |
| Benchmarking Self-Supervised Learning Methods for Accelerated MRI Reconstruction | Feb 19, 2025 | BenchmarkingMRI Reconstruction | CodeCode Available | 0 |
| Benchmarking LLMs for Political Science: A United Nations Perspective | Feb 19, 2025 | BenchmarkingDecision Making | CodeCode Available | 1 |
| A Baseline Method for Removing Invisible Image Watermarks using Deep Image Prior | Feb 19, 2025 | BenchmarkingMisinformation | —Unverified | 0 |
| GIMMICK -- Globally Inclusive Multimodal Multitask Cultural Knowledge Benchmarking | Feb 19, 2025 | Benchmarking | —Unverified | 0 |
| Benchmarking of Different YOLO Models for CAPTCHAs Detection and Classification | Feb 19, 2025 | Benchmarking | —Unverified | 0 |
| VITAL: A New Dataset for Benchmarking Pluralistic Alignment in Healthcare | Feb 19, 2025 | BenchmarkingDiversity | —Unverified | 0 |