| M4-SAR: A Multi-Resolution, Multi-Polarization, Multi-Scene, Multi-Source Dataset and Benchmark for Optical-SAR Fusion Object Detection | May 16, 2025 | Benchmarkingobject-detection | CodeCode Available | 1 |
| MatTools: Benchmarking Large Language Models for Materials Science Tools | May 16, 2025 | BenchmarkingQuestion Answering | CodeCode Available | 1 |
| Evaluating Robustness of Deep Reinforcement Learning for Autonomous Surface Vehicle Control in Field Tests | May 15, 2025 | BenchmarkingDeep Reinforcement Learning | CodeCode Available | 1 |
| Words That Unite The World: A Unified Framework for Deciphering Central Bank Communications Globally | May 15, 2025 | BenchmarkingSentence | CodeCode Available | 1 |
| OpenLKA: An Open Dataset of Lane Keeping Assist from Recent Car Models under Real-world Driving Conditions | May 14, 2025 | Autonomous DrivingBenchmarking | CodeCode Available | 1 |
| Towards scalable surrogate models based on Neural Fields for large scale aerodynamic simulations | May 14, 2025 | Benchmarking | CodeCode Available | 1 |
| Benchmarking AI scientists in omics data-driven biological research | May 13, 2025 | BenchmarkingMultiple-choice | CodeCode Available | 1 |
| JaxRobotarium: Training and Deploying Multi-Robot Policies in 10 Minutes | May 10, 2025 | BenchmarkingGPU | CodeCode Available | 1 |
| FNBench: Benchmarking Robust Federated Learning against Noisy Labels | May 10, 2025 | BenchmarkingFederated Learning | CodeCode Available | 1 |
| Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action Environments | May 8, 2025 | BenchmarkingPrompt Engineering | CodeCode Available | 1 |