| GUI-Robust: A Comprehensive Dataset for Testing GUI Agent Robustness in Real-World Anomalies | Jun 17, 2025 | Benchmarking | CodeCode Available | 1 | 5 |
| Towards Motion Forecasting with Real-World Perception Inputs: Are End-to-End Approaches Competitive? | Jun 15, 2023 | Autonomous DrivingAutonomous Vehicles | CodeCode Available | 1 | 5 |
| A Reinforcement Learning Environment for Multi-Service UAV-enabled Wireless Systems | May 11, 2021 | BenchmarkingEdge-computing | CodeCode Available | 1 | 5 |
| CharacterBench: Benchmarking Character Customization of Large Language Models | Dec 16, 2024 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Simulation-Based Inference | Jan 12, 2021 | Benchmarking | CodeCode Available | 1 | 5 |
| CheX-GPT: Harnessing Large Language Models for Enhanced Chest X-ray Report Labeling | Jan 21, 2024 | Benchmarking | CodeCode Available | 1 | 5 |
| Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-based Hate | Aug 12, 2021 | Benchmarking | CodeCode Available | 1 | 5 |
| Histo-Genomic Knowledge Distillation For Cancer Prognosis From Histopathology Whole Slide Images | Mar 15, 2024 | BenchmarkingKnowledge Distillation | CodeCode Available | 1 | 5 |
| Benchmarking Language Models for Code Syntax Understanding | Oct 26, 2022 | Benchmarking | CodeCode Available | 1 | 5 |
| TextEE: Benchmark, Reevaluation, Reflections, and Future Challenges in Event Extraction | Nov 16, 2023 | BenchmarkingEvent Extraction | CodeCode Available | 1 | 5 |