| CODEMENV: Benchmarking Large Language Models on Code Migration | Jun 1, 2025 | Benchmarking | CodeCode Available | 1 |
| Curious Hierarchical Actor-Critic Reinforcement Learning | May 7, 2020 | BenchmarkingHierarchical Reinforcement Learning | CodeCode Available | 1 |
| Are Vision Language Models Ready for Clinical Diagnosis? A 3D Medical Benchmark for Tumor-centric Visual Question Answering | May 25, 2025 | AnatomyBenchmarking | CodeCode Available | 1 |
| D2S: Document-to-Slide Generation Via Query-Based Text Summarization | May 8, 2021 | BenchmarkingLong Form Question Answering | CodeCode Available | 1 |
| Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift | Dec 15, 2022 | BenchmarkingImage Captioning | CodeCode Available | 1 |
| Benchmarking LLMs' Swarm intelligence | May 7, 2025 | Benchmarking | CodeCode Available | 1 |
| Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions | Feb 28, 2024 | BenchmarkingMultiple-choice | CodeCode Available | 1 |
| Benchmarking Multimodal Variational Autoencoders: CdSprites+ Dataset and Toolkit | Sep 7, 2022 | Benchmarking | CodeCode Available | 1 |
| Benchmarking Multi-Scene Fire and Smoke Detection | Oct 22, 2024 | Benchmarking | CodeCode Available | 1 |
| Benchmarking Large Language Models on Controllable Generation under Diversified Instructions | Jan 1, 2024 | BenchmarkingInstruction Following | CodeCode Available | 1 |