| Large Language Models for Outpatient Referral: Problem Definition, Benchmarking and Challenges | Mar 11, 2025 | Benchmarking | CodeCode Available | 0 |
| Reference Matters: Benchmarking Factual Error Correction for Dialogue Summarization with Fine-grained Evaluation Framework | Jun 8, 2023 | Benchmarking | CodeCode Available | 0 |
| KArSL: Arabic Sign Language Database | Jan 1, 2021 | BenchmarkingSign Language Recognition | CodeCode Available | 0 |
| Can AI Validate Science? Benchmarking LLMs for Accurate Scientific Claim Evidence Reasoning | Jun 9, 2025 | BenchmarkingDiagnostic | CodeCode Available | 0 |
| JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models | Jun 10, 2024 | BenchmarkingCode Generation | CodeCode Available | 0 |
| TGB-Seq Benchmark: Challenging Temporal GNNs with Complex Sequential Dynamics | Feb 5, 2025 | BenchmarkingLink Prediction | CodeCode Available | 0 |
| Refining Joint Text and Source Code Embeddings for Retrieval Task with Parameter-Efficient Fine-Tuning | May 7, 2024 | BenchmarkingContrastive Learning | CodeCode Available | 0 |
| KamNet: An Integrated Spatiotemporal Deep Neural Network for Rare Event Search in KamLAND-Zen | Mar 3, 2022 | Benchmarking | CodeCode Available | 0 |
| Joint Multi-Scale Tone Mapping and Denoising for HDR Image Enhancement | Mar 16, 2023 | BenchmarkingDemosaicking | CodeCode Available | 0 |
| Ref-Long: Benchmarking the Long-context Referencing Capability of Long-context Language Models | Jul 13, 2025 | AttributeBenchmarking | CodeCode Available | 0 |