| NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual Updates | Oct 28, 2024 | Benchmarking | CodeCode Available | 0 |
| A comparison of translation performance between DeepL and Supertext | Feb 4, 2025 | BenchmarkingMachine Translation | CodeCode Available | 0 |
| Benchmarking Multimodal RAG through a Chart-based Document Question-Answering Generation Framework | Feb 20, 2025 | BenchmarkingQuestion Answering | CodeCode Available | 0 |
| Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program | Apr 9, 2025 | Benchmarking | CodeCode Available | 0 |
| Benchmarking Machine Translation with Cultural Awareness | May 23, 2023 | BenchmarkingIn-Context Learning | CodeCode Available | 0 |
| Benchmarking Multilabel Topic Classification in the Kyrgyz Language | Aug 30, 2023 | BenchmarkingClassification | CodeCode Available | 0 |
| Unsupervised Tracklet Person Re-Identification | Mar 1, 2019 | BenchmarkingDomain Adaptation | CodeCode Available | 0 |
| Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning | Nov 15, 2019 | BenchmarkingDiversity | CodeCode Available | 0 |
| TMPNN: High-Order Polynomial Regression Based on Taylor Map Factorization | Jul 30, 2023 | BenchmarkingMulti-target regression | CodeCode Available | 0 |
| Nmbr9 as a Constraint Programming Challenge | Jan 13, 2020 | BenchmarkingBoard Games | CodeCode Available | 0 |