| GPT as Knowledge Worker: A Zero-Shot Evaluation of (AI)CPA Capabilities | Jan 11, 2023 | Multiple-choice | CodeCode Available | 1 | 5 |
| Complex Reasoning over Logical Queries on Commonsense Knowledge Graphs | Mar 12, 2024 | Knowledge GraphsMultiple-choice | CodeCode Available | 1 | 5 |
| Annealed Winner-Takes-All for Motion Forecasting | Sep 17, 2024 | AllAutonomous Driving | CodeCode Available | 1 | 5 |
| CHOICE: Benchmarking the Remote Sensing Capabilities of Large Vision-Language Models | Nov 27, 2024 | BenchmarkingEarth Observation | CodeCode Available | 1 | 5 |
| An Open Source Data Contamination Report for Large Language Models | Oct 26, 2023 | HellaSwagLanguage Modeling | CodeCode Available | 1 | 5 |
| Constructing Narrative Event Evolutionary Graph for Script Event Prediction | May 14, 2018 | Graph Neural NetworkMultiple-choice | CodeCode Available | 1 | 5 |
| Ranked Voting based Self-Consistency of Large Language Models | May 16, 2025 | Multiple-choiceOpen-Ended Question Answering | CodeCode Available | 1 | 5 |
| Fine-tuning Multimodal Large Language Models for Product Bundling | Jul 16, 2024 | In-Context LearningMultiple-choice | CodeCode Available | 1 | 5 |
| CUPCase: Clinically Uncommon Patient Cases and Diagnoses Dataset | Mar 8, 2025 | Multiple-choice | CodeCode Available | 1 | 5 |
| CC-Riddle: A Question Answering Dataset of Chinese Character Riddles | Jun 28, 2022 | General KnowledgeLanguage Modelling | CodeCode Available | 1 | 5 |