| TAXI: Evaluating Categorical Knowledge Editing for Language Models | Apr 23, 2024 | knowledge editingMultiple-choice | CodeCode Available | 0 |
| WiCkeD: A Simple Method to Make Multiple Choice Benchmarks More Challenging | Feb 25, 2025 | MMLUMultiple-choice | CodeCode Available | 0 |
| What Makes Reading Comprehension Questions Easier? | Aug 28, 2018 | Machine Reading ComprehensionMultiple-choice | CodeCode Available | 0 |
| Downstream Trade-offs of a Family of Text Watermarks | Nov 16, 2023 | FormLanguage Modelling | CodeCode Available | 0 |
| Teach2Eval: An Indirect Evaluation Method for LLM by Judging How It Teaches | May 18, 2025 | FairnessMemorization | CodeCode Available | 0 |
| A multimodal dataset for understanding the impact of mobile phones on remote online virtual education | Dec 13, 2024 | EEGHead Pose Estimation | CodeCode Available | 0 |
| Utilize the Flow before Stepping into the Same River Twice: Certainty Represented Knowledge Flow for Refusal-Aware Instruction Tuning | Oct 9, 2024 | HallucinationMultiple-choice | CodeCode Available | 0 |
| Can Model Uncertainty Function as a Proxy for Multiple-Choice Question Item Difficulty? | Jul 7, 2024 | Multiple-choice | CodeCode Available | 0 |
| Differentiating Choices via Commonality for Multiple-Choice Question Answering | Aug 21, 2024 | Multiple-choiceMultiple Choice Question Answering (MCQA) | CodeCode Available | 0 |
| Utilizing Background Knowledge for Robust Reasoning over Traffic Situations | Dec 4, 2022 | Knowledge GraphsMultiple-choice | CodeCode Available | 0 |