| Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla | Jul 18, 2023 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| Benchmarking Bias in Large Language Models during Role-Playing | Nov 1, 2024 | BenchmarkingFairness | —Unverified | 0 |
| Document-level Event Factuality Identification via Machine Reading Comprehension Frameworks with Transfer Learning | Oct 1, 2022 | Data AugmentationMachine Reading Comprehension | —Unverified | 0 |
| DMind Benchmark: Toward a Holistic Assessment of LLM Capabilities across the Web3 Domain | Apr 18, 2025 | Multiple-choice | —Unverified | 0 |
| A Corpus of Text Data and Gaze Fixations from Autistic and Non-Autistic Adults | May 1, 2016 | Multiple-choicePOS | —Unverified | 0 |
| DiverseNet: When One Right Answer is not Enough | Aug 24, 2020 | Multiple-choiceStructured Prediction | —Unverified | 0 |
| Being Negative but Constructively: Lessons Learnt from Creating Better Visual Question Answering Datasets | Apr 24, 2017 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| Language Models (Mostly) Know What They Know | Jul 11, 2022 | Multiple-choice | —Unverified | 0 |
| LAR-ECHR: A New Legal Argument Reasoning Task and Dataset for Cases of the European Court of Human Rights | Oct 17, 2024 | Legal ReasoningMultiple-choice | —Unverified | 0 |
| Distributional semantics beyond words: Supervised learning of analogy and paraphrase | Oct 18, 2013 | Multiple-choiceTask 2 | —Unverified | 0 |