| Can AI Master Construction Management (CM)? Benchmarking State-of-the-Art Large Language Models on CM Certification Exams | Apr 4, 2025 | BenchmarkingManagement | —Unverified | 0 |
| EQUATOR: A Deterministic Framework for Evaluating LLM Reasoning with Open-Ended Questions. # v1.0.0-beta | Dec 31, 2024 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation | Jan 1, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| ExplanationLP: Abductive Reasoning for Explainable Science Question Answering | Oct 25, 2020 | Answer SelectionARC | —Unverified | 0 |
| Can ChatGPT pass the Vietnamese National High School Graduation Examination? | Jun 15, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Answering questions by learning to rank -- Learning to rank by answering questions | Sep 2, 2019 | ARCLearning-To-Rank | —Unverified | 0 |
| Explore then Determine: A GNN-LLM Synergy Framework for Reasoning over Knowledge Graph | Jun 3, 2024 | Knowledge GraphsMultiple-choice | —Unverified | 0 |
| Can Crowdsourcing be used for Effective Annotation of Arabic? | May 1, 2014 | Entity ResolutionMultiple-choice | —Unverified | 0 |
| Generalised Winograd Schema and its Contextuality | Aug 31, 2023 | coreference-resolutionCoreference Resolution | —Unverified | 0 |
| Enhancing Multiple-Choice Question Answering with Causal Knowledge | Jun 1, 2021 | Multiple-choiceQuestion Answering | —Unverified | 0 |