| Self-Evaluation Improves Selective Generation in Large Language Models | Dec 14, 2023 | Multiple-choiceTruthfulQA | —Unverified | 0 |
| A Foundational Multimodal Vision Language AI Assistant for Human Pathology | Dec 13, 2023 | Decision MakingDiagnostic | —Unverified | 0 |
| A Comparative Study of AI-Generated (GPT-4) and Human-crafted MCQs in Programming Education | Dec 5, 2023 | Multiple-choice | —Unverified | 0 |
| Unleashing the Potential of Large Language Model: Zero-shot VQA for Flood Disaster Scenario | Dec 4, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Explanatory Argument Extraction of Correct Answers in Resident Medical Exams | Dec 1, 2023 | Multiple-choice | CodeCode Available | 0 |
| Evaluating the Rationale Understanding of Critical Reasoning in Logical Reading Comprehension | Nov 30, 2023 | Multiple-choiceReading Comprehension | —Unverified | 0 |
| CLOMO: Counterfactual Logical Modification with Large Language Models | Nov 29, 2023 | counterfactualCounterfactual Reasoning | CodeCode Available | 0 |
| ConceptPsy:A Benchmark Suite with Conceptual Comprehensiveness in Psychology | Nov 16, 2023 | MMLUMultiple-choice | —Unverified | 0 |
| Investigating Data Contamination in Modern Benchmarks for Large Language Models | Nov 16, 2023 | Common Sense ReasoningMMLU | —Unverified | 0 |
| Downstream Trade-offs of a Family of Text Watermarks | Nov 16, 2023 | FormLanguage Modelling | CodeCode Available | 0 |