| On the Reasoning Capacity of AI Models and How to Quantify It | Jan 23, 2025 | MemorizationMMLU | —Unverified | 0 |
| The AI Penalization Effect: People Reduce Compensation for Workers Who Use AI | Jan 22, 2025 | Multiple-choice | —Unverified | 0 |
| Patent Figure Classification using Large Vision-language Models | Jan 22, 2025 | ClassificationFew-Shot Learning | CodeCode Available | 0 |
| Generating Plausible Distractors for Multiple-Choice Questions via Student Choice Prediction | Jan 21, 2025 | Distractor GenerationMisconceptions | —Unverified | 0 |
| MedS^3: Towards Medical Small Language Models with Self-Evolved Slow Thinking | Jan 21, 2025 | Multiple-choice | CodeCode Available | 2 |
| Can Multimodal LLMs do Visual Temporal Understanding and Reasoning? The answer is No! | Jan 18, 2025 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| FaceXBench: Evaluating Multimodal LLMs on Face Understanding | Jan 17, 2025 | FairnessMultiple-choice | CodeCode Available | 1 |
| Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong | Jan 16, 2025 | Multiple-choice | —Unverified | 0 |
| Empowering Large Language Models in Wireless Communication: A Novel Dataset and Fine-Tuning Framework | Jan 16, 2025 | Multiple-choiceQuestion Generation | —Unverified | 0 |
| Vision-Language Models Do Not Understand Negation | Jan 16, 2025 | Multiple-choiceNegation | —Unverified | 0 |