| Latxa: An Open Language Model and Evaluation Suite for Basque | Mar 29, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Assessing the Chemical Intelligence of Large Language Models | May 12, 2025 | Multiple-choice | CodeCode Available | 1 | 5 |
| Let Androids Dream of Electric Sheep: A Human-like Image Implication Understanding and Reasoning Framework | May 22, 2025 | Multiple-choiceVisual Question Answering (VQA) | CodeCode Available | 1 | 5 |
| Leveraging Large Language Models for Learning Complex Legal Concepts through Storytelling | Feb 26, 2024 | Multiple-choice | CodeCode Available | 1 | 5 |
| LibriSQA: A Novel Dataset and Framework for Spoken Question Answering with Large Language Models | Aug 20, 2023 | Multiple-choiceQuestion Answering | CodeCode Available | 1 | 5 |
| LifeQA: A Real-life Dataset for Video Question Answering | May 1, 2020 | Multiple-choiceQuestion Answering | CodeCode Available | 1 | 5 |
| A Hitchhikers Guide to Fine-Grained Face Forgery Detection Using Common Sense Reasoning | Oct 1, 2024 | Common Sense ReasoningDeepFake Detection | CodeCode Available | 1 | 5 |
| FarsTail: A Persian Natural Language Inference Dataset | Sep 18, 2020 | Multiple-choiceNatural Language Inference | CodeCode Available | 1 | 5 |
| FaceXBench: Evaluating Multimodal LLMs on Face Understanding | Jan 17, 2025 | FairnessMultiple-choice | CodeCode Available | 1 | 5 |
| Fake Alignment: Are LLMs Really Aligned Well? | Nov 10, 2023 | Multiple-choice | CodeCode Available | 1 | 5 |