| HATS: Hindi Analogy Test Set for Evaluating Reasoning in Large Language Models | Jul 17, 2025 | Multiple-choice | —Unverified | 0 | 0 |
| Have Large Language Models Developed a Personality?: Applicability of Self-Assessment Tests in Measuring Personality in LLMs | May 24, 2023 | Multiple-choice | —Unverified | 0 | 0 |
| Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive? | Jun 6, 2024 | Multiple-choiceQuestion Answering | —Unverified | 0 | 0 |
| Analyzing the Performance of ChatGPT in Cardiology and Vascular Pathologies | Apr 15, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information | May 9, 2025 | BenchmarkingForm | —Unverified | 0 | 0 |
| HFL-RC System at SemEval-2018 Task 11: Hybrid Multi-Aspects Model for Commonsense Reading Comprehension | Mar 15, 2018 | Multiple-choiceReading Comprehension | —Unverified | 0 | 0 |
| Hierarchical Divide-and-Conquer for Fine-Grained Alignment in LLM-Based Medical Evaluation | Jan 12, 2025 | AttributeMultiple-choice | —Unverified | 0 | 0 |
| HindiLLM: Large Language Model for Hindi | Dec 29, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Analyzing Multiple-Choice Reading and Listening Comprehension Tests | Jul 3, 2023 | Multiple-choiceReading Comprehension | —Unverified | 0 | 0 |
| How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites | Apr 25, 2024 | 4kLanguage Modeling | —Unverified | 0 | 0 |