| When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards | Feb 1, 2024 | Answer SelectionLanguage Modeling | CodeCode Available | 0 |
| An Information-Theoretic Approach to Analyze NLP Classification Tasks | Feb 1, 2024 | Multiple-choiceReading Comprehension | CodeCode Available | 0 |
| Evaluating LLM -- Generated Multimodal Diagnosis from Medical Images and Symptom Analysis | Jan 28, 2024 | Knowledge GraphsMedical Diagnosis | —Unverified | 0 |
| Towards Collective Superintelligence: Amplifying Group IQ using Conversational Swarms | Jan 25, 2024 | Decision MakingMultiple-choice | —Unverified | 0 |
| Instruction Fine-Tuning: Does Prompt Loss Matter? | Jan 24, 2024 | Multiple-choicetoken-classification | —Unverified | 0 |
| What Large Language Models Know and What People Think They Know | Jan 24, 2024 | ArticlesDecision Making | —Unverified | 0 |
| Towards Efficient Methods in Medical Question Answering using Knowledge Graph Embeddings | Jan 15, 2024 | Knowledge Graph EmbeddingsKnowledge Graphs | CodeCode Available | 0 |
| A Study on Large Language Models' Limitations in Multiple-Choice Question Answering | Jan 15, 2024 | Multiple-choiceQuestion Answering | CodeCode Available | 0 |
| Assessing Large Language Models in Mechanical Engineering Education: A Study on Mechanics-Focused Conceptual Understanding | Jan 13, 2024 | Multiple-choicePrompt Engineering | —Unverified | 0 |
| Automated Answer Validation using Text Similarity | Jan 13, 2024 | Information RetrievalMultiple-choice | —Unverified | 0 |