| Detect, Describe, Discriminate: Moving Beyond VQA for MLLM Evaluation | Sep 23, 2024 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| Evaluating the Performance and Robustness of LLMs in Materials Science Q&A and Property Predictions | Sep 22, 2024 | Band GapIn-Context Learning | —Unverified | 0 |
| QMOS: Enhancing LLMs for Telecommunication with Question Masked loss and Option Shuffling | Sep 21, 2024 | Multiple-choicePrompt Engineering | CodeCode Available | 0 |
| First Place Solution to the Multiple-choice Video QA Track of The Second Perception Test Challenge | Sep 20, 2024 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination | Sep 19, 2024 | General KnowledgeMMLU | —Unverified | 0 |
| Efficient Knowledge Distillation: Empowering Small Language Models with Teacher Model Insights | Sep 19, 2024 | Decision MakingKnowledge Distillation | —Unverified | 0 |
| Edu-Values: Towards Evaluating the Chinese Education Values of Large Language Models | Sep 19, 2024 | EthicsMultiple-choice | CodeCode Available | 0 |
| LLM-as-a-Judge & Reward Model: What They Can and Cannot Do | Sep 17, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Cracking the Code: Multi-domain LLM Evaluation on Real-World Professional Exams in Indonesia | Sep 13, 2024 | MathMultiple-choice | —Unverified | 0 |
| Exploring syntactic information in sentence embeddings through multilingual subject-verb agreement | Sep 10, 2024 | Multiple-choiceSentence | —Unverified | 0 |