| Introducing Flexible Monotone Multiple Choice Item Response Theory Models and Bit Scales | Oct 2, 2024 | Multiple-choice | CodeCode Available | 0 |
| MedQA-CS: Benchmarking Large Language Models Clinical Skills Using an AI-SCE Framework | Oct 2, 2024 | BenchmarkingInstruction Following | CodeCode Available | 1 |
| DLP-LoRA: Efficient Task-Specific LoRA Fusion with a Dynamic, Lightweight Plugin for Large Language Models | Oct 2, 2024 | Multiple-choiceparameter-efficient fine-tuning | CodeCode Available | 0 |
| Language Enhanced Model for Eye (LEME): An Open-Source Ophthalmology-Specific Large Language Model | Oct 1, 2024 | AllLanguage Modeling | —Unverified | 0 |
| A Hitchhikers Guide to Fine-Grained Face Forgery Detection Using Common Sense Reasoning | Oct 1, 2024 | Common Sense ReasoningDeepFake Detection | CodeCode Available | 1 |
| Task-Adaptive Pretrained Language Models via Clustered-Importance Sampling | Sep 30, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs | Sep 30, 2024 | BenchmarkingMultiple-choice | —Unverified | 0 |
| Mitigating Selection Bias with Node Pruning and Auxiliary Options | Sep 27, 2024 | Multiple-choiceSelection bias | —Unverified | 0 |
| DisGeM: Distractor Generation for Multiple Choice Questions with Span Masking | Sep 26, 2024 | Distractor GenerationMultiple-choice | CodeCode Available | 0 |
| DARE: Diverse Visual Question Answering with Robustness Evaluation | Sep 26, 2024 | image-classificationImage Classification | —Unverified | 0 |