| LEAVS: An LLM-based Labeler for Abdominal CT Supervision | Mar 17, 2025 | AnatomyLarge Language Model | CodeCode Available | 0 |
| MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research | Mar 17, 2025 | ArticlesBenchmarking | CodeCode Available | 1 |
| Chat-TS: Enhancing Multi-Modal Reasoning Over Time-Series and Natural Language Data | Mar 13, 2025 | Large Language ModelMath | —Unverified | 0 |
| It is Too Many Options: Pitfalls of Multiple-Choice Questions in Generative AI and Medical Education | Mar 13, 2025 | Multiple-choice | —Unverified | 0 |
| The Impact of Item-Writing Flaws on Difficulty and Discrimination in Item Response Theory | Mar 13, 2025 | MathMultiple-choice | —Unverified | 0 |
| SeqSAM: Autoregressive Multiple Hypothesis Prediction for Medical Image Segmentation using SAM | Mar 12, 2025 | Image SegmentationMedical Image Segmentation | CodeCode Available | 0 |
| Mellow: a small audio language model for reasoning | Mar 11, 2025 | Audio captioningLanguage Modeling | CodeCode Available | 2 |
| Identity Lock: Locking API Fine-tuned LLMs With Identity-based Wake Words | Mar 10, 2025 | Multiple-choice | —Unverified | 0 |
| VisBias: Measuring Explicit and Implicit Social Biases in Vision Language Models | Mar 10, 2025 | Image DescriptionMultiple-choice | CodeCode Available | 0 |
| Social Bias Benchmark for Generation: A Comparison of Generation and QA-Based Evaluations | Mar 10, 2025 | FormMultiple-choice | —Unverified | 0 |