| Enterprise Large Language Model Evaluation Benchmark | Jun 25, 2025 | Language Model EvaluationLanguage Modeling | —Unverified | 0 |
| Biomed-Enriched: A Biomedical Dataset Enriched with LLMs for Pretraining and Extracting Rare and Hidden Content | Jun 25, 2025 | ArticlesContinual Pretraining | —Unverified | 0 |
| Gazal-R1: Achieving State-of-the-Art Medical Reasoning with Parameter-Efficient Two-Stage Training | Jun 18, 2025 | MedQAMMLU | —Unverified | 0 |
| Slimming Down LLMs Without Losing Their Minds | Jun 12, 2025 | Computational EfficiencyGSM8K | —Unverified | 0 |
| Surface Fairness, Deep Bias: A Comparative Study of Bias in Language Models | Jun 12, 2025 | FairnessMMLU | —Unverified | 0 |
| MoE-GPS: Guidlines for Prediction Strategy for Dynamic Expert Duplication in MoE Load Balancing | Jun 9, 2025 | GPUMixture-of-Experts | —Unverified | 0 |
| From Threat to Tool: Leveraging Refusal-Aware Injection Attacks for Safety Alignment | Jun 7, 2025 | ARCMMLU | —Unverified | 0 |
| Automatic Robustness Stress Testing of LLMs as Mathematical Problem Solvers | Jun 5, 2025 | GSM8KMath | —Unverified | 0 |
| GEM: Empowering LLM for both Embedding Generation and Language Understanding | Jun 4, 2025 | DecoderLarge Language Model | —Unverified | 0 |
| Do Language Models Mirror Human Confidence? Exploring Psychological Insights to Address Overconfidence in LLMs | May 31, 2025 | MMLU | CodeCode Available | 0 |