| SSR: Alignment-Aware Modality Connector for Speech Language Models | Sep 30, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Correctness Coverage Evaluation for Medical Multiple-Choice Question Answering Based on the Enhanced Conformal Prediction Framework | Mar 7, 2025 | Conformal PredictionMedical Question Answering | —Unverified | 0 | 0 |
| Step Guided Reasoning: Improving Mathematical Reasoning using Guidance Generation and Step Reasoning | Oct 18, 2024 | MathMathematical Reasoning | —Unverified | 0 | 0 |
| SuperBPE: Space Travel for Language Models | Mar 17, 2025 | Inductive BiasMMLU | —Unverified | 0 | 0 |
| Surface Fairness, Deep Bias: A Comparative Study of Bias in Language Models | Jun 12, 2025 | FairnessMMLU | —Unverified | 0 | 0 |
| SUTRA: Scalable Multilingual Language Model Architecture | May 7, 2024 | Computational EfficiencyHallucination | —Unverified | 0 | 0 |
| Swallowing the Poison Pills: Insights from Vulnerability Disparity Among LLMs | Feb 23, 2025 | Data PoisoningDiagnostic | —Unverified | 0 | 0 |
| Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning | Mar 7, 2025 | GPUMath | —Unverified | 0 | 0 |
| Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models | Oct 9, 2023 | MMLU | —Unverified | 0 | 0 |
| TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise | Oct 29, 2023 | Data AugmentationLanguage Modeling | —Unverified | 0 | 0 |
| The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback | Oct 31, 2023 | GSM8KMMLU | —Unverified | 0 | 0 |
| The Base-Rate Effect on LLM Benchmark Performance: Disambiguating Test-Taking Strategies from Benchmark Performance | Jun 17, 2024 | counterfactualMMLU | —Unverified | 0 | 0 |
| The Claude 3 Model Family: Opus, Sonnet, Haiku | Mar 4, 2024 | 1 Image, 2*2 StitchingArithmetic Reasoning | —Unverified | 0 | 0 |
| The Poison of Alignment | Aug 25, 2023 | MMLU | —Unverified | 0 | 0 |
| The Vulnerability of Language Model Benchmarks: Do They Accurately Reflect True LLM Performance? | Dec 2, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Uncovering Latent Chain of Thought Vectors in Language Models | Sep 21, 2024 | ARCGSM8K | —Unverified | 0 | 0 |
| Tokenization Standards for Linguistic Integrity: Turkish as a Benchmark | Feb 10, 2025 | MMLUMorphological Analysis | —Unverified | 0 | 0 |
| Towards Multilingual LLM Evaluation for European Languages | Oct 11, 2024 | ARCGSM8K | —Unverified | 0 | 0 |
| Towards Fully Exploiting LLM Internal States to Enhance Knowledge Boundary Perception | Feb 17, 2025 | MMLUNatural Questions | —Unverified | 0 | 0 |
| Towards Uncertainty-Aware Language Agent | Jan 25, 2024 | MMLUStrategyQA | —Unverified | 0 | 0 |
| Transcending Scaling Laws with 0.1% Extra Compute | Oct 20, 2022 | Arithmetic ReasoningCross-Lingual Question Answering | —Unverified | 0 | 0 |
| Transferable text data distillation by trajectory matching | Apr 14, 2025 | ARCLarge Language Model | —Unverified | 0 | 0 |
| Triangulating LLM Progress through Benchmarks, Games, and Cognitive Tests | Feb 20, 2025 | Logical ReasoningMMLU | —Unverified | 0 | 0 |
| Understanding Finetuning for Factual Knowledge Extraction | Jun 20, 2024 | MMLUQuestion Answering | —Unverified | 0 | 0 |
| Universality of Layer-Level Entropy-Weighted Quantization Beyond Model Architecture and Size | Mar 6, 2025 | MMLUQuantization | —Unverified | 0 | 0 |