| PRobELM: Plausibility Ranking Evaluation for Language Models | Apr 4, 2024 | Question AnsweringTruthfulQA | —Unverified | 0 |
| Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs | Sep 30, 2024 | ARCDiversity | —Unverified | 0 |
| Reducing LLM Hallucinations using Epistemic Neural Networks | Dec 25, 2023 | TruthfulQA | —Unverified | 0 |
| Student Data Paradox and Curious Case of Single Student-Tutor Model: Regressive Side Effects of Training LLMs for Personalized Learning | Apr 23, 2024 | ARCCommon Sense Reasoning | —Unverified | 0 |
| Sample, Don't Search: Rethinking Test-Time Alignment for Language Models | Apr 4, 2025 | GSM8KMathematical Reasoning | —Unverified | 0 |
| Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models | Sep 7, 2024 | MMLUTruthfulQA | —Unverified | 0 |
| Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models | Feb 12, 2025 | Mathematical ReasoningMMLU | —Unverified | 0 |
| Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation | Feb 14, 2024 | TruthfulQA | —Unverified | 0 |
| Self-Evaluation Improves Selective Generation in Large Language Models | Dec 14, 2023 | Multiple-choiceTruthfulQA | —Unverified | 0 |
| Semantic Consistency for Assuring Reliability of Large Language Models | Aug 17, 2023 | Question AnsweringText Generation | —Unverified | 0 |