| Efficiently Deploying LLMs with Controlled Risk | Oct 3, 2024 | MMLUTruthfulQA | —Unverified | 0 |
| Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs | Sep 30, 2024 | ARCDiversity | —Unverified | 0 |
| Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models | Sep 7, 2024 | MMLUTruthfulQA | —Unverified | 0 |
| Lower Layer Matters: Alleviating Hallucination via Multi-Layer Fusion Contrastive Decoding with Truthfulness Refocused | Aug 16, 2024 | HallucinationTruthfulQA | —Unverified | 0 |
| LokiLM: Technical Report | Jul 10, 2024 | Knowledge DistillationLanguage Modeling | —Unverified | 0 |
| metabench -- A Sparse Benchmark to Measure General Ability in Large Language Models | Jul 4, 2024 | ARCGSM8K | CodeCode Available | 0 |
| VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation | Jun 25, 2024 | ARCBenchmarking | CodeCode Available | 0 |
| Steering Without Side Effects: Improving Post-Deployment Control of Language Models | Jun 21, 2024 | Red TeamingTruthfulQA | CodeCode Available | 0 |
| Enhancing Language Model Factuality via Activation-Based Confidence Calibration and Guided Decoding | Jun 19, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models | May 31, 2024 | TriviaQATruthfulQA | CodeCode Available | 0 |