| VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation | Jun 25, 2024 | ARCBenchmarking | CodeCode Available | 0 |
| Steering Without Side Effects: Improving Post-Deployment Control of Language Models | Jun 21, 2024 | Red TeamingTruthfulQA | CodeCode Available | 0 |
| Enhancing Language Model Factuality via Activation-Based Confidence Calibration and Guided Decoding | Jun 19, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models | May 31, 2024 | TriviaQATruthfulQA | CodeCode Available | 0 |
| Multi-Reference Preference Optimization for Large Language Models | May 26, 2024 | GSM8KTruthfulQA | —Unverified | 0 |
| Machine Unlearning in Large Language Models | May 24, 2024 | Machine UnlearningTruthfulQA | CodeCode Available | 1 |
| Instruction Tuning With Loss Over Instructions | May 23, 2024 | HumanEvalMMLU | CodeCode Available | 1 |
| RLHF Workflow: From Reward Modeling to Online RLHF | May 13, 2024 | ChatbotHumanEval | CodeCode Available | 5 |
| Harmonic LLMs are Trustworthy | Apr 30, 2024 | HallucinationTruthfulQA | —Unverified | 0 |
| Student Data Paradox and Curious Case of Single Student-Tutor Model: Regressive Side Effects of Training LLMs for Personalized Learning | Apr 23, 2024 | ARCCommon Sense Reasoning | —Unverified | 0 |