| A test suite of prompt injection attacks for LLM-based machine translation | Oct 7, 2024 | Machine TranslationTranslation | CodeCode Available | 0 | 5 |
| Measuring Reliability of Large Language Models through Semantic Consistency | Nov 10, 2022 | Text GenerationTruthfulQA | CodeCode Available | 0 | 5 |
| Instruction Tuning with Human Curriculum | Oct 14, 2023 | ARCMMLU | CodeCode Available | 0 | 5 |
| Steering Without Side Effects: Improving Post-Deployment Control of Language Models | Jun 21, 2024 | Red TeamingTruthfulQA | CodeCode Available | 0 | 5 |
| Multi-Agent Reinforcement Learning with Focal Diversity Optimization | Feb 6, 2025 | DiversityMulti-agent Reinforcement Learning | CodeCode Available | 0 | 5 |
| metabench -- A Sparse Benchmark to Measure General Ability in Large Language Models | Jul 4, 2024 | ARCGSM8K | CodeCode Available | 0 | 5 |
| NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language Models | Oct 11, 2024 | Multiple-choiceTruthfulQA | CodeCode Available | 0 | 5 |
| Truth Knows No Language: Evaluating Truthfulness Beyond English | Feb 13, 2025 | InformativenessMachine Translation | CodeCode Available | 0 | 5 |
| Truth Neurons | May 18, 2025 | TruthfulQA | CodeCode Available | 0 | 5 |
| DeLTa: A Decoding Strategy based on Logit Trajectory Prediction Improves Factuality and Reasoning Ability | Mar 4, 2025 | GSM8KLogical Reasoning | CodeCode Available | 0 | 5 |
| Unsupervised Elicitation of Language Models | Jun 11, 2025 | GSM8KTruthfulQA | CodeCode Available | 0 | 5 |
| VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation | Jun 25, 2024 | ARCBenchmarking | CodeCode Available | 0 | 5 |
| Teaching language models to support answers with verified quotes | Mar 21, 2022 | Fact CheckingNatural Questions | —Unverified | 0 | 0 |
| Towards Multilingual LLM Evaluation for European Languages | Oct 11, 2024 | ARCGSM8K | —Unverified | 0 | 0 |
| TruthFlow: Truthful LLM Generation via Representation Flow Correction | Feb 6, 2025 | HallucinationTruthfulQA | —Unverified | 0 | 0 |
| Uhura: A Benchmark for Evaluating Scientific Question Answering and Truthfulness in Low-Resource African Languages | Dec 1, 2024 | ARCMultiple-choice | —Unverified | 0 | 0 |
| Uncertainty-aware Language Modeling for Selective Question Answering | Nov 26, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| When Persuasion Overrides Truth in Multi-Agent LLM Debates: Introducing a Confidence-Weighted Persuasion Override Rate (CW-POR) | Apr 1, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Obliviate: Efficient Unmemorization for Protecting Intellectual Property in Large Language Models | Feb 20, 2025 | HellaSwagMemorization | —Unverified | 0 | 0 |
| Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts | Oct 11, 2024 | Holdout SetMisconceptions | —Unverified | 0 | 0 |
| Cost-Saving LLM Cascades with Early Abstention | Feb 13, 2025 | GSM8KMMLU | —Unverified | 0 | 0 |
| LLMAuditor: A Framework for Auditing Large Language Models Using Human-in-the-Loop | Feb 14, 2024 | HallucinationTruthfulQA | —Unverified | 0 | 0 |
| DYNAMAX: Dynamic computing for Transformers and Mamba based architectures | Apr 29, 2025 | MambaTriviaQA | —Unverified | 0 | 0 |
| Efficiently Deploying LLMs with Controlled Risk | Oct 3, 2024 | MMLUTruthfulQA | —Unverified | 0 | 0 |
| Efficient MAP Estimation of LLM Judgment Performance with Prior Transfer | Apr 17, 2025 | Conformal PredictionTruthfulQA | —Unverified | 0 | 0 |