| CHAIR -- Classifier of Hallucination as Improver | Jan 5, 2025 | HallucinationMMLU | CodeCode Available | 0 | 5 |
| A test suite of prompt injection attacks for LLM-based machine translation | Oct 7, 2024 | Machine TranslationTranslation | CodeCode Available | 0 | 5 |
| VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation | Jun 25, 2024 | ARCBenchmarking | CodeCode Available | 0 | 5 |
| When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in Large Language Models | Apr 14, 2024 | TruthfulQA | CodeCode Available | 0 | 5 |
| Self-Evaluation Improves Selective Generation in Large Language Models | Dec 14, 2023 | Multiple-choiceTruthfulQA | —Unverified | 0 | 0 |
| Semantic Consistency for Assuring Reliability of Large Language Models | Aug 17, 2023 | Question AnsweringText Generation | —Unverified | 0 | 0 |
| Shadows in the Attention: Contextual Perturbation and Representation Drift in the Dynamics of Hallucination in LLMs | May 22, 2025 | HallucinationTruthfulQA | —Unverified | 0 | 0 |
| SkillAggregation: Reference-free LLM-Dependent Aggregation | Oct 14, 2024 | ChatbotHallucination | —Unverified | 0 | 0 |
| Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency | Apr 4, 2025 | BenchmarkingGSM8K | —Unverified | 0 | 0 |
| Teaching language models to support answers with verified quotes | Mar 21, 2022 | Fact CheckingNatural Questions | —Unverified | 0 | 0 |