| RegaVAE: A Retrieval-Augmented Gaussian Mixture Variational Auto-Encoder for Language Modeling | Oct 16, 2023 | HallucinationLanguage Modeling | CodeCode Available | 1 |
| Metric Ensembles For Hallucination Detection | Oct 16, 2023 | Abstractive Text SummarizationHallucination | —Unverified | 0 |
| Factored Verification: Detecting and Reducing Hallucination in Summaries of Academic Papers | Oct 16, 2023 | 16kHallucination | CodeCode Available | 1 |
| Assessing the Reliability of Large Language Model Knowledge | Oct 15, 2023 | HallucinationKnowledge Probing | CodeCode Available | 0 |
| Configuration Validation with Large Language Models | Oct 15, 2023 | Code GenerationFew-Shot Learning | —Unverified | 0 |
| "Kelly is a Warm Person, Joseph is a Role Model": Gender Biases in LLM-Generated Reference Letters | Oct 13, 2023 | BenchmarkingFairness | CodeCode Available | 1 |
| Improving Large Language Models in Event Relation Logical Prediction | Oct 13, 2023 | counterfactualEvent Relation Extraction | CodeCode Available | 1 |
| KCTS: Knowledge-Constrained Tree Search Decoding with Token-Level Hallucination Detection | Oct 13, 2023 | Abstractive Text SummarizationHallucination | CodeCode Available | 1 |
| From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models | Oct 13, 2023 | HallucinationImage Captioning | CodeCode Available | 2 |
| GraphextQA: A Benchmark for Evaluating Graph-Enhanced Large Language Models | Oct 12, 2023 | Answer GenerationHallucination | CodeCode Available | 0 |
| GameGPT: Multi-agent Collaborative Framework for Game Development | Oct 12, 2023 | Code GenerationHallucination | —Unverified | 0 |
| Enhancing Text-based Knowledge Graph Completion with Zero-Shot Large Language Models: A Focus on Semantic Enhancement | Oct 12, 2023 | Contrastive LearningData Augmentation | CodeCode Available | 1 |
| Ferret: Refer and Ground Anything Anywhere at Any Granularity | Oct 11, 2023 | HallucinationLanguage Modeling | CodeCode Available | 5 |
| OpsEval: A Comprehensive IT Operations Benchmark Suite for Large Language Models | Oct 11, 2023 | HallucinationIn-Context Learning | CodeCode Available | 1 |
| A New Benchmark and Reverse Validation Method for Passage-level Hallucination Detection | Oct 10, 2023 | HallucinationSentence | CodeCode Available | 0 |
| Teaching Language Models to Hallucinate Less with Synthetic Tasks | Oct 10, 2023 | Abstractive Text SummarizationHallucination | —Unverified | 0 |
| Towards Mitigating Hallucination in Large Language Models via Self-Reflection | Oct 10, 2023 | Answer GenerationHallucination | —Unverified | 0 |
| Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models | Oct 9, 2023 | HallucinationObject | —Unverified | 0 |
| The Troubling Emergence of Hallucination in Large Language Models -- An Extensive Definition, Quantification, and Prescriptive Remediations | Oct 8, 2023 | Hallucination | —Unverified | 0 |
| Improving the Reliability of Large Language Models by Leveraging Uncertainty-Aware In-Context Learning | Oct 7, 2023 | HallucinationIn-Context Learning | —Unverified | 0 |
| Chain of Natural Language Inference for Reducing Large Language Model Ungrounded Hallucinations | Oct 6, 2023 | HallucinationLanguage Modeling | CodeCode Available | 1 |
| Evaluating Hallucinations in Chinese Large Language Models | Oct 5, 2023 | HallucinationQuestion Answering | CodeCode Available | 3 |
| FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation | Oct 5, 2023 | HallucinationWorld Knowledge | CodeCode Available | 2 |
| MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation | Oct 5, 2023 | BenchmarkingDecision Making | CodeCode Available | 2 |
| AGIR: Automating Cyber Threat Intelligence Reporting with Natural Language Generation | Oct 4, 2023 | HallucinationText Generation | CodeCode Available | 1 |