| FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation | Oct 5, 2023 | HallucinationWorld Knowledge | CodeCode Available | 2 |
| MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation | Oct 5, 2023 | BenchmarkingDecision Making | CodeCode Available | 2 |
| MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning | Sep 14, 2023 | HallucinationIn-Context Learning | CodeCode Available | 2 |
| Benchmarking Large Language Models in Retrieval-Augmented Generation | Sep 4, 2023 | Benchmarkingcounterfactual | CodeCode Available | 2 |
| MindMap: Knowledge Graph Prompting Sparks Graph of Thoughts in Large Language Models | Aug 17, 2023 | Decision MakingHallucination | CodeCode Available | 2 |
| TinyLVLM-eHub: Towards Comprehensive and Efficient Evaluation for Large Vision-Language Models | Aug 7, 2023 | HallucinationObject Hallucination | CodeCode Available | 2 |
| Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies | Aug 6, 2023 | Hallucination | CodeCode Available | 2 |
| Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph | Jul 15, 2023 | HallucinationKnowledge Graphs | CodeCode Available | 2 |
| Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning | Jun 26, 2023 | HallucinationVisual Question Answering | CodeCode Available | 2 |
| ToolQA: A Dataset for LLM Question Answering with External Tools | Jun 23, 2023 | HallucinationQuestion Answering | CodeCode Available | 2 |