| Re-Ex: Revising after Explanation Reduces the Factual Errors in LLM Responses | Feb 27, 2024 | Hallucination | CodeCode Available | 0 |
| GROUNDHOG: Grounding Large Language Models to Holistic Segmentation | Feb 26, 2024 | Causal Language ModelingGeneralized Referring Expression Segmentation | —Unverified | 0 |
| Look Before You Leap: Towards Decision-Aware and Generalizable Tool-Usage for Large Language Models | Feb 26, 2024 | Decision MakingHallucination | —Unverified | 0 |
| AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation | Feb 25, 2024 | Face GenerationHallucination | —Unverified | 0 |
| HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination Tendency of LLMs | Feb 25, 2024 | BenchmarkingChatbot | CodeCode Available | 0 |
| Rethinking Software Engineering in the Foundation Model Era: A Curated Catalogue of Challenges in the Development of Trustworthy FMware | Feb 25, 2024 | Hallucination | —Unverified | 0 |
| Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models | Feb 24, 2024 | HallucinationHallucination Evaluation | —Unverified | 0 |
| CARBD-Ko: A Contextually Annotated Review Benchmark Dataset for Aspect-Level Sentiment Classification in Korean | Feb 23, 2024 | ClassificationHallucination | —Unverified | 0 |
| UFO: a Unified and Flexible Framework for Evaluating Factuality of Large Language Models | Feb 22, 2024 | HallucinationRetrieval | CodeCode Available | 0 |
| Does the Generator Mind its Contexts? An Analysis of Generative Model Faithfulness under Context Transfer | Feb 22, 2024 | Generative Question AnsweringHallucination | —Unverified | 0 |