TruthfulQA

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 80 papers

Title	Date	Tasks	Status	Hype	Score
RLHF Workflow: From Reward Modeling to Online RLHF	May 13, 2024	ChatbotHumanEval	CodeCode Available	5	5
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model	Jun 6, 2023	Language ModelingLanguage Modelling	CodeCode Available	2	5
In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation	Mar 3, 2024	HallucinationTruthfulQA	CodeCode Available	2	5
TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space	Feb 27, 2024	Contrastive LearningHallucination	CodeCode Available	2	5
Tuning Language Models by Proxy	Jan 16, 2024	Domain AdaptationMath	CodeCode Available	2	5
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models	Sep 7, 2023	TruthfulQA	CodeCode Available	2	5
Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics	Sep 13, 2023	EthicsTruthfulQA	CodeCode Available	1	5
Machine Unlearning in Large Language Models	May 24, 2024	Machine UnlearningTruthfulQA	CodeCode Available	1	5
Non-Linear Inference Time Intervention: Improving LLM Truthfulness	Mar 27, 2024	Large Language ModelMultiple-choice	CodeCode Available	1	5
RAIN: Your Language Models Can Align Themselves without Finetuning	Sep 13, 2023	Adversarial AttackTruthfulQA	CodeCode Available	1	5
Instruction Tuning With Loss Over Instructions	May 23, 2024	HumanEvalMMLU	CodeCode Available	1	5
Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment	Aug 18, 2023	MMLURed Teaming	CodeCode Available	1	5
Tool-Augmented Reward Modeling	Oct 2, 2023	TruthfulQA	CodeCode Available	1	5
Integrative Decoding: Improve Factuality via Implicit Self-consistency	Oct 2, 2024	TruthfulQA	CodeCode Available	1	5
Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuning	Dec 29, 2023	TruthfulQA	CodeCode Available	1	5
TruthfulQA: Measuring How Models Mimic Human Falsehoods	Sep 8, 2021	Language ModelingLanguage Modelling	CodeCode Available	1	5
Alleviating Hallucinations of Large Language Models through Induced Hallucinations	Dec 25, 2023	HallucinationHallucination Evaluation	CodeCode Available	1	5
(WhyPHI) Fine-Tuning PHI-3 for Multiple-Choice Question Answering: Methodology, Results, and Challenges	Jan 3, 2025	Multiple-choiceQuestion Answering	CodeCode Available	0	5
SaGE: Evaluating Moral Consistency in Large Language Models	Feb 21, 2024	Decision MakingHellaSwag	CodeCode Available	0	5
Measuring Reliability of Large Language Models through Semantic Consistency	Nov 10, 2022	Text GenerationTruthfulQA	CodeCode Available	0	5
DeLTa: A Decoding Strategy based on Logit Trajectory Prediction Improves Factuality and Reasoning Ability	Mar 4, 2025	GSM8KLogical Reasoning	CodeCode Available	0	5
Multi-Agent Reinforcement Learning with Focal Diversity Optimization	Feb 6, 2025	DiversityMulti-agent Reinforcement Learning	CodeCode Available	0	5
Enhancing Language Model Factuality via Activation-Based Confidence Calibration and Guided Decoding	Jun 19, 2024	Language ModelingLanguage Modelling	CodeCode Available	0	5
Steering Without Side Effects: Improving Post-Deployment Control of Language Models	Jun 21, 2024	Red TeamingTruthfulQA	CodeCode Available	0	5
NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language Models	Oct 11, 2024	Multiple-choiceTruthfulQA	CodeCode Available	0	5
metabench -- A Sparse Benchmark to Measure General Ability in Large Language Models	Jul 4, 2024	ARCGSM8K	CodeCode Available	0	5
LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models	May 31, 2024	TriviaQATruthfulQA	CodeCode Available	0	5
Truth Knows No Language: Evaluating Truthfulness Beyond English	Feb 13, 2025	InformativenessMachine Translation	CodeCode Available	0	5
Truth Neurons	May 18, 2025	TruthfulQA	CodeCode Available	0	5
PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition Dynamics	Apr 6, 2024	BenchmarkingHallucination	CodeCode Available	0	5
CHAIR -- Classifier of Hallucination as Improver	Jan 5, 2025	HallucinationMMLU	CodeCode Available	0	5
A test suite of prompt injection attacks for LLM-based machine translation	Oct 7, 2024	Machine TranslationTranslation	CodeCode Available	0	5
VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation	Jun 25, 2024	ARCBenchmarking	CodeCode Available	0	5
When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in Large Language Models	Apr 14, 2024	TruthfulQA	CodeCode Available	0	5
Self-Evaluation Improves Selective Generation in Large Language Models	Dec 14, 2023	Multiple-choiceTruthfulQA	—Unverified	0	0
Semantic Consistency for Assuring Reliability of Large Language Models	Aug 17, 2023	Question AnsweringText Generation	—Unverified	0	0
Shadows in the Attention: Contextual Perturbation and Representation Drift in the Dynamics of Hallucination in LLMs	May 22, 2025	HallucinationTruthfulQA	—Unverified	0	0
SkillAggregation: Reference-free LLM-Dependent Aggregation	Oct 14, 2024	ChatbotHallucination	—Unverified	0	0
Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency	Apr 4, 2025	BenchmarkingGSM8K	—Unverified	0	0
Teaching language models to support answers with verified quotes	Mar 21, 2022	Fact CheckingNatural Questions	—Unverified	0	0
Towards Multilingual LLM Evaluation for European Languages	Oct 11, 2024	ARCGSM8K	—Unverified	0	0
TruthFlow: Truthful LLM Generation via Representation Flow Correction	Feb 6, 2025	HallucinationTruthfulQA	—Unverified	0	0
Uhura: A Benchmark for Evaluating Scientific Question Answering and Truthfulness in Low-Resource African Languages	Dec 1, 2024	ARCMultiple-choice	—Unverified	0	0
Uncertainty-aware Language Modeling for Selective Question Answering	Nov 26, 2023	Language ModelingLanguage Modelling	—Unverified	0	0
Unsupervised Elicitation of Language Models	Jun 11, 2025	GSM8KTruthfulQA	—Unverified	0	0
When Persuasion Overrides Truth in Multi-Agent LLM Debates: Introducing a Confidence-Weighted Persuasion Override Rate (CW-POR)	Apr 1, 2025	Language ModelingLanguage Modelling	—Unverified	0	0
Obliviate: Efficient Unmemorization for Protecting Intellectual Property in Large Language Models	Feb 20, 2025	HellaSwagMemorization	—Unverified	0	0
Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts	Oct 11, 2024	Holdout SetMisconceptions	—Unverified	0	0
Cost-Saving LLM Cascades with Early Abstention	Feb 13, 2025	GSM8KMMLU	—Unverified	0	0
LLMAuditor: A Framework for Auditing Large Language Models Using Human-in-the-Loop	Feb 14, 2024	HallucinationTruthfulQA	—Unverified	0	0

Show:10 25 50

← PrevPage 1 of 2Next →

No leaderboard results yet.