TruthfulQA

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 80 papers

Title	Date	Tasks	Status	Hype
RLHF Workflow: From Reward Modeling to Online RLHF	May 13, 2024	ChatbotHumanEval	CodeCode Available	5
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model	Jun 6, 2023	Language ModelingLanguage Modelling	CodeCode Available	2
Tuning Language Models by Proxy	Jan 16, 2024	Domain AdaptationMath	CodeCode Available	2
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models	Sep 7, 2023	TruthfulQA	CodeCode Available	2
TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space	Feb 27, 2024	Contrastive LearningHallucination	CodeCode Available	2
In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation	Mar 3, 2024	HallucinationTruthfulQA	CodeCode Available	2
Integrative Decoding: Improve Factuality via Implicit Self-consistency	Oct 2, 2024	TruthfulQA	CodeCode Available	1
Alleviating Hallucinations of Large Language Models through Induced Hallucinations	Dec 25, 2023	HallucinationHallucination Evaluation	CodeCode Available	1
Machine Unlearning in Large Language Models	May 24, 2024	Machine UnlearningTruthfulQA	CodeCode Available	1
Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuning	Dec 29, 2023	TruthfulQA	CodeCode Available	1
Non-Linear Inference Time Intervention: Improving LLM Truthfulness	Mar 27, 2024	Large Language ModelMultiple-choice	CodeCode Available	1
Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment	Aug 18, 2023	MMLURed Teaming	CodeCode Available	1
RAIN: Your Language Models Can Align Themselves without Finetuning	Sep 13, 2023	Adversarial AttackTruthfulQA	CodeCode Available	1
Tool-Augmented Reward Modeling	Oct 2, 2023	TruthfulQA	CodeCode Available	1
Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics	Sep 13, 2023	EthicsTruthfulQA	CodeCode Available	1
TruthfulQA: Measuring How Models Mimic Human Falsehoods	Sep 8, 2021	Language ModelingLanguage Modelling	CodeCode Available	1
Instruction Tuning With Loss Over Instructions	May 23, 2024	HumanEvalMMLU	CodeCode Available	1
Lower Layer Matters: Alleviating Hallucination via Multi-Layer Fusion Contrastive Decoding with Truthfulness Refocused	Aug 16, 2024	HallucinationTruthfulQA	—Unverified	0
Maintaining Informative Coherence: Migrating Hallucinations in Large Language Models via Absorbing Markov Chains	Oct 27, 2024	Text GenerationTruthfulQA	—Unverified	0
Mitigating Adversarial Attacks in LLMs through Defensive Suffix Generation	Dec 18, 2024	TruthfulQA	—Unverified	0
Model Unlearning via Sparse Autoencoder Subspace Guided Projections	May 30, 2025	Adversarial Robustnessfeature selection	—Unverified	0
Monty Hall and Optimized Conformal Prediction to Improve Decision-Making with LLMs	Dec 31, 2024	Conformal PredictionDecision Making	—Unverified	0
More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment	Apr 3, 2025	ARCHellaSwag	—Unverified	0
Multi-Reference Preference Optimization for Large Language Models	May 26, 2024	GSM8KTruthfulQA	—Unverified	0
Obliviate: Efficient Unmemorization for Protecting Intellectual Property in Large Language Models	Feb 20, 2025	HellaSwagMemorization	—Unverified	0
On The Truthfulness of 'Surprisingly Likely' Responses of Large Language Models	Nov 13, 2023	Language ModelingLanguage Modelling	—Unverified	0
PRobELM: Plausibility Ranking Evaluation for Language Models	Apr 4, 2024	Question AnsweringTruthfulQA	—Unverified	0
Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs	Sep 30, 2024	ARCDiversity	—Unverified	0
Efficiently Deploying LLMs with Controlled Risk	Oct 3, 2024	MMLUTruthfulQA	—Unverified	0
Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts	Oct 11, 2024	Holdout SetMisconceptions	—Unverified	0
Cost-Saving LLM Cascades with Early Abstention	Feb 13, 2025	GSM8KMMLU	—Unverified	0
LLMAuditor: A Framework for Auditing Large Language Models Using Human-in-the-Loop	Feb 14, 2024	HallucinationTruthfulQA	—Unverified	0
DYNAMAX: Dynamic computing for Transformers and Mamba based architectures	Apr 29, 2025	MambaTriviaQA	—Unverified	0
A Debate-Driven Experiment on LLM Hallucinations and Accuracy	Oct 25, 2024	Fact CheckingHallucination	—Unverified	0
Efficient MAP Estimation of LLM Judgment Performance with Prior Transfer	Apr 17, 2025	Conformal PredictionTruthfulQA	—Unverified	0
Elastic Weight Consolidation for Full-Parameter Continual Pre-Training of Gemma2	May 9, 2025	ARCBelebele	—Unverified	0
Evaluating Consistencies in LLM responses through a Semantic Clustering of Question Answering	Oct 20, 2024	Language ModellingLarge Language Model	—Unverified	0
GRATH: Gradual Self-Truthifying for Large Language Models	Jan 22, 2024	TruthfulQA	—Unverified	0
Harmonic LLMs are Trustworthy	Apr 30, 2024	HallucinationTruthfulQA	—Unverified	0
Instruction Tuning with Human Curriculum	Oct 14, 2023	ARCMMLU	—Unverified	0
Investigating Data Contamination in Modern Benchmarks for Large Language Models	Nov 16, 2023	Common Sense ReasoningMMLU	—Unverified	0
Iter-AHMCL: Alleviate Hallucination for Large Language Model via Iterative Model-level Contrastive Learning	Oct 16, 2024	Contrastive Learninggraph construction	—Unverified	0
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback	May 24, 2023	TriviaQATruthfulQA	—Unverified	0
Layer Importance and Hallucination Analysis in Large Language Models via Enhanced Activation Variance-Sparsity	Nov 15, 2024	Contrastive LearningHallucination	—Unverified	0
LokiLM: Technical Report	Jul 10, 2024	Knowledge DistillationLanguage Modeling	—Unverified	0
Student Data Paradox and Curious Case of Single Student-Tutor Model: Regressive Side Effects of Training LLMs for Personalized Learning	Apr 23, 2024	ARCCommon Sense Reasoning	—Unverified	0
Sample, Don't Search: Rethinking Test-Time Alignment for Language Models	Apr 4, 2025	GSM8KMathematical Reasoning	—Unverified	0
Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models	Sep 7, 2024	MMLUTruthfulQA	—Unverified	0
Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models	Feb 12, 2025	Mathematical ReasoningMMLU	—Unverified	0
Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation	Feb 14, 2024	TruthfulQA	—Unverified	0

Show:10 25 50

← PrevPage 1 of 2Next →

No leaderboard results yet.