TruthfulQA

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 26–50 of 80 papers

Title	Date	Tasks	Status
Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency	Apr 4, 2025	BenchmarkingGSM8K	—Unverified
More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment	Apr 3, 2025	ARCHellaSwag	—Unverified
When Persuasion Overrides Truth in Multi-Agent LLM Debates: Introducing a Confidence-Weighted Persuasion Override Rate (CW-POR)	Apr 1, 2025	Language ModelingLanguage Modelling	—Unverified
DeLTa: A Decoding Strategy based on Logit Trajectory Prediction Improves Factuality and Reasoning Ability	Mar 4, 2025	GSM8KLogical Reasoning	CodeCode Available
Obliviate: Efficient Unmemorization for Protecting Intellectual Property in Large Language Models	Feb 20, 2025	HellaSwagMemorization	—Unverified
Truth Knows No Language: Evaluating Truthfulness Beyond English	Feb 13, 2025	InformativenessMachine Translation	CodeCode Available
Cost-Saving LLM Cascades with Early Abstention	Feb 13, 2025	GSM8KMMLU	—Unverified
Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models	Feb 12, 2025	Mathematical ReasoningMMLU	—Unverified
Multi-Agent Reinforcement Learning with Focal Diversity Optimization	Feb 6, 2025	DiversityMulti-agent Reinforcement Learning	CodeCode Available
TruthFlow: Truthful LLM Generation via Representation Flow Correction	Feb 6, 2025	HallucinationTruthfulQA	—Unverified
CHAIR -- Classifier of Hallucination as Improver	Jan 5, 2025	HallucinationMMLU	CodeCode Available
(WhyPHI) Fine-Tuning PHI-3 for Multiple-Choice Question Answering: Methodology, Results, and Challenges	Jan 3, 2025	Multiple-choiceQuestion Answering	CodeCode Available
Monty Hall and Optimized Conformal Prediction to Improve Decision-Making with LLMs	Dec 31, 2024	Conformal PredictionDecision Making	—Unverified
Mitigating Adversarial Attacks in LLMs through Defensive Suffix Generation	Dec 18, 2024	TruthfulQA	—Unverified
Uhura: A Benchmark for Evaluating Scientific Question Answering and Truthfulness in Low-Resource African Languages	Dec 1, 2024	ARCMultiple-choice	—Unverified
Layer Importance and Hallucination Analysis in Large Language Models via Enhanced Activation Variance-Sparsity	Nov 15, 2024	Contrastive LearningHallucination	—Unverified
Maintaining Informative Coherence: Migrating Hallucinations in Large Language Models via Absorbing Markov Chains	Oct 27, 2024	Text GenerationTruthfulQA	—Unverified
A Debate-Driven Experiment on LLM Hallucinations and Accuracy	Oct 25, 2024	Fact CheckingHallucination	—Unverified
Evaluating Consistencies in LLM responses through a Semantic Clustering of Question Answering	Oct 20, 2024	Language ModellingLarge Language Model	—Unverified
Iter-AHMCL: Alleviate Hallucination for Large Language Model via Iterative Model-level Contrastive Learning	Oct 16, 2024	Contrastive Learninggraph construction	—Unverified
SkillAggregation: Reference-free LLM-Dependent Aggregation	Oct 14, 2024	ChatbotHallucination	—Unverified
NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language Models	Oct 11, 2024	Multiple-choiceTruthfulQA	CodeCode Available
Towards Multilingual LLM Evaluation for European Languages	Oct 11, 2024	ARCGSM8K	—Unverified
Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts	Oct 11, 2024	Holdout SetMisconceptions	—Unverified
A test suite of prompt injection attacks for LLM-based machine translation	Oct 7, 2024	Machine TranslationTranslation	CodeCode Available

Show:10 25 50

← PrevPage 2 of 4Next →

No leaderboard results yet.