SOTAVerified|Agents Browse Leaderboard About

TruthfulQA

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 26–50 of 80 papers

Title	Date	Tasks	Status	Hype
A Debate-Driven Experiment on LLM Hallucinations and Accuracy	Oct 25, 2024	Fact CheckingHallucination	—Unverified	0
Evaluating Consistencies in LLM responses through a Semantic Clustering of Question Answering	Oct 20, 2024	Language ModellingLarge Language Model	—Unverified	0
Iter-AHMCL: Alleviate Hallucination for Large Language Model via Iterative Model-level Contrastive Learning	Oct 16, 2024	Contrastive Learninggraph construction	—Unverified	0
SkillAggregation: Reference-free LLM-Dependent Aggregation	Oct 14, 2024	ChatbotHallucination	—Unverified	0
Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts	Oct 11, 2024	Holdout SetMisconceptions	—Unverified	0
NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language Models	Oct 11, 2024	Multiple-choiceTruthfulQA	CodeCode Available	0
Towards Multilingual LLM Evaluation for European Languages	Oct 11, 2024	ARCGSM8K	—Unverified	0
A test suite of prompt injection attacks for LLM-based machine translation	Oct 7, 2024	Machine TranslationTranslation	CodeCode Available	0
Efficiently Deploying LLMs with Controlled Risk	Oct 3, 2024	MMLUTruthfulQA	—Unverified	0
Integrative Decoding: Improve Factuality via Implicit Self-consistency	Oct 2, 2024	TruthfulQA	CodeCode Available	1
Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs	Sep 30, 2024	ARCDiversity	—Unverified	0
Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models	Sep 7, 2024	MMLUTruthfulQA	—Unverified	0
Lower Layer Matters: Alleviating Hallucination via Multi-Layer Fusion Contrastive Decoding with Truthfulness Refocused	Aug 16, 2024	HallucinationTruthfulQA	—Unverified	0
LokiLM: Technical Report	Jul 10, 2024	Knowledge DistillationLanguage Modeling	—Unverified	0
metabench -- A Sparse Benchmark to Measure General Ability in Large Language Models	Jul 4, 2024	ARCGSM8K	CodeCode Available	0
VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation	Jun 25, 2024	ARCBenchmarking	CodeCode Available	0
Steering Without Side Effects: Improving Post-Deployment Control of Language Models	Jun 21, 2024	Red TeamingTruthfulQA	CodeCode Available	0
Enhancing Language Model Factuality via Activation-Based Confidence Calibration and Guided Decoding	Jun 19, 2024	Language ModelingLanguage Modelling	CodeCode Available	0
LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models	May 31, 2024	TriviaQATruthfulQA	CodeCode Available	0
Multi-Reference Preference Optimization for Large Language Models	May 26, 2024	GSM8KTruthfulQA	—Unverified	0
Machine Unlearning in Large Language Models	May 24, 2024	Machine UnlearningTruthfulQA	CodeCode Available	1
Instruction Tuning With Loss Over Instructions	May 23, 2024	HumanEvalMMLU	CodeCode Available	1
RLHF Workflow: From Reward Modeling to Online RLHF	May 13, 2024	ChatbotHumanEval	CodeCode Available	5
Harmonic LLMs are Trustworthy	Apr 30, 2024	HallucinationTruthfulQA	—Unverified	0
Student Data Paradox and Curious Case of Single Student-Tutor Model: Regressive Side Effects of Training LLMs for Personalized Learning	Apr 23, 2024	ARCCommon Sense Reasoning	—Unverified	0

Show:10 25 50

← PrevPage 2 of 4Next →

No leaderboard results yet.