SOTAVerified|Agents Browse Leaderboard About Blog

TruthfulQA

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 31–40 of 80 papers

Title	Date	Tasks	Status	Hype	Score
CHAIR -- Classifier of Hallucination as Improver	Jan 5, 2025	HallucinationMMLU	CodeCode Available	0	5
A test suite of prompt injection attacks for LLM-based machine translation	Oct 7, 2024	Machine TranslationTranslation	CodeCode Available	0	5
VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation	Jun 25, 2024	ARCBenchmarking	CodeCode Available	0	5
When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in Large Language Models	Apr 14, 2024	TruthfulQA	CodeCode Available	0	5
Self-Evaluation Improves Selective Generation in Large Language Models	Dec 14, 2023	Multiple-choiceTruthfulQA	—Unverified	0	0
Semantic Consistency for Assuring Reliability of Large Language Models	Aug 17, 2023	Question AnsweringText Generation	—Unverified	0	0
Shadows in the Attention: Contextual Perturbation and Representation Drift in the Dynamics of Hallucination in LLMs	May 22, 2025	HallucinationTruthfulQA	—Unverified	0	0
SkillAggregation: Reference-free LLM-Dependent Aggregation	Oct 14, 2024	ChatbotHallucination	—Unverified	0	0
Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency	Apr 4, 2025	BenchmarkingGSM8K	—Unverified	0	0
Teaching language models to support answers with verified quotes	Mar 21, 2022	Fact CheckingNatural Questions	—Unverified	0	0

Show:10 25 50

← PrevPage 4 of 8Next →

No leaderboard results yet.