SOTAVerified

TruthfulQA

Papers

Showing 5175 of 80 papers

TitleStatusHype
Self-Evaluation Improves Selective Generation in Large Language Models0
Semantic Consistency for Assuring Reliability of Large Language Models0
Shadows in the Attention: Contextual Perturbation and Representation Drift in the Dynamics of Hallucination in LLMs0
SkillAggregation: Reference-free LLM-Dependent Aggregation0
Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency0
Teaching language models to support answers with verified quotes0
Towards Multilingual LLM Evaluation for European Languages0
TruthFlow: Truthful LLM Generation via Representation Flow Correction0
Uhura: A Benchmark for Evaluating Scientific Question Answering and Truthfulness in Low-Resource African Languages0
Uncertainty-aware Language Modeling for Selective Question Answering0
Unsupervised Elicitation of Language Models0
When Persuasion Overrides Truth in Multi-Agent LLM Debates: Introducing a Confidence-Weighted Persuasion Override Rate (CW-POR)0
Reducing LLM Hallucinations using Epistemic Neural Networks0
PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition DynamicsCode0
NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language ModelsCode0
VarBench: Robust Language Model Benchmarking Through Dynamic Variable PerturbationCode0
metabench -- A Sparse Benchmark to Measure General Ability in Large Language ModelsCode0
Multi-Agent Reinforcement Learning with Focal Diversity OptimizationCode0
SaGE: Evaluating Moral Consistency in Large Language ModelsCode0
LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language ModelsCode0
When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in Large Language ModelsCode0
(WhyPHI) Fine-Tuning PHI-3 for Multiple-Choice Question Answering: Methodology, Results, and ChallengesCode0
Enhancing Language Model Factuality via Activation-Based Confidence Calibration and Guided DecodingCode0
DeLTa: A Decoding Strategy based on Logit Trajectory Prediction Improves Factuality and Reasoning AbilityCode0
Truth Knows No Language: Evaluating Truthfulness Beyond EnglishCode0
Show:102550
← PrevPage 3 of 4Next →

No leaderboard results yet.