SOTAVerified

TruthfulQA

Papers

Showing 150 of 80 papers

TitleStatusHype
RLHF Workflow: From Reward Modeling to Online RLHFCode5
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language ModelsCode2
In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination MitigationCode2
Inference-Time Intervention: Eliciting Truthful Answers from a Language ModelCode2
TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful SpaceCode2
Tuning Language Models by ProxyCode2
Instruction Tuning With Loss Over InstructionsCode1
RAIN: Your Language Models Can Align Themselves without FinetuningCode1
Machine Unlearning in Large Language ModelsCode1
Integrative Decoding: Improve Factuality via Implicit Self-consistencyCode1
Red-Teaming Large Language Models using Chain of Utterances for Safety-AlignmentCode1
Alleviating Hallucinations of Large Language Models through Induced HallucinationsCode1
Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and EthicsCode1
Non-Linear Inference Time Intervention: Improving LLM TruthfulnessCode1
Tool-Augmented Reward ModelingCode1
Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without TuningCode1
TruthfulQA: Measuring How Models Mimic Human FalsehoodsCode1
When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in Large Language ModelsCode0
PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition DynamicsCode0
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human FeedbackCode0
(WhyPHI) Fine-Tuning PHI-3 for Multiple-Choice Question Answering: Methodology, Results, and ChallengesCode0
LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language ModelsCode0
Enhancing Language Model Factuality via Activation-Based Confidence Calibration and Guided DecodingCode0
SaGE: Evaluating Moral Consistency in Large Language ModelsCode0
CHAIR -- Classifier of Hallucination as ImproverCode0
A test suite of prompt injection attacks for LLM-based machine translationCode0
Measuring Reliability of Large Language Models through Semantic ConsistencyCode0
Instruction Tuning with Human CurriculumCode0
Steering Without Side Effects: Improving Post-Deployment Control of Language ModelsCode0
Multi-Agent Reinforcement Learning with Focal Diversity OptimizationCode0
metabench -- A Sparse Benchmark to Measure General Ability in Large Language ModelsCode0
NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language ModelsCode0
Truth Knows No Language: Evaluating Truthfulness Beyond EnglishCode0
Truth NeuronsCode0
DeLTa: A Decoding Strategy based on Logit Trajectory Prediction Improves Factuality and Reasoning AbilityCode0
Unsupervised Elicitation of Language ModelsCode0
VarBench: Robust Language Model Benchmarking Through Dynamic Variable PerturbationCode0
Teaching language models to support answers with verified quotes0
Towards Multilingual LLM Evaluation for European Languages0
TruthFlow: Truthful LLM Generation via Representation Flow Correction0
Uhura: A Benchmark for Evaluating Scientific Question Answering and Truthfulness in Low-Resource African Languages0
Uncertainty-aware Language Modeling for Selective Question Answering0
When Persuasion Overrides Truth in Multi-Agent LLM Debates: Introducing a Confidence-Weighted Persuasion Override Rate (CW-POR)0
Obliviate: Efficient Unmemorization for Protecting Intellectual Property in Large Language Models0
Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts0
Cost-Saving LLM Cascades with Early Abstention0
LLMAuditor: A Framework for Auditing Large Language Models Using Human-in-the-Loop0
DYNAMAX: Dynamic computing for Transformers and Mamba based architectures0
Efficiently Deploying LLMs with Controlled Risk0
Efficient MAP Estimation of LLM Judgment Performance with Prior Transfer0
Show:102550
← PrevPage 1 of 2Next →

No leaderboard results yet.