SOTAVerified

TruthfulQA

Papers

Showing 150 of 80 papers

TitleStatusHype
RLHF Workflow: From Reward Modeling to Online RLHFCode5
Inference-Time Intervention: Eliciting Truthful Answers from a Language ModelCode2
TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful SpaceCode2
Tuning Language Models by ProxyCode2
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language ModelsCode2
In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination MitigationCode2
Tool-Augmented Reward ModelingCode1
Instruction Tuning With Loss Over InstructionsCode1
RAIN: Your Language Models Can Align Themselves without FinetuningCode1
Integrative Decoding: Improve Factuality via Implicit Self-consistencyCode1
TruthfulQA: Measuring How Models Mimic Human FalsehoodsCode1
Non-Linear Inference Time Intervention: Improving LLM TruthfulnessCode1
Alleviating Hallucinations of Large Language Models through Induced HallucinationsCode1
Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and EthicsCode1
Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without TuningCode1
Machine Unlearning in Large Language ModelsCode1
Red-Teaming Large Language Models using Chain of Utterances for Safety-AlignmentCode1
Mitigating Adversarial Attacks in LLMs through Defensive Suffix Generation0
Model Unlearning via Sparse Autoencoder Subspace Guided Projections0
Monty Hall and Optimized Conformal Prediction to Improve Decision-Making with LLMs0
More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment0
Multi-Reference Preference Optimization for Large Language Models0
Efficiently Deploying LLMs with Controlled Risk0
Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts0
Cost-Saving LLM Cascades with Early Abstention0
LLMAuditor: A Framework for Auditing Large Language Models Using Human-in-the-Loop0
DYNAMAX: Dynamic computing for Transformers and Mamba based architectures0
A Debate-Driven Experiment on LLM Hallucinations and Accuracy0
Efficient MAP Estimation of LLM Judgment Performance with Prior Transfer0
Elastic Weight Consolidation for Full-Parameter Continual Pre-Training of Gemma20
Evaluating Consistencies in LLM responses through a Semantic Clustering of Question Answering0
GRATH: Gradual Self-Truthifying for Large Language Models0
Harmonic LLMs are Trustworthy0
Investigating Data Contamination in Modern Benchmarks for Large Language Models0
Iter-AHMCL: Alleviate Hallucination for Large Language Model via Iterative Model-level Contrastive Learning0
Layer Importance and Hallucination Analysis in Large Language Models via Enhanced Activation Variance-Sparsity0
LokiLM: Technical Report0
Lower Layer Matters: Alleviating Hallucination via Multi-Layer Fusion Contrastive Decoding with Truthfulness Refocused0
Maintaining Informative Coherence: Migrating Hallucinations in Large Language Models via Absorbing Markov Chains0
On The Truthfulness of 'Surprisingly Likely' Responses of Large Language Models0
PRobELM: Plausibility Ranking Evaluation for Language Models0
Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs0
Reducing LLM Hallucinations using Epistemic Neural Networks0
Student Data Paradox and Curious Case of Single Student-Tutor Model: Regressive Side Effects of Training LLMs for Personalized Learning0
Sample, Don't Search: Rethinking Test-Time Alignment for Language Models0
Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models0
Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models0
Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation0
Self-Evaluation Improves Selective Generation in Large Language Models0
Semantic Consistency for Assuring Reliability of Large Language Models0
Show:102550
← PrevPage 1 of 2Next →

No leaderboard results yet.