SOTAVerified

TruthfulQA

Papers

Showing 150 of 80 papers

TitleStatusHype
RLHF Workflow: From Reward Modeling to Online RLHFCode5
Inference-Time Intervention: Eliciting Truthful Answers from a Language ModelCode2
Tuning Language Models by ProxyCode2
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language ModelsCode2
TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful SpaceCode2
In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination MitigationCode2
Integrative Decoding: Improve Factuality via Implicit Self-consistencyCode1
Alleviating Hallucinations of Large Language Models through Induced HallucinationsCode1
Machine Unlearning in Large Language ModelsCode1
Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without TuningCode1
Non-Linear Inference Time Intervention: Improving LLM TruthfulnessCode1
Red-Teaming Large Language Models using Chain of Utterances for Safety-AlignmentCode1
RAIN: Your Language Models Can Align Themselves without FinetuningCode1
Tool-Augmented Reward ModelingCode1
Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and EthicsCode1
TruthfulQA: Measuring How Models Mimic Human FalsehoodsCode1
Instruction Tuning With Loss Over InstructionsCode1
Lower Layer Matters: Alleviating Hallucination via Multi-Layer Fusion Contrastive Decoding with Truthfulness Refocused0
Maintaining Informative Coherence: Migrating Hallucinations in Large Language Models via Absorbing Markov Chains0
Mitigating Adversarial Attacks in LLMs through Defensive Suffix Generation0
Model Unlearning via Sparse Autoencoder Subspace Guided Projections0
Monty Hall and Optimized Conformal Prediction to Improve Decision-Making with LLMs0
More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment0
Multi-Reference Preference Optimization for Large Language Models0
Obliviate: Efficient Unmemorization for Protecting Intellectual Property in Large Language Models0
On The Truthfulness of 'Surprisingly Likely' Responses of Large Language Models0
PRobELM: Plausibility Ranking Evaluation for Language Models0
Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs0
Efficiently Deploying LLMs with Controlled Risk0
Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts0
Cost-Saving LLM Cascades with Early Abstention0
LLMAuditor: A Framework for Auditing Large Language Models Using Human-in-the-Loop0
DYNAMAX: Dynamic computing for Transformers and Mamba based architectures0
A Debate-Driven Experiment on LLM Hallucinations and Accuracy0
Efficient MAP Estimation of LLM Judgment Performance with Prior Transfer0
Elastic Weight Consolidation for Full-Parameter Continual Pre-Training of Gemma20
Evaluating Consistencies in LLM responses through a Semantic Clustering of Question Answering0
GRATH: Gradual Self-Truthifying for Large Language Models0
Harmonic LLMs are Trustworthy0
Instruction Tuning with Human Curriculum0
Investigating Data Contamination in Modern Benchmarks for Large Language Models0
Iter-AHMCL: Alleviate Hallucination for Large Language Model via Iterative Model-level Contrastive Learning0
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback0
Layer Importance and Hallucination Analysis in Large Language Models via Enhanced Activation Variance-Sparsity0
LokiLM: Technical Report0
Student Data Paradox and Curious Case of Single Student-Tutor Model: Regressive Side Effects of Training LLMs for Personalized Learning0
Sample, Don't Search: Rethinking Test-Time Alignment for Language Models0
Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models0
Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models0
Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation0
Show:102550
← PrevPage 1 of 2Next →

No leaderboard results yet.