SOTAVerified

TruthfulQA

Papers

Showing 5175 of 80 papers

TitleStatusHype
Efficiently Deploying LLMs with Controlled Risk0
Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs0
Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models0
Lower Layer Matters: Alleviating Hallucination via Multi-Layer Fusion Contrastive Decoding with Truthfulness Refocused0
LokiLM: Technical Report0
metabench -- A Sparse Benchmark to Measure General Ability in Large Language ModelsCode0
VarBench: Robust Language Model Benchmarking Through Dynamic Variable PerturbationCode0
Steering Without Side Effects: Improving Post-Deployment Control of Language ModelsCode0
Enhancing Language Model Factuality via Activation-Based Confidence Calibration and Guided DecodingCode0
LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language ModelsCode0
Multi-Reference Preference Optimization for Large Language Models0
Harmonic LLMs are Trustworthy0
Student Data Paradox and Curious Case of Single Student-Tutor Model: Regressive Side Effects of Training LLMs for Personalized Learning0
When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in Large Language ModelsCode0
PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition DynamicsCode0
PRobELM: Plausibility Ranking Evaluation for Language Models0
SaGE: Evaluating Moral Consistency in Large Language ModelsCode0
Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation0
LLMAuditor: A Framework for Auditing Large Language Models Using Human-in-the-Loop0
GRATH: Gradual Self-Truthifying for Large Language Models0
Reducing LLM Hallucinations using Epistemic Neural Networks0
Self-Evaluation Improves Selective Generation in Large Language Models0
Uncertainty-aware Language Modeling for Selective Question Answering0
Investigating Data Contamination in Modern Benchmarks for Large Language Models0
On The Truthfulness of 'Surprisingly Likely' Responses of Large Language Models0
Show:102550
← PrevPage 3 of 4Next →

No leaderboard results yet.