SOTAVerified

TruthfulQA

Papers

Showing 6170 of 80 papers

TitleStatusHype
Unsupervised Elicitation of Language Models0
When Persuasion Overrides Truth in Multi-Agent LLM Debates: Introducing a Confidence-Weighted Persuasion Override Rate (CW-POR)0
Reducing LLM Hallucinations using Epistemic Neural Networks0
PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition DynamicsCode0
NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language ModelsCode0
VarBench: Robust Language Model Benchmarking Through Dynamic Variable PerturbationCode0
metabench -- A Sparse Benchmark to Measure General Ability in Large Language ModelsCode0
Multi-Agent Reinforcement Learning with Focal Diversity OptimizationCode0
SaGE: Evaluating Moral Consistency in Large Language ModelsCode0
LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language ModelsCode0
Show:102550
← PrevPage 7 of 8Next →

No leaderboard results yet.