SOTAVerified|Agents Browse Leaderboard About Blog

Misconceptions

Measures whether a model can discern popular misconceptions from the truth.

Example:

        input: The daddy longlegs spider is the most venomous spider in the world.
        choice: T
        choice: F
        answer: F

        input: Karl Benz is correctly credited with the invention of the first modern automobile.
        choice: T
        choice: F
        answer: T

Source: BIG-bench

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–25 of 161 papers

Title	Date	Tasks	Status	Hype
Training Compute-Optimal Large Language Models	Mar 29, 2022	AnachronismsAnalogical Similarity	CodeCode Available	6
Factuality Enhanced Language Models for Open-Ended Text Generation	Jun 9, 2022	MisconceptionsSentence	CodeCode Available	5
Scaling Language Models: Methods, Analysis & Insights from Training Gopher	Dec 8, 2021	Abstract AlgebraAnachronisms	CodeCode Available	2
The pitfalls of next-token prediction	Mar 11, 2024	MambaMisconceptions	CodeCode Available	2
Parting with Misconceptions about Learning-based Vehicle Motion Planning	Jun 13, 2023	MisconceptionsMotion Planning	CodeCode Available	2
Towards Democratizing Joint-Embedding Self-Supervised Learning	Mar 3, 2023	Data AugmentationMisconceptions	CodeCode Available	2
Towards Process-Oriented, Modular, and Versatile Question Generation that Meets Educational Needs	Apr 30, 2022	MisconceptionsQuestion Generation	CodeCode Available	1
TruthfulQA: Measuring How Models Mimic Human Falsehoods	Sep 8, 2021	Language ModelingLanguage Modelling	CodeCode Available	1
Unveiling Contrastive Learning's Capability of Neighborhood Aggregation for Collaborative Filtering	Apr 14, 2025	Collaborative FilteringContrastive Learning	CodeCode Available	1
Re-Examining Linear Embeddings for High-Dimensional Bayesian Optimization	Jan 31, 2020	Bayesian OptimizationMisconceptions	CodeCode Available	1
Laplace Redux -- Effortless Bayesian Deep Learning	Jun 28, 2021	Deep LearningMisconceptions	CodeCode Available	1
A Tutorial on VAEs: From Bayes' Rule to Lossless Compression	Jun 18, 2020	Misconceptions	CodeCode Available	1
Noise-powered Multi-modal Knowledge Graph Representation Framework	Mar 11, 2024	Entity AlignmentKnowledge Graph Completion	CodeCode Available	1
Improving the Validity of Automatically Generated Feedback via Reinforcement Learning	Mar 2, 2024	MathMisconceptions	CodeCode Available	1
Enhancing Knowledge Tracing with Concept Map and Response Disentanglement	Aug 23, 2024	DisentanglementKnowledge Tracing	CodeCode Available	1
Emergent Communication under Competition	Jan 25, 2021	Misconceptions	CodeCode Available	1
Back to the Drawing Board: A Critical Evaluation of Poisoning Attacks on Production Federated Learning	Aug 23, 2021	Federated LearningMisconceptions	CodeCode Available	1
Exploring Knowledge Tracing in Tutor-Student Dialogues using LLMs	Sep 24, 2024	Knowledge TracingMisconceptions	CodeCode Available	1
Inability of a graph neural network heuristic to outperform greedy algorithms in solving combinatorial optimization problems like Max-Cut	Oct 2, 2022	Combinatorial OptimizationGraph Neural Network	CodeCode Available	1
On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines	Jun 8, 2020	Misconceptions	CodeCode Available	1
Analyzing Factors Influencing Driver Willingness to Accept Advanced Driver Assistance Systems	Feb 23, 2025	Misconceptions	—Unverified	0
Automatic Generation of Question Hints for Mathematics Problems using Large Language Models in Educational Technology	Nov 5, 2024	MathMisconceptions	—Unverified	0
A clarification of misconceptions, myths and desired status of artificial intelligence	Aug 3, 2020	BIG-bench Machine LearningMisconceptions	—Unverified	0
Challenges and Trends in User Trust Discourse in AI	May 5, 2023	Misconceptions	—Unverified	0
A Thematic Framework for Analyzing Large-scale Self-reported Social Media Data on Opioid Use Disorder Treatment Using Buprenorphine Product	Oct 2, 2024	Misconceptions	—Unverified	0

Show:10 25 50

← PrevPage 1 of 7Next →

No leaderboard results yet.