SOTAVerified|Agents Browse Leaderboard About Blog

Misconceptions

Measures whether a model can discern popular misconceptions from the truth.

Example:

        input: The daddy longlegs spider is the most venomous spider in the world.
        choice: T
        choice: F
        answer: F

        input: Karl Benz is correctly credited with the invention of the first modern automobile.
        choice: T
        choice: F
        answer: T

Source: BIG-bench

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–75 of 161 papers

Title	Date	Tasks	Status	Hype
A Closer Look at Classification Evaluation Metrics and a Critical Reflection of Common Evaluation Practice	Apr 25, 2024	Misconceptions	CodeCode Available	0
Student Data Paradox and Curious Case of Single Student-Tutor Model: Regressive Side Effects of Training LLMs for Personalized Learning	Apr 23, 2024	ARCCommon Sense Reasoning	—Unverified	0
Differential contributions of machine learning and statistical analysis to language and cognitive sciences	Apr 22, 2024	Misconceptions	—Unverified	0
Improving Automated Distractor Generation for Math Multiple-choice Questions with Overgenerate-and-rank	Apr 19, 2024	Distractor GenerationMath	—Unverified	0
Axiomatic modeling of fixed proportion technologies	Apr 18, 2024	Misconceptions	—Unverified	0
Breaking Boundaries: A Chronology with Future Directions of Women in Exercise Physiology Research, Centred on Pregnancy	Apr 12, 2024	Misconceptions	—Unverified	0
SoK: On Gradient Leakage in Federated Learning	Apr 8, 2024	Federated LearningMisconceptions	—Unverified	0
Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language Models	Apr 2, 2024	Distractor GenerationIn-Context Learning	CodeCode Available	0
Prompting the E-Brushes: Users as Authors in Generative AI	Mar 25, 2024	Misconceptions	—Unverified	0
Noise-powered Multi-modal Knowledge Graph Representation Framework	Mar 11, 2024	Entity AlignmentKnowledge Graph Completion	CodeCode Available	1
The pitfalls of next-token prediction	Mar 11, 2024	MambaMisconceptions	CodeCode Available	2
WatChat: Explaining perplexing programs by debugging mental models	Mar 8, 2024	counterfactualLanguage Modelling	CodeCode Available	0
Improving the Validity of Automatically Generated Feedback via Reinforcement Learning	Mar 2, 2024	MathMisconceptions	CodeCode Available	1
The Essential Role of Causality in Foundation World Models for Embodied AI	Feb 6, 2024	Misconceptions	—Unverified	0
Clarify: Improving Model Robustness With Natural Language Corrections	Feb 6, 2024	Misconceptionsmodel	CodeCode Available	0
Enhancing Diagnostic Accuracy through Multi-Agent Conversations: Using Large Language Models to Mitigate Cognitive Bias	Jan 26, 2024	Decision MakingDiagnostic	—Unverified	0
Distortions in Judged Spatial Relations in Large Language Models	Jan 8, 2024	MisconceptionsSpatial Reasoning	—Unverified	0
Finnish 5th and 6th graders' misconceptions about Artificial Intelligence	Nov 28, 2023	Misconceptions	—Unverified	0
Uncertainty Quantification in Machine Learning for Biosignal Applications -- A Review	Nov 15, 2023	DiagnosticEEG	—Unverified	0
Fine-tuning Language Models for Factuality	Nov 14, 2023	Fact CheckingMisconceptions	—Unverified	0
Large Language Models for In-Context Student Modeling: Synthesizing Student's Behavior in Visual Programming	Oct 15, 2023	Misconceptions	CodeCode Available	0
Unraveling the Single Tangent Space Fallacy: An Analysis and Clarification for Applying Riemannian Geometry in Robot Learning	Oct 11, 2023	Misconceptions	—Unverified	0
Novice Learner and Expert Tutor: Evaluating Math Reasoning Abilities of Large Language Models with Misconceptions	Oct 3, 2023	MathMathematical Reasoning	—Unverified	0
Can Large Language Models Provide Security & Privacy Advice? Measuring the Ability of LLMs to Refute Misconceptions	Oct 3, 2023	MisconceptionsMultiple-choice	CodeCode Available	0
EchoPrompt: Instructing the Model to Rephrase Queries for Improved In-context Learning	Sep 16, 2023	Date UnderstandingGSM8K	CodeCode Available	0

Show:10 25 50

← PrevPage 3 of 7Next →

No leaderboard results yet.