SOTAVerified|Agents Browse Leaderboard About Blog

Misconceptions

Measures whether a model can discern popular misconceptions from the truth.

Example:

        input: The daddy longlegs spider is the most venomous spider in the world.
        choice: T
        choice: F
        answer: F

        input: Karl Benz is correctly credited with the invention of the first modern automobile.
        choice: T
        choice: F
        answer: T

Source: BIG-bench

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–125 of 161 papers

Title	Date	Tasks	Status
Uncertainty Quantification in Machine Learning for Biosignal Applications -- A Review	Nov 15, 2023	DiagnosticEEG	—Unverified
Math Multiple Choice Question Generation via Human-Large Language Model Collaboration	May 1, 2024	Language ModelingLanguage Modelling	—Unverified
Metagenomic Analysis using Phylogenetic Placement -- A Review of the First Decade	Feb 7, 2022	Misconceptions	—Unverified
Understanding the Lexical Simplification Needs of Non-Native Speakers of English	Dec 1, 2016	Complex Word IdentificationLexical Simplification	—Unverified
Neural topology optimization: the good, the bad, and the ugly	Jul 19, 2024	GPUMisconceptions	—Unverified
Unraveling the Single Tangent Space Fallacy: An Analysis and Clarification for Applying Riemannian Geometry in Robot Learning	Oct 11, 2023	Misconceptions	—Unverified
Novice Learner and Expert Tutor: Evaluating Math Reasoning Abilities of Large Language Models with Misconceptions	Oct 3, 2023	MathMathematical Reasoning	—Unverified
On the lifting and reconstruction of nonlinear systems with multiple invariant sets	Apr 24, 2023	Misconceptions	—Unverified
Axiomatic modeling of fixed proportion technologies	Apr 18, 2024	Misconceptions	—Unverified
Automatic Generation of Question Hints for Mathematics Problems using Large Language Models in Educational Technology	Nov 5, 2024	MathMisconceptions	—Unverified
Knowledge, beliefs, attitudes and perceived risk about COVID-19 vaccine and determinants of COVID-19 vaccine acceptance in Bangladesh	Mar 28, 2021	Misconceptionsregression	—Unverified
Problems in AI, their roots in philosophy, and implications for science and society	Jul 22, 2024	MisconceptionsPhilosophy	—Unverified
Prompting the E-Brushes: Users as Authors in Generative AI	Mar 25, 2024	Misconceptions	—Unverified
Quantum Technology for Economists	Dec 8, 2020	Misconceptions	—Unverified
Rectified Max-Value Entropy Search for Bayesian Optimization	Feb 28, 2022	Bayesian OptimizationMisconceptions	—Unverified
A Thematic Framework for Analyzing Large-scale Self-reported Social Media Data on Opioid Use Disorder Treatment Using Buprenorphine Product	Oct 2, 2024	Misconceptions	—Unverified
Refining Skewed Perceptions in Vision-Language Models through Visual Representations	May 22, 2024	Misconceptions	—Unverified
Student Data Paradox and Curious Case of Single Student-Tutor Model: Regressive Side Effects of Training LLMs for Personalized Learning	Apr 23, 2024	ARCCommon Sense Reasoning	—Unverified
Using language models in the implicit automated assessment of mathematical short answer items	Aug 21, 2023	Misconceptions	—Unverified
Reply to Garcia et al.: Common mistakes in measuring frequency dependent word characteristics	May 25, 2015	Misconceptions	—Unverified
Automated Identification of Logical Errors in Programs: Advancing Scalable Analysis of Student Misconceptions	May 16, 2025	Misconceptions	—Unverified
Response to Moffat's Comment on "Towards Meaningful Statements in IR Evaluation: Mapping Evaluation Measures to Interval Scales"	Dec 22, 2022	Misconceptions	—Unverified
Response to: Significance and stability of deep learning-based identification of subtypes within major psychiatric disorders. Molecular Psychiatry (2022)	Jun 10, 2022	BIG-bench Machine LearningMisconceptions	—Unverified
Retrieval-augmented systems can be dangerous medical communicators	Feb 18, 2025	MisconceptionsRetrieval	—Unverified
Using Search Queries to Understand Health Information Needs in Africa	Jun 14, 2018	Misconceptions	—Unverified

Show:10 25 50

← PrevPage 5 of 7Next →

No leaderboard results yet.