SOTAVerified|Agents Browse Leaderboard About Blog

Misconceptions

Measures whether a model can discern popular misconceptions from the truth.

Example:

        input: The daddy longlegs spider is the most venomous spider in the world.
        choice: T
        choice: F
        answer: F

        input: Karl Benz is correctly credited with the invention of the first modern automobile.
        choice: T
        choice: F
        answer: T

Source: BIG-bench

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 26–50 of 161 papers

Title	Date	Tasks	Status
Harnessing Structured Knowledge: A Concept Map-Based Approach for High-Quality Multiple Choice Question Generation with Effective Distractors	May 2, 2025	High School PhysicsMisconceptions	CodeCode Available
LLM Library Learning Fails: A LEGO-Prover Case Study	Apr 3, 2025	Mathematical ReasoningMisconceptions	—Unverified
What is AI, what is it not, how we use it in physics and how it impacts... you	Apr 2, 2025	Anomaly DetectionMisconceptions	—Unverified
From Intuition to Understanding: Using AI Peers to Overcome Physics Misconceptions	Apr 1, 2025	Misconceptions	—Unverified
Clarifying Misconceptions in COVID-19 Vaccine Sentiment and Stance Analysis and Their Implications for Vaccine Hesitancy Mitigation: A Systematic Review	Mar 23, 2025	MisconceptionsSentiment Analysis	—Unverified
How to Protect Yourself from 5G Radiation? Investigating LLM Responses to Implicit Misinformation	Mar 12, 2025	counterfactualMisconceptions	CodeCode Available
Paths and Ambient Spaces in Neural Loss Landscapes	Mar 5, 2025	Misconceptions	CodeCode Available
Emergent Abilities in Large Language Models: A Survey	Feb 28, 2025	In-Context LearningMisconceptions	—Unverified
Analyzing Factors Influencing Driver Willingness to Accept Advanced Driver Assistance Systems	Feb 23, 2025	Misconceptions	—Unverified
The Imitation Game for Educational AI	Feb 21, 2025	Distractor GenerationMisconceptions	—Unverified
Retrieval-augmented systems can be dangerous medical communicators	Feb 18, 2025	MisconceptionsRetrieval	—Unverified
Foundation Models in Computational Pathology: A Review of Challenges, Opportunities, and Impact	Feb 12, 2025	Misconceptions	—Unverified
Knowledge Tracing in Programming Education Integrating Students' Questions	Jan 22, 2025	Knowledge TracingMisconceptions	—Unverified
Generating Plausible Distractors for Multiple-Choice Questions via Student Choice Prediction	Jan 21, 2025	Distractor GenerationMisconceptions	—Unverified
Generative AI in Education: From Foundational Insights to the Socratic Playground for Learning	Jan 12, 2025	Misconceptions	—Unverified
Decoding Knowledge in Large Language Models: A Framework for Categorization and Comprehension	Jan 2, 2025	Misconceptions	—Unverified
A Graphical Approach to State Variable Selection in Off-policy Learning	Jan 1, 2025	Causal InferenceDimensionality Reduction	—Unverified
Chasing Progress, Not Perfection: Revisiting Strategies for End-to-End LLM Plan Generation	Dec 14, 2024	Misconceptions	—Unverified
Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning Distractor	Dec 8, 2024	MisconceptionsMultiple-choice	CodeCode Available
Developer Perspectives on Licensing and Copyright Issues Arising from Generative AI for Software Development	Nov 16, 2024	MisconceptionsSurvey	—Unverified
Automatic Generation of Question Hints for Mathematics Problems using Large Language Models in Educational Technology	Nov 5, 2024	MathMisconceptions	—Unverified
A Study on Characterization of Near-Field Sub-Regions For Phased-Array Antennas	Oct 23, 2024	Misconceptions	—Unverified
LLM-based Cognitive Models of Students with Misconceptions	Oct 16, 2024	Misconceptions	—Unverified
The Future of Learning in the Age of Generative AI: Automated Question Generation and Assessment with Large Language Models	Oct 12, 2024	MisconceptionsMultiple-choice	—Unverified
Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts	Oct 11, 2024	Holdout SetMisconceptions	—Unverified

Show:10 25 50

← PrevPage 2 of 7Next →

No leaderboard results yet.