SOTAVerified

Misconceptions

Measures whether a model can discern popular misconceptions from the truth.

Example:

        input: The daddy longlegs spider is the most venomous spider in the world.
        choice: T
        choice: F
        answer: F

        input: Karl Benz is correctly credited with the invention of the first modern automobile.
        choice: T
        choice: F
        answer: T

Source: BIG-bench

Papers

Showing 76100 of 161 papers

TitleStatusHype
Prompting the E-Brushes: Users as Authors in Generative AI0
WatChat: Explaining perplexing programs by debugging mental modelsCode0
Clarify: Improving Model Robustness With Natural Language CorrectionsCode0
The Essential Role of Causality in Foundation World Models for Embodied AI0
Enhancing Diagnostic Accuracy through Multi-Agent Conversations: Using Large Language Models to Mitigate Cognitive Bias0
Distortions in Judged Spatial Relations in Large Language Models0
Finnish 5th and 6th graders' misconceptions about Artificial Intelligence0
Uncertainty Quantification in Machine Learning for Biosignal Applications -- A Review0
Fine-tuning Language Models for Factuality0
Large Language Models for In-Context Student Modeling: Synthesizing Student's Behavior in Visual ProgrammingCode0
Unraveling the Single Tangent Space Fallacy: An Analysis and Clarification for Applying Riemannian Geometry in Robot Learning0
Novice Learner and Expert Tutor: Evaluating Math Reasoning Abilities of Large Language Models with Misconceptions0
Can Large Language Models Provide Security & Privacy Advice? Measuring the Ability of LLMs to Refute MisconceptionsCode0
EchoPrompt: Instructing the Model to Rephrase Queries for Improved In-context LearningCode0
Towards a Rigorous Analysis of Mutual Information in Contrastive Learning0
Using language models in the implicit automated assessment of mathematical short answer items0
Characterizing Information Seeking Events in Health-Related Social Discourse0
Automated Distractor and Feedback Generation for Math Multiple-choice Questions via In-context LearningCode0
Scalable and Equitable Math Problem Solving Strategy Prediction in Big Educational DataCode0
Reliability Check: An Analysis of GPT-3's Response to Sensitive Topics and Prompt WordingCode0
Dear XAI Community, We Need to Talk! Fundamental Misconceptions in Current XAI Research0
Justices for Information Bottleneck Theory0
Clarifying System 1 & 2 through the Common Model of Cognition0
Disproving XAI Myths with Formal Methods -- Initial Results0
Human-centered trust framework: An HCI perspective0
Show:102550
← PrevPage 4 of 7Next →

No leaderboard results yet.