SOTAVerified

Misconceptions

Measures whether a model can discern popular misconceptions from the truth.

Example:

        input: The daddy longlegs spider is the most venomous spider in the world.
        choice: T
        choice: F
        answer: F

        input: Karl Benz is correctly credited with the invention of the first modern automobile.
        choice: T
        choice: F
        answer: T

Source: BIG-bench

Papers

Showing 2650 of 161 papers

TitleStatusHype
Design Challenges and Misconceptions in Neural Sequence LabelingCode0
Scalable and Equitable Math Problem Solving Strategy Prediction in Big Educational DataCode0
Pay attention to your loss: understanding misconceptions about 1-Lipschitz neural networksCode0
Not All Claims are Created Equal: Choosing the Right Statistical Approach to Assess HypothesesCode0
Large Language Models for In-Context Student Modeling: Synthesizing Student's Behavior in Visual ProgrammingCode0
Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning DistractorCode0
How to (Properly) Evaluate Cross-Lingual Word Embeddings: On Strong Baselines, Comparative Analyses, and Some MisconceptionsCode0
Harnessing Structured Knowledge: A Concept Map-Based Approach for High-Quality Multiple Choice Question Generation with Effective DistractorsCode0
A Structured Unplugged Approach for Foundational AI Literacy in Primary EducationCode0
Hindsight and Sequential Rationality of Correlated PlayCode0
Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language ModelsCode0
Automated Distractor and Feedback Generation for Math Multiple-choice Questions via In-context LearningCode0
End-to-End Annotator Bias Approximation on Crowdsourced Single-Label Sentiment AnalysisCode0
Community detection in networks: A user guideCode0
A Closer Look at Classification Evaluation Metrics and a Critical Reflection of Common Evaluation PracticeCode0
EchoPrompt: Instructing the Model to Rephrase Queries for Improved In-context LearningCode0
How to Protect Yourself from 5G Radiation? Investigating LLM Responses to Implicit MisinformationCode0
A Weakly-Supervised Iterative Graph-Based Approach to Retrieve COVID-19 Misinformation TopicsCode0
Collecting the Public Perception of AI and Robot RightsCode0
MalAlgoQA: Pedagogical Evaluation of Counterfactual Reasoning in Large Language Models and Implications for AI in EducationCode0
Deep Curvature SuiteCode0
DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice QuestionsCode0
From Solution Synthesis to Student Attempt Synthesis for Block-Based Visual Programming TasksCode0
Reliability Check: An Analysis of GPT-3's Response to Sensitive Topics and Prompt WordingCode0
Paths and Ambient Spaces in Neural Loss LandscapesCode0
Show:102550
← PrevPage 2 of 7Next →

No leaderboard results yet.