SOTAVerified

Misconceptions

Measures whether a model can discern popular misconceptions from the truth.

Example:

        input: The daddy longlegs spider is the most venomous spider in the world.
        choice: T
        choice: F
        answer: F

        input: Karl Benz is correctly credited with the invention of the first modern automobile.
        choice: T
        choice: F
        answer: T

Source: BIG-bench

Papers

Showing 51100 of 161 papers

TitleStatusHype
Listening to Patients: A Framework of Detecting and Mitigating Patient Misreport for Medical Dialogue Generation0
Contrastive Explanations That Anticipate Human Misconceptions Can Improve Human Decision-Making Skills0
A Thematic Framework for Analyzing Large-scale Self-reported Social Media Data on Opioid Use Disorder Treatment Using Buprenorphine Product0
Classifier-Free Guidance is a Predictor-Corrector0
Problems in AI, their roots in philosophy, and implications for science and society0
Neural topology optimization: the good, the bad, and the ugly0
When big data actually are low-rank, or entrywise approximation of certain function-generated matricesCode0
MalAlgoQA: Pedagogical Evaluation of Counterfactual Reasoning in Large Language Models and Implications for AI in EducationCode0
Formalising Anti-Discrimination Law in Automated Decision Systems0
DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice QuestionsCode0
Student Answer Forecasting: Transformer-Driven Answer Choice Prediction for Language LearningCode0
Refining Skewed Perceptions in Vision-Language Models through Visual Representations0
An Initial Introduction to Cooperative Multi-Agent Reinforcement Learning0
Toward In-Context Teaching: Adapting Examples to Students' Misconceptions0
Common pitfalls to avoid while using multiobjective optimization in machine learning0
Can a Hallucinating Model help in Reducing Human "Hallucination"?0
Math Multiple Choice Question Generation via Human-Large Language Model Collaboration0
A Closer Look at Classification Evaluation Metrics and a Critical Reflection of Common Evaluation PracticeCode0
Student Data Paradox and Curious Case of Single Student-Tutor Model: Regressive Side Effects of Training LLMs for Personalized Learning0
Differential contributions of machine learning and statistical analysis to language and cognitive sciences0
Improving Automated Distractor Generation for Math Multiple-choice Questions with Overgenerate-and-rank0
Axiomatic modeling of fixed proportion technologies0
Breaking Boundaries: A Chronology with Future Directions of Women in Exercise Physiology Research, Centred on Pregnancy0
SoK: On Gradient Leakage in Federated Learning0
Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language ModelsCode0
Prompting the E-Brushes: Users as Authors in Generative AI0
WatChat: Explaining perplexing programs by debugging mental modelsCode0
Clarify: Improving Model Robustness With Natural Language CorrectionsCode0
The Essential Role of Causality in Foundation World Models for Embodied AI0
Enhancing Diagnostic Accuracy through Multi-Agent Conversations: Using Large Language Models to Mitigate Cognitive Bias0
Distortions in Judged Spatial Relations in Large Language Models0
Finnish 5th and 6th graders' misconceptions about Artificial Intelligence0
Uncertainty Quantification in Machine Learning for Biosignal Applications -- A Review0
Fine-tuning Language Models for Factuality0
Large Language Models for In-Context Student Modeling: Synthesizing Student's Behavior in Visual ProgrammingCode0
Unraveling the Single Tangent Space Fallacy: An Analysis and Clarification for Applying Riemannian Geometry in Robot Learning0
Novice Learner and Expert Tutor: Evaluating Math Reasoning Abilities of Large Language Models with Misconceptions0
Can Large Language Models Provide Security & Privacy Advice? Measuring the Ability of LLMs to Refute MisconceptionsCode0
EchoPrompt: Instructing the Model to Rephrase Queries for Improved In-context LearningCode0
Towards a Rigorous Analysis of Mutual Information in Contrastive Learning0
Using language models in the implicit automated assessment of mathematical short answer items0
Characterizing Information Seeking Events in Health-Related Social Discourse0
Automated Distractor and Feedback Generation for Math Multiple-choice Questions via In-context LearningCode0
Scalable and Equitable Math Problem Solving Strategy Prediction in Big Educational DataCode0
Reliability Check: An Analysis of GPT-3's Response to Sensitive Topics and Prompt WordingCode0
Dear XAI Community, We Need to Talk! Fundamental Misconceptions in Current XAI Research0
Justices for Information Bottleneck Theory0
Clarifying System 1 & 2 through the Common Model of Cognition0
Disproving XAI Myths with Formal Methods -- Initial Results0
Human-centered trust framework: An HCI perspective0
Show:102550
← PrevPage 2 of 4Next →

No leaderboard results yet.