SOTAVerified

Misconceptions

Measures whether a model can discern popular misconceptions from the truth.

Example:

        input: The daddy longlegs spider is the most venomous spider in the world.
        choice: T
        choice: F
        answer: F

        input: Karl Benz is correctly credited with the invention of the first modern automobile.
        choice: T
        choice: F
        answer: T

Source: BIG-bench

Papers

Showing 5175 of 161 papers

TitleStatusHype
Listening to Patients: A Framework of Detecting and Mitigating Patient Misreport for Medical Dialogue Generation0
Contrastive Explanations That Anticipate Human Misconceptions Can Improve Human Decision-Making Skills0
A Thematic Framework for Analyzing Large-scale Self-reported Social Media Data on Opioid Use Disorder Treatment Using Buprenorphine Product0
Classifier-Free Guidance is a Predictor-Corrector0
Problems in AI, their roots in philosophy, and implications for science and society0
Neural topology optimization: the good, the bad, and the ugly0
When big data actually are low-rank, or entrywise approximation of certain function-generated matricesCode0
MalAlgoQA: Pedagogical Evaluation of Counterfactual Reasoning in Large Language Models and Implications for AI in EducationCode0
Formalising Anti-Discrimination Law in Automated Decision Systems0
DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice QuestionsCode0
Student Answer Forecasting: Transformer-Driven Answer Choice Prediction for Language LearningCode0
Refining Skewed Perceptions in Vision-Language Models through Visual Representations0
An Initial Introduction to Cooperative Multi-Agent Reinforcement Learning0
Toward In-Context Teaching: Adapting Examples to Students' Misconceptions0
Common pitfalls to avoid while using multiobjective optimization in machine learning0
Can a Hallucinating Model help in Reducing Human "Hallucination"?0
Math Multiple Choice Question Generation via Human-Large Language Model Collaboration0
A Closer Look at Classification Evaluation Metrics and a Critical Reflection of Common Evaluation PracticeCode0
Student Data Paradox and Curious Case of Single Student-Tutor Model: Regressive Side Effects of Training LLMs for Personalized Learning0
Differential contributions of machine learning and statistical analysis to language and cognitive sciences0
Improving Automated Distractor Generation for Math Multiple-choice Questions with Overgenerate-and-rank0
Axiomatic modeling of fixed proportion technologies0
Breaking Boundaries: A Chronology with Future Directions of Women in Exercise Physiology Research, Centred on Pregnancy0
SoK: On Gradient Leakage in Federated Learning0
Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language ModelsCode0
Show:102550
← PrevPage 3 of 7Next →

No leaderboard results yet.