SOTAVerified

Misconceptions

Measures whether a model can discern popular misconceptions from the truth.

Example:

        input: The daddy longlegs spider is the most venomous spider in the world.
        choice: T
        choice: F
        answer: F

        input: Karl Benz is correctly credited with the invention of the first modern automobile.
        choice: T
        choice: F
        answer: T

Source: BIG-bench

Papers

Showing 101125 of 161 papers

TitleStatusHype
Uncertainty Quantification in Machine Learning for Biosignal Applications -- A Review0
Math Multiple Choice Question Generation via Human-Large Language Model Collaboration0
Metagenomic Analysis using Phylogenetic Placement -- A Review of the First Decade0
Understanding the Lexical Simplification Needs of Non-Native Speakers of English0
Neural topology optimization: the good, the bad, and the ugly0
Unraveling the Single Tangent Space Fallacy: An Analysis and Clarification for Applying Riemannian Geometry in Robot Learning0
Novice Learner and Expert Tutor: Evaluating Math Reasoning Abilities of Large Language Models with Misconceptions0
On the lifting and reconstruction of nonlinear systems with multiple invariant sets0
Axiomatic modeling of fixed proportion technologies0
Automatic Generation of Question Hints for Mathematics Problems using Large Language Models in Educational Technology0
Knowledge, beliefs, attitudes and perceived risk about COVID-19 vaccine and determinants of COVID-19 vaccine acceptance in Bangladesh0
Problems in AI, their roots in philosophy, and implications for science and society0
Prompting the E-Brushes: Users as Authors in Generative AI0
Quantum Technology for Economists0
Rectified Max-Value Entropy Search for Bayesian Optimization0
A Thematic Framework for Analyzing Large-scale Self-reported Social Media Data on Opioid Use Disorder Treatment Using Buprenorphine Product0
Refining Skewed Perceptions in Vision-Language Models through Visual Representations0
Student Data Paradox and Curious Case of Single Student-Tutor Model: Regressive Side Effects of Training LLMs for Personalized Learning0
Using language models in the implicit automated assessment of mathematical short answer items0
Reply to Garcia et al.: Common mistakes in measuring frequency dependent word characteristics0
Automated Identification of Logical Errors in Programs: Advancing Scalable Analysis of Student Misconceptions0
Response to Moffat's Comment on "Towards Meaningful Statements in IR Evaluation: Mapping Evaluation Measures to Interval Scales"0
Response to: Significance and stability of deep learning-based identification of subtypes within major psychiatric disorders. Molecular Psychiatry (2022)0
Retrieval-augmented systems can be dangerous medical communicators0
Using Search Queries to Understand Health Information Needs in Africa0
Show:102550
← PrevPage 5 of 7Next →

No leaderboard results yet.