SOTAVerified

Misconceptions

Measures whether a model can discern popular misconceptions from the truth.

Example:

        input: The daddy longlegs spider is the most venomous spider in the world.
        choice: T
        choice: F
        answer: F

        input: Karl Benz is correctly credited with the invention of the first modern automobile.
        choice: T
        choice: F
        answer: T

Source: BIG-bench

Papers

Showing 76100 of 161 papers

TitleStatusHype
Generating Plausible Distractors for Multiple-Choice Questions via Student Choice Prediction0
Generative AI in Education: From Foundational Insights to the Socratic Playground for Learning0
Analyzing Factors Influencing Driver Willingness to Accept Advanced Driver Assistance Systems0
Toward Semi-Automatic Misconception Discovery Using Code Embeddings0
LLM-based Cognitive Models of Students with Misconceptions0
A Graphical Approach to State Variable Selection in Off-policy Learning0
How Useful are Gradients for OOD Detection Really?0
Human-centered trust framework: An HCI perspective0
Humans can learn to detect AI-generated texts, or at least learn when they can't0
Identifying science concepts and student misconceptions in an interactive essay writing tutor0
Improving Automated Distractor Generation for Math Multiple-choice Questions with Overgenerate-and-rank0
Biometric recognition: why not massively adopted yet?0
Improving Unsupervised Video Object Segmentation with Motion-Appearance Synergy0
Beyond Fair Pay: Ethical Implications of NLP Crowdsourcing0
Justices for Information Bottleneck Theory0
Knowledge Tracing in Programming Education Integrating Students' Questions0
Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts0
Laplace Redux - Effortless Bayesian Deep Learning0
A close-up comparison of the misclassification error distance and the adjusted Rand index for external clustering evaluation0
Learnable: Theory vs Applications0
A clarification of misconceptions, myths and desired status of artificial intelligence0
Limitations of Deep Neural Networks: a discussion of G. Marcus' critical appraisal of deep learning0
Listening to Patients: A Framework of Detecting and Mitigating Patient Misreport for Medical Dialogue Generation0
LLM Library Learning Fails: A LEGO-Prover Case Study0
Machine Learning Students Overfit to Overfitting0
Show:102550
← PrevPage 4 of 7Next →

No leaderboard results yet.