SOTAVerified

Misconceptions

Measures whether a model can discern popular misconceptions from the truth.

Example:

        input: The daddy longlegs spider is the most venomous spider in the world.
        choice: T
        choice: F
        answer: F

        input: Karl Benz is correctly credited with the invention of the first modern automobile.
        choice: T
        choice: F
        answer: T

Source: BIG-bench

Papers

Showing 51100 of 161 papers

TitleStatusHype
Emergent Abilities in Large Language Models: A Survey0
Enforcing Interpretability and its Statistical Impacts: Trade-offs between Accuracy and Interpretability0
Enhancing Diagnostic Accuracy through Multi-Agent Conversations: Using Large Language Models to Mitigate Cognitive Bias0
Automatic Generation of Question Hints for Mathematics Problems using Large Language Models in Educational Technology0
Knowledge, beliefs, attitudes and perceived risk about COVID-19 vaccine and determinants of COVID-19 vaccine acceptance in Bangladesh0
Fine-tuning Language Models for Factuality0
Finnish 5th and 6th graders' misconceptions about Artificial Intelligence0
Formalising Anti-Discrimination Law in Automated Decision Systems0
Foundation Models in Computational Pathology: A Review of Challenges, Opportunities, and Impact0
On Proximity and Structural Role-based Embeddings in Networks: Misconceptions, Techniques, and Applications0
From Intuition to Understanding: Using AI Peers to Overcome Physics Misconceptions0
From Random to Regular: Variation in the Patterning of Retinal Mosaics0
A close-up comparison of the misclassification error distance and the adjusted Rand index for external clustering evaluation0
Generating Plausible Distractors for Multiple-Choice Questions via Student Choice Prediction0
Generative AI in Education: From Foundational Insights to the Socratic Playground for Learning0
Axiomatic modeling of fixed proportion technologies0
Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts0
Beyond Fair Pay: Ethical Implications of NLP Crowdsourcing0
How Useful are Gradients for OOD Detection Really?0
Human-centered trust framework: An HCI perspective0
Humans can learn to detect AI-generated texts, or at least learn when they can't0
Identifying science concepts and student misconceptions in an interactive essay writing tutor0
Improving Automated Distractor Generation for Math Multiple-choice Questions with Overgenerate-and-rank0
Improving Unsupervised Video Object Segmentation with Motion-Appearance Synergy0
Justices for Information Bottleneck Theory0
Knowledge Tracing in Programming Education Integrating Students' Questions0
Laplace Redux - Effortless Bayesian Deep Learning0
Biometric recognition: why not massively adopted yet?0
Learnable: Theory vs Applications0
Breaking Boundaries: A Chronology with Future Directions of Women in Exercise Physiology Research, Centred on Pregnancy0
Limitations of Deep Neural Networks: a discussion of G. Marcus' critical appraisal of deep learning0
Listening to Patients: A Framework of Detecting and Mitigating Patient Misreport for Medical Dialogue Generation0
LLM Library Learning Fails: A LEGO-Prover Case Study0
Machine Learning Students Overfit to Overfitting0
Can a Hallucinating Model help in Reducing Human "Hallucination"?0
Math Multiple Choice Question Generation via Human-Large Language Model Collaboration0
Metagenomic Analysis using Phylogenetic Placement -- A Review of the First Decade0
A Graphical Approach to State Variable Selection in Off-policy Learning0
Neural topology optimization: the good, the bad, and the ugly0
Challenges and Trends in User Trust Discourse in AI0
Novice Learner and Expert Tutor: Evaluating Math Reasoning Abilities of Large Language Models with Misconceptions0
On the lifting and reconstruction of nonlinear systems with multiple invariant sets0
Characterizing Information Seeking Events in Health-Related Social Discourse0
Problems in AI, their roots in philosophy, and implications for science and society0
Prompting the E-Brushes: Users as Authors in Generative AI0
Quantum Technology for Economists0
Rectified Max-Value Entropy Search for Bayesian Optimization0
Refining Skewed Perceptions in Vision-Language Models through Visual Representations0
Student Data Paradox and Curious Case of Single Student-Tutor Model: Regressive Side Effects of Training LLMs for Personalized Learning0
Chasing Progress, Not Perfection: Revisiting Strategies for End-to-End LLM Plan Generation0
Show:102550
← PrevPage 2 of 4Next →

No leaderboard results yet.