SOTAVerified

Misconceptions

Measures whether a model can discern popular misconceptions from the truth.

Example:

        input: The daddy longlegs spider is the most venomous spider in the world.
        choice: T
        choice: F
        answer: F

        input: Karl Benz is correctly credited with the invention of the first modern automobile.
        choice: T
        choice: F
        answer: T

Source: BIG-bench

Papers

Showing 51100 of 161 papers

TitleStatusHype
WatChat: Explaining perplexing programs by debugging mental modelsCode0
When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific ResearchCode0
When big data actually are low-rank, or entrywise approximation of certain function-generated matricesCode0
Zero Shot Learning for Code Education: Rubric Sampling with Deep Learning InferenceCode0
The Oversmoothing Fallacy: A Misguided Narrative in GNN Research0
Dynamics and triggers of misinformation on vaccines0
An Initial Introduction to Cooperative Multi-Agent Reinforcement Learning0
Emergent Abilities in Large Language Models: A Survey0
Characterizing Information Seeking Events in Health-Related Social Discourse0
The kernel perspective on dynamic mode decomposition0
Enforcing Interpretability and its Statistical Impacts: Trade-offs between Accuracy and Interpretability0
Enhancing Diagnostic Accuracy through Multi-Agent Conversations: Using Large Language Models to Mitigate Cognitive Bias0
Challenges and Trends in User Trust Discourse in AI0
The Singularity Controversy, Part I: Lessons Learned and Open Questions: Conclusions from the Battle on the Legitimacy of the Debate0
Toward In-Context Teaching: Adapting Examples to Students' Misconceptions0
Can a Hallucinating Model help in Reducing Human "Hallucination"?0
Breaking Boundaries: A Chronology with Future Directions of Women in Exercise Physiology Research, Centred on Pregnancy0
Fine-tuning Language Models for Factuality0
Finnish 5th and 6th graders' misconceptions about Artificial Intelligence0
Formalising Anti-Discrimination Law in Automated Decision Systems0
Foundation Models in Computational Pathology: A Review of Challenges, Opportunities, and Impact0
On Proximity and Structural Role-based Embeddings in Networks: Misconceptions, Techniques, and Applications0
From Intuition to Understanding: Using AI Peers to Overcome Physics Misconceptions0
From Random to Regular: Variation in the Patterning of Retinal Mosaics0
Towards a Rigorous Analysis of Mutual Information in Contrastive Learning0
Generating Plausible Distractors for Multiple-Choice Questions via Student Choice Prediction0
Generative AI in Education: From Foundational Insights to the Socratic Playground for Learning0
Analyzing Factors Influencing Driver Willingness to Accept Advanced Driver Assistance Systems0
Toward Semi-Automatic Misconception Discovery Using Code Embeddings0
LLM-based Cognitive Models of Students with Misconceptions0
A Graphical Approach to State Variable Selection in Off-policy Learning0
How Useful are Gradients for OOD Detection Really?0
Human-centered trust framework: An HCI perspective0
Humans can learn to detect AI-generated texts, or at least learn when they can't0
Identifying science concepts and student misconceptions in an interactive essay writing tutor0
Improving Automated Distractor Generation for Math Multiple-choice Questions with Overgenerate-and-rank0
Biometric recognition: why not massively adopted yet?0
Improving Unsupervised Video Object Segmentation with Motion-Appearance Synergy0
Beyond Fair Pay: Ethical Implications of NLP Crowdsourcing0
Justices for Information Bottleneck Theory0
Knowledge Tracing in Programming Education Integrating Students' Questions0
Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts0
Laplace Redux - Effortless Bayesian Deep Learning0
A close-up comparison of the misclassification error distance and the adjusted Rand index for external clustering evaluation0
Learnable: Theory vs Applications0
A clarification of misconceptions, myths and desired status of artificial intelligence0
Limitations of Deep Neural Networks: a discussion of G. Marcus' critical appraisal of deep learning0
Listening to Patients: A Framework of Detecting and Mitigating Patient Misreport for Medical Dialogue Generation0
LLM Library Learning Fails: A LEGO-Prover Case Study0
Machine Learning Students Overfit to Overfitting0
Show:102550
← PrevPage 2 of 4Next →

No leaderboard results yet.