SOTAVerified

Misconceptions

Measures whether a model can discern popular misconceptions from the truth.

Example:

        input: The daddy longlegs spider is the most venomous spider in the world.
        choice: T
        choice: F
        answer: F

        input: Karl Benz is correctly credited with the invention of the first modern automobile.
        choice: T
        choice: F
        answer: T

Source: BIG-bench

Papers

Showing 51100 of 161 papers

TitleStatusHype
A Closer Look at Classification Evaluation Metrics and a Critical Reflection of Common Evaluation PracticeCode0
Student Data Paradox and Curious Case of Single Student-Tutor Model: Regressive Side Effects of Training LLMs for Personalized Learning0
Differential contributions of machine learning and statistical analysis to language and cognitive sciences0
Improving Automated Distractor Generation for Math Multiple-choice Questions with Overgenerate-and-rank0
Axiomatic modeling of fixed proportion technologies0
Breaking Boundaries: A Chronology with Future Directions of Women in Exercise Physiology Research, Centred on Pregnancy0
SoK: On Gradient Leakage in Federated Learning0
Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language ModelsCode0
Prompting the E-Brushes: Users as Authors in Generative AI0
Noise-powered Multi-modal Knowledge Graph Representation FrameworkCode1
The pitfalls of next-token predictionCode2
WatChat: Explaining perplexing programs by debugging mental modelsCode0
Improving the Validity of Automatically Generated Feedback via Reinforcement LearningCode1
The Essential Role of Causality in Foundation World Models for Embodied AI0
Clarify: Improving Model Robustness With Natural Language CorrectionsCode0
Enhancing Diagnostic Accuracy through Multi-Agent Conversations: Using Large Language Models to Mitigate Cognitive Bias0
Distortions in Judged Spatial Relations in Large Language Models0
Finnish 5th and 6th graders' misconceptions about Artificial Intelligence0
Uncertainty Quantification in Machine Learning for Biosignal Applications -- A Review0
Fine-tuning Language Models for Factuality0
Large Language Models for In-Context Student Modeling: Synthesizing Student's Behavior in Visual ProgrammingCode0
Unraveling the Single Tangent Space Fallacy: An Analysis and Clarification for Applying Riemannian Geometry in Robot Learning0
Can Large Language Models Provide Security & Privacy Advice? Measuring the Ability of LLMs to Refute MisconceptionsCode0
Novice Learner and Expert Tutor: Evaluating Math Reasoning Abilities of Large Language Models with Misconceptions0
EchoPrompt: Instructing the Model to Rephrase Queries for Improved In-context LearningCode0
Towards a Rigorous Analysis of Mutual Information in Contrastive Learning0
Using language models in the implicit automated assessment of mathematical short answer items0
Characterizing Information Seeking Events in Health-Related Social Discourse0
Scalable and Equitable Math Problem Solving Strategy Prediction in Big Educational DataCode0
Automated Distractor and Feedback Generation for Math Multiple-choice Questions via In-context LearningCode0
Parting with Misconceptions about Learning-based Vehicle Motion PlanningCode2
Reliability Check: An Analysis of GPT-3's Response to Sensitive Topics and Prompt WordingCode0
Dear XAI Community, We Need to Talk! Fundamental Misconceptions in Current XAI Research0
Justices for Information Bottleneck Theory0
Clarifying System 1 & 2 through the Common Model of Cognition0
Disproving XAI Myths with Formal Methods -- Initial Results0
Challenges and Trends in User Trust Discourse in AI0
Human-centered trust framework: An HCI perspective0
On the lifting and reconstruction of nonlinear systems with multiple invariant sets0
Demystifying Misconceptions in Social Bots Research0
Towards Democratizing Joint-Embedding Self-Supervised LearningCode2
Succinct Representations for Concepts0
Response to Moffat's Comment on "Towards Meaningful Statements in IR Evaluation: Mapping Evaluation Measures to Interval Scales"0
Deep learning applied to computational mechanics: A comprehensive review, state of the art, and the classics0
Improving Unsupervised Video Object Segmentation with Motion-Appearance Synergy0
The Monitor Model and its Misconceptions: A Clarification0
Inability of a graph neural network heuristic to outperform greedy algorithms in solving combinatorial optimization problems like Max-CutCode1
Machine Learning Students Overfit to Overfitting0
Dynamics and triggers of misinformation on vaccines0
Response to: Significance and stability of deep learning-based identification of subtypes within major psychiatric disorders. Molecular Psychiatry (2022)0
Show:102550
← PrevPage 2 of 4Next →

No leaderboard results yet.