SOTAVerified

Misconceptions

Measures whether a model can discern popular misconceptions from the truth.

Example:

        input: The daddy longlegs spider is the most venomous spider in the world.
        choice: T
        choice: F
        answer: F

        input: Karl Benz is correctly credited with the invention of the first modern automobile.
        choice: T
        choice: F
        answer: T

Source: BIG-bench

Papers

Showing 5175 of 161 papers

TitleStatusHype
A Closer Look at Classification Evaluation Metrics and a Critical Reflection of Common Evaluation PracticeCode0
Student Data Paradox and Curious Case of Single Student-Tutor Model: Regressive Side Effects of Training LLMs for Personalized Learning0
Differential contributions of machine learning and statistical analysis to language and cognitive sciences0
Improving Automated Distractor Generation for Math Multiple-choice Questions with Overgenerate-and-rank0
Axiomatic modeling of fixed proportion technologies0
Breaking Boundaries: A Chronology with Future Directions of Women in Exercise Physiology Research, Centred on Pregnancy0
SoK: On Gradient Leakage in Federated Learning0
Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language ModelsCode0
Prompting the E-Brushes: Users as Authors in Generative AI0
Noise-powered Multi-modal Knowledge Graph Representation FrameworkCode1
The pitfalls of next-token predictionCode2
WatChat: Explaining perplexing programs by debugging mental modelsCode0
Improving the Validity of Automatically Generated Feedback via Reinforcement LearningCode1
The Essential Role of Causality in Foundation World Models for Embodied AI0
Clarify: Improving Model Robustness With Natural Language CorrectionsCode0
Enhancing Diagnostic Accuracy through Multi-Agent Conversations: Using Large Language Models to Mitigate Cognitive Bias0
Distortions in Judged Spatial Relations in Large Language Models0
Finnish 5th and 6th graders' misconceptions about Artificial Intelligence0
Uncertainty Quantification in Machine Learning for Biosignal Applications -- A Review0
Fine-tuning Language Models for Factuality0
Large Language Models for In-Context Student Modeling: Synthesizing Student's Behavior in Visual ProgrammingCode0
Unraveling the Single Tangent Space Fallacy: An Analysis and Clarification for Applying Riemannian Geometry in Robot Learning0
Novice Learner and Expert Tutor: Evaluating Math Reasoning Abilities of Large Language Models with Misconceptions0
Can Large Language Models Provide Security & Privacy Advice? Measuring the Ability of LLMs to Refute MisconceptionsCode0
EchoPrompt: Instructing the Model to Rephrase Queries for Improved In-context LearningCode0
Show:102550
← PrevPage 3 of 7Next →

No leaderboard results yet.