SOTAVerified

Misconceptions

Measures whether a model can discern popular misconceptions from the truth.

Example:

        input: The daddy longlegs spider is the most venomous spider in the world.
        choice: T
        choice: F
        answer: F

        input: Karl Benz is correctly credited with the invention of the first modern automobile.
        choice: T
        choice: F
        answer: T

Source: BIG-bench

Papers

Showing 150 of 161 papers

TitleStatusHype
Training Compute-Optimal Large Language ModelsCode6
Factuality Enhanced Language Models for Open-Ended Text GenerationCode5
Towards Democratizing Joint-Embedding Self-Supervised LearningCode2
Scaling Language Models: Methods, Analysis & Insights from Training GopherCode2
Parting with Misconceptions about Learning-based Vehicle Motion PlanningCode2
The pitfalls of next-token predictionCode2
Unveiling Contrastive Learning's Capability of Neighborhood Aggregation for Collaborative FilteringCode1
A Tutorial on VAEs: From Bayes' Rule to Lossless CompressionCode1
Re-Examining Linear Embeddings for High-Dimensional Bayesian OptimizationCode1
Emergent Communication under CompetitionCode1
On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong BaselinesCode1
Back to the Drawing Board: A Critical Evaluation of Poisoning Attacks on Production Federated LearningCode1
Noise-powered Multi-modal Knowledge Graph Representation FrameworkCode1
Laplace Redux -- Effortless Bayesian Deep LearningCode1
TruthfulQA: Measuring How Models Mimic Human FalsehoodsCode1
Improving the Validity of Automatically Generated Feedback via Reinforcement LearningCode1
Enhancing Knowledge Tracing with Concept Map and Response DisentanglementCode1
Inability of a graph neural network heuristic to outperform greedy algorithms in solving combinatorial optimization problems like Max-CutCode1
Exploring Knowledge Tracing in Tutor-Student Dialogues using LLMsCode1
Towards Process-Oriented, Modular, and Versatile Question Generation that Meets Educational NeedsCode1
A Variational Inequality Perspective on Generative Adversarial NetworksCode0
Clarify: Improving Model Robustness With Natural Language CorrectionsCode0
Student Answer Forecasting: Transformer-Driven Answer Choice Prediction for Language LearningCode0
Resolving conceptual issues in Modern Coexistence TheoryCode0
Can Large Language Models Provide Security & Privacy Advice? Measuring the Ability of LLMs to Refute MisconceptionsCode0
Design Challenges and Misconceptions in Neural Sequence LabelingCode0
Scalable and Equitable Math Problem Solving Strategy Prediction in Big Educational DataCode0
Pay attention to your loss: understanding misconceptions about 1-Lipschitz neural networksCode0
Not All Claims are Created Equal: Choosing the Right Statistical Approach to Assess HypothesesCode0
Large Language Models for In-Context Student Modeling: Synthesizing Student's Behavior in Visual ProgrammingCode0
Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning DistractorCode0
How to (Properly) Evaluate Cross-Lingual Word Embeddings: On Strong Baselines, Comparative Analyses, and Some MisconceptionsCode0
Harnessing Structured Knowledge: A Concept Map-Based Approach for High-Quality Multiple Choice Question Generation with Effective DistractorsCode0
A Structured Unplugged Approach for Foundational AI Literacy in Primary EducationCode0
Hindsight and Sequential Rationality of Correlated PlayCode0
Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language ModelsCode0
Automated Distractor and Feedback Generation for Math Multiple-choice Questions via In-context LearningCode0
End-to-End Annotator Bias Approximation on Crowdsourced Single-Label Sentiment AnalysisCode0
Community detection in networks: A user guideCode0
A Closer Look at Classification Evaluation Metrics and a Critical Reflection of Common Evaluation PracticeCode0
EchoPrompt: Instructing the Model to Rephrase Queries for Improved In-context LearningCode0
How to Protect Yourself from 5G Radiation? Investigating LLM Responses to Implicit MisinformationCode0
A Weakly-Supervised Iterative Graph-Based Approach to Retrieve COVID-19 Misinformation TopicsCode0
Collecting the Public Perception of AI and Robot RightsCode0
MalAlgoQA: Pedagogical Evaluation of Counterfactual Reasoning in Large Language Models and Implications for AI in EducationCode0
Deep Curvature SuiteCode0
DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice QuestionsCode0
From Solution Synthesis to Student Attempt Synthesis for Block-Based Visual Programming TasksCode0
Reliability Check: An Analysis of GPT-3's Response to Sensitive Topics and Prompt WordingCode0
Paths and Ambient Spaces in Neural Loss LandscapesCode0
Show:102550
← PrevPage 1 of 4Next →

No leaderboard results yet.