SOTAVerified

Misconceptions

Measures whether a model can discern popular misconceptions from the truth.

Example:

        input: The daddy longlegs spider is the most venomous spider in the world.
        choice: T
        choice: F
        answer: F

        input: Karl Benz is correctly credited with the invention of the first modern automobile.
        choice: T
        choice: F
        answer: T

Source: BIG-bench

Papers

Showing 150 of 161 papers

TitleStatusHype
Training Compute-Optimal Large Language ModelsCode6
Factuality Enhanced Language Models for Open-Ended Text GenerationCode5
The pitfalls of next-token predictionCode2
Parting with Misconceptions about Learning-based Vehicle Motion PlanningCode2
Towards Democratizing Joint-Embedding Self-Supervised LearningCode2
Scaling Language Models: Methods, Analysis & Insights from Training GopherCode2
Unveiling Contrastive Learning's Capability of Neighborhood Aggregation for Collaborative FilteringCode1
Exploring Knowledge Tracing in Tutor-Student Dialogues using LLMsCode1
Enhancing Knowledge Tracing with Concept Map and Response DisentanglementCode1
Noise-powered Multi-modal Knowledge Graph Representation FrameworkCode1
Improving the Validity of Automatically Generated Feedback via Reinforcement LearningCode1
Inability of a graph neural network heuristic to outperform greedy algorithms in solving combinatorial optimization problems like Max-CutCode1
Towards Process-Oriented, Modular, and Versatile Question Generation that Meets Educational NeedsCode1
TruthfulQA: Measuring How Models Mimic Human FalsehoodsCode1
Back to the Drawing Board: A Critical Evaluation of Poisoning Attacks on Production Federated LearningCode1
Laplace Redux -- Effortless Bayesian Deep LearningCode1
Emergent Communication under CompetitionCode1
A Tutorial on VAEs: From Bayes' Rule to Lossless CompressionCode1
On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong BaselinesCode1
Re-Examining Linear Embeddings for High-Dimensional Bayesian OptimizationCode1
The Oversmoothing Fallacy: A Misguided Narrative in GNN Research0
A Structured Unplugged Approach for Foundational AI Literacy in Primary EducationCode0
When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific ResearchCode0
Automated Identification of Logical Errors in Programs: Advancing Scalable Analysis of Student Misconceptions0
Humans can learn to detect AI-generated texts, or at least learn when they can't0
Harnessing Structured Knowledge: A Concept Map-Based Approach for High-Quality Multiple Choice Question Generation with Effective DistractorsCode0
LLM Library Learning Fails: A LEGO-Prover Case Study0
What is AI, what is it not, how we use it in physics and how it impacts... you0
From Intuition to Understanding: Using AI Peers to Overcome Physics Misconceptions0
Clarifying Misconceptions in COVID-19 Vaccine Sentiment and Stance Analysis and Their Implications for Vaccine Hesitancy Mitigation: A Systematic Review0
How to Protect Yourself from 5G Radiation? Investigating LLM Responses to Implicit MisinformationCode0
Paths and Ambient Spaces in Neural Loss LandscapesCode0
Emergent Abilities in Large Language Models: A Survey0
Analyzing Factors Influencing Driver Willingness to Accept Advanced Driver Assistance Systems0
The Imitation Game for Educational AI0
Retrieval-augmented systems can be dangerous medical communicators0
Foundation Models in Computational Pathology: A Review of Challenges, Opportunities, and Impact0
Knowledge Tracing in Programming Education Integrating Students' Questions0
Generating Plausible Distractors for Multiple-Choice Questions via Student Choice Prediction0
Generative AI in Education: From Foundational Insights to the Socratic Playground for Learning0
Decoding Knowledge in Large Language Models: A Framework for Categorization and Comprehension0
A Graphical Approach to State Variable Selection in Off-policy Learning0
Chasing Progress, Not Perfection: Revisiting Strategies for End-to-End LLM Plan Generation0
Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning DistractorCode0
Developer Perspectives on Licensing and Copyright Issues Arising from Generative AI for Software Development0
Automatic Generation of Question Hints for Mathematics Problems using Large Language Models in Educational Technology0
A Study on Characterization of Near-Field Sub-Regions For Phased-Array Antennas0
LLM-based Cognitive Models of Students with Misconceptions0
The Future of Learning in the Age of Generative AI: Automated Question Generation and Assessment with Large Language Models0
Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts0
Show:102550
← PrevPage 1 of 4Next →

No leaderboard results yet.