SOTAVerified

Misconceptions

Measures whether a model can discern popular misconceptions from the truth.

Example:

        input: The daddy longlegs spider is the most venomous spider in the world.
        choice: T
        choice: F
        answer: F

        input: Karl Benz is correctly credited with the invention of the first modern automobile.
        choice: T
        choice: F
        answer: T

Source: BIG-bench

Papers

Showing 125 of 161 papers

TitleStatusHype
Training Compute-Optimal Large Language ModelsCode6
Factuality Enhanced Language Models for Open-Ended Text GenerationCode5
Parting with Misconceptions about Learning-based Vehicle Motion PlanningCode2
Towards Democratizing Joint-Embedding Self-Supervised LearningCode2
Scaling Language Models: Methods, Analysis & Insights from Training GopherCode2
The pitfalls of next-token predictionCode2
Enhancing Knowledge Tracing with Concept Map and Response DisentanglementCode1
TruthfulQA: Measuring How Models Mimic Human FalsehoodsCode1
Unveiling Contrastive Learning's Capability of Neighborhood Aggregation for Collaborative FilteringCode1
Emergent Communication under CompetitionCode1
Noise-powered Multi-modal Knowledge Graph Representation FrameworkCode1
A Tutorial on VAEs: From Bayes' Rule to Lossless CompressionCode1
Exploring Knowledge Tracing in Tutor-Student Dialogues using LLMsCode1
Towards Process-Oriented, Modular, and Versatile Question Generation that Meets Educational NeedsCode1
Back to the Drawing Board: A Critical Evaluation of Poisoning Attacks on Production Federated LearningCode1
Inability of a graph neural network heuristic to outperform greedy algorithms in solving combinatorial optimization problems like Max-CutCode1
Improving the Validity of Automatically Generated Feedback via Reinforcement LearningCode1
On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong BaselinesCode1
Laplace Redux -- Effortless Bayesian Deep LearningCode1
Re-Examining Linear Embeddings for High-Dimensional Bayesian OptimizationCode1
A Variational Inequality Perspective on Generative Adversarial NetworksCode0
Harnessing Structured Knowledge: A Concept Map-Based Approach for High-Quality Multiple Choice Question Generation with Effective DistractorsCode0
Hindsight and Sequential Rationality of Correlated PlayCode0
Can Large Language Models Provide Security & Privacy Advice? Measuring the Ability of LLMs to Refute MisconceptionsCode0
From Solution Synthesis to Student Attempt Synthesis for Block-Based Visual Programming TasksCode0
Show:102550
← PrevPage 1 of 7Next →

No leaderboard results yet.