SOTAVerified

Misconceptions

Measures whether a model can discern popular misconceptions from the truth.

Example:

        input: The daddy longlegs spider is the most venomous spider in the world.
        choice: T
        choice: F
        answer: F

        input: Karl Benz is correctly credited with the invention of the first modern automobile.
        choice: T
        choice: F
        answer: T

Source: BIG-bench

Papers

Showing 125 of 161 papers

TitleStatusHype
Training Compute-Optimal Large Language ModelsCode6
Factuality Enhanced Language Models for Open-Ended Text GenerationCode5
The pitfalls of next-token predictionCode2
Parting with Misconceptions about Learning-based Vehicle Motion PlanningCode2
Towards Democratizing Joint-Embedding Self-Supervised LearningCode2
Scaling Language Models: Methods, Analysis & Insights from Training GopherCode2
Unveiling Contrastive Learning's Capability of Neighborhood Aggregation for Collaborative FilteringCode1
Exploring Knowledge Tracing in Tutor-Student Dialogues using LLMsCode1
Enhancing Knowledge Tracing with Concept Map and Response DisentanglementCode1
Noise-powered Multi-modal Knowledge Graph Representation FrameworkCode1
Improving the Validity of Automatically Generated Feedback via Reinforcement LearningCode1
Inability of a graph neural network heuristic to outperform greedy algorithms in solving combinatorial optimization problems like Max-CutCode1
Towards Process-Oriented, Modular, and Versatile Question Generation that Meets Educational NeedsCode1
TruthfulQA: Measuring How Models Mimic Human FalsehoodsCode1
Back to the Drawing Board: A Critical Evaluation of Poisoning Attacks on Production Federated LearningCode1
Laplace Redux -- Effortless Bayesian Deep LearningCode1
Emergent Communication under CompetitionCode1
A Tutorial on VAEs: From Bayes' Rule to Lossless CompressionCode1
On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong BaselinesCode1
Re-Examining Linear Embeddings for High-Dimensional Bayesian OptimizationCode1
The Oversmoothing Fallacy: A Misguided Narrative in GNN Research0
A Structured Unplugged Approach for Foundational AI Literacy in Primary EducationCode0
When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific ResearchCode0
Automated Identification of Logical Errors in Programs: Advancing Scalable Analysis of Student Misconceptions0
Humans can learn to detect AI-generated texts, or at least learn when they can't0
Show:102550
← PrevPage 1 of 7Next →

No leaderboard results yet.