SOTAVerified

Misconceptions

Measures whether a model can discern popular misconceptions from the truth.

Example:

        input: The daddy longlegs spider is the most venomous spider in the world.
        choice: T
        choice: F
        answer: F

        input: Karl Benz is correctly credited with the invention of the first modern automobile.
        choice: T
        choice: F
        answer: T

Source: BIG-bench

Papers

Showing 150 of 161 papers

TitleStatusHype
Training Compute-Optimal Large Language ModelsCode6
Factuality Enhanced Language Models for Open-Ended Text GenerationCode5
Scaling Language Models: Methods, Analysis & Insights from Training GopherCode2
The pitfalls of next-token predictionCode2
Towards Democratizing Joint-Embedding Self-Supervised LearningCode2
Parting with Misconceptions about Learning-based Vehicle Motion PlanningCode2
Exploring Knowledge Tracing in Tutor-Student Dialogues using LLMsCode1
Improving the Validity of Automatically Generated Feedback via Reinforcement LearningCode1
A Tutorial on VAEs: From Bayes' Rule to Lossless CompressionCode1
TruthfulQA: Measuring How Models Mimic Human FalsehoodsCode1
Unveiling Contrastive Learning's Capability of Neighborhood Aggregation for Collaborative FilteringCode1
Emergent Communication under CompetitionCode1
Re-Examining Linear Embeddings for High-Dimensional Bayesian OptimizationCode1
Laplace Redux -- Effortless Bayesian Deep LearningCode1
Enhancing Knowledge Tracing with Concept Map and Response DisentanglementCode1
On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong BaselinesCode1
Back to the Drawing Board: A Critical Evaluation of Poisoning Attacks on Production Federated LearningCode1
Noise-powered Multi-modal Knowledge Graph Representation FrameworkCode1
Towards Process-Oriented, Modular, and Versatile Question Generation that Meets Educational NeedsCode1
Inability of a graph neural network heuristic to outperform greedy algorithms in solving combinatorial optimization problems like Max-CutCode1
Analyzing Factors Influencing Driver Willingness to Accept Advanced Driver Assistance Systems0
Automatic Generation of Question Hints for Mathematics Problems using Large Language Models in Educational Technology0
A clarification of misconceptions, myths and desired status of artificial intelligence0
Chasing Progress, Not Perfection: Revisiting Strategies for End-to-End LLM Plan Generation0
Characterizing Information Seeking Events in Health-Related Social Discourse0
Challenges and Trends in User Trust Discourse in AI0
A Graphical Approach to State Variable Selection in Off-policy Learning0
Clarifying Misconceptions in COVID-19 Vaccine Sentiment and Stance Analysis and Their Implications for Vaccine Hesitancy Mitigation: A Systematic Review0
Clarifying System 1 & 2 through the Common Model of Cognition0
Classifier-Free Guidance is a Predictor-Corrector0
Dynamics and triggers of misinformation on vaccines0
Emergent Abilities in Large Language Models: A Survey0
Enforcing Interpretability and its Statistical Impacts: Trade-offs between Accuracy and Interpretability0
Can a Hallucinating Model help in Reducing Human "Hallucination"?0
Breaking Boundaries: A Chronology with Future Directions of Women in Exercise Physiology Research, Centred on Pregnancy0
A Study on Characterization of Near-Field Sub-Regions For Phased-Array Antennas0
Biometric recognition: why not massively adopted yet?0
Dear XAI Community, We Need to Talk! Fundamental Misconceptions in Current XAI Research0
Data-Mining Textual Responses to Uncover Misconception Patterns0
Beyond Fair Pay: Ethical Implications of NLP Crowdsourcing0
Decoding Knowledge in Large Language Models: A Framework for Categorization and Comprehension0
Deep Discourse Analysis for Generating Personalized Feedback in Intelligent Tutor Systems0
Deep learning applied to computational mechanics: A comprehensive review, state of the art, and the classics0
Demystifying Misconceptions in Social Bots Research0
Demystifying Ten Big Ideas and Rules Every Fire Scientist & Engineer Should Know About Blackbox, Whitebox & Causal Artificial Intelligence0
Depression Status Estimation by Deep Learning based Hybrid Multi-Modal Fusion Model0
A Thematic Framework for Analyzing Large-scale Self-reported Social Media Data on Opioid Use Disorder Treatment Using Buprenorphine Product0
Developer Perspectives on Licensing and Copyright Issues Arising from Generative AI for Software Development0
Instructions and Guide for Diagnostic Questions: The NeurIPS 2020 Education Challenge0
A close-up comparison of the misclassification error distance and the adjusted Rand index for external clustering evaluation0
Show:102550
← PrevPage 1 of 4Next →

No leaderboard results yet.