SOTAVerified

Common Sense Reasoning

Common sense reasoning tasks are intended to require the model to go beyond pattern recognition. Instead, the model should use "common sense" or world knowledge to make inferences.

Papers

Showing 651700 of 939 papers

TitleStatusHype
A Logic-based Approach for Recognizing Textual Entailment Supported by Ontological Background Knowledge0
A Machine Consciousness architecture based on Deep Learning and Gaussian Processes0
A mathematical theory of super-resolution and two-point resolution0
Ambiguss, a game for building a Sense Annotated Corpus for French0
A Multi-Attention based Neural Network with External Knowledge for Story Ending Predicting Task0
A Multimodal Social Agent0
Analogical Proportions0
An Aposteriorical Clusterability Criterion for k-Means++ and Simplicity of Clustering0
An Application of Pseudo-Log-Likelihoods to Natural Language Scoring0
An End-to-End Multi-task Learning Model for Fact Checking0
An Hymn of an even Deeper Sentiment Analysis0
An Improved Neural Baseline for Temporal Relation Extraction0
An Informational Space Based Semantic Analysis for Scientific Texts0
An Interpretable Neural Network with Topical Information for Relevant Emotion Ranking0
A note on bequest preferences in utility maximisation for modern tontines0
An Overview Of Temporal Commonsense Reasoning and Acquisition0
A Perspective on Large Language Models, Intelligent Machines, and Knowledge Acquisition0
A Pluggable Common Sense-Enhanced Framework for Knowledge Graph Completion0
A primer on getting neologisms from foreign languages to under-resourced languages0
A Reliable Common-Sense Reasoning Socialbot Built Using LLMs and Goal-Directed ASP0
Are Social Sentiments Inherent in LLMs? An Empirical Study on Extraction of Inter-demographic Sentiments0
A Review on Objective-Driven Artificial Intelligence0
Artificial General Intelligence (AGI)-Native Wireless Systems: A Journey Beyond 6G0
A Rule-Based Approach to Aspect Extraction from Product Reviews0
A Service-Oriented Architecture for Assisting the Authoring of Semantic Crowd Maps0
Ask Me What You Need: Product Retrieval using Knowledge from GPT-30
Aspect Extraction from Product Reviews Using Category Hierarchy Information0
Assessment of cognitive characteristics in intelligent systems and predictive ability0
Assisting human experts in the interpretation of their visual process: A case study on assessing copper surface adhesive potency0
A Statistical View on Synthetic Aperture Imaging for Occlusion Removal0
A Strong Lexical Matching Method for the Machine Comprehension Test0
A Study on Neuro-Symbolic Artificial Intelligence: Healthcare Perspectives0
A survey of Identification and mitigation of Machine Learning algorithmic biases in Image Analysis0
A Survey on Semantics in Automated Data Science0
A Survey on Uncertainty Quantification of Large Language Models: Taxonomy, Open Research Challenges, and Future Directions0
A Systematic Survey of Text Worlds as Embodied Natural Language Environments0
A Systematic Survey of Text Worlds as Embodied Natural Language Environments0
A Theory of Human-Like Few-Shot Learning0
ATLAS: Learning to Optimally Memorize the Context at Test Time0
A Tool for Extracting Conversational Implicatures0
Attentioned Convolutional LSTM InpaintingNetwork for Anomaly Detection in Videos0
Stereotype Detection in LLMs: A Multiclass, Explainable, and Benchmark-Driven Approach0
Audit-LLM: Multi-Agent Collaboration for Log-based Insider Threat Detection0
Augmented Translation: A New Approach to Combining Human and Machine Capabilities0
Augmenting Autotelic Agents with Large Language Models0
A Unified Model for Video Understanding and Knowledge Embedding with Heterogeneous Knowledge Graph Dataset0
AUTO-DISCERN: Autonomous Driving Using Common Sense Reasoning0
Automatic Adaptation Rule Optimization via Large Language Models0
Automatic Enrichment of WordNet with Common-Sense Knowledge0
Automatic Evaluation of Commonsense Knowledge for Refining Japanese ConceptNet0
Show:102550
← PrevPage 14 of 19Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ST-MoE-32B 269B (fine-tuned)Accuracy96.1Unverified
2Unicorn 11B (fine-tuned)Accuracy91.3Unverified
3CompassMTL 567M with TailorAccuracy90.5Unverified
4CompassMTL 567MAccuracy89.6Unverified
5UnifiedQA 11B (fine-tuned)Accuracy89.4Unverified
6Claude 3 Opus (5-shot)Accuracy88.5Unverified
7GPT-4 (5-shot)Accuracy87.5Unverified
8ExDeBERTa 567MAccuracy87Unverified
9LLaMA-2 13B + MixLoRAAccuracy86.3Unverified
10LLaMA3 8B+MoSLoRAAccuracy85.8Unverified
#ModelMetricClaimedVerifiedStatus
1GPT-4 (few-shot, k=25)Accuracy96.4Unverified
2PaLM 2 (few-shot, CoT, SC)Accuracy95.1Unverified
3Shivaay (4B, few-shot, k=8)Accuracy91.04Unverified
4StupidLLMAccuracy91.03Unverified
5Claude 2 (few-shot, k=5)Accuracy91Unverified
6Claude 1.3 (few-shot, k=5)Accuracy90Unverified
7PaLM 540B (Self Improvement, Self Consistency)Accuracy89.8Unverified
8PaLM 540B (Self Consistency)Accuracy88.7Unverified
9PaLM 540B (Self Improvement, CoT Prompting)Accuracy88.3Unverified
10PaLM 540B (Self Improvement, Standard-Prompting)Accuracy87.2Unverified
#ModelMetricClaimedVerifiedStatus
1ST-MoE-32B 269B (fine-tuned)Accuracy95.2Unverified
2LLaMA 3 8B+MoSLoRA (fine-tuned)Accuracy90.5Unverified
3PaLM 2-L (1-shot)Accuracy89.7Unverified
4PaLM 2-M (1-shot)Accuracy88Unverified
5LLaMA-3 8B + MixLoRAAccuracy86.5Unverified
6Camelidae-8×34BAccuracy86.2Unverified
7PaLM 2-S (1-shot)Accuracy85.6Unverified
8LLaMA 65B + CFG (0-shot)Accuracy84.2Unverified
9GAL 120B (0-shot)Accuracy83.8Unverified
10LLaMA-2 13B + MixLoRAAccuracy83.5Unverified
#ModelMetricClaimedVerifiedStatus
1Turing NLR v5 XXL 5.4B (fine-tuned)EM95.9Unverified
2ST-MoE-32B 269B (fine-tuned)EM95.1Unverified
3T5-11BF194.1Unverified
4DeBERTa-1.5BEM94.1Unverified
5PaLM 540B (finetuned)EM94Unverified
6Vega v2 6B (fine-tuned)EM93.9Unverified
7PaLM 2-L (one-shot)F193.8Unverified
8T5-XXL 11B (fine-tuned)EM93.4Unverified
9PaLM 2-M (one-shot)F192.4Unverified
10PaLM 2-S (one-shot)F192.1Unverified