SOTAVerified

Common Sense Reasoning

Common sense reasoning tasks are intended to require the model to go beyond pattern recognition. Instead, the model should use "common sense" or world knowledge to make inferences.

Papers

Showing 901939 of 939 papers

TitleStatusHype
IPPON: Common Sense Guided Informative Path Planning for Object Goal Navigation0
Irony Detection for Dutch: a Venture into the Implicit0
Is a 204 cm Man Tall or Small ? Acquisition of Numerical Common Sense from the Web0
Is “My Favorite New Movie” My Favorite Movie? Probing the Understanding of Recursive Noun Phrases0
Is the Elephant Flying? Resolving Ambiguities in Text-to-Image Generative Models0
ITCMA: A Generative Agent Based on a Computational Consciousness Structure0
It’s Commonsense, isn’t it? Demystifying Human Evaluations in Commonsense-Enhanced NLG Systems0
JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents0
JEPA4Rec: Learning Effective Language Representations for Sequential Recommendation via Joint Embedding Predictive Architecture0
JFCKB: Japanese Feature Change Knowledge Base0
KARGEN: Knowledge-enhanced Automated Radiology Report Generation Using Large Language Models0
KARNA at COIN Shared Task 1: Bidirectional Encoder Representations from Transformers with relational knowledge for machine comprehension with common sense0
Kernel Choice Matters for Boundary Inference Using Local Polynomial Density: With Application to Manipulation Testing0
Knowledge Acquisition Strategies for Goal-Oriented Dialog Systems0
Knowledge-Aware Conversation Derailment Forecasting Using Graph Convolutional Networks0
Knowledge Aware Semantic Concept Expansion for Image-Text Matching0
Knowledge Bases in Support of Large Language Models for Processing Web News0
0/1 Deep Neural Networks via Block Coordinate Descent0
Knowledge-Guided Recurrent Neural Network Learning for Task-Oriented Action Prediction0
Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models0
KnowRA: Knowledge Retrieval Augmented Method for Document-level Relation Extraction with Comprehensive Reasoning Abilities0
Know your exceptions: Towards an Ontology of Exceptions in Knowledge Representation0
KoGuN: Accelerating Deep Reinforcement Learning via Integrating Human Suboptimal Knowledge0
K-VQG: Knowledge-aware Visual Question Generation for Common-sense Acquisition0
K-XLNet: A General Method for Combining Explicit Knowledge with Language Model Pretraining0
Language Models as Fact Checkers?0
Large Language Models are Effective Priors for Causal Graph Discovery0
Large Language Models are Zero-Shot Recognizers for Activities of Daily Living0
Large Language Models as Common-Sense Heuristics0
Large Language Models as Theory of Mind Aware Generative Agents with Counterfactual Reflection0
Large Language Models Can Self-Improve0
Large Language Models Fail on Trivial Alterations to Theory-of-Mind Tasks0
Large-Scale Acquisition of Commonsense Knowledge via a Quiz Game on a Dialogue System0
Latest News in Computational Argumentation: Surfing on the Deep Learning Wave, Scuba Diving in the Abyss of Fundamental Questions0
LC-LLM: Explainable Lane-Change Intention and Trajectory Predictions with Large Language Models0
Learning-based Practical Smartphone Eavesdropping with Built-in Accelerometer0
Learning Common Sense Through Visual Abstraction0
Learning Continuous 3D Reconstructions for Geometrically Aware Grasping0
Learning Fine-Grained Knowledge about Contingent Relations between Everyday Events0
Show:102550
← PrevPage 19 of 19Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ST-MoE-32B 269B (fine-tuned)Accuracy96.1Unverified
2Unicorn 11B (fine-tuned)Accuracy91.3Unverified
3CompassMTL 567M with TailorAccuracy90.5Unverified
4CompassMTL 567MAccuracy89.6Unverified
5UnifiedQA 11B (fine-tuned)Accuracy89.4Unverified
6Claude 3 Opus (5-shot)Accuracy88.5Unverified
7GPT-4 (5-shot)Accuracy87.5Unverified
8ExDeBERTa 567MAccuracy87Unverified
9LLaMA-2 13B + MixLoRAAccuracy86.3Unverified
10LLaMA3 8B+MoSLoRAAccuracy85.8Unverified
#ModelMetricClaimedVerifiedStatus
1GPT-4 (few-shot, k=25)Accuracy96.4Unverified
2PaLM 2 (few-shot, CoT, SC)Accuracy95.1Unverified
3Shivaay (4B, few-shot, k=8)Accuracy91.04Unverified
4StupidLLMAccuracy91.03Unverified
5Claude 2 (few-shot, k=5)Accuracy91Unverified
6Claude 1.3 (few-shot, k=5)Accuracy90Unverified
7PaLM 540B (Self Improvement, Self Consistency)Accuracy89.8Unverified
8PaLM 540B (Self Consistency)Accuracy88.7Unverified
9PaLM 540B (Self Improvement, CoT Prompting)Accuracy88.3Unverified
10PaLM 540B (Self Improvement, Standard-Prompting)Accuracy87.2Unverified
#ModelMetricClaimedVerifiedStatus
1ST-MoE-32B 269B (fine-tuned)Accuracy95.2Unverified
2LLaMA 3 8B+MoSLoRA (fine-tuned)Accuracy90.5Unverified
3PaLM 2-L (1-shot)Accuracy89.7Unverified
4PaLM 2-M (1-shot)Accuracy88Unverified
5LLaMA-3 8B + MixLoRAAccuracy86.5Unverified
6Camelidae-8×34BAccuracy86.2Unverified
7PaLM 2-S (1-shot)Accuracy85.6Unverified
8LLaMA 65B + CFG (0-shot)Accuracy84.2Unverified
9GAL 120B (0-shot)Accuracy83.8Unverified
10LLaMA-2 13B + MixLoRAAccuracy83.5Unverified
#ModelMetricClaimedVerifiedStatus
1Turing NLR v5 XXL 5.4B (fine-tuned)EM95.9Unverified
2ST-MoE-32B 269B (fine-tuned)EM95.1Unverified
3T5-11BF194.1Unverified
4DeBERTa-1.5BEM94.1Unverified
5PaLM 540B (finetuned)EM94Unverified
6Vega v2 6B (fine-tuned)EM93.9Unverified
7PaLM 2-L (one-shot)F193.8Unverified
8T5-XXL 11B (fine-tuned)EM93.4Unverified
9PaLM 2-M (one-shot)F192.4Unverified
10PaLM 2-S (one-shot)F192.1Unverified