SOTAVerified

Common Sense Reasoning

Common sense reasoning tasks are intended to require the model to go beyond pattern recognition. Instead, the model should use "common sense" or world knowledge to make inferences.

Papers

Showing 251300 of 939 papers

TitleStatusHype
Rethinking Annotation for Object Detection: Is Annotating Small-size Instances Worth Its Cost?0
A Survey on Uncertainty Quantification of Large Language Models: Taxonomy, Open Research Challenges, and Future Directions0
A surprisal oracle for when every layer countsCode0
Let's Think Var-by-Var: Large Language Models Enable Ad Hoc Probabilistic Reasoning0
MALT: Improving Reasoning with Multi-Agent LLM Training0
Online Knowledge Integration for 3D Semantic Mapping: A Survey0
HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator0
Generating Out-Of-Distribution Scenarios Using Language Models0
Interactive and Expressive Code-Augmented Planning with Large Language Models0
GLOVER: Generalizable Open-Vocabulary Affordance Reasoning for Task-Oriented Grasping0
Improving Tool Retrieval by Leveraging Large Language Models for Query Generation0
Knowledge Bases in Support of Large Language Models for Processing Web News0
CLaSP: Learning Concepts for Time-Series Signals from Natural Language Supervision0
A little less conversation, a little more action, please: Investigating the physical common-sense of LLMs in a 3D embodied environmentCode0
Diffusion as Reasoning: Enhancing Object Goal Navigation with LLM-Biased Diffusion Model0
IPPON: Common Sense Guided Informative Path Planning for Object Goal Navigation0
From Blind Solvers to Logical Thinkers: Benchmarking LLMs' Logical Integrity on Faulty Mathematical Problems0
Robust RL with LLM-Driven Data Synthesis and Policy Adaptation for Autonomous Driving0
VLM See, Robot Do: Human Demo Video to Robot Action Plan via Vision Language Model0
Offline Inverse Constrained Reinforcement Learning for Safe-Critical Decision Making in Healthcare0
FusionSense: Bridging Common Sense, Vision, and Touch for Robust Sparse-View Reconstruction0
Learning Low-Level Causal Relations using a Simulated Robotic ArmCode0
Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs0
ConceptAgent: LLM-Driven Precondition Grounding and Tree Search for Robust Task Planning and Execution0
A Pluggable Common Sense-Enhanced Framework for Knowledge Graph Completion0
Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning0
LLM-Augmented Symbolic Reinforcement Learning with Landmark-Based Task Decomposition0
Can Models Learn Skill Composition from Examples?0
Rehearsing Answers to Probable Questions with Perspective-Taking0
Tag Map: A Text-Based Map for Spatial Reasoning and Navigation with Large Language Models0
KARGEN: Knowledge-enhanced Automated Radiology Report Generation Using Large Language Models0
CauseJudger: Identifying the Cause with LLMs for Abductive Logical Reasoning0
Structured Event Reasoning with Large Language Models0
Path-Consistency: Prefix Enhancement for Efficient Inference in LLM0
Knowledge-Aware Conversation Derailment Forecasting Using Graph Convolutional Networks0
Implicit Sentiment Analysis Based on Chain of Thought Prompting0
LLM-enhanced Scene Graph Learning for Household Rearrangement0
Tell Codec What Worth Compressing: Semantically Disentangled Image Coding for Machine with LMMs0
A Perspective on Large Language Models, Intelligent Machines, and Knowledge Acquisition0
Audit-LLM: Multi-Agent Collaboration for Log-based Insider Threat Detection0
Are Social Sentiments Inherent in LLMs? An Empirical Study on Extraction of Inter-demographic Sentiments0
From Recognition to Prediction: Leveraging Sequence Reasoning for Action AnticipationCode0
Affective Computing in the Era of Large Language Models: A Survey from the NLP Perspective0
Using Large Language Models for the Interpretation of Building Regulations0
A Reliable Common-Sense Reasoning Socialbot Built Using LLMs and Goal-Directed ASP0
Multi-turn Response Selection with Commonsense-enhanced Language Models0
Using GPT-4 to guide causal machine learning0
Robots Can Multitask Too: Integrating a Memory Architecture and LLMs for Enhanced Cross-Task Robot Action Generation0
Reconstruct the Pruned Model without Any Retraining0
NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models0
Show:102550
← PrevPage 6 of 19Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ST-MoE-32B 269B (fine-tuned)Accuracy96.1Unverified
2Unicorn 11B (fine-tuned)Accuracy91.3Unverified
3CompassMTL 567M with TailorAccuracy90.5Unverified
4CompassMTL 567MAccuracy89.6Unverified
5UnifiedQA 11B (fine-tuned)Accuracy89.4Unverified
6Claude 3 Opus (5-shot)Accuracy88.5Unverified
7GPT-4 (5-shot)Accuracy87.5Unverified
8ExDeBERTa 567MAccuracy87Unverified
9LLaMA-2 13B + MixLoRAAccuracy86.3Unverified
10LLaMA3 8B+MoSLoRAAccuracy85.8Unverified
#ModelMetricClaimedVerifiedStatus
1GPT-4 (few-shot, k=25)Accuracy96.4Unverified
2PaLM 2 (few-shot, CoT, SC)Accuracy95.1Unverified
3Shivaay (4B, few-shot, k=8)Accuracy91.04Unverified
4StupidLLMAccuracy91.03Unverified
5Claude 2 (few-shot, k=5)Accuracy91Unverified
6Claude 1.3 (few-shot, k=5)Accuracy90Unverified
7PaLM 540B (Self Improvement, Self Consistency)Accuracy89.8Unverified
8PaLM 540B (Self Consistency)Accuracy88.7Unverified
9PaLM 540B (Self Improvement, CoT Prompting)Accuracy88.3Unverified
10PaLM 540B (Self Improvement, Standard-Prompting)Accuracy87.2Unverified
#ModelMetricClaimedVerifiedStatus
1ST-MoE-32B 269B (fine-tuned)Accuracy95.2Unverified
2LLaMA 3 8B+MoSLoRA (fine-tuned)Accuracy90.5Unverified
3PaLM 2-L (1-shot)Accuracy89.7Unverified
4PaLM 2-M (1-shot)Accuracy88Unverified
5LLaMA-3 8B + MixLoRAAccuracy86.5Unverified
6Camelidae-8×34BAccuracy86.2Unverified
7PaLM 2-S (1-shot)Accuracy85.6Unverified
8LLaMA 65B + CFG (0-shot)Accuracy84.2Unverified
9GAL 120B (0-shot)Accuracy83.8Unverified
10LLaMA-2 13B + MixLoRAAccuracy83.5Unverified
#ModelMetricClaimedVerifiedStatus
1Turing NLR v5 XXL 5.4B (fine-tuned)EM95.9Unverified
2ST-MoE-32B 269B (fine-tuned)EM95.1Unverified
3T5-11BF194.1Unverified
4DeBERTa-1.5BEM94.1Unverified
5PaLM 540B (finetuned)EM94Unverified
6Vega v2 6B (fine-tuned)EM93.9Unverified
7PaLM 2-L (one-shot)F193.8Unverified
8T5-XXL 11B (fine-tuned)EM93.4Unverified
9PaLM 2-M (one-shot)F192.4Unverified
10PaLM 2-S (one-shot)F192.1Unverified