SOTAVerified

Common Sense Reasoning

Common sense reasoning tasks are intended to require the model to go beyond pattern recognition. Instead, the model should use "common sense" or world knowledge to make inferences.

Papers

Showing 101150 of 939 papers

TitleStatusHype
VLM See, Robot Do: Human Demo Video to Robot Action Plan via Vision Language Model0
FusionSense: Bridging Common Sense, Vision, and Touch for Robust Sparse-View Reconstruction0
Offline Inverse Constrained Reinforcement Learning for Safe-Critical Decision Making in Healthcare0
Learning Low-Level Causal Relations using a Simulated Robotic ArmCode0
Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs0
ConceptAgent: LLM-Driven Precondition Grounding and Tree Search for Robust Task Planning and Execution0
PrefixQuant: Eliminating Outliers by Prefixed Tokens for Large Language Models QuantizationCode2
A Pluggable Common Sense-Enhanced Framework for Knowledge Graph Completion0
Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning0
LLM-Augmented Symbolic Reinforcement Learning with Landmark-Based Task Decomposition0
A Hitchhikers Guide to Fine-Grained Face Forgery Detection Using Common Sense ReasoningCode1
Revisiting Essential and Nonessential Settings of Evidential Deep LearningCode1
Can Models Learn Skill Composition from Examples?0
Rehearsing Answers to Probable Questions with Perspective-Taking0
Tag Map: A Text-Based Map for Spatial Reasoning and Navigation with Large Language Models0
CauseJudger: Identifying the Cause with LLMs for Abductive Logical Reasoning0
KARGEN: Knowledge-enhanced Automated Radiology Report Generation Using Large Language Models0
Structured Event Reasoning with Large Language Models0
Path-Consistency: Prefix Enhancement for Efficient Inference in LLM0
Knowledge-Aware Conversation Derailment Forecasting Using Graph Convolutional Networks0
LLM-enhanced Scene Graph Learning for Household Rearrangement0
Implicit Sentiment Analysis Based on Chain of Thought Prompting0
Tell Codec What Worth Compressing: Semantically Disentangled Image Coding for Machine with LMMs0
A Perspective on Large Language Models, Intelligent Machines, and Knowledge Acquisition0
Audit-LLM: Multi-Agent Collaboration for Log-based Insider Threat Detection0
Are Social Sentiments Inherent in LLMs? An Empirical Study on Extraction of Inter-demographic Sentiments0
From Recognition to Prediction: Leveraging Sequence Reasoning for Action AnticipationCode0
Affective Computing in the Era of Large Language Models: A Survey from the NLP Perspective0
Using Large Language Models for the Interpretation of Building Regulations0
Multi-turn Response Selection with Commonsense-enhanced Language Models0
Using GPT-4 to guide causal machine learning0
A Reliable Common-Sense Reasoning Socialbot Built Using LLMs and Goal-Directed ASP0
Robots Can Multitask Too: Integrating a Memory Architecture and LLMs for Enhanced Cross-Task Robot Action Generation0
Reconstruct the Pruned Model without Any Retraining0
Addressing Image Hallucination in Text-to-Image Generation through Factual Image Retrieval0
NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models0
AIR-Bench 2024: A Safety Benchmark Based on Risk Categories from Regulations and Policies0
Model Surgery: Modulating LLM's Behavior Via Simple Parameter EditingCode1
Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs0
Improving Sample Efficiency of Reinforcement Learning with Background Knowledge from Large Language ModelsCode0
Whispering Experts: Neural Interventions for Toxicity Mitigation in Language Models0
Automatic Adaptation Rule Optimization via Large Language Models0
Large Language Models are Zero-Shot Recognizers for Activities of Daily Living0
Tokenize the World into Object-level Knowledge to Address Long-tail Events in Autonomous Driving0
RegMix: Data Mixture as Regression for Language Model Pre-trainingCode2
Human-Object Interaction from Human-Level Instructions0
Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language ModelsCode1
MR-MLLM: Mutual Reinforcement of Multimodal Comprehension and Vision Perception0
Human-AI collectives produce the most accurate differential diagnosesCode0
Improving Visual Commonsense in Language Models via Multiple Image GenerationCode1
Show:102550
← PrevPage 3 of 19Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ST-MoE-32B 269B (fine-tuned)Accuracy96.1Unverified
2Unicorn 11B (fine-tuned)Accuracy91.3Unverified
3CompassMTL 567M with TailorAccuracy90.5Unverified
4CompassMTL 567MAccuracy89.6Unverified
5UnifiedQA 11B (fine-tuned)Accuracy89.4Unverified
6Claude 3 Opus (5-shot)Accuracy88.5Unverified
7GPT-4 (5-shot)Accuracy87.5Unverified
8ExDeBERTa 567MAccuracy87Unverified
9LLaMA-2 13B + MixLoRAAccuracy86.3Unverified
10LLaMA3 8B+MoSLoRAAccuracy85.8Unverified
#ModelMetricClaimedVerifiedStatus
1GPT-4 (few-shot, k=25)Accuracy96.4Unverified
2PaLM 2 (few-shot, CoT, SC)Accuracy95.1Unverified
3Shivaay (4B, few-shot, k=8)Accuracy91.04Unverified
4StupidLLMAccuracy91.03Unverified
5Claude 2 (few-shot, k=5)Accuracy91Unverified
6Claude 1.3 (few-shot, k=5)Accuracy90Unverified
7PaLM 540B (Self Improvement, Self Consistency)Accuracy89.8Unverified
8PaLM 540B (Self Consistency)Accuracy88.7Unverified
9PaLM 540B (Self Improvement, CoT Prompting)Accuracy88.3Unverified
10PaLM 540B (Self Improvement, Standard-Prompting)Accuracy87.2Unverified
#ModelMetricClaimedVerifiedStatus
1ST-MoE-32B 269B (fine-tuned)Accuracy95.2Unverified
2LLaMA 3 8B+MoSLoRA (fine-tuned)Accuracy90.5Unverified
3PaLM 2-L (1-shot)Accuracy89.7Unverified
4PaLM 2-M (1-shot)Accuracy88Unverified
5LLaMA-3 8B + MixLoRAAccuracy86.5Unverified
6Camelidae-8×34BAccuracy86.2Unverified
7PaLM 2-S (1-shot)Accuracy85.6Unverified
8LLaMA 65B + CFG (0-shot)Accuracy84.2Unverified
9GAL 120B (0-shot)Accuracy83.8Unverified
10LLaMA-2 13B + MixLoRAAccuracy83.5Unverified
#ModelMetricClaimedVerifiedStatus
1Turing NLR v5 XXL 5.4B (fine-tuned)EM95.9Unverified
2ST-MoE-32B 269B (fine-tuned)EM95.1Unverified
3T5-11BF194.1Unverified
4DeBERTa-1.5BEM94.1Unverified
5PaLM 540B (finetuned)EM94Unverified
6Vega v2 6B (fine-tuned)EM93.9Unverified
7PaLM 2-L (one-shot)F193.8Unverified
8T5-XXL 11B (fine-tuned)EM93.4Unverified
9PaLM 2-M (one-shot)F192.4Unverified
10PaLM 2-S (one-shot)F192.1Unverified