SOTAVerified|Agents Browse Leaderboard About Blog

Common Sense Reasoning

Common sense reasoning tasks are intended to require the model to go beyond pattern recognition. Instead, the model should use "common sense" or world knowledge to make inferences.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–10 of 939 papers

Title	Date	Tasks	Status	Hype
Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes	Jul 17, 2025	Common Sense ReasoningWorld Knowledge	—Unverified	0
LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization	Jul 6, 2025	Common Sense Reasoningparameter-efficient fine-tuning	CodeCode Available	0
CheckManual: A New Challenge and Benchmark for Manual-based Appliance Manipulation	Jun 11, 2025	Common Sense ReasoningQuestion Answering	—Unverified	0
EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits	Jun 11, 2025	Artifact DetectionCaption Generation	—Unverified	0
Prime the search: Using large language models for guiding geometric task and motion planning by warm-starting tree search	Jun 8, 2025	Common Sense ReasoningMotion Planning	CodeCode Available	0
AmbiK: Dataset of Ambiguous Tasks in Kitchen Environment	Jun 4, 2025	Common Sense Reasoning	CodeCode Available	0
ATLAS: Learning to Optimally Memorize the Context at Test Time	May 29, 2025	Common Sense ReasoningLanguage Modeling	—Unverified	0
Spatial Knowledge Graph-Guided Multimodal Synthesis	May 28, 2025	Common Sense ReasoningKnowledge Graphs	—Unverified	0
CaseEdit: Enhancing Localized Commonsense Reasoning via Null-Space Constrained Knowledge Editing in Small Parameter Language Models	May 26, 2025	Common Sense ReasoningComputational Efficiency	—Unverified	0
Align-GRAG: Reasoning-Guided Dual Alignment for Graph Retrieval-Augmented Generation	May 22, 2025	Common Sense ReasoningInformation Retrieval	—Unverified	0

Show:10 25 50

← PrevPage 1 of 94Next →

All datasets WinoGrande arc_challenge arc_easy ReCoRD CommonsenseQA PARus RuCoS RWSD BIG-bench (Causal Judgment)BIG-bench (Date Understanding)BIG-bench (Disambiguation QA)BIG-bench (Sports Understanding)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Human Benchmark	Average F1	0.93	—	Unverified
2	Golden Transformer	Average F1	0.92	—	Unverified
3	YaLM 1.0B few-shot	Average F1	0.86	—	Unverified
4	ruT5-large-finetune	Average F1	0.81	—	Unverified
5	ruT5-base-finetune	Average F1	0.79	—	Unverified
6	ruBert-base finetune	Average F1	0.74	—	Unverified
7	ruRoberta-large finetune	Average F1	0.73	—	Unverified
8	ruBert-large finetune	Average F1	0.68	—	Unverified
9	RuGPT3XL few-shot	Average F1	0.67	—	Unverified
10	MT5 Large	Average F1	0.57	—	Unverified