Logical Reasoning

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 151–200 of 747 papers

Title	Date	Tasks	Status	Hype	Score
Natural Language Reasoning, A Survey	Mar 26, 2023	Logical ReasoningMathematical Reasoning	CodeCode Available	1	5
Counterfactual reasoning: Testing language models' understanding of hypothetical scenarios	May 26, 2023	counterfactualCounterfactual Reasoning	CodeCode Available	1	5
Conditional and Modal Reasoning in Large Language Models	Jan 30, 2024	Logical Reasoning	CodeCode Available	1	5
ExAIS: Executable AI Semantics	Feb 20, 2022	Logical Reasoningvalid	CodeCode Available	1	5
Cross from Left to Right Brain: Adaptive Text Dreamer for Vision-and-Language Navigation	May 27, 2025	Large Language ModelLogical Reasoning	CodeCode Available	1	5
Complex Logical Reasoning over Knowledge Graphs using Large Language Models	May 2, 2023	Knowledge GraphsLogical Reasoning	CodeCode Available	1	5
AdaLoGN: Adaptive Logic Graph Network for Reasoning-Based Machine Reading Comprehension	Mar 16, 2022	Logical ReasoningMachine Reading Comprehension	CodeCode Available	1	5
DetermLR: Augmenting LLM-based Logical Reasoning from Indeterminacy to Determinacy	Oct 28, 2023	Logical Reasoning	CodeCode Available	1	5
Enhancing the Geometric Problem-Solving Ability of Multimodal LLMs via Symbolic-Neural Integration	Apr 17, 2025	Geometry Problem SolvingLarge Language Model	CodeCode Available	1	5
COLLIE: Systematic Construction of Constrained Text Generation Tasks	Jul 17, 2023	Logical ReasoningSentence	CodeCode Available	1	5
GLoRE: Evaluating Logical Reasoning of Large Language Models	Oct 13, 2023	Logical ReasoningNatural Language Understanding	CodeCode Available	1	5
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models	Oct 7, 2024	GSM8KLogical Reasoning	CodeCode Available	1	5
End-to-end Algorithm Synthesis with Recurrent Networks: Logical Extrapolation Without Overthinking	Feb 11, 2022	Logical Reasoning	CodeCode Available	1	5
Measuring Systematic Generalization in Neural Proof Generation with Transformers	Sep 30, 2020	Automated Theorem ProvingLogical Reasoning	CodeCode Available	1	5
OMGEval: An Open Multilingual Generative Evaluation Benchmark for Large Language Models	Feb 21, 2024	General KnowledgeLogical Reasoning	CodeCode Available	1	5
On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning	Dec 15, 2022	Instruction FollowingLanguage Modeling	CodeCode Available	1	5
BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs	May 18, 2025	Logical Reasoning	CodeCode Available	1	5
MERIt: Meta-Path Guided Contrastive Learning for Logical Reasoning	Mar 1, 2022	Contrastive Learningcounterfactual	CodeCode Available	1	5
Deductive Verification of Chain-of-Thought Reasoning	Jun 6, 2023	Logical Reasoning	CodeCode Available	1	5
Domain Specific Question Answering Over Knowledge Graphs Using Logical Programming and Large Language Models	Mar 3, 2023	Knowledge GraphsLogical Reasoning	CodeCode Available	1	5
A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners	Jun 16, 2024	Logical Reasoning	CodeCode Available	1	5
ElecBench: a Power Dispatch Evaluation Benchmark for Large Language Models	Jul 7, 2024	FairnessGeneral Knowledge	CodeCode Available	1	5
Beta Embeddings for Multi-Hop Logical Reasoning in Knowledge Graphs	Oct 22, 2020	Complex Query AnsweringKnowledge Graphs	CodeCode Available	1	5
IDOL: Indicator-oriented Logic Pre-training for Logical Reasoning	Jun 27, 2023	Logical ReasoningMachine Reading Comprehension	CodeCode Available	1	5
ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression	Dec 4, 2024	2kLogical Reasoning	CodeCode Available	1	5
Explicit Planning Helps Language Models in Logical Reasoning	Mar 28, 2023	Logical ReasoningMultiple-choice	CodeCode Available	1	5
Do PLMs Know and Understand Ontological Knowledge?	Sep 12, 2023	Logical ReasoningMemorization	CodeCode Available	1	5
AI Descartes: Combining Data and Theory for Derivable Scientific Discovery	Sep 3, 2021	Automated Theorem ProvingBIG-bench Machine Learning	CodeCode Available	1	5
Enhancing Multilingual Language Model with Massive Multilingual Knowledge Triples	Nov 22, 2021	Knowledge GraphsLanguage Modeling	CodeCode Available	1	5
Discriminative Reasoning for Document-level Relation Extraction	Jun 3, 2021	Document-level Relation ExtractionLogical Reasoning	CodeCode Available	1	5
Mind Reasoning Manners: Enhancing Type Perception for Generalized Zero-shot Logical Reasoning over Text	Jan 8, 2023	Contrastive LearningLogical Reasoning	CodeCode Available	1	5
Neural Collaborative Reasoning	May 16, 2020	Collaborative FilteringDecision Making	CodeCode Available	1	5
NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning	May 21, 2025	General Reinforcement LearningLogical Reasoning	CodeCode Available	1	5
Classifying Conspiratorial Narratives At Scale: False Alarms and Erroneous Connections	Mar 29, 2024	Logical Reasoning	CodeCode Available	0	5
Assessing the Alignment of FOL Closeness Metrics with Human Judgement	Jan 15, 2025	Logical ReasoningSensitivity	CodeCode Available	0	5
LogiQA 2.0—An Improved Dataset for Logical Reasoning in Natural Language Understanding	Jun 6, 2023	Logical ReasoningLogical Reasoning Reading Comprehension	CodeCode Available	0	5
LR-IAD:Mask-Free Industrial Anomaly Detection with Logical Reasoning	Apr 28, 2025	Anomaly DetectionLogical Reasoning	CodeCode Available	0	5
ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart Understanding	May 25, 2025	Chart UnderstandingLogical Reasoning	CodeCode Available	0	5
A Closer Look at the Self-Verification Abilities of Large Language Models in Logical Reasoning	Nov 14, 2023	Logical FallaciesLogical Reasoning	CodeCode Available	0	5
LR-XFL: Logical Reasoning-based Explainable Federated Learning	Aug 24, 2023	Federated LearningLogical Reasoning	CodeCode Available	0	5
Chains of Reasoning over Entities, Relations, and Text using Recurrent Neural Networks	Jul 5, 2016	Logical Reasoning	CodeCode Available	0	5
Assessing Logical Reasoning Capabilities of Encoder-Only Transformer Models	Dec 18, 2023	Logical Reasoning	CodeCode Available	0	5
Assessing Logical Puzzle Solving in Large Language Models: Insights from a Minesweeper Case Study	Nov 13, 2023	Logical ReasoningPrompt Engineering	CodeCode Available	0	5
Aligning Knowledge Graphs Provided by Humans and Generated from Neural Networks in Specific Tasks	Apr 23, 2024	Knowledge GraphsLogical Reasoning	CodeCode Available	0	5
Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models	Sep 26, 2024	Logical Reasoning	CodeCode Available	0	5
LogicPro: Improving Complex Logical Reasoning via Program-Guided Learning	Sep 19, 2024	GSM8KLogical Reasoning	CodeCode Available	0	5
Logical Reasoning with Span-Level Predictions for Interpretable and Robust NLI Models	May 23, 2022	Logical ReasoningNatural Language Inference	CodeCode Available	0	5
A Closer Look at Logical Reasoning with LLMs: The Choice of Tool Matters	Jun 1, 2024	Logical ReasoningTranslation	CodeCode Available	0	5
Logical Tasks for Measuring Extrapolation and Rule Comprehension	Nov 14, 2022	Inductive BiasLogical Reasoning	CodeCode Available	0	5
Aristotle: Mastering Logical Reasoning with A Logic-Complete Decompose-Search-Resolve Framework	Dec 22, 2024	Logical Reasoning	CodeCode Available	0	5

Show:10 25 50

← PrevPage 4 of 15Next →

All datasets LingOly BIG-bench (Formal Fallacies Syllogisms Negation)BIG-bench (Penguins In A Table)BIG-bench (Reasoning About Colored Objects)BIG-bench (Temporal Sequences)BIG-bench (Logic Grid Puzzle)BIG-bench (StrategyQA)RuWorldTree Winograd Automatic BIG-bench (Logical Fallacy Detection)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Claude Opus	Delta_NoContext	28.8	—	Unverified
2	GPT-4o	Delta_NoContext	25.1	—	Unverified
3	Gemini 1.5 Pro	Delta_NoContext	23.4	—	Unverified
4	GPT-4	Delta_NoContext	21.5	—	Unverified
5	Command R+	Delta_NoContext	11.6	—	Unverified
6	GPT-3.5	Delta_NoContext	11.2	—	Unverified
7	Mixtral 8x7B	Delta_NoContext	6.4	—	Unverified
8	Llama 3 8B	Delta_NoContext	4.9	—	Unverified
9	Llama 3 70B	Delta_NoContext	2.9	—	Unverified
10	Gemma 7B	Delta_NoContext	2.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	PaLM 2 (few-shot, k=3, Direct)	Accuracy	64.8	—	Unverified
2	PaLM 2 (few-shot, k=3, CoT)	Accuracy	57.2	—	Unverified
3	OPT 66B (few-shot, k=3)	Accuracy	54	—	Unverified
4	PaLM 540B (few-shot, k=3)	Accuracy	53.6	—	Unverified
5	GPT-NeoX 20B (few-shot, k=3)	Accuracy	52.8	—	Unverified
6	BLOOM 176B (few-shot, k=3)	Accuracy	52.8	—	Unverified
7	Chinchilla-70B (few-shot, k=5)	Accuracy	52.1	—	Unverified
8	Bloomberg GPT 50B (few-shot, k=3)	Accuracy	50.8	—	Unverified
9	Gopher-280B (few-shot, k=5)	Accuracy	50.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	PaLM 2 (few-shot, k=3, CoT)	Accuracy	84.9	—	Unverified
2	PaLM 2 (few-shot, k=3, Direct)	Accuracy	65.8	—	Unverified
3	Chinchilla-70B (few-shot, k=5)	Accuracy	48.7	—	Unverified
4	PaLM 540B (few-shot, k=3)	Accuracy	44.5	—	Unverified
5	Gopher-280B (few-shot, k=5)	Accuracy	40.6	—	Unverified
6	BLOOM 176B (few-shot, k=3)	Accuracy	40.41	—	Unverified
7	Bloomberg GPT (few-shot, k=3)	Accuracy	37.67	—	Unverified
8	GPT-NeoX (few-shot, k=3)	Accuracy	33.56	—	Unverified
9	OPT 66B (few-shot, k=3)	Accuracy	28.08	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	PaLM 2 (few-shot, k=3, CoT)	Accuracy	91.2	—	Unverified
2	PaLM 2 (few-shot, k=3, Direct)	Accuracy	61.2	—	Unverified
3	Chinchilla-70B (few-shot, k=5)	Accuracy	59.7	—	Unverified
4	Gopher-280B (few-shot, k=5)	Accuracy	49.2	—	Unverified
5	PaLM 540B (few-shot, k=3)	Accuracy	38	—	Unverified
6	BLOOM 176B (few-shot, k=3)	Accuracy	36.8	—	Unverified
7	Bloomberg GPT (few-shot, k=3)	Accuracy	34.8	—	Unverified
8	OPT 66B (few-shot, k=3)	Accuracy	31.2	—	Unverified
9	GPT-NeoX (few-shot, k=3)	Accuracy	26	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	PaLM 2 (few-shot, k=3, CoT)	Accuracy	100	—	Unverified
2	PaLM 2 (few-shot, k=3, Direct)	Accuracy	96.4	—	Unverified
3	PaLM 540B (few-shot, k=3)	Accuracy	39.6	—	Unverified
4	BLOOM 176B (few-shot, k=3)	Accuracy	36.8	—	Unverified
5	Chinchilla-70B (few-shot, k=5)	Accuracy	32	—	Unverified
6	Bloomberg GPT (few-shot, k=3)	Accuracy	29.2	—	Unverified
7	OPT 66B (few-shot, k=3)	Accuracy	23.6	—	Unverified
8	GPT-NeoX (few-shot, k=3)	Accuracy	21.2	—	Unverified
9	Gopher-280B (few-shot, k=5)	Accuracy	19	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Chinchilla-70B (few-shot, k=5)	Accuracy	44	—	Unverified
2	PaLM-540B (few-shot, k=5)	Accuracy	42.4	—	Unverified
3	PaLM-62B (few-shot, k=5)	Accuracy	36.5	—	Unverified
4	Gopher-280B (few-shot, k=5)	Accuracy	35.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	PaLM-540B (few-shot, k=5)	Accuracy	73.9	—	Unverified
2	Chinchilla-70B (few-shot, k=5)	Accuracy	68.3	—	Unverified
3	PaLM-62B (few-shot, k=5)	Accuracy	65.4	—	Unverified
4	Gopher-280B (few-shot, k=5)	Accuracy	61	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Human benchmark	Accuracy	83.7	—	Unverified
2	RuGPT-3 Large	Accuracy	40.7	—	Unverified
3	RuGPT-3 Medium	Accuracy	38	—	Unverified
4	RuGPT-3 Small	Accuracy	34	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Human benchmark	Accuracy	87	—	Unverified
2	RuGPT-3 Small	Accuracy	57.9	—	Unverified
3	RuGPT-3 Medium	Accuracy	57.2	—	Unverified
4	RuGPT-3 Large	Accuracy	55.5	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Chinchilla-70B (few-shot, k=5)	Accuracy	72.1	—	Unverified
2	Gopher-280B (few-shot, k=5)	Accuracy	58.9	—	Unverified