Logical Reasoning

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 351–400 of 747 papers

Title	Date	Tasks	Status
SymAgent: A Neural-Symbolic Self-Learning Agent Framework for Complex Reasoning over Knowledge Graphs	Feb 5, 2025	Knowledge GraphsLogical Reasoning	—Unverified
Symbol Correctness in Deep Neural Networks Containing Symbolic Layers	Feb 6, 2024	Logical ReasoningTransfer Learning	—Unverified
Symbolic-AI-Fusion Deep Learning (SAIF-DL): Encoding Knowledge into Training with Answer Set Programming Loss Penalties by a Novel Loss Function Approach	Nov 13, 2024	Logical Reasoning	—Unverified
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models	Feb 20, 2024	Instruction FollowingLogical Reasoning	—Unverified
System Prompt Poisoning: Persistent Attacks on Large Language Models Beyond User Injection	May 10, 2025	Logical ReasoningRAG	—Unverified
Table-based Fact Verification with Self-adaptive Mixture of Experts	Nov 16, 2021	Fact VerificationLogical Reasoning	—Unverified
Teaching Pretrained Models with Commonsense Reasoning: A Preliminary KB-Based Approach	Sep 20, 2019	Few-Shot LearningLogical Reasoning	—Unverified
TeleMath: A Benchmark for Large Language Models in Telecom Mathematical Problem Solving	Jun 12, 2025	Logical ReasoningMathematical Problem-Solving	—Unverified
TensorLog: Deep Learning Meets Probabilistic DBs	Jul 17, 2017	Deep LearningLogical Reasoning	—Unverified
Testing and Evaluation of Large Language Models: Correctness, Non-Toxicity, and Fairness	Aug 31, 2024	FairnessLanguage Modeling	—Unverified
Testing Uncertainty of Large Language Models for Physics Knowledge and Reasoning	Nov 18, 2024	Logical ReasoningMultiple-choice	—Unverified
The Dark Side of Explanations: Poisoning Recommender Systems with Counterfactual Examples	Apr 30, 2023	counterfactualCounterfactual Explanation	—Unverified
The General Theory of General Intelligence: A Pragmatic Patternist Perspective	Mar 28, 2021	ClusteringEthics	—Unverified
The Good, The Bad, and Why: Unveiling Emotions in Generative AI	Dec 18, 2023	Logical Reasoning	—Unverified
The Multilingual Mind : A Survey of Multilingual Reasoning in Language Models	Feb 13, 2025	Logical ReasoningSurvey	—Unverified
The neural correlates of logical-mathematical symbol systems processing resemble that of spatial cognition more than natural language processing	Jun 20, 2024	Logical Reasoning	—Unverified
The potential of large language models for improving probability learning: A study on ChatGPT3.5 and first-year computer engineering students	Oct 9, 2023	Language ModellingLogical Reasoning	—Unverified
The RatioLog Project: Rational Extensions of Logical Reasoning	Mar 20, 2015	BIG-bench Machine LearningCommon Sense Reasoning	—Unverified
The Society of HiveMind: Multi-Agent Optimization of Foundation Model Swarms to Unlock the Potential of Collective Intelligence	Mar 7, 2025	Logical ReasoningWorld Knowledge	—Unverified
The theory of quantitative trading	Dec 27, 2021	ArticlesLogical Reasoning	—Unverified
Think Beyond Size: Adaptive Prompting for More Effective Reasoning	Oct 10, 2024	Arithmetic ReasoningComputational Efficiency	—Unverified
Thinking Like an Expert:Multimodal Hypergraph-of-Thought (HoT) Reasoning to boost Foundation Modals	Aug 11, 2023	Graph LearningLogical Reasoning	—Unverified
Time-aware Self-Attention Meets Logic Reasoning in Recommender Systems	Aug 29, 2022	Logical ReasoningRecommendation Systems	—Unverified
TimeLogic: A Temporal Logic Benchmark for Video QA	Jan 13, 2025	2kAction Segmentation	—Unverified
Knowledge-based and Data-driven Reasoning and Learning for Ad Hoc Teamwork	Aug 24, 2022	Decision MakingIncremental Learning	—Unverified
Towards a Theory of Intentions for Human-Robot Collaboration	Jul 31, 2019	Computational EfficiencyLogical Reasoning	—Unverified
Towards Better Response Times and Higher-Quality Queries in Interactive Knowledge Base Debugging	Sep 8, 2016	Active LearningLogical Reasoning	—Unverified
Towards Competent AI for Fundamental Analysis in Finance: A Benchmark Dataset and Evaluation	May 22, 2025	Financial AnalysisLogical Reasoning	—Unverified
SarcasmBench: Towards Evaluating Large Language Models on Sarcasm Understanding	Aug 21, 2024	Logical ReasoningMathematical Reasoning	—Unverified
Towards Generalist Prompting for Large Language Models by Mental Models	Feb 28, 2024	Logical Reasoning	—Unverified
Towards Human-Compatible XAI: Explaining Data Differentials with Concept Induction over Background Knowledge	Sep 27, 2022	Explainable Artificial Intelligence (XAI)Logical Reasoning	—Unverified
Towards Ideal Semantics for Analyzing Stream Reasoning	May 20, 2015	Logical Reasoning	—Unverified
Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models	Oct 2, 2023	Knowledge DistillationLanguage Modelling	—Unverified
Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models	Mar 12, 2025	Logical ReasoningSurvey	—Unverified
Towards Superior Quantization Accuracy: A Layer-sensitive Approach	Mar 9, 2025	Logical ReasoningModel Compression	—Unverified
Towards Unifying Logical Entailment and Statistical Estimation	Feb 27, 2022	Formal LogicLogical Reasoning	—Unverified
Towards Unifying Perceptual Reasoning and Logical Reasoning	Jun 27, 2022	Bayesian InferenceLogical Reasoning	—Unverified
To What Extent Do Natural Language Understanding Datasets Correlate to Logical Reasoning? A Method for Diagnosing Logical Reasoning.	Oct 1, 2022	DiagnosticLogical Reasoning	—Unverified
Town Hall Debate Prompting: Enhancing Logical Reasoning in LLMs through Multi-Persona Interaction	Jan 28, 2025	Logical ReasoningMultiple-choice	—Unverified
Beyond Single-Task: Robust Multi-Task Length Generalization for LLMs	Feb 17, 2025	In-Context LearningLogical Reasoning	—Unverified
Transformer-based Language Models for Reasoning in the Description Logic ALCQ	Oct 12, 2024	Logical Reasoning	—Unverified
Triangulating LLM Progress through Benchmarks, Games, and Cognitive Tests	Feb 20, 2025	Logical ReasoningMMLU	—Unverified
Truth Table Deep Convolutional Neural Network, A New SAT-Encodable Architecture - Application To Complete Robustness	Sep 29, 2021	Explainable Artificial Intelligence (XAI)Explanation Generation	—Unverified
A Scalable, Interpretable, Verifiable & Differentiable Logic Gate Convolutional Neural Network Architecture From Truth Tables	Aug 18, 2022	FairnessLogical Reasoning	—Unverified
TTT-Bench: A Benchmark for Evaluating Reasoning Ability with Simple and Novel Tic-Tac-Toe-style Games	Jun 11, 2025	Logical ReasoningMath	—Unverified
Type-dependent Prompt CycleQAG : Cycle Consistency for Multi-hop Question Generation	Oct 1, 2022	Answer GenerationLogical Reasoning	—Unverified
Unifying Neural Learning and Symbolic Reasoning for Spinal Medical Report Generation	Apr 28, 2020	Decision MakingGenerative Adversarial Network	—Unverified
Unifying Structure Reasoning and Language Model Pre-training for Complex Reasoning	Jan 21, 2023	Language ModelingLanguage Modelling	—Unverified
Unleash LLMs Potential for Recommendation by Coordinating Twin-Tower Dynamic Semantic Token Generator	Sep 14, 2024	Logical ReasoningRecommendation Systems	—Unverified
Unveiling Scoring Processes: Dissecting the Differences between LLMs and Human Graders in Automatic Scoring	Jul 4, 2024	Logical Reasoning	—Unverified

Show:10 25 50

← PrevPage 8 of 15Next →

All datasets LingOly BIG-bench (Formal Fallacies Syllogisms Negation)BIG-bench (Penguins In A Table)BIG-bench (Reasoning About Colored Objects)BIG-bench (Temporal Sequences)BIG-bench (Logic Grid Puzzle)BIG-bench (StrategyQA)RuWorldTree Winograd Automatic BIG-bench (Logical Fallacy Detection)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Claude Opus	Delta_NoContext	28.8	—	Unverified
2	GPT-4o	Delta_NoContext	25.1	—	Unverified
3	Gemini 1.5 Pro	Delta_NoContext	23.4	—	Unverified
4	GPT-4	Delta_NoContext	21.5	—	Unverified
5	Command R+	Delta_NoContext	11.6	—	Unverified
6	GPT-3.5	Delta_NoContext	11.2	—	Unverified
7	Mixtral 8x7B	Delta_NoContext	6.4	—	Unverified
8	Llama 3 8B	Delta_NoContext	4.9	—	Unverified
9	Llama 3 70B	Delta_NoContext	2.9	—	Unverified
10	Gemma 7B	Delta_NoContext	2.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	PaLM 2 (few-shot, k=3, Direct)	Accuracy	64.8	—	Unverified
2	PaLM 2 (few-shot, k=3, CoT)	Accuracy	57.2	—	Unverified
3	OPT 66B (few-shot, k=3)	Accuracy	54	—	Unverified
4	PaLM 540B (few-shot, k=3)	Accuracy	53.6	—	Unverified
5	GPT-NeoX 20B (few-shot, k=3)	Accuracy	52.8	—	Unverified
6	BLOOM 176B (few-shot, k=3)	Accuracy	52.8	—	Unverified
7	Chinchilla-70B (few-shot, k=5)	Accuracy	52.1	—	Unverified
8	Bloomberg GPT 50B (few-shot, k=3)	Accuracy	50.8	—	Unverified
9	Gopher-280B (few-shot, k=5)	Accuracy	50.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	PaLM 2 (few-shot, k=3, CoT)	Accuracy	84.9	—	Unverified
2	PaLM 2 (few-shot, k=3, Direct)	Accuracy	65.8	—	Unverified
3	Chinchilla-70B (few-shot, k=5)	Accuracy	48.7	—	Unverified
4	PaLM 540B (few-shot, k=3)	Accuracy	44.5	—	Unverified
5	Gopher-280B (few-shot, k=5)	Accuracy	40.6	—	Unverified
6	BLOOM 176B (few-shot, k=3)	Accuracy	40.41	—	Unverified
7	Bloomberg GPT (few-shot, k=3)	Accuracy	37.67	—	Unverified
8	GPT-NeoX (few-shot, k=3)	Accuracy	33.56	—	Unverified
9	OPT 66B (few-shot, k=3)	Accuracy	28.08	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	PaLM 2 (few-shot, k=3, CoT)	Accuracy	91.2	—	Unverified
2	PaLM 2 (few-shot, k=3, Direct)	Accuracy	61.2	—	Unverified
3	Chinchilla-70B (few-shot, k=5)	Accuracy	59.7	—	Unverified
4	Gopher-280B (few-shot, k=5)	Accuracy	49.2	—	Unverified
5	PaLM 540B (few-shot, k=3)	Accuracy	38	—	Unverified
6	BLOOM 176B (few-shot, k=3)	Accuracy	36.8	—	Unverified
7	Bloomberg GPT (few-shot, k=3)	Accuracy	34.8	—	Unverified
8	OPT 66B (few-shot, k=3)	Accuracy	31.2	—	Unverified
9	GPT-NeoX (few-shot, k=3)	Accuracy	26	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	PaLM 2 (few-shot, k=3, CoT)	Accuracy	100	—	Unverified
2	PaLM 2 (few-shot, k=3, Direct)	Accuracy	96.4	—	Unverified
3	PaLM 540B (few-shot, k=3)	Accuracy	39.6	—	Unverified
4	BLOOM 176B (few-shot, k=3)	Accuracy	36.8	—	Unverified
5	Chinchilla-70B (few-shot, k=5)	Accuracy	32	—	Unverified
6	Bloomberg GPT (few-shot, k=3)	Accuracy	29.2	—	Unverified
7	OPT 66B (few-shot, k=3)	Accuracy	23.6	—	Unverified
8	GPT-NeoX (few-shot, k=3)	Accuracy	21.2	—	Unverified
9	Gopher-280B (few-shot, k=5)	Accuracy	19	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Chinchilla-70B (few-shot, k=5)	Accuracy	44	—	Unverified
2	PaLM-540B (few-shot, k=5)	Accuracy	42.4	—	Unverified
3	PaLM-62B (few-shot, k=5)	Accuracy	36.5	—	Unverified
4	Gopher-280B (few-shot, k=5)	Accuracy	35.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	PaLM-540B (few-shot, k=5)	Accuracy	73.9	—	Unverified
2	Chinchilla-70B (few-shot, k=5)	Accuracy	68.3	—	Unverified
3	PaLM-62B (few-shot, k=5)	Accuracy	65.4	—	Unverified
4	Gopher-280B (few-shot, k=5)	Accuracy	61	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Human benchmark	Accuracy	83.7	—	Unverified
2	RuGPT-3 Large	Accuracy	40.7	—	Unverified
3	RuGPT-3 Medium	Accuracy	38	—	Unverified
4	RuGPT-3 Small	Accuracy	34	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Human benchmark	Accuracy	87	—	Unverified
2	RuGPT-3 Small	Accuracy	57.9	—	Unverified
3	RuGPT-3 Medium	Accuracy	57.2	—	Unverified
4	RuGPT-3 Large	Accuracy	55.5	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Chinchilla-70B (few-shot, k=5)	Accuracy	72.1	—	Unverified
2	Gopher-280B (few-shot, k=5)	Accuracy	58.9	—	Unverified