Logical Reasoning

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–150 of 747 papers

Title	Date	Tasks	Status	Hype
Reversal Blessing: Thinking Backward May Outpace Thinking Forward in Multi-choice Questions	Feb 25, 2025	Inductive BiasLogical Reasoning	—Unverified	0
Autoregressive Image Generation Guided by Chains of Thought	Feb 24, 2025	Image GenerationLogical Reasoning	—Unverified	0
Quantifying Logical Consistency in Transformers via Query-Key Alignment	Feb 24, 2025	Logical Reasoningvalid	—Unverified	0
AutoLogi: Automated Generation of Logic Puzzles for Evaluating Reasoning Abilities of Large Language Models	Feb 24, 2025	Logical ReasoningMultiple-choice	CodeCode Available	1
Logic Haystacks: Probing LLMs Long-Context Logical Reasoning (Without Easily Identifiable Unrelated Padding)	Feb 24, 2025	Logical ReasoningRetrieval	—Unverified	0
R1-Onevision：An Open-Source Multimodal Large Language Model Capable of Deep Reasoning	Feb 24, 2025	Language ModelingLanguage Modelling	CodeCode Available	4
Intermediate Languages Matter: Formal Choice Drives Neurosymbolic LLM Reasoning	Feb 24, 2025	In-Context LearningLogical Reasoning	—Unverified	0
From System 1 to System 2: A Survey of Reasoning Large Language Models	Feb 24, 2025	Logical Reasoning	CodeCode Available	5
Empowering LLMs with Logical Reasoning: A Comprehensive Survey	Feb 21, 2025	Logical ReasoningNegation	—Unverified	0
Identifying Features that Shape Perceived Consciousness in Large Language Model-based AI: A Quantitative Study of Human Responses	Feb 21, 2025	Language ModelingLanguage Modelling	—Unverified	0
On the logical skills of large language models: evaluations using arbitrarily complex first-order logic problems	Feb 20, 2025	Logical Reasoning	CodeCode Available	0
Triangulating LLM Progress through Benchmarks, Games, and Cognitive Tests	Feb 20, 2025	Logical ReasoningMMLU	—Unverified	0
A Mousetrap: Fooling Large Reasoning Models for Jailbreak with Chain of Iterative Chaos	Feb 19, 2025	Logical Reasoning	—Unverified	0
SPPD: Self-training with Process Preference Learning Using Dynamic Value Margin	Feb 19, 2025	GPULogical Reasoning	—Unverified	0
HopRAG: Multi-Hop Reasoning for Logic-Aware Retrieval-Augmented Generation	Feb 18, 2025	Logical ReasoningRAG	—Unverified	0
Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights	Feb 18, 2025	Arithmetic ReasoningCommon Sense Reasoning	—Unverified	0
Integrating Expert Knowledge into Logical Programs via LLMs	Feb 17, 2025	BenchmarkingLogical Reasoning	CodeCode Available	0
Unveiling the Magic of Code Reasoning through Hypothesis Decomposition and Amendment	Feb 17, 2025	HallucinationLogical Reasoning	CodeCode Available	2
Beyond Single-Task: Robust Multi-Task Length Generalization for LLMs	Feb 17, 2025	In-Context LearningLogical Reasoning	—Unverified	0
Exposing Numeracy Gaps: A Benchmark to Evaluate Fundamental Numerical Abilities in Large Language Models	Feb 16, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
Quantifying the Capability Boundary of DeepSeek Models: An Application-Driven Performance Analysis	Feb 16, 2025	Logical ReasoningModel Selection	—Unverified	0
Dialogue-based Explanations for Logical Reasoning using Structured Argumentation	Feb 16, 2025	Logical Reasoning	—Unverified	0
The Multilingual Mind : A Survey of Multilingual Reasoning in Language Models	Feb 13, 2025	Logical ReasoningSurvey	—Unverified	0
Logical Reasoning in Large Language Models: A Survey	Feb 13, 2025	Logical ReasoningSurvey	—Unverified	0
Logical Lease Litigation: Prolog and LLMs for Rental Law Compliance in New York	Feb 13, 2025	Legal ReasoningLogical Reasoning	—Unverified	0
Logical forms complement probability in understanding language model (and human) performance	Feb 13, 2025	Language ModelingLanguage Modelling	—Unverified	0
DMWM: Dual-Mind World Model with Long-Term Imagination	Feb 11, 2025	Logical Reasoning	—Unverified	0
Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation	Feb 10, 2025	Logical Reasoning	CodeCode Available	1
Structural Reformation of Large Language Model Neuron Encapsulation for Divergent Information Aggregation	Feb 10, 2025	Decision MakingLanguage Modeling	—Unverified	0
S^2-MAD: Breaking the Token Barrier to Enhance Multi-Agent Debate Efficiency	Feb 7, 2025	Logical Reasoning	—Unverified	0
SymAgent: A Neural-Symbolic Self-Learning Agent Framework for Complex Reasoning over Knowledge Graphs	Feb 5, 2025	Knowledge GraphsLogical Reasoning	—Unverified	0
Automating Mathematical Proof Generation Using Large Language Model Agents and Knowledge Graphs	Feb 4, 2025	Formal LogicKnowledge Graphs	—Unverified	0
Standard Neural Computation Alone Is Insufficient for Logical Intelligence	Feb 4, 2025	Inductive LearningLogical Reasoning	—Unverified	0
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning	Feb 3, 2025	Logical Reasoning	—Unverified	0
Enhancing Large Language Model Efficiencyvia Symbolic Compression: A Formal Approach Towards Interpretability	Jan 30, 2025	Code GenerationLanguage Modeling	—Unverified	0
Town Hall Debate Prompting: Enhancing Logical Reasoning in LLMs through Multi-Persona Interaction	Jan 28, 2025	Logical ReasoningMultiple-choice	—Unverified	0
Instantiation-based Formalization of Logical Reasoning Tasks using Language Models and Logical Solvers	Jan 28, 2025	Logical Reasoning	—Unverified	0
DBRouting: Routing End User Queries to Databases for Answerability	Jan 27, 2025	Logical ReasoningSemantic Parsing	—Unverified	0
SedarEval: Automated Evaluation using Self-Adaptive Rubrics	Jan 26, 2025	Logical Reasoning	CodeCode Available	0
A Causality-aware Paradigm for Evaluating Creativity of Multimodal Large Language Models	Jan 25, 2025	Logical Reasoning	—Unverified	0
JustLogic: A Comprehensive Benchmark for Evaluating Deductive Reasoning in Large Language Models	Jan 24, 2025	Logical Reasoning	CodeCode Available	0
VERUS-LM: a Versatile Framework for Combining LLMs with Symbolic Reasoning	Jan 24, 2025	Logical Reasoning	—Unverified	0
PIKE-RAG: sPecIalized KnowledgE and Rationale Augmented Generation	Jan 20, 2025	Language ModelingLanguage Modelling	CodeCode Available	7
Assessing the Alignment of FOL Closeness Metrics with Human Judgement	Jan 15, 2025	Logical ReasoningSensitivity	CodeCode Available	0
LeapVAD: A Leap in Autonomous Driving via Cognitive Perception and Dual-Process Thinking	Jan 14, 2025	Autonomous DrivingDecision Making	CodeCode Available	2
Reasoning with Graphs: Structuring Implicit Knowledge to Enhance LLMs Reasoning	Jan 14, 2025	Logical ReasoningMulti-hop Question Answering	—Unverified	0
TimeLogic: A Temporal Logic Benchmark for Video QA	Jan 13, 2025	2kAction Segmentation	—Unverified	0
Neural Probabilistic Circuits: Enabling Compositional and Interpretable Predictions through Logical Reasoning	Jan 13, 2025	Attributecounterfactual	—Unverified	0
Multimodal-to-Text Prompt Engineering in Large Language Models Using Feature Embeddings for GNSS Interference Characterization	Jan 9, 2025	Information RetrievalLogical Reasoning	—Unverified	0
Enhancing Transformers for Generalizable First-Order Logical Entailment	Jan 1, 2025	Logical ReasoningOut-of-Distribution Generalization	—Unverified	0

Show:10 25 50

← PrevPage 3 of 15Next →

All datasets LingOly BIG-bench (Formal Fallacies Syllogisms Negation)BIG-bench (Penguins In A Table)BIG-bench (Reasoning About Colored Objects)BIG-bench (Temporal Sequences)BIG-bench (Logic Grid Puzzle)BIG-bench (StrategyQA)RuWorldTree Winograd Automatic BIG-bench (Logical Fallacy Detection)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Claude Opus	Delta_NoContext	28.8	—	Unverified
2	GPT-4o	Delta_NoContext	25.1	—	Unverified
3	Gemini 1.5 Pro	Delta_NoContext	23.4	—	Unverified
4	GPT-4	Delta_NoContext	21.5	—	Unverified
5	Command R+	Delta_NoContext	11.6	—	Unverified
6	GPT-3.5	Delta_NoContext	11.2	—	Unverified
7	Mixtral 8x7B	Delta_NoContext	6.4	—	Unverified
8	Llama 3 8B	Delta_NoContext	4.9	—	Unverified
9	Llama 3 70B	Delta_NoContext	2.9	—	Unverified
10	Gemma 7B	Delta_NoContext	2.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	PaLM 2 (few-shot, k=3, Direct)	Accuracy	64.8	—	Unverified
2	PaLM 2 (few-shot, k=3, CoT)	Accuracy	57.2	—	Unverified
3	OPT 66B (few-shot, k=3)	Accuracy	54	—	Unverified
4	PaLM 540B (few-shot, k=3)	Accuracy	53.6	—	Unverified
5	GPT-NeoX 20B (few-shot, k=3)	Accuracy	52.8	—	Unverified
6	BLOOM 176B (few-shot, k=3)	Accuracy	52.8	—	Unverified
7	Chinchilla-70B (few-shot, k=5)	Accuracy	52.1	—	Unverified
8	Bloomberg GPT 50B (few-shot, k=3)	Accuracy	50.8	—	Unverified
9	Gopher-280B (few-shot, k=5)	Accuracy	50.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	PaLM 2 (few-shot, k=3, CoT)	Accuracy	84.9	—	Unverified
2	PaLM 2 (few-shot, k=3, Direct)	Accuracy	65.8	—	Unverified
3	Chinchilla-70B (few-shot, k=5)	Accuracy	48.7	—	Unverified
4	PaLM 540B (few-shot, k=3)	Accuracy	44.5	—	Unverified
5	Gopher-280B (few-shot, k=5)	Accuracy	40.6	—	Unverified
6	BLOOM 176B (few-shot, k=3)	Accuracy	40.41	—	Unverified
7	Bloomberg GPT (few-shot, k=3)	Accuracy	37.67	—	Unverified
8	GPT-NeoX (few-shot, k=3)	Accuracy	33.56	—	Unverified
9	OPT 66B (few-shot, k=3)	Accuracy	28.08	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	PaLM 2 (few-shot, k=3, CoT)	Accuracy	91.2	—	Unverified
2	PaLM 2 (few-shot, k=3, Direct)	Accuracy	61.2	—	Unverified
3	Chinchilla-70B (few-shot, k=5)	Accuracy	59.7	—	Unverified
4	Gopher-280B (few-shot, k=5)	Accuracy	49.2	—	Unverified
5	PaLM 540B (few-shot, k=3)	Accuracy	38	—	Unverified
6	BLOOM 176B (few-shot, k=3)	Accuracy	36.8	—	Unverified
7	Bloomberg GPT (few-shot, k=3)	Accuracy	34.8	—	Unverified
8	OPT 66B (few-shot, k=3)	Accuracy	31.2	—	Unverified
9	GPT-NeoX (few-shot, k=3)	Accuracy	26	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	PaLM 2 (few-shot, k=3, CoT)	Accuracy	100	—	Unverified
2	PaLM 2 (few-shot, k=3, Direct)	Accuracy	96.4	—	Unverified
3	PaLM 540B (few-shot, k=3)	Accuracy	39.6	—	Unverified
4	BLOOM 176B (few-shot, k=3)	Accuracy	36.8	—	Unverified
5	Chinchilla-70B (few-shot, k=5)	Accuracy	32	—	Unverified
6	Bloomberg GPT (few-shot, k=3)	Accuracy	29.2	—	Unverified
7	OPT 66B (few-shot, k=3)	Accuracy	23.6	—	Unverified
8	GPT-NeoX (few-shot, k=3)	Accuracy	21.2	—	Unverified
9	Gopher-280B (few-shot, k=5)	Accuracy	19	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Chinchilla-70B (few-shot, k=5)	Accuracy	44	—	Unverified
2	PaLM-540B (few-shot, k=5)	Accuracy	42.4	—	Unverified
3	PaLM-62B (few-shot, k=5)	Accuracy	36.5	—	Unverified
4	Gopher-280B (few-shot, k=5)	Accuracy	35.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	PaLM-540B (few-shot, k=5)	Accuracy	73.9	—	Unverified
2	Chinchilla-70B (few-shot, k=5)	Accuracy	68.3	—	Unverified
3	PaLM-62B (few-shot, k=5)	Accuracy	65.4	—	Unverified
4	Gopher-280B (few-shot, k=5)	Accuracy	61	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Human benchmark	Accuracy	83.7	—	Unverified
2	RuGPT-3 Large	Accuracy	40.7	—	Unverified
3	RuGPT-3 Medium	Accuracy	38	—	Unverified
4	RuGPT-3 Small	Accuracy	34	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Human benchmark	Accuracy	87	—	Unverified
2	RuGPT-3 Small	Accuracy	57.9	—	Unverified
3	RuGPT-3 Medium	Accuracy	57.2	—	Unverified
4	RuGPT-3 Large	Accuracy	55.5	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Chinchilla-70B (few-shot, k=5)	Accuracy	72.1	—	Unverified
2	Gopher-280B (few-shot, k=5)	Accuracy	58.9	—	Unverified