Logical Reasoning

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 401–450 of 747 papers

Title	Date	Tasks	Status	Hype
Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation	Dec 5, 2023	Logical Reasoning	CodeCode Available	2
Deciphering Digital Detectives: Understanding LLM Behaviors and Capabilities in Multi-Agent Mystery Games	Dec 1, 2023	AI AgentIn-Context Learning	—Unverified	0
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI	Nov 27, 2023	Complex Query AnsweringLogical Reasoning	CodeCode Available	5
Generation of Explanations for Logic Reasoning	Nov 22, 2023	Logical ReasoningPhilosophy	—Unverified	0
Enhancing Logical Reasoning in Large Language Models to Facilitate Legal Applications	Nov 22, 2023	FairnessLegal Reasoning	—Unverified	0
De-fine: Decomposing and Refining Visual Programs with Auto-Feedback	Nov 21, 2023	Logical Reasoning	—Unverified	0
WatME: Towards Lossless Watermarking Through Lexical Redundancy	Nov 16, 2023	Instruction FollowingLanguage Modelling	—Unverified	0
FollowEval: A Multi-Dimensional Benchmark for Assessing the Instruction-Following Capability of Large Language Models	Nov 16, 2023	Instruction FollowingLogical Reasoning	—Unverified	0
Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs	Nov 16, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	1
A Closer Look at the Self-Verification Abilities of Large Language Models in Logical Reasoning	Nov 14, 2023	Logical FallaciesLogical Reasoning	CodeCode Available	0
Assessing Logical Puzzle Solving in Large Language Models: Insights from a Minesweeper Case Study	Nov 13, 2023	Logical ReasoningPrompt Engineering	CodeCode Available	0
From Complex to Simple: Unraveling the Cognitive Tree for Reasoning with Small Language Models	Nov 12, 2023	Language ModellingLogical Reasoning	—Unverified	0
Are LLMs Rigorous Logical Reasoner? Empowering Natural Language Proof Generation with Contrastive Stepwise Decoding	Nov 12, 2023	Language ModelingLanguage Modelling	—Unverified	0
Let's Reinforce Step by Step	Nov 10, 2023	GSM8KLogical Reasoning	—Unverified	0
Language Models can be Logical Solvers	Nov 10, 2023	Decision MakingLanguage Modeling	—Unverified	0
Chain of Images for Intuitively Reasoning	Nov 9, 2023	Common Sense ReasoningLanguage Modelling	CodeCode Available	1
COOL: A Constraint Object-Oriented Logic Programming Language and its Neural-Symbolic Compilation System	Nov 7, 2023	Logical Reasoning	—Unverified	0
Evaluating the Potential of Leading Large Language Models in Reasoning Biology Questions	Nov 5, 2023	Logical ReasoningMultiple-choice	—Unverified	0
Rule Learning as Machine Translation using the Atomic Knowledge Bank	Nov 5, 2023	Logical ReasoningMachine Translation	CodeCode Available	0
LLM4Drive: A Survey of Large Language Models for Autonomous Driving	Nov 2, 2023	Autonomous DrivingFew-Shot Learning	CodeCode Available	3
Noisy Exemplars Make Large Language Models More Robust: A Domain-Agnostic Behavioral Analysis	Nov 1, 2023	Logical ReasoningPrompt Engineering	CodeCode Available	0
Dynamics of Instruction Tuning: Each Ability of Large Language Models Has Its Own Growth Pace	Oct 30, 2023	Code GenerationLogical Reasoning	CodeCode Available	1
DetermLR: Augmenting LLM-based Logical Reasoning from Indeterminacy to Determinacy	Oct 28, 2023	Logical Reasoning	CodeCode Available	1
Generating by Understanding: Neural Visual Generation with Logical Symbol Groundings	Oct 26, 2023	DisentanglementLogical Reasoning	CodeCode Available	0
POE: Process of Elimination for Multiple Choice Reasoning	Oct 24, 2023	In-Context LearningLogical Reasoning	CodeCode Available	0
Breaking the Language Barrier: Improving Cross-Lingual Reasoning with Structured Self-Attention	Oct 23, 2023	Logical Reasoning	CodeCode Available	0
DetectGPT-SC: Improving Detection of Text Generated by Large Language Models through Self-Consistency with Masked Predictions	Oct 23, 2023	Logical ReasoningText Generation	—Unverified	0
Assessing Step-by-Step Reasoning against Lexical Negation: A Case Study on Syllogism	Oct 23, 2023	Logical ReasoningNegation	—Unverified	0
Plan, Verify and Switch: Integrated Reasoning with Diverse X-of-Thoughts	Oct 23, 2023	Logical ReasoningMath	CodeCode Available	1
LINC: A Neurosymbolic Approach for Logical Reasoning by Combining Language Models with First-Order Logic Provers	Oct 23, 2023	Logical Reasoning	CodeCode Available	1
Retrieval-Augmented Neural Response Generation Using Logical Reasoning and Relevance Scoring	Oct 20, 2023	Logical ReasoningResponse Generation	—Unverified	0
Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in the Real World	Oct 16, 2023	Few-Shot LearningForm	CodeCode Available	1
Assessing and Enhancing the Robustness of Large Language Models with Task Structure Variations for Logical Reasoning	Oct 13, 2023	Data AugmentationLogical Reasoning	CodeCode Available	1
Improving Large Language Models in Event Relation Logical Prediction	Oct 13, 2023	counterfactualEvent Relation Extraction	CodeCode Available	1
GLoRE: Evaluating Logical Reasoning of Large Language Models	Oct 13, 2023	Logical ReasoningNatural Language Understanding	CodeCode Available	1
The potential of large language models for improving probability learning: A study on ChatGPT3.5 and first-year computer engineering students	Oct 9, 2023	Language ModellingLogical Reasoning	—Unverified	0
Empower Nested Boolean Logic via Self-Supervised Curriculum Learning	Oct 9, 2023	Logical ReasoningSelf-Supervised Learning	CodeCode Available	0
DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers	Oct 5, 2023	DecoderLogical Reasoning	CodeCode Available	0
Instances Need More Care: Rewriting Prompts for Instances with LLMs in the Loop Yields Better Zero-Shot Performance	Oct 3, 2023	Code GenerationLogical Reasoning	CodeCode Available	0
Learning Reliable Logical Rules with SATNet	Oct 3, 2023	Logical Reasoning	—Unverified	0
Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models	Oct 2, 2023	Knowledge DistillationLanguage Modelling	—Unverified	0
DyVal: Dynamic Evaluation of Large Language Models for Reasoning Tasks	Sep 29, 2023	Logical Reasoning	—Unverified	0
Physics of Language Models: Part 3.2, Knowledge Manipulation	Sep 25, 2023	AttributeLanguage Modelling	—Unverified	0
EchoPrompt: Instructing the Model to Rephrase Queries for Improved In-context Learning	Sep 16, 2023	Date UnderstandingGSM8K	CodeCode Available	0
Do PLMs Know and Understand Ontological Knowledge?	Sep 12, 2023	Logical ReasoningMemorization	CodeCode Available	1
HAE-RAE Bench: Evaluation of Korean Knowledge in Language Models	Sep 6, 2023	General KnowledgeLogical Reasoning	CodeCode Available	1
On the Potential of CLIP for Compositional Logical Reasoning	Aug 30, 2023	Logical ReasoningVisual Reasoning	—Unverified	0
LR-XFL: Logical Reasoning-based Explainable Federated Learning	Aug 24, 2023	Federated LearningLogical Reasoning	CodeCode Available	0
Human Comprehensible Active Learning of Genome-Scale Metabolic Networks	Aug 24, 2023	Active LearningExperimental Design	—Unverified	0
LatEval: An Interactive LLMs Evaluation Benchmark with Incomplete Information from Lateral Thinking Puzzles	Aug 21, 2023	Logical Reasoning	CodeCode Available	1

Show:10 25 50

← PrevPage 9 of 15Next →

All datasets LingOly BIG-bench (Formal Fallacies Syllogisms Negation)BIG-bench (Penguins In A Table)BIG-bench (Reasoning About Colored Objects)BIG-bench (Temporal Sequences)BIG-bench (Logic Grid Puzzle)BIG-bench (StrategyQA)RuWorldTree Winograd Automatic BIG-bench (Logical Fallacy Detection)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Claude Opus	Delta_NoContext	28.8	—	Unverified
2	GPT-4o	Delta_NoContext	25.1	—	Unverified
3	Gemini 1.5 Pro	Delta_NoContext	23.4	—	Unverified
4	GPT-4	Delta_NoContext	21.5	—	Unverified
5	Command R+	Delta_NoContext	11.6	—	Unverified
6	GPT-3.5	Delta_NoContext	11.2	—	Unverified
7	Mixtral 8x7B	Delta_NoContext	6.4	—	Unverified
8	Llama 3 8B	Delta_NoContext	4.9	—	Unverified
9	Llama 3 70B	Delta_NoContext	2.9	—	Unverified
10	Gemma 7B	Delta_NoContext	2.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	PaLM 2 (few-shot, k=3, Direct)	Accuracy	64.8	—	Unverified
2	PaLM 2 (few-shot, k=3, CoT)	Accuracy	57.2	—	Unverified
3	OPT 66B (few-shot, k=3)	Accuracy	54	—	Unverified
4	PaLM 540B (few-shot, k=3)	Accuracy	53.6	—	Unverified
5	GPT-NeoX 20B (few-shot, k=3)	Accuracy	52.8	—	Unverified
6	BLOOM 176B (few-shot, k=3)	Accuracy	52.8	—	Unverified
7	Chinchilla-70B (few-shot, k=5)	Accuracy	52.1	—	Unverified
8	Bloomberg GPT 50B (few-shot, k=3)	Accuracy	50.8	—	Unverified
9	Gopher-280B (few-shot, k=5)	Accuracy	50.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	PaLM 2 (few-shot, k=3, CoT)	Accuracy	84.9	—	Unverified
2	PaLM 2 (few-shot, k=3, Direct)	Accuracy	65.8	—	Unverified
3	Chinchilla-70B (few-shot, k=5)	Accuracy	48.7	—	Unverified
4	PaLM 540B (few-shot, k=3)	Accuracy	44.5	—	Unverified
5	Gopher-280B (few-shot, k=5)	Accuracy	40.6	—	Unverified
6	BLOOM 176B (few-shot, k=3)	Accuracy	40.41	—	Unverified
7	Bloomberg GPT (few-shot, k=3)	Accuracy	37.67	—	Unverified
8	GPT-NeoX (few-shot, k=3)	Accuracy	33.56	—	Unverified
9	OPT 66B (few-shot, k=3)	Accuracy	28.08	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	PaLM 2 (few-shot, k=3, CoT)	Accuracy	91.2	—	Unverified
2	PaLM 2 (few-shot, k=3, Direct)	Accuracy	61.2	—	Unverified
3	Chinchilla-70B (few-shot, k=5)	Accuracy	59.7	—	Unverified
4	Gopher-280B (few-shot, k=5)	Accuracy	49.2	—	Unverified
5	PaLM 540B (few-shot, k=3)	Accuracy	38	—	Unverified
6	BLOOM 176B (few-shot, k=3)	Accuracy	36.8	—	Unverified
7	Bloomberg GPT (few-shot, k=3)	Accuracy	34.8	—	Unverified
8	OPT 66B (few-shot, k=3)	Accuracy	31.2	—	Unverified
9	GPT-NeoX (few-shot, k=3)	Accuracy	26	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	PaLM 2 (few-shot, k=3, CoT)	Accuracy	100	—	Unverified
2	PaLM 2 (few-shot, k=3, Direct)	Accuracy	96.4	—	Unverified
3	PaLM 540B (few-shot, k=3)	Accuracy	39.6	—	Unverified
4	BLOOM 176B (few-shot, k=3)	Accuracy	36.8	—	Unverified
5	Chinchilla-70B (few-shot, k=5)	Accuracy	32	—	Unverified
6	Bloomberg GPT (few-shot, k=3)	Accuracy	29.2	—	Unverified
7	OPT 66B (few-shot, k=3)	Accuracy	23.6	—	Unverified
8	GPT-NeoX (few-shot, k=3)	Accuracy	21.2	—	Unverified
9	Gopher-280B (few-shot, k=5)	Accuracy	19	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Chinchilla-70B (few-shot, k=5)	Accuracy	44	—	Unverified
2	PaLM-540B (few-shot, k=5)	Accuracy	42.4	—	Unverified
3	PaLM-62B (few-shot, k=5)	Accuracy	36.5	—	Unverified
4	Gopher-280B (few-shot, k=5)	Accuracy	35.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	PaLM-540B (few-shot, k=5)	Accuracy	73.9	—	Unverified
2	Chinchilla-70B (few-shot, k=5)	Accuracy	68.3	—	Unverified
3	PaLM-62B (few-shot, k=5)	Accuracy	65.4	—	Unverified
4	Gopher-280B (few-shot, k=5)	Accuracy	61	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Human benchmark	Accuracy	83.7	—	Unverified
2	RuGPT-3 Large	Accuracy	40.7	—	Unverified
3	RuGPT-3 Medium	Accuracy	38	—	Unverified
4	RuGPT-3 Small	Accuracy	34	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Human benchmark	Accuracy	87	—	Unverified
2	RuGPT-3 Small	Accuracy	57.9	—	Unverified
3	RuGPT-3 Medium	Accuracy	57.2	—	Unverified
4	RuGPT-3 Large	Accuracy	55.5	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Chinchilla-70B (few-shot, k=5)	Accuracy	72.1	—	Unverified
2	Gopher-280B (few-shot, k=5)	Accuracy	58.9	—	Unverified