SOTAVerified

Logical Reasoning

Papers

Showing 451500 of 747 papers

TitleStatusHype
A Simple Generative Model of Logical Reasoning and Statistical Learning0
A (Simplified) Supreme Being Necessarily Exists, says the Computer: Computationally Explored Variants of Gödel's Ontological Argument0
Assessing SATNet's Ability to Solve the Symbol Grounding Problem0
Assessing Step-by-Step Reasoning against Lexical Negation: A Case Study on Syllogism0
Assessing the Reasoning Abilities of ChatGPT in the Context of Claim Verification0
A Study on Neuro-Symbolic Artificial Intelligence: Healthcare Perspectives0
A Survey of Knowledge Enhanced Pre-trained Models0
A Survey on State-of-the-art Techniques for Knowledge Graphs Construction and Challenges ahead0
A Synergistic Approach In Network Intrusion Detection By Neurosymbolic AI0
A Systematic Assessment of OpenAI o1-Preview for Higher Order Thinking in Education0
A Theoretical Solution of the Mind-Body Problem: An Operationalized Proof that no Purely Physical System Can Exhibit all the Properties of Human Consciousness0
Attribution-Scores and Causal Counterfactuals as Explanations in Artificial Intelligence0
Automated scholarly paper review: Concepts, technologies, and challenges0
Automated Theorem Provers Help Improve Large Language Model Reasoning0
Automating Mathematical Proof Generation Using Large Language Model Agents and Knowledge Graphs0
Autoregressive Image Generation Guided by Chains of Thought0
Axiom Learning and Belief Tracing for Transparent Decision Making in Robotics0
Balancing Exploration and Exploitation in LLM using Soft RLLF for Enhanced Negation Understanding0
Bayesian Entailment Hypothesis: How Brains Implement Monotonic and Non-monotonic Reasoning0
Bayes Meets Entailment and Prediction: Commonsense Reasoning with Non-monotonicity, Paraconsistency and Predictive Accuracy0
Beware of Words: Evaluating the Lexical Diversity of Conversational LLMs using ChatGPT as Case Study0
Beyond LLMs: Advancing the Landscape of Complex Reasoning0
Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models0
Bi-Chainer: Automated Large Language Models Reasoning with Bidirectional Chaining0
BloombergGPT: A Large Language Model for Finance0
Boosting Deductive Reasoning with Step Signals In RLHF0
Boosting Logical Reasoning in Large Language Models through a New Framework: The Graph of Thought0
Brainstorming Brings Power to Large Language Models of Knowledge Reasoning0
Bridging Technology and Humanities: Evaluating the Impact of Large Language Models on Social Sciences Research with DeepSeek-R10
BTPK-based interpretable method for NER tasks based on Talmudic Public Announcement Logic0
Building Trustworthy AI: Transparent AI Systems via Large Language Models, Ontologies, and Logical Reasoning (TranspNet)0
Can BERT Conduct Logical Reasoning? On the Difficulty of Learning to Reason from Data0
Can Large Language Models Reason? A Characterization via 3-SAT0
Can OpenAI o1 outperform humans in higher-order cognitive thinking?0
Cantor: Inspiring Multimodal Chain-of-Thought of MLLM0
Can Transformers Reason Logically? A Study in SAT Solving0
CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues0
CAPO: Reinforcing Consistent Reasoning in Medical Decision-Making0
Categorical Syllogisms Revisited: A Review of the Logical Reasoning Abilities of LLMs for Analyzing Categorical Syllogism0
CausalR: Causal Reasoning over Natural Language Rulebases0
CauseJudger: Identifying the Cause with LLMs for Abductive Logical Reasoning0
ChatABL: Abductive Learning via Natural Language Interaction with ChatGPT0
ChatGPT is a Remarkable Tool -- For Experts0
City-LEO: Toward Transparent City Management Using LLM with End-to-End Optimization0
CLR-Fact: Evaluating the Complex Logical Reasoning Capability of Large Language Models over Factual Knowledge0
CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning0
Cognitive Argumentation and the Suppression Task0
Combining Commonsense Reasoning and Knowledge Acquisition to Guide Deep Learning in Robotics0
Combining Domain-Specific Models and LLMs for Automated Disease Phenotyping from Survey Data0
Compositional Attention Networks for Interpretability in Natural Language Question Answering0
Show:102550
← PrevPage 10 of 15Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Claude OpusDelta_NoContext28.8Unverified
2GPT-4oDelta_NoContext25.1Unverified
3Gemini 1.5 ProDelta_NoContext23.4Unverified
4GPT-4Delta_NoContext21.5Unverified
5Command R+Delta_NoContext11.6Unverified
6GPT-3.5Delta_NoContext11.2Unverified
7Mixtral 8x7BDelta_NoContext6.4Unverified
8Llama 3 8BDelta_NoContext4.9Unverified
9Llama 3 70BDelta_NoContext2.9Unverified
10Gemma 7BDelta_NoContext2.2Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, Direct)Accuracy64.8Unverified
2PaLM 2 (few-shot, k=3, CoT)Accuracy57.2Unverified
3OPT 66B (few-shot, k=3)Accuracy54Unverified
4PaLM 540B (few-shot, k=3)Accuracy53.6Unverified
5GPT-NeoX 20B (few-shot, k=3)Accuracy52.8Unverified
6BLOOM 176B (few-shot, k=3)Accuracy52.8Unverified
7Chinchilla-70B (few-shot, k=5)Accuracy52.1Unverified
8Bloomberg GPT 50B (few-shot, k=3)Accuracy50.8Unverified
9Gopher-280B (few-shot, k=5)Accuracy50.7Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy84.9Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy65.8Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy48.7Unverified
4PaLM 540B (few-shot, k=3)Accuracy44.5Unverified
5Gopher-280B (few-shot, k=5)Accuracy40.6Unverified
6BLOOM 176B (few-shot, k=3)Accuracy40.41Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy37.67Unverified
8GPT-NeoX (few-shot, k=3)Accuracy33.56Unverified
9OPT 66B (few-shot, k=3)Accuracy28.08Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy91.2Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy61.2Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy59.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy49.2Unverified
5PaLM 540B (few-shot, k=3)Accuracy38Unverified
6BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy34.8Unverified
8OPT 66B (few-shot, k=3)Accuracy31.2Unverified
9GPT-NeoX (few-shot, k=3)Accuracy26Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy100Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy96.4Unverified
3PaLM 540B (few-shot, k=3)Accuracy39.6Unverified
4BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
5Chinchilla-70B (few-shot, k=5)Accuracy32Unverified
6Bloomberg GPT (few-shot, k=3)Accuracy29.2Unverified
7OPT 66B (few-shot, k=3)Accuracy23.6Unverified
8GPT-NeoX (few-shot, k=3)Accuracy21.2Unverified
9Gopher-280B (few-shot, k=5)Accuracy19Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy44Unverified
2PaLM-540B (few-shot, k=5)Accuracy42.4Unverified
3PaLM-62B (few-shot, k=5)Accuracy36.5Unverified
4Gopher-280B (few-shot, k=5)Accuracy35.1Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM-540B (few-shot, k=5)Accuracy73.9Unverified
2Chinchilla-70B (few-shot, k=5)Accuracy68.3Unverified
3PaLM-62B (few-shot, k=5)Accuracy65.4Unverified
4Gopher-280B (few-shot, k=5)Accuracy61Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy 83.7Unverified
2RuGPT-3 LargeAccuracy 40.7Unverified
3RuGPT-3 MediumAccuracy 38Unverified
4RuGPT-3 SmallAccuracy 34Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy87Unverified
2RuGPT-3 SmallAccuracy57.9Unverified
3RuGPT-3 MediumAccuracy57.2Unverified
4RuGPT-3 LargeAccuracy55.5Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy72.1Unverified
2Gopher-280B (few-shot, k=5)Accuracy58.9Unverified