SOTAVerified

Logical Reasoning

Papers

Showing 701747 of 747 papers

TitleStatusHype
Neural Logic Networks0
MMM: Multi-stage Multi-task Learning for Multi-choice Reading ComprehensionCode0
Graph Neural Reasoning May Fail in Certifying Boolean Unsatisfiability0
Graph Neural Networks for Reasoning 2-Quantified Boolean Formulas0
Non-monotonic Logical Reasoning Guiding Deep Learning for Explainable Visual Question Answering0
Teaching Pretrained Models with Commonsense Reasoning: A Preliminary KB-Based Approach0
Logic and the 2-Simplicial Transformer0
Towards a Theory of Intentions for Human-Robot Collaboration0
Semantic RL with Action Grammars: Data-Efficient Learning of Hierarchical Task AbstractionsCode0
SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solverCode0
Controlled Natural Languages and Default Reasoning0
Declarative Question Answering over Knowledge Bases containing Natural Language Text with Answer Set ProgrammingCode0
How to Make a BLT Sandwich? Learning to Reason towards Understanding Web Instructional Videos0
Compositional Attention Networks for Interpretability in Natural Language Question Answering0
Ontology Reasoning with Deep Neural NetworksCode0
Argumentation Synthesis following Rhetorical Strategies0
Modeling Human Decision-making: An Overview of the Brussels Quantum Approach0
Data Science with Vadalog: Bridging Machine Learning and Reasoning0
Neural Tensor Networks with Diagonal Slice Matrices0
DeepLogic: Towards End-to-End Differentiable Logical ReasoningCode0
Consistent CCG Parsing over Multiple Sentences for Improved Logical Reasoning0
First Experiments with a Flexible Infrastructure for Normative Reasoning0
A Dataset and Architecture for Visual Reasoning with a Working MemoryCode0
GOTaxon: Representing the evolution of biological functions in the Gene OntologyCode0
A New Algorithmic Decision for Categorical Syllogisms via Caroll's Diagrams0
A Theoretical Solution of the Mind-Body Problem: An Operationalized Proof that no Purely Physical System Can Exhibit all the Properties of Human Consciousness0
Is writing style predictive of scientific fraud?0
TensorLog: Deep Learning Meets Probabilistic DBs0
Is writing style predictive of scientific fraud?0
Towards Better Response Times and Higher-Quality Queries in Interactive Knowledge Base Debugging0
Compositional Distributional Cognition0
Chains of Reasoning over Entities, Relations, and Text using Recurrent Neural NetworksCode0
Neural Networks and Continuous Time0
Mapping Ontologies Using Ontologies: Cross-lingual Semantic Role Information Transfer0
Reasoning in Vector Space: An Exploratory Study of Question Answering0
Object-Oriented Dynamic Networks0
Mixed Logical and Probabilistic Reasoning for Planning and Explanation Generation in Robotics0
Towards Ideal Semantics for Analyzing Stream Reasoning0
A New Fundamental Evidence of Non-Classical Structure in the Combination of Natural Concepts0
The RatioLog Project: Rational Extensions of Logical Reasoning0
Quantum Structure of Negation and Conjunction in Human Thought0
Quantum Structure in Cognition and the Foundations of Human Reasoning0
Inferring User Preferences by Probabilistic Logical Reasoning over Social Networks0
Learning Distributed Word Representations for Natural Logic Reasoning0
New Directions in Vector Space Models of Meaning0
Can recursive neural tensor networks learn logical reasoning?Code0
Lp : A Logic for Statistical Information0
Show:102550
← PrevPage 15 of 15Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Claude OpusDelta_NoContext28.8Unverified
2GPT-4oDelta_NoContext25.1Unverified
3Gemini 1.5 ProDelta_NoContext23.4Unverified
4GPT-4Delta_NoContext21.5Unverified
5Command R+Delta_NoContext11.6Unverified
6GPT-3.5Delta_NoContext11.2Unverified
7Mixtral 8x7BDelta_NoContext6.4Unverified
8Llama 3 8BDelta_NoContext4.9Unverified
9Llama 3 70BDelta_NoContext2.9Unverified
10Gemma 7BDelta_NoContext2.2Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, Direct)Accuracy64.8Unverified
2PaLM 2 (few-shot, k=3, CoT)Accuracy57.2Unverified
3OPT 66B (few-shot, k=3)Accuracy54Unverified
4PaLM 540B (few-shot, k=3)Accuracy53.6Unverified
5GPT-NeoX 20B (few-shot, k=3)Accuracy52.8Unverified
6BLOOM 176B (few-shot, k=3)Accuracy52.8Unverified
7Chinchilla-70B (few-shot, k=5)Accuracy52.1Unverified
8Bloomberg GPT 50B (few-shot, k=3)Accuracy50.8Unverified
9Gopher-280B (few-shot, k=5)Accuracy50.7Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy84.9Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy65.8Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy48.7Unverified
4PaLM 540B (few-shot, k=3)Accuracy44.5Unverified
5Gopher-280B (few-shot, k=5)Accuracy40.6Unverified
6BLOOM 176B (few-shot, k=3)Accuracy40.41Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy37.67Unverified
8GPT-NeoX (few-shot, k=3)Accuracy33.56Unverified
9OPT 66B (few-shot, k=3)Accuracy28.08Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy91.2Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy61.2Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy59.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy49.2Unverified
5PaLM 540B (few-shot, k=3)Accuracy38Unverified
6BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy34.8Unverified
8OPT 66B (few-shot, k=3)Accuracy31.2Unverified
9GPT-NeoX (few-shot, k=3)Accuracy26Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy100Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy96.4Unverified
3PaLM 540B (few-shot, k=3)Accuracy39.6Unverified
4BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
5Chinchilla-70B (few-shot, k=5)Accuracy32Unverified
6Bloomberg GPT (few-shot, k=3)Accuracy29.2Unverified
7OPT 66B (few-shot, k=3)Accuracy23.6Unverified
8GPT-NeoX (few-shot, k=3)Accuracy21.2Unverified
9Gopher-280B (few-shot, k=5)Accuracy19Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy44Unverified
2PaLM-540B (few-shot, k=5)Accuracy42.4Unverified
3PaLM-62B (few-shot, k=5)Accuracy36.5Unverified
4Gopher-280B (few-shot, k=5)Accuracy35.1Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM-540B (few-shot, k=5)Accuracy73.9Unverified
2Chinchilla-70B (few-shot, k=5)Accuracy68.3Unverified
3PaLM-62B (few-shot, k=5)Accuracy65.4Unverified
4Gopher-280B (few-shot, k=5)Accuracy61Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy 83.7Unverified
2RuGPT-3 LargeAccuracy 40.7Unverified
3RuGPT-3 MediumAccuracy 38Unverified
4RuGPT-3 SmallAccuracy 34Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy87Unverified
2RuGPT-3 SmallAccuracy57.9Unverified
3RuGPT-3 MediumAccuracy57.2Unverified
4RuGPT-3 LargeAccuracy55.5Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy72.1Unverified
2Gopher-280B (few-shot, k=5)Accuracy58.9Unverified