SOTAVerified

Logical Reasoning

Papers

Showing 651700 of 747 papers

TitleStatusHype
Diagnosing the First-Order Logical Reasoning Ability Through LogicNLI0
SQALER: Scaling Question Answering by Decoupling Multi-Hop and Logical Reasoning0
Logical Assessment Formula and Its Principles for Evaluations with Inaccurate Ground-Truth Labels0
One-Step Abductive Multi-Target Learning with Diverse Noisy Samples and Its Application to Tumour Segmentation for Breast CancerCode0
A Survey on State-of-the-art Techniques for Knowledge Graphs Construction and Challenges ahead0
A Survey of Knowledge Enhanced Pre-trained Models0
NAIL: A Challenging Benchmark for Na\"ive Logical Reasoning0
Logic Pre-Training of Language Models0
Truth Table Deep Convolutional Neural Network, A New SAT-Encodable Architecture - Application To Complete Robustness0
Efficient Training and Inference of Hypergraph Reasoning Networks0
Weakly Supervised Explainable Phrasal Reasoning with Neural Fuzzy LogicCode0
What Makes Reading Comprehension Questions Difficult? Investigating Variation in Passage Sources and Question Types0
Counterfactual Adversarial Learning with Representation InterpolationCode0
Sinoledge: A Knowledge Engine based on Logical Reasoning and Distributed Micro Services0
From Statistical Relational to Neurosymbolic Artificial Intelligence: a Survey0
Exploring Generalization Ability of Pretrained Language Models on Arithmetic and Logical Reasoning0
Knowledge Informed Semantic Parsing for Conversational Question Answering0
Improving Coherence and Consistency in Neural Sequence Models with Dual-System, Neuro-Symbolic Reasoning0
Reasoning with Transformer-based Models: Deep Learning, but Shallow ReasoningCode0
Techniques for Symbol Grounding with SATNetCode0
Volta at SemEval-2021 Task 9: Statement Verification and Evidence Finding with Tables using TAPAS and Transfer LearningCode0
Probabilistic Sufficient ExplanationsCode0
The General Theory of General Intelligence: A Pragmatic Patternist Perspective0
Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution0
Context Transformer with Stacked Pointer Networks for Conversational Question Answering over Knowledge GraphsCode0
Neural Sequence-to-grid Module for Learning Symbolic RulesCode0
Bayes Meets Entailment and Prediction: Commonsense Reasoning with Non-monotonicity, Paraconsistency and Predictive Accuracy0
A Closer Look at the Robustness of Vision-and-Language Pre-trained Models0
Neurosymbolic AI: The 3rd Wave0
Handling Noisy Labels via One-Step Abductive Multi-Target Learning and Its Application to Helicobacter Pylori Segmentation0
Neural Software AnalysisCode0
Axiom Learning and Belief Tracing for Transparent Decision Making in Robotics0
Few-shot Visual Reasoning with Meta-analogical Contrastive Learning0
Learning Syllogism with Euler Neural-Networks0
Medical idioms for clinical Bayesian network development0
Multi-source Meta Transfer for Low Resource Multiple-Choice Question Answering0
Matrix Shuffle-Exchange Networks for Hard 2D TasksCode0
A Probabilistic Model for Discriminative and Neuro-Symbolic Semi-Supervised Learning0
Mathematical Reasoning via Self-supervised Skip-tree Training0
Bayesian Entailment Hypothesis: How Brains Implement Monotonic and Non-monotonic Reasoning0
Unifying Neural Learning and Symbolic Reasoning for Spinal Medical Report Generation0
Multi-Step Inference for Reasoning Over Paragraphs0
Extending Automated Deduction for Commonsense Reasoning0
From Statistical Relational to Neuro-Symbolic Artificial Intelligence0
Improving Certified Robustness via Statistical Learning with Logical ReasoningCode0
Cognitive Argumentation and the Suppression Task0
HypoML: Visual Analysis for Hypothesis-based Evaluation of Machine Learning Models0
A (Simplified) Supreme Being Necessarily Exists, says the Computer: Computationally Explored Variants of Gödel's Ontological Argument0
Generating Programmatic Referring Expressions via Program SynthesisCode0
Bridging Machine Learning and Logical Reasoning by Abductive LearningCode0
Show:102550
← PrevPage 14 of 15Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Claude OpusDelta_NoContext28.8Unverified
2GPT-4oDelta_NoContext25.1Unverified
3Gemini 1.5 ProDelta_NoContext23.4Unverified
4GPT-4Delta_NoContext21.5Unverified
5Command R+Delta_NoContext11.6Unverified
6GPT-3.5Delta_NoContext11.2Unverified
7Mixtral 8x7BDelta_NoContext6.4Unverified
8Llama 3 8BDelta_NoContext4.9Unverified
9Llama 3 70BDelta_NoContext2.9Unverified
10Gemma 7BDelta_NoContext2.2Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, Direct)Accuracy64.8Unverified
2PaLM 2 (few-shot, k=3, CoT)Accuracy57.2Unverified
3OPT 66B (few-shot, k=3)Accuracy54Unverified
4PaLM 540B (few-shot, k=3)Accuracy53.6Unverified
5GPT-NeoX 20B (few-shot, k=3)Accuracy52.8Unverified
6BLOOM 176B (few-shot, k=3)Accuracy52.8Unverified
7Chinchilla-70B (few-shot, k=5)Accuracy52.1Unverified
8Bloomberg GPT 50B (few-shot, k=3)Accuracy50.8Unverified
9Gopher-280B (few-shot, k=5)Accuracy50.7Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy84.9Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy65.8Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy48.7Unverified
4PaLM 540B (few-shot, k=3)Accuracy44.5Unverified
5Gopher-280B (few-shot, k=5)Accuracy40.6Unverified
6BLOOM 176B (few-shot, k=3)Accuracy40.41Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy37.67Unverified
8GPT-NeoX (few-shot, k=3)Accuracy33.56Unverified
9OPT 66B (few-shot, k=3)Accuracy28.08Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy91.2Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy61.2Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy59.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy49.2Unverified
5PaLM 540B (few-shot, k=3)Accuracy38Unverified
6BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy34.8Unverified
8OPT 66B (few-shot, k=3)Accuracy31.2Unverified
9GPT-NeoX (few-shot, k=3)Accuracy26Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy100Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy96.4Unverified
3PaLM 540B (few-shot, k=3)Accuracy39.6Unverified
4BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
5Chinchilla-70B (few-shot, k=5)Accuracy32Unverified
6Bloomberg GPT (few-shot, k=3)Accuracy29.2Unverified
7OPT 66B (few-shot, k=3)Accuracy23.6Unverified
8GPT-NeoX (few-shot, k=3)Accuracy21.2Unverified
9Gopher-280B (few-shot, k=5)Accuracy19Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy44Unverified
2PaLM-540B (few-shot, k=5)Accuracy42.4Unverified
3PaLM-62B (few-shot, k=5)Accuracy36.5Unverified
4Gopher-280B (few-shot, k=5)Accuracy35.1Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM-540B (few-shot, k=5)Accuracy73.9Unverified
2Chinchilla-70B (few-shot, k=5)Accuracy68.3Unverified
3PaLM-62B (few-shot, k=5)Accuracy65.4Unverified
4Gopher-280B (few-shot, k=5)Accuracy61Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy 83.7Unverified
2RuGPT-3 LargeAccuracy 40.7Unverified
3RuGPT-3 MediumAccuracy 38Unverified
4RuGPT-3 SmallAccuracy 34Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy87Unverified
2RuGPT-3 SmallAccuracy57.9Unverified
3RuGPT-3 MediumAccuracy57.2Unverified
4RuGPT-3 LargeAccuracy55.5Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy72.1Unverified
2Gopher-280B (few-shot, k=5)Accuracy58.9Unverified