SOTAVerified

Logical Reasoning

Papers

Showing 601650 of 747 papers

TitleStatusHype
Reasoning Like Program Executors0
Combining Commonsense Reasoning and Knowledge Acquisition to Guide Deep Learning in Robotics0
BTPK-based interpretable method for NER tasks based on Talmudic Public Announcement Logic0
Scales and Hedges in a Logic with Analogous Semantics0
Emergent Symbols through Binding in External Memory0
FaiRR: Faithful and Robust Deductive Reasoning over Natural Language0
Can BERT Conduct Logical Reasoning? On the Difficulty of Learning to Reason from Data0
MANGO: Enhancing the Robustness of VQA Models via Adversarial Noise Generation0
Quantifying Adaptability in Pre-trained Language Models with 500 Tasks0
Does Entity Abstraction Help Generative Transformers Reason?0
Modeling Associative Reasoning Processes0
Explainability Is in the Mind of the Beholder: Establishing the Foundations of Explainable Artificial Intelligence0
The theory of quantitative trading0
Graph Collaborative Reasoning0
Scaling Language Models: Methods, Analysis & Insights from Training GopherCode2
Quantifying Adaptability in Pre-trained Language Models with 500 TasksCode1
LoNLI: An Extensible Framework for Testing Diverse Logical Reasoning Capabilities for NLI0
Scallop: From Probabilistic Deductive Databases to Scalable Differentiable Reasoning0
Two-stage Rule-induction Visual Reasoning on RPMs with an Application to Video Prediction0
Enhancing Multilingual Language Model with Massive Multilingual Knowledge TriplesCode1
What Makes Machine Reading Comprehension Questions Difficult? Investigating Variation in Passage Sources and Question Types0
Table-based Fact Verification with Self-adaptive Mixture of Experts0
Reasoning Like Program Executors0
Logic-Driven Context Extension and Data Augmentation for Logical Reasoning of Text0
CausalR: Causal Reasoning over Natural Language Rulebases0
AbductionRules: Training Transformers to Explain Unexpected Inputs0
Automated scholarly paper review: Concepts, technologies, and challenges0
Diagnosing the First-Order Logical Reasoning Ability Through LogicNLI0
SQALER: Scaling Question Answering by Decoupling Multi-Hop and Logical Reasoning0
Probabilistic Entity Representation Model for Reasoning over Knowledge GraphsCode1
Logical Assessment Formula and Its Principles for Evaluations with Inaccurate Ground-Truth Labels0
One-Step Abductive Multi-Target Learning with Diverse Noisy Samples and Its Application to Tumour Segmentation for Breast CancerCode0
A Survey on State-of-the-art Techniques for Knowledge Graphs Construction and Challenges ahead0
ConditionalQA: A Complex Reading Comprehension Dataset with Conditional AnswersCode1
A Survey of Knowledge Enhanced Pre-trained Models0
Truth Table Deep Convolutional Neural Network, A New SAT-Encodable Architecture - Application To Complete Robustness0
Logic Pre-Training of Language Models0
NAIL: A Challenging Benchmark for Na\"ive Logical Reasoning0
Efficient Training and Inference of Hypergraph Reasoning Networks0
Weakly Supervised Explainable Phrasal Reasoning with Neural Fuzzy LogicCode0
What Makes Reading Comprehension Questions Difficult? Investigating Variation in Passage Sources and Question Types0
Counterfactual Adversarial Learning with Representation InterpolationCode0
AI Descartes: Combining Data and Theory for Derivable Scientific DiscoveryCode1
Sinoledge: A Knowledge Engine based on Logical Reasoning and Distributed Micro Services0
From Statistical Relational to Neurosymbolic Artificial Intelligence: a Survey0
Exploring Generalization Ability of Pretrained Language Models on Arithmetic and Logical Reasoning0
From LSAT: The Progress and Challenges of Complex ReasoningCode1
Knowledge Informed Semantic Parsing for Conversational Question Answering0
Improving Coherence and Consistency in Neural Sequence Models with Dual-System, Neuro-Symbolic Reasoning0
Reasoning with Transformer-based Models: Deep Learning, but Shallow ReasoningCode0
Show:102550
← PrevPage 13 of 15Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Claude OpusDelta_NoContext28.8Unverified
2GPT-4oDelta_NoContext25.1Unverified
3Gemini 1.5 ProDelta_NoContext23.4Unverified
4GPT-4Delta_NoContext21.5Unverified
5Command R+Delta_NoContext11.6Unverified
6GPT-3.5Delta_NoContext11.2Unverified
7Mixtral 8x7BDelta_NoContext6.4Unverified
8Llama 3 8BDelta_NoContext4.9Unverified
9Llama 3 70BDelta_NoContext2.9Unverified
10Gemma 7BDelta_NoContext2.2Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, Direct)Accuracy64.8Unverified
2PaLM 2 (few-shot, k=3, CoT)Accuracy57.2Unverified
3OPT 66B (few-shot, k=3)Accuracy54Unverified
4PaLM 540B (few-shot, k=3)Accuracy53.6Unverified
5GPT-NeoX 20B (few-shot, k=3)Accuracy52.8Unverified
6BLOOM 176B (few-shot, k=3)Accuracy52.8Unverified
7Chinchilla-70B (few-shot, k=5)Accuracy52.1Unverified
8Bloomberg GPT 50B (few-shot, k=3)Accuracy50.8Unverified
9Gopher-280B (few-shot, k=5)Accuracy50.7Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy84.9Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy65.8Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy48.7Unverified
4PaLM 540B (few-shot, k=3)Accuracy44.5Unverified
5Gopher-280B (few-shot, k=5)Accuracy40.6Unverified
6BLOOM 176B (few-shot, k=3)Accuracy40.41Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy37.67Unverified
8GPT-NeoX (few-shot, k=3)Accuracy33.56Unverified
9OPT 66B (few-shot, k=3)Accuracy28.08Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy91.2Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy61.2Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy59.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy49.2Unverified
5PaLM 540B (few-shot, k=3)Accuracy38Unverified
6BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy34.8Unverified
8OPT 66B (few-shot, k=3)Accuracy31.2Unverified
9GPT-NeoX (few-shot, k=3)Accuracy26Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy100Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy96.4Unverified
3PaLM 540B (few-shot, k=3)Accuracy39.6Unverified
4BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
5Chinchilla-70B (few-shot, k=5)Accuracy32Unverified
6Bloomberg GPT (few-shot, k=3)Accuracy29.2Unverified
7OPT 66B (few-shot, k=3)Accuracy23.6Unverified
8GPT-NeoX (few-shot, k=3)Accuracy21.2Unverified
9Gopher-280B (few-shot, k=5)Accuracy19Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy44Unverified
2PaLM-540B (few-shot, k=5)Accuracy42.4Unverified
3PaLM-62B (few-shot, k=5)Accuracy36.5Unverified
4Gopher-280B (few-shot, k=5)Accuracy35.1Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM-540B (few-shot, k=5)Accuracy73.9Unverified
2Chinchilla-70B (few-shot, k=5)Accuracy68.3Unverified
3PaLM-62B (few-shot, k=5)Accuracy65.4Unverified
4Gopher-280B (few-shot, k=5)Accuracy61Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy 83.7Unverified
2RuGPT-3 LargeAccuracy 40.7Unverified
3RuGPT-3 MediumAccuracy 38Unverified
4RuGPT-3 SmallAccuracy 34Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy87Unverified
2RuGPT-3 SmallAccuracy57.9Unverified
3RuGPT-3 MediumAccuracy57.2Unverified
4RuGPT-3 LargeAccuracy55.5Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy72.1Unverified
2Gopher-280B (few-shot, k=5)Accuracy58.9Unverified