SOTAVerified

Logical Reasoning

Papers

Showing 601650 of 747 papers

TitleStatusHype
Reduced Implication-bias Logic Loss for Neuro-Symbolic Learning0
Language models show human-like content effects on reasoning tasksCode0
Emotion Recognition in Conversation using Probabilistic Soft Logic0
Discourse-Aware Graph Networks for Textual Logical Reasoning0
AnaLog: Testing Analytical and Deductive Logic Learnability in Language Models0
Learning Symmetric Rules with SATNetCode0
Towards Unifying Perceptual Reasoning and Logical Reasoning0
TAR: Neural Logical Reasoning across TBox and ABox0
Reasoning over Logically Interacted Conditions for Question Answering0
RobustLR: Evaluating Robustness to Logical Perturbation in Deductive ReasoningCode0
FLEX: Feature-Logic Embedding Framework for CompleX Knowledge Graph ReasoningCode0
Logical Reasoning with Span-Level Predictions for Interpretable and Robust NLI ModelsCode0
Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning0
LogiGAN: Learning Logical Reasoning via Adversarial Pre-training0
Graph Neural Networks for Propositional Model Counting0
Table-based Fact Verification with Self-adaptive Mixture of ExpertsCode0
Reasoning with Multi-Structure Commonsense Knowledge in Visual Dialog0
Enhancing Neural Mathematical Reasoning by Abductive Combination with Symbolic Library0
A Densely Connected Criss-Cross Attention Network for Document-level Relation Extraction0
A Neural-Symbolic Approach to Natural Language UnderstandingCode0
What Makes Reading Comprehension Questions Difficult?Code0
Towards Unifying Logical Entailment and Statistical Estimation0
MUC-driven Feature Importance Measurement and Adversarial Analysis for Random Forest0
JAMES: Normalizing Job Titles with Multi-Aspect Graph Embeddings and Reasoning0
Logical Reasoning for Task Oriented Dialogue Systems0
Neural Logic Analogy Learning0
Reasoning Like Program Executors0
Combining Commonsense Reasoning and Knowledge Acquisition to Guide Deep Learning in Robotics0
BTPK-based interpretable method for NER tasks based on Talmudic Public Announcement Logic0
Scales and Hedges in a Logic with Analogous Semantics0
Emergent Symbols through Binding in External Memory0
Quantifying Adaptability in Pre-trained Language Models with 500 Tasks0
MANGO: Enhancing the Robustness of VQA Models via Adversarial Noise Generation0
Can BERT Conduct Logical Reasoning? On the Difficulty of Learning to Reason from Data0
FaiRR: Faithful and Robust Deductive Reasoning over Natural Language0
Does Entity Abstraction Help Generative Transformers Reason?0
Modeling Associative Reasoning Processes0
Explainability Is in the Mind of the Beholder: Establishing the Foundations of Explainable Artificial Intelligence0
Graph Collaborative Reasoning0
The theory of quantitative trading0
LoNLI: An Extensible Framework for Testing Diverse Logical Reasoning Capabilities for NLI0
Scallop: From Probabilistic Deductive Databases to Scalable Differentiable Reasoning0
Two-stage Rule-induction Visual Reasoning on RPMs with an Application to Video Prediction0
What Makes Machine Reading Comprehension Questions Difficult? Investigating Variation in Passage Sources and Question Types0
CausalR: Causal Reasoning over Natural Language Rulebases0
Table-based Fact Verification with Self-adaptive Mixture of Experts0
AbductionRules: Training Transformers to Explain Unexpected Inputs0
Logic-Driven Context Extension and Data Augmentation for Logical Reasoning of Text0
Reasoning Like Program Executors0
Automated scholarly paper review: Concepts, technologies, and challenges0
Show:102550
← PrevPage 13 of 15Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Claude OpusDelta_NoContext28.8Unverified
2GPT-4oDelta_NoContext25.1Unverified
3Gemini 1.5 ProDelta_NoContext23.4Unverified
4GPT-4Delta_NoContext21.5Unverified
5Command R+Delta_NoContext11.6Unverified
6GPT-3.5Delta_NoContext11.2Unverified
7Mixtral 8x7BDelta_NoContext6.4Unverified
8Llama 3 8BDelta_NoContext4.9Unverified
9Llama 3 70BDelta_NoContext2.9Unverified
10Gemma 7BDelta_NoContext2.2Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, Direct)Accuracy64.8Unverified
2PaLM 2 (few-shot, k=3, CoT)Accuracy57.2Unverified
3OPT 66B (few-shot, k=3)Accuracy54Unverified
4PaLM 540B (few-shot, k=3)Accuracy53.6Unverified
5GPT-NeoX 20B (few-shot, k=3)Accuracy52.8Unverified
6BLOOM 176B (few-shot, k=3)Accuracy52.8Unverified
7Chinchilla-70B (few-shot, k=5)Accuracy52.1Unverified
8Bloomberg GPT 50B (few-shot, k=3)Accuracy50.8Unverified
9Gopher-280B (few-shot, k=5)Accuracy50.7Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy84.9Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy65.8Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy48.7Unverified
4PaLM 540B (few-shot, k=3)Accuracy44.5Unverified
5Gopher-280B (few-shot, k=5)Accuracy40.6Unverified
6BLOOM 176B (few-shot, k=3)Accuracy40.41Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy37.67Unverified
8GPT-NeoX (few-shot, k=3)Accuracy33.56Unverified
9OPT 66B (few-shot, k=3)Accuracy28.08Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy91.2Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy61.2Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy59.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy49.2Unverified
5PaLM 540B (few-shot, k=3)Accuracy38Unverified
6BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy34.8Unverified
8OPT 66B (few-shot, k=3)Accuracy31.2Unverified
9GPT-NeoX (few-shot, k=3)Accuracy26Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy100Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy96.4Unverified
3PaLM 540B (few-shot, k=3)Accuracy39.6Unverified
4BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
5Chinchilla-70B (few-shot, k=5)Accuracy32Unverified
6Bloomberg GPT (few-shot, k=3)Accuracy29.2Unverified
7OPT 66B (few-shot, k=3)Accuracy23.6Unverified
8GPT-NeoX (few-shot, k=3)Accuracy21.2Unverified
9Gopher-280B (few-shot, k=5)Accuracy19Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy44Unverified
2PaLM-540B (few-shot, k=5)Accuracy42.4Unverified
3PaLM-62B (few-shot, k=5)Accuracy36.5Unverified
4Gopher-280B (few-shot, k=5)Accuracy35.1Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM-540B (few-shot, k=5)Accuracy73.9Unverified
2Chinchilla-70B (few-shot, k=5)Accuracy68.3Unverified
3PaLM-62B (few-shot, k=5)Accuracy65.4Unverified
4Gopher-280B (few-shot, k=5)Accuracy61Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy 83.7Unverified
2RuGPT-3 LargeAccuracy 40.7Unverified
3RuGPT-3 MediumAccuracy 38Unverified
4RuGPT-3 SmallAccuracy 34Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy87Unverified
2RuGPT-3 SmallAccuracy57.9Unverified
3RuGPT-3 MediumAccuracy57.2Unverified
4RuGPT-3 LargeAccuracy55.5Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy72.1Unverified
2Gopher-280B (few-shot, k=5)Accuracy58.9Unverified