SOTAVerified

Logical Reasoning

Papers

Showing 551600 of 747 papers

TitleStatusHype
Type-dependent Prompt CycleQAG : Cycle Consistency for Multi-hop Question Generation0
Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical ReasoningCode1
Neural Methods for Logical Reasoning Over Knowledge GraphsCode1
Towards Human-Compatible XAI: Explaining Data Differentials with Concept Induction over Background Knowledge0
FOLIO: Natural Language Reasoning with First-Order LogicCode1
Time-aware Self-Attention Meets Logic Reasoning in Recommender Systems0
Knowledge-based and Data-driven Reasoning and Learning for Ad Hoc Teamwork0
A Scalable, Interpretable, Verifiable & Differentiable Logic Gate Convolutional Neural Network Architecture From Truth Tables0
Reduced Implication-bias Logic Loss for Neuro-Symbolic Learning0
Emotion Recognition in Conversation using Probabilistic Soft Logic0
Language models show human-like content effects on reasoning tasksCode0
Discourse-Aware Graph Networks for Textual Logical Reasoning0
AnaLog: Testing Analytical and Deductive Logic Learnability in Language Models0
Learning Symmetric Rules with SATNetCode0
Towards Unifying Perceptual Reasoning and Logical Reasoning0
Semantic Probabilistic Layers for Neuro-Symbolic LearningCode1
TAR: Neural Logical Reasoning across TBox and ABox0
TFLEX: Temporal Feature-Logic Embedding Framework for Complex Reasoning over Temporal Knowledge GraphCode1
Reasoning over Logically Interacted Conditions for Question Answering0
RobustLR: Evaluating Robustness to Logical Perturbation in Deductive ReasoningCode0
Large Language Models are Zero-Shot ReasonersCode2
On the Paradox of Learning to Reason from DataCode1
Logical Reasoning with Span-Level Predictions for Interpretable and Robust NLI ModelsCode0
FLEX: Feature-Logic Embedding Framework for CompleX Knowledge Graph ReasoningCode0
Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning0
LogiGAN: Learning Logical Reasoning via Adversarial Pre-training0
Graph Neural Networks for Propositional Model Counting0
Logiformer: A Two-Branch Graph Transformer Network for Interpretable Logical ReasoningCode1
Table-based Fact Verification with Self-adaptive Mixture of ExpertsCode0
Reasoning with Multi-Structure Commonsense Knowledge in Visual Dialog0
PaLM: Scaling Language Modeling with PathwaysCode2
Training Compute-Optimal Large Language ModelsCode6
Enhancing Neural Mathematical Reasoning by Abductive Combination with Symbolic Library0
A Densely Connected Criss-Cross Attention Network for Document-level Relation Extraction0
AbductionRules: Training Transformers to Explain Unexpected InputsCode1
A Neural-Symbolic Approach to Natural Language UnderstandingCode0
ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical ReasoningCode2
FaiRR: Faithful and Robust Deductive Reasoning over Natural LanguageCode1
AdaLoGN: Adaptive Logic Graph Network for Reasoning-Based Machine Reading ComprehensionCode1
What Makes Reading Comprehension Questions Difficult?Code0
A Neuro-vector-symbolic Architecture for Solving Raven's Progressive MatricesCode1
MERIt: Meta-Path Guided Contrastive Learning for Logical ReasoningCode1
Towards Unifying Logical Entailment and Statistical Estimation0
MUC-driven Feature Importance Measurement and Adversarial Analysis for Random Forest0
JAMES: Normalizing Job Titles with Multi-Aspect Graph Embeddings and Reasoning0
ExAIS: Executable AI SemanticsCode1
End-to-end Algorithm Synthesis with Recurrent Networks: Logical Extrapolation Without OverthinkingCode1
Logical Reasoning for Task Oriented Dialogue Systems0
VAEL: Bridging Variational Autoencoders and Probabilistic Logic ProgrammingCode1
Neural Logic Analogy Learning0
Show:102550
← PrevPage 12 of 15Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Claude OpusDelta_NoContext28.8Unverified
2GPT-4oDelta_NoContext25.1Unverified
3Gemini 1.5 ProDelta_NoContext23.4Unverified
4GPT-4Delta_NoContext21.5Unverified
5Command R+Delta_NoContext11.6Unverified
6GPT-3.5Delta_NoContext11.2Unverified
7Mixtral 8x7BDelta_NoContext6.4Unverified
8Llama 3 8BDelta_NoContext4.9Unverified
9Llama 3 70BDelta_NoContext2.9Unverified
10Gemma 7BDelta_NoContext2.2Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, Direct)Accuracy64.8Unverified
2PaLM 2 (few-shot, k=3, CoT)Accuracy57.2Unverified
3OPT 66B (few-shot, k=3)Accuracy54Unverified
4PaLM 540B (few-shot, k=3)Accuracy53.6Unverified
5GPT-NeoX 20B (few-shot, k=3)Accuracy52.8Unverified
6BLOOM 176B (few-shot, k=3)Accuracy52.8Unverified
7Chinchilla-70B (few-shot, k=5)Accuracy52.1Unverified
8Bloomberg GPT 50B (few-shot, k=3)Accuracy50.8Unverified
9Gopher-280B (few-shot, k=5)Accuracy50.7Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy84.9Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy65.8Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy48.7Unverified
4PaLM 540B (few-shot, k=3)Accuracy44.5Unverified
5Gopher-280B (few-shot, k=5)Accuracy40.6Unverified
6BLOOM 176B (few-shot, k=3)Accuracy40.41Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy37.67Unverified
8GPT-NeoX (few-shot, k=3)Accuracy33.56Unverified
9OPT 66B (few-shot, k=3)Accuracy28.08Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy91.2Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy61.2Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy59.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy49.2Unverified
5PaLM 540B (few-shot, k=3)Accuracy38Unverified
6BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy34.8Unverified
8OPT 66B (few-shot, k=3)Accuracy31.2Unverified
9GPT-NeoX (few-shot, k=3)Accuracy26Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy100Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy96.4Unverified
3PaLM 540B (few-shot, k=3)Accuracy39.6Unverified
4BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
5Chinchilla-70B (few-shot, k=5)Accuracy32Unverified
6Bloomberg GPT (few-shot, k=3)Accuracy29.2Unverified
7OPT 66B (few-shot, k=3)Accuracy23.6Unverified
8GPT-NeoX (few-shot, k=3)Accuracy21.2Unverified
9Gopher-280B (few-shot, k=5)Accuracy19Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy44Unverified
2PaLM-540B (few-shot, k=5)Accuracy42.4Unverified
3PaLM-62B (few-shot, k=5)Accuracy36.5Unverified
4Gopher-280B (few-shot, k=5)Accuracy35.1Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM-540B (few-shot, k=5)Accuracy73.9Unverified
2Chinchilla-70B (few-shot, k=5)Accuracy68.3Unverified
3PaLM-62B (few-shot, k=5)Accuracy65.4Unverified
4Gopher-280B (few-shot, k=5)Accuracy61Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy 83.7Unverified
2RuGPT-3 LargeAccuracy 40.7Unverified
3RuGPT-3 MediumAccuracy 38Unverified
4RuGPT-3 SmallAccuracy 34Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy87Unverified
2RuGPT-3 SmallAccuracy57.9Unverified
3RuGPT-3 MediumAccuracy57.2Unverified
4RuGPT-3 LargeAccuracy55.5Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy72.1Unverified
2Gopher-280B (few-shot, k=5)Accuracy58.9Unverified