SOTAVerified

Logical Reasoning

Papers

Showing 501550 of 747 papers

TitleStatusHype
Tackling Universal Properties of Minimal Trap Spaces of Boolean NetworksCode0
A Neural Divide-and-Conquer Reasoning Framework for Image Retrieval from Linguistically Complex TextCode0
Complex Logical Reasoning over Knowledge Graphs using Large Language ModelsCode1
The Dark Side of Explanations: Poisoning Recommender Systems with Counterfactual Examples0
Sequential Recommendation with Probabilistic Logical ReasoningCode0
ChatABL: Abductive Learning via Natural Language Interaction with ChatGPT0
Chameleon: Plug-and-Play Compositional Reasoning with Large Language ModelsCode3
LeafAI: query generator for clinical cohort discovery rivaling a human programmer0
Scallop: A Language for Neurosymbolic Programming0
Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4Code1
Deep Manifold Learning for Reading Comprehension and Logical Reasoning Tasks with Polytuplet LossCode0
BloombergGPT: A Large Language Model for Finance0
Explicit Planning Helps Language Models in Logical ReasoningCode1
Natural Language Reasoning, A SurveyCode1
Neural Graph Reasoning: Complex Logical Query Answering Meets Graph DatabasesCode1
Logical Reasoning over Natural Language as Knowledge Representation: A SurveyCode0
Weakly Supervised Knowledge Transfer with Probabilistic Logical Reasoning for Object DetectionCode0
Attribution-Scores and Causal Counterfactuals as Explanations in Artificial Intelligence0
Domain Specific Question Answering Over Knowledge Graphs Using Logical Programming and Large Language ModelsCode1
A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT0
ChatCAD: Interactive Computer-Aided Diagnosis on Medical Image using Large Language ModelsCode1
A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and InteractivityCode1
Double Equivariance for Inductive Link Prediction for Both New Nodes and New Relation TypesCode0
Unifying Structure Reasoning and Language Model Pre-training for Complex Reasoning0
Logical Message Passing Networks with One-hop Inference on Atomic FormulasCode1
A separation logic for sequences in pointer programs and its decidability0
CogReact: A Reinforced Framework to Model Human Cognitive Reaction Modulated by Dynamic Intervention0
Mind Reasoning Manners: Enhancing Type Perception for Generalized Zero-shot Logical Reasoning over TextCode1
LAMBADA: Backward Chaining for Automated Reasoning in Natural Language0
Large Language Models are Better Reasoners with Self-VerificationCode1
Reasoning with Language Model Prompting: A SurveyCode3
APOLLO: A Simple Approach for Adaptive Pretraining of Language Models for Logical Reasoning0
On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot ReasoningCode1
Towards High-Order Complementary Recommendation via Logical Reasoning NetworkCode0
Counterfactual reasoning: Do language models need world knowledge for causal understanding?Code1
UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical ExpressionCode1
Weisfeiler and Leman Go RelationalCode0
Neuro-Symbolic Spatio-Temporal Reasoning0
NQE: N-ary Query Embedding for Complex Query Answering over Hyper-Relational Knowledge GraphsCode1
Logical Tasks for Measuring Extrapolation and Rule ComprehensionCode0
Evident: a Development Methodology and a Knowledge Base Topology for Data Mining, Machine Learning and General Knowledge Management0
Zero-Shot Classification by Logical Reasoning on Natural Language ExplanationsCode0
GammaE: Gamma Embeddings for Logical Queries on Knowledge GraphsCode0
TAPE: Assessing Few-shot Russian Language UnderstandingCode0
MetaLogic: Logical Reasoning Explanations with Fine-Grained StructureCode0
Investigating the Robustness of Natural Language Generation from Logical Forms via Counterfactual SamplesCode0
Inductive Logical Query Answering in Knowledge GraphsCode0
Join-Chain Network: A Logical Reasoning View of the Multi-head Attention in Transformer0
Document-level Biomedical Relation Extraction Based on Multi-Dimensional Fusion Information and Multi-Granularity Logical ReasoningCode0
To What Extent Do Natural Language Understanding Datasets Correlate to Logical Reasoning? A Method for Diagnosing Logical Reasoning.0
Show:102550
← PrevPage 11 of 15Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Claude OpusDelta_NoContext28.8Unverified
2GPT-4oDelta_NoContext25.1Unverified
3Gemini 1.5 ProDelta_NoContext23.4Unverified
4GPT-4Delta_NoContext21.5Unverified
5Command R+Delta_NoContext11.6Unverified
6GPT-3.5Delta_NoContext11.2Unverified
7Mixtral 8x7BDelta_NoContext6.4Unverified
8Llama 3 8BDelta_NoContext4.9Unverified
9Llama 3 70BDelta_NoContext2.9Unverified
10Gemma 7BDelta_NoContext2.2Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, Direct)Accuracy64.8Unverified
2PaLM 2 (few-shot, k=3, CoT)Accuracy57.2Unverified
3OPT 66B (few-shot, k=3)Accuracy54Unverified
4PaLM 540B (few-shot, k=3)Accuracy53.6Unverified
5GPT-NeoX 20B (few-shot, k=3)Accuracy52.8Unverified
6BLOOM 176B (few-shot, k=3)Accuracy52.8Unverified
7Chinchilla-70B (few-shot, k=5)Accuracy52.1Unverified
8Bloomberg GPT 50B (few-shot, k=3)Accuracy50.8Unverified
9Gopher-280B (few-shot, k=5)Accuracy50.7Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy84.9Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy65.8Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy48.7Unverified
4PaLM 540B (few-shot, k=3)Accuracy44.5Unverified
5Gopher-280B (few-shot, k=5)Accuracy40.6Unverified
6BLOOM 176B (few-shot, k=3)Accuracy40.41Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy37.67Unverified
8GPT-NeoX (few-shot, k=3)Accuracy33.56Unverified
9OPT 66B (few-shot, k=3)Accuracy28.08Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy91.2Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy61.2Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy59.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy49.2Unverified
5PaLM 540B (few-shot, k=3)Accuracy38Unverified
6BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy34.8Unverified
8OPT 66B (few-shot, k=3)Accuracy31.2Unverified
9GPT-NeoX (few-shot, k=3)Accuracy26Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy100Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy96.4Unverified
3PaLM 540B (few-shot, k=3)Accuracy39.6Unverified
4BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
5Chinchilla-70B (few-shot, k=5)Accuracy32Unverified
6Bloomberg GPT (few-shot, k=3)Accuracy29.2Unverified
7OPT 66B (few-shot, k=3)Accuracy23.6Unverified
8GPT-NeoX (few-shot, k=3)Accuracy21.2Unverified
9Gopher-280B (few-shot, k=5)Accuracy19Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy44Unverified
2PaLM-540B (few-shot, k=5)Accuracy42.4Unverified
3PaLM-62B (few-shot, k=5)Accuracy36.5Unverified
4Gopher-280B (few-shot, k=5)Accuracy35.1Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM-540B (few-shot, k=5)Accuracy73.9Unverified
2Chinchilla-70B (few-shot, k=5)Accuracy68.3Unverified
3PaLM-62B (few-shot, k=5)Accuracy65.4Unverified
4Gopher-280B (few-shot, k=5)Accuracy61Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy 83.7Unverified
2RuGPT-3 LargeAccuracy 40.7Unverified
3RuGPT-3 MediumAccuracy 38Unverified
4RuGPT-3 SmallAccuracy 34Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy87Unverified
2RuGPT-3 SmallAccuracy57.9Unverified
3RuGPT-3 MediumAccuracy57.2Unverified
4RuGPT-3 LargeAccuracy55.5Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy72.1Unverified
2Gopher-280B (few-shot, k=5)Accuracy58.9Unverified