SOTAVerified

Logical Reasoning

Papers

Showing 351400 of 747 papers

TitleStatusHype
Do Large Language Models Truly Grasp Mathematics? An Empirical Exploration From Cognitive Psychology0
Do Large Language Models Understand Logic or Just Mimick Context?0
Dynamic In-Context Learning from Nearest Neighbors for Bundle Generation0
Efficient but Vulnerable: Benchmarking and Defending LLM Batch Prompting Attack0
Efficient Training and Inference of Hypergraph Reasoning Networks0
Emergent Symbols through Binding in External Memory0
Emotion Recognition in Conversation using Probabilistic Soft Logic0
Empowering LLMs with Logical Reasoning: A Comprehensive Survey0
Enhanced User Interaction in Operating Systems through Machine Learning Language Models0
Enhancing Large Language Model Efficiencyvia Symbolic Compression: A Formal Approach Towards Interpretability0
Enhancing Logical Reasoning in Large Language Models to Facilitate Legal Applications0
Enhancing Neural Mathematical Reasoning by Abductive Combination with Symbolic Library0
Enhancing Retrieval Systems with Inference-Time Logical Reasoning0
Enhancing Transformers for Generalizable First-Order Logical Entailment0
Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles0
Evaluating Large Language Models with NeuBAROCO: Syllogistic Reasoning Ability and Human-like Biases0
Evaluating the Potential of Leading Large Language Models in Reasoning Biology Questions0
Evident: a Development Methodology and a Knowledge Base Topology for Data Mining, Machine Learning and General Knowledge Management0
Explainability Is in the Mind of the Beholder: Establishing the Foundations of Explainable Artificial Intelligence0
Explicitly Encoding Structural Symmetry is Key to Length Generalization in Arithmetic Tasks0
Exploiting LLMs' Reasoning Capability to Infer Implicit Concepts in Legal Information Retrieval0
Exploring & Exploiting High-Order Graph Structure for Sparse Knowledge Graph Completion0
Exploring Generalization Ability of Pretrained Language Models on Arithmetic and Logical Reasoning0
Extending Automated Deduction for Commonsense Reasoning0
FaiRR: Faithful and Robust Deductive Reasoning over Natural Language0
Federated Neural Graph Databases0
Federated In-Context LLM Agent Learning0
FEVO: Financial Knowledge Expansion and Reasoning Evolution for Large Language Models0
Few-shot Visual Reasoning with Meta-analogical Contrastive Learning0
First Experiments with a Flexible Infrastructure for Normative Reasoning0
FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts0
FollowEval: A Multi-Dimensional Benchmark for Assessing the Instruction-Following Capability of Large Language Models0
uto\!L: Autonomous Evaluation of LLMs for Truth Maintenance and Reasoning Tasks0
Formal Language Knowledge Corpus for Retrieval Augmented Generation0
Formal Logic-guided Robust Federated Learning against Poisoning Attacks0
From Chaos to Order: The Atomic Reasoner Framework for Fine-grained Reasoning in Large Language Models0
From Complex to Simple: Unraveling the Cognitive Tree for Reasoning with Small Language Models0
From Statistical Relational to Neurosymbolic Artificial Intelligence: a Survey0
From Statistical Relational to Neuro-Symbolic Artificial Intelligence0
Fuzzy Datalog^ over Arbitrary t-Norms0
Generation of Explanations for Logic Reasoning0
(G)I-DLE: Generative Inference via Distribution-preserving Logit Exclusion with KL Divergence Minimization for Constrained Decoding0
Graph Collaborative Reasoning0
GraphIC: A Graph-Based In-Context Example Retrieval Model for Multi-Step Reasoning0
Graph Neural Networks for Propositional Model Counting0
Graph Neural Networks for Reasoning 2-Quantified Boolean Formulas0
Graph Neural Reasoning May Fail in Certifying Boolean Unsatisfiability0
Guidance is All You Need: Temperature-Guided Reasoning in Large Language Models0
Handling Noisy Labels via One-Step Abductive Multi-Target Learning and Its Application to Helicobacter Pylori Segmentation0
Have Large Language Models Learned to Reason? A Characterization via 3-SAT Phase Transition0
Show:102550
← PrevPage 8 of 15Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Claude OpusDelta_NoContext28.8Unverified
2GPT-4oDelta_NoContext25.1Unverified
3Gemini 1.5 ProDelta_NoContext23.4Unverified
4GPT-4Delta_NoContext21.5Unverified
5Command R+Delta_NoContext11.6Unverified
6GPT-3.5Delta_NoContext11.2Unverified
7Mixtral 8x7BDelta_NoContext6.4Unverified
8Llama 3 8BDelta_NoContext4.9Unverified
9Llama 3 70BDelta_NoContext2.9Unverified
10Gemma 7BDelta_NoContext2.2Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, Direct)Accuracy64.8Unverified
2PaLM 2 (few-shot, k=3, CoT)Accuracy57.2Unverified
3OPT 66B (few-shot, k=3)Accuracy54Unverified
4PaLM 540B (few-shot, k=3)Accuracy53.6Unverified
5GPT-NeoX 20B (few-shot, k=3)Accuracy52.8Unverified
6BLOOM 176B (few-shot, k=3)Accuracy52.8Unverified
7Chinchilla-70B (few-shot, k=5)Accuracy52.1Unverified
8Bloomberg GPT 50B (few-shot, k=3)Accuracy50.8Unverified
9Gopher-280B (few-shot, k=5)Accuracy50.7Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy84.9Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy65.8Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy48.7Unverified
4PaLM 540B (few-shot, k=3)Accuracy44.5Unverified
5Gopher-280B (few-shot, k=5)Accuracy40.6Unverified
6BLOOM 176B (few-shot, k=3)Accuracy40.41Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy37.67Unverified
8GPT-NeoX (few-shot, k=3)Accuracy33.56Unverified
9OPT 66B (few-shot, k=3)Accuracy28.08Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy91.2Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy61.2Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy59.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy49.2Unverified
5PaLM 540B (few-shot, k=3)Accuracy38Unverified
6BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy34.8Unverified
8OPT 66B (few-shot, k=3)Accuracy31.2Unverified
9GPT-NeoX (few-shot, k=3)Accuracy26Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy100Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy96.4Unverified
3PaLM 540B (few-shot, k=3)Accuracy39.6Unverified
4BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
5Chinchilla-70B (few-shot, k=5)Accuracy32Unverified
6Bloomberg GPT (few-shot, k=3)Accuracy29.2Unverified
7OPT 66B (few-shot, k=3)Accuracy23.6Unverified
8GPT-NeoX (few-shot, k=3)Accuracy21.2Unverified
9Gopher-280B (few-shot, k=5)Accuracy19Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy44Unverified
2PaLM-540B (few-shot, k=5)Accuracy42.4Unverified
3PaLM-62B (few-shot, k=5)Accuracy36.5Unverified
4Gopher-280B (few-shot, k=5)Accuracy35.1Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM-540B (few-shot, k=5)Accuracy73.9Unverified
2Chinchilla-70B (few-shot, k=5)Accuracy68.3Unverified
3PaLM-62B (few-shot, k=5)Accuracy65.4Unverified
4Gopher-280B (few-shot, k=5)Accuracy61Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy 83.7Unverified
2RuGPT-3 LargeAccuracy 40.7Unverified
3RuGPT-3 MediumAccuracy 38Unverified
4RuGPT-3 SmallAccuracy 34Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy87Unverified
2RuGPT-3 SmallAccuracy57.9Unverified
3RuGPT-3 MediumAccuracy57.2Unverified
4RuGPT-3 LargeAccuracy55.5Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy72.1Unverified
2Gopher-280B (few-shot, k=5)Accuracy58.9Unverified