SOTAVerified

Logical Reasoning

Papers

Showing 551600 of 747 papers

TitleStatusHype
Exploring & Exploiting High-Order Graph Structure for Sparse Knowledge Graph Completion0
Exploring Generalization Ability of Pretrained Language Models on Arithmetic and Logical Reasoning0
Extending Automated Deduction for Commonsense Reasoning0
FaiRR: Faithful and Robust Deductive Reasoning over Natural Language0
Federated Neural Graph Databases0
Federated In-Context LLM Agent Learning0
FEVO: Financial Knowledge Expansion and Reasoning Evolution for Large Language Models0
Few-shot Visual Reasoning with Meta-analogical Contrastive Learning0
First Experiments with a Flexible Infrastructure for Normative Reasoning0
FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts0
FollowEval: A Multi-Dimensional Benchmark for Assessing the Instruction-Following Capability of Large Language Models0
uto\!L: Autonomous Evaluation of LLMs for Truth Maintenance and Reasoning Tasks0
Formal Language Knowledge Corpus for Retrieval Augmented Generation0
Formal Logic-guided Robust Federated Learning against Poisoning Attacks0
From Chaos to Order: The Atomic Reasoner Framework for Fine-grained Reasoning in Large Language Models0
From Complex to Simple: Unraveling the Cognitive Tree for Reasoning with Small Language Models0
From Statistical Relational to Neurosymbolic Artificial Intelligence: a Survey0
From Statistical Relational to Neuro-Symbolic Artificial Intelligence0
Fuzzy Datalog^ over Arbitrary t-Norms0
Generation of Explanations for Logic Reasoning0
(G)I-DLE: Generative Inference via Distribution-preserving Logit Exclusion with KL Divergence Minimization for Constrained Decoding0
Graph Collaborative Reasoning0
GraphIC: A Graph-Based In-Context Example Retrieval Model for Multi-Step Reasoning0
Graph Neural Networks for Propositional Model Counting0
Graph Neural Networks for Reasoning 2-Quantified Boolean Formulas0
Graph Neural Reasoning May Fail in Certifying Boolean Unsatisfiability0
Guidance is All You Need: Temperature-Guided Reasoning in Large Language Models0
Handling Noisy Labels via One-Step Abductive Multi-Target Learning and Its Application to Helicobacter Pylori Segmentation0
Have Large Language Models Learned to Reason? A Characterization via 3-SAT Phase Transition0
HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator0
HelpSteer3: Human-Annotated Feedback and Edit Data to Empower Inference-Time Scaling in Open-Ended General-Domain Tasks0
HF4Rec: Human-Like Feedback-Driven Optimization Framework for Explainable Recommendation0
HopRAG: Multi-Hop Reasoning for Logic-Aware Retrieval-Augmented Generation0
HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs0
How to Make a BLT Sandwich? Learning to Reason towards Understanding Web Instructional Videos0
How Transformers Solve Propositional Logic Problems: A Mechanistic Analysis0
How Truncating Weights Improves Reasoning in Language Models0
Human Comprehensible Active Learning of Genome-Scale Metabolic Networks0
Human-in-the-Loop through Chain-of-Thought0
HyperTree Planning: Enhancing LLM Reasoning via Hierarchical Thinking0
HypoML: Visual Analysis for Hypothesis-based Evaluation of Machine Learning Models0
Identifying Features that Shape Perceived Consciousness in Large Language Model-based AI: A Quantitative Study of Human Responses0
I-Design: Personalized LLM Interior Designer0
Imperative Learning: A Self-supervised Neuro-Symbolic Learning Framework for Robot Autonomy0
Improving Coherence and Consistency in Neural Sequence Models with Dual-System, Neuro-Symbolic Reasoning0
Improving Complex Reasoning over Knowledge Graph with Logic-Aware Curriculum Tuning0
Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks0
Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights0
Inferring User Preferences by Probabilistic Logical Reasoning over Social Networks0
Infi-MMR: Curriculum-based Unlocking Multimodal Reasoning via Phased Reinforcement Learning in Multimodal Small Language Models0
Show:102550
← PrevPage 12 of 15Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Claude OpusDelta_NoContext28.8Unverified
2GPT-4oDelta_NoContext25.1Unverified
3Gemini 1.5 ProDelta_NoContext23.4Unverified
4GPT-4Delta_NoContext21.5Unverified
5Command R+Delta_NoContext11.6Unverified
6GPT-3.5Delta_NoContext11.2Unverified
7Mixtral 8x7BDelta_NoContext6.4Unverified
8Llama 3 8BDelta_NoContext4.9Unverified
9Llama 3 70BDelta_NoContext2.9Unverified
10Gemma 7BDelta_NoContext2.2Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, Direct)Accuracy64.8Unverified
2PaLM 2 (few-shot, k=3, CoT)Accuracy57.2Unverified
3OPT 66B (few-shot, k=3)Accuracy54Unverified
4PaLM 540B (few-shot, k=3)Accuracy53.6Unverified
5GPT-NeoX 20B (few-shot, k=3)Accuracy52.8Unverified
6BLOOM 176B (few-shot, k=3)Accuracy52.8Unverified
7Chinchilla-70B (few-shot, k=5)Accuracy52.1Unverified
8Bloomberg GPT 50B (few-shot, k=3)Accuracy50.8Unverified
9Gopher-280B (few-shot, k=5)Accuracy50.7Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy84.9Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy65.8Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy48.7Unverified
4PaLM 540B (few-shot, k=3)Accuracy44.5Unverified
5Gopher-280B (few-shot, k=5)Accuracy40.6Unverified
6BLOOM 176B (few-shot, k=3)Accuracy40.41Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy37.67Unverified
8GPT-NeoX (few-shot, k=3)Accuracy33.56Unverified
9OPT 66B (few-shot, k=3)Accuracy28.08Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy91.2Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy61.2Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy59.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy49.2Unverified
5PaLM 540B (few-shot, k=3)Accuracy38Unverified
6BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy34.8Unverified
8OPT 66B (few-shot, k=3)Accuracy31.2Unverified
9GPT-NeoX (few-shot, k=3)Accuracy26Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy100Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy96.4Unverified
3PaLM 540B (few-shot, k=3)Accuracy39.6Unverified
4BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
5Chinchilla-70B (few-shot, k=5)Accuracy32Unverified
6Bloomberg GPT (few-shot, k=3)Accuracy29.2Unverified
7OPT 66B (few-shot, k=3)Accuracy23.6Unverified
8GPT-NeoX (few-shot, k=3)Accuracy21.2Unverified
9Gopher-280B (few-shot, k=5)Accuracy19Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy44Unverified
2PaLM-540B (few-shot, k=5)Accuracy42.4Unverified
3PaLM-62B (few-shot, k=5)Accuracy36.5Unverified
4Gopher-280B (few-shot, k=5)Accuracy35.1Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM-540B (few-shot, k=5)Accuracy73.9Unverified
2Chinchilla-70B (few-shot, k=5)Accuracy68.3Unverified
3PaLM-62B (few-shot, k=5)Accuracy65.4Unverified
4Gopher-280B (few-shot, k=5)Accuracy61Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy 83.7Unverified
2RuGPT-3 LargeAccuracy 40.7Unverified
3RuGPT-3 MediumAccuracy 38Unverified
4RuGPT-3 SmallAccuracy 34Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy87Unverified
2RuGPT-3 SmallAccuracy57.9Unverified
3RuGPT-3 MediumAccuracy57.2Unverified
4RuGPT-3 LargeAccuracy55.5Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy72.1Unverified
2Gopher-280B (few-shot, k=5)Accuracy58.9Unverified