SOTAVerified

Logical Reasoning

Papers

Showing 151200 of 747 papers

TitleStatusHype
Counterfactual reasoning: Do language models need world knowledge for causal understanding?Code1
Counterfactual reasoning: Testing language models' understanding of hypothetical scenariosCode1
Conditional and Modal Reasoning in Large Language ModelsCode1
ExAIS: Executable AI SemanticsCode1
Cross from Left to Right Brain: Adaptive Text Dreamer for Vision-and-Language NavigationCode1
Complex Logical Reasoning over Knowledge Graphs using Large Language ModelsCode1
AdaLoGN: Adaptive Logic Graph Network for Reasoning-Based Machine Reading ComprehensionCode1
DetermLR: Augmenting LLM-based Logical Reasoning from Indeterminacy to DeterminacyCode1
Enhancing the Geometric Problem-Solving Ability of Multimodal LLMs via Symbolic-Neural IntegrationCode1
Modeling Hierarchical Reasoning Chains by Linking Discourse Units and Key Phrases for Reading ComprehensionCode1
GLoRE: Evaluating Logical Reasoning of Large Language ModelsCode1
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language ModelsCode1
Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-AugmentationCode1
HAE-RAE Bench: Evaluation of Korean Knowledge in Language ModelsCode1
COLLIE: Systematic Construction of Constrained Text Generation TasksCode1
NQE: N-ary Query Embedding for Complex Query Answering over Hyper-Relational Knowledge GraphsCode1
End-to-end Algorithm Synthesis with Recurrent Networks: Logical Extrapolation Without OverthinkingCode1
ElecBench: a Power Dispatch Evaluation Benchmark for Large Language ModelsCode1
ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable CompressionCode1
Domain Specific Question Answering Over Knowledge Graphs Using Logical Programming and Large Language ModelsCode1
A Peek into Token Bias: Large Language Models Are Not Yet Genuine ReasonersCode1
Explicit Planning Helps Language Models in Logical ReasoningCode1
Beta Embeddings for Multi-Hop Logical Reasoning in Knowledge GraphsCode1
IDOL: Indicator-oriented Logic Pre-training for Logical ReasoningCode1
Do PLMs Know and Understand Ontological Knowledge?Code1
Measuring Systematic Generalization in Neural Proof Generation with TransformersCode1
Quantum Embedding of Knowledge for ReasoningCode1
AI Descartes: Combining Data and Theory for Derivable Scientific DiscoveryCode1
Natural Language Inference in Context -- Investigating Contextual Reasoning over Long TextsCode1
Neuro-symbolic Learning Yielding Logical ConstraintsCode1
Enhancing Multilingual Language Model with Massive Multilingual Knowledge TriplesCode1
R^2-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical ReasoningCode1
SIRE: Separate Intra- and Inter-sentential Reasoning for Document-level Relation ExtractionCode1
Classifying Conspiratorial Narratives At Scale: False Alarms and Erroneous ConnectionsCode0
Assessing the Alignment of FOL Closeness Metrics with Human JudgementCode0
Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language ModelsCode0
ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart UnderstandingCode0
A Closer Look at the Self-Verification Abilities of Large Language Models in Logical ReasoningCode0
LogicPro: Improving Complex Logical Reasoning via Program-Guided LearningCode0
LogiGAN: Learning Logical Reasoning via Adversarial Pre-trainingCode0
Chains of Reasoning over Entities, Relations, and Text using Recurrent Neural NetworksCode0
Assessing Logical Reasoning Capabilities of Encoder-Only Transformer ModelsCode0
Assessing Logical Puzzle Solving in Large Language Models: Insights from a Minesweeper Case StudyCode0
Aligning Knowledge Graphs Provided by Humans and Generated from Neural Networks in Specific TasksCode0
Exploring Self-supervised Logic-enhanced Training for Large Language ModelsCode0
LogiQA 2.0—An Improved Dataset for Logical Reasoning in Natural Language UnderstandingCode0
Logical Reasoning with Span-Level Predictions for Interpretable and Robust NLI ModelsCode0
Logical Reasoning over Natural Language as Knowledge Representation: A SurveyCode0
Logical Tasks for Measuring Extrapolation and Rule ComprehensionCode0
A Closer Look at Logical Reasoning with LLMs: The Choice of Tool MattersCode0
Show:102550
← PrevPage 4 of 15Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Claude OpusDelta_NoContext28.8Unverified
2GPT-4oDelta_NoContext25.1Unverified
3Gemini 1.5 ProDelta_NoContext23.4Unverified
4GPT-4Delta_NoContext21.5Unverified
5Command R+Delta_NoContext11.6Unverified
6GPT-3.5Delta_NoContext11.2Unverified
7Mixtral 8x7BDelta_NoContext6.4Unverified
8Llama 3 8BDelta_NoContext4.9Unverified
9Llama 3 70BDelta_NoContext2.9Unverified
10Gemma 7BDelta_NoContext2.2Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, Direct)Accuracy64.8Unverified
2PaLM 2 (few-shot, k=3, CoT)Accuracy57.2Unverified
3OPT 66B (few-shot, k=3)Accuracy54Unverified
4PaLM 540B (few-shot, k=3)Accuracy53.6Unverified
5GPT-NeoX 20B (few-shot, k=3)Accuracy52.8Unverified
6BLOOM 176B (few-shot, k=3)Accuracy52.8Unverified
7Chinchilla-70B (few-shot, k=5)Accuracy52.1Unverified
8Bloomberg GPT 50B (few-shot, k=3)Accuracy50.8Unverified
9Gopher-280B (few-shot, k=5)Accuracy50.7Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy84.9Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy65.8Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy48.7Unverified
4PaLM 540B (few-shot, k=3)Accuracy44.5Unverified
5Gopher-280B (few-shot, k=5)Accuracy40.6Unverified
6BLOOM 176B (few-shot, k=3)Accuracy40.41Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy37.67Unverified
8GPT-NeoX (few-shot, k=3)Accuracy33.56Unverified
9OPT 66B (few-shot, k=3)Accuracy28.08Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy91.2Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy61.2Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy59.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy49.2Unverified
5PaLM 540B (few-shot, k=3)Accuracy38Unverified
6BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy34.8Unverified
8OPT 66B (few-shot, k=3)Accuracy31.2Unverified
9GPT-NeoX (few-shot, k=3)Accuracy26Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy100Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy96.4Unverified
3PaLM 540B (few-shot, k=3)Accuracy39.6Unverified
4BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
5Chinchilla-70B (few-shot, k=5)Accuracy32Unverified
6Bloomberg GPT (few-shot, k=3)Accuracy29.2Unverified
7OPT 66B (few-shot, k=3)Accuracy23.6Unverified
8GPT-NeoX (few-shot, k=3)Accuracy21.2Unverified
9Gopher-280B (few-shot, k=5)Accuracy19Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy44Unverified
2PaLM-540B (few-shot, k=5)Accuracy42.4Unverified
3PaLM-62B (few-shot, k=5)Accuracy36.5Unverified
4Gopher-280B (few-shot, k=5)Accuracy35.1Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM-540B (few-shot, k=5)Accuracy73.9Unverified
2Chinchilla-70B (few-shot, k=5)Accuracy68.3Unverified
3PaLM-62B (few-shot, k=5)Accuracy65.4Unverified
4Gopher-280B (few-shot, k=5)Accuracy61Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy 83.7Unverified
2RuGPT-3 LargeAccuracy 40.7Unverified
3RuGPT-3 MediumAccuracy 38Unverified
4RuGPT-3 SmallAccuracy 34Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy87Unverified
2RuGPT-3 SmallAccuracy57.9Unverified
3RuGPT-3 MediumAccuracy57.2Unverified
4RuGPT-3 LargeAccuracy55.5Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy72.1Unverified
2Gopher-280B (few-shot, k=5)Accuracy58.9Unverified