SOTAVerified

Logical Reasoning

Papers

Showing 601650 of 747 papers

TitleStatusHype
Multimodal-to-Text Prompt Engineering in Large Language Models Using Feature Embeddings for GNSS Interference Characterization0
Exploring Self-supervised Logic-enhanced Training for Large Language ModelsCode0
Logical Tasks for Measuring Extrapolation and Rule ComprehensionCode0
Logical Reasoning with Span-Level Predictions for Interpretable and Robust NLI ModelsCode0
Logical Reasoning over Natural Language as Knowledge Representation: A SurveyCode0
Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language ModelsCode0
Atomic Inference for NLI with Generated Facts as AtomsCode0
Bridging Machine Learning and Logical Reasoning by Abductive LearningCode0
LogicPro: Improving Complex Logical Reasoning via Program-Guided LearningCode0
LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct LanguagesCode0
Synthesizing a Progression of Subtasks for Block-Based Visual Programming TasksCode0
Large Language Models are Limited in Out-of-Context Knowledge ReasoningCode0
Liar, Liar, Logical Mire: A Benchmark for Suppositional Reasoning in Large Language ModelsCode0
Understanding Inter-Session Intentions via Complex Logical ReasoningCode0
LogiQA 2.0—An Improved Dataset for Logical Reasoning in Natural Language UnderstandingCode0
Leveraging LLMs for Hypothetical Deduction in Logical Inference: A Neuro-Symbolic ApproachCode0
Which Programming Language and What Features at Pre-training Stage Affect Downstream Logical Inference Performance?Code0
Towards High-Order Complementary Recommendation via Logical Reasoning NetworkCode0
LR-IAD:Mask-Free Industrial Anomaly Detection with Logical ReasoningCode0
LR-XFL: Logical Reasoning-based Explainable Federated LearningCode0
Leveraging large language models for nano synthesis mechanism explanation: solid foundations or mere conjectures?Code0
Learning the meanings of function words from grounded language using a visual question answering modelCode0
Table-based Fact Verification with Self-adaptive Mixture of ExpertsCode0
DecoderLens: Layerwise Interpretation of Encoder-Decoder TransformersCode0
Tackling Universal Properties of Minimal Trap Spaces of Boolean NetworksCode0
A Closer Look at the Self-Verification Abilities of Large Language Models in Logical ReasoningCode0
TAPE: Assessing Few-shot Russian Language UnderstandingCode0
Declarative Question Answering over Knowledge Bases containing Natural Language Text with Answer Set ProgrammingCode0
Learning Symmetric Rules with SATNetCode0
Learning for Long-Horizon Planning via Neuro-Symbolic Abductive ImitationCode0
Large Language Models Are Cross-Lingual Knowledge-Free ReasonersCode0
Teaching Probabilistic Logical Reasoning to TransformersCode0
Reasoning with Transformer-based Models: Deep Learning, but Shallow ReasoningCode0
MedLogic-AQA: Enhancing Medical Question Answering with Abstractive Models Focusing on Logical StructuresCode0
Deciphering Raw Data in Neuro-Symbolic Learning with Provable GuaranteesCode0
Language models show human-like content effects on reasoning tasksCode0
MetaLogic: Logical Reasoning Explanations with Fine-Grained StructureCode0
Meta-Reasoning: Semantics-Symbol Deconstruction for Large Language ModelsCode0
CRAVE: A Conflicting Reasoning Approach for Explainable Claim Verification Using LLMsCode0
Techniques for Symbol Grounding with SATNetCode0
Language Model Guided Interpretable Video Action ReasoningCode0
Breaking the Language Barrier: Improving Cross-Lingual Reasoning with Structured Self-AttentionCode0
A Neural Divide-and-Conquer Reasoning Framework for Image Retrieval from Linguistically Complex TextCode0
Weakly Supervised Explainable Phrasal Reasoning with Neural Fuzzy LogicCode0
MMM: Multi-stage Multi-task Learning for Multi-choice Reading ComprehensionCode0
SCoRE: Benchmarking Long-Chain Reasoning in Commonsense ScenariosCode0
Revisiting Document-Level Relation Extraction with Context-Guided Link PredictionCode0
JustLogic: A Comprehensive Benchmark for Evaluating Deductive Reasoning in Large Language ModelsCode0
Investigating the Robustness of Natural Language Generation from Logical Forms via Counterfactual SamplesCode0
Unlocking Temporal Question Answering for Large Language Models with Tailor-Made Reasoning LogicCode0
Show:102550
← PrevPage 13 of 15Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Claude OpusDelta_NoContext28.8Unverified
2GPT-4oDelta_NoContext25.1Unverified
3Gemini 1.5 ProDelta_NoContext23.4Unverified
4GPT-4Delta_NoContext21.5Unverified
5Command R+Delta_NoContext11.6Unverified
6GPT-3.5Delta_NoContext11.2Unverified
7Mixtral 8x7BDelta_NoContext6.4Unverified
8Llama 3 8BDelta_NoContext4.9Unverified
9Llama 3 70BDelta_NoContext2.9Unverified
10Gemma 7BDelta_NoContext2.2Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, Direct)Accuracy64.8Unverified
2PaLM 2 (few-shot, k=3, CoT)Accuracy57.2Unverified
3OPT 66B (few-shot, k=3)Accuracy54Unverified
4PaLM 540B (few-shot, k=3)Accuracy53.6Unverified
5GPT-NeoX 20B (few-shot, k=3)Accuracy52.8Unverified
6BLOOM 176B (few-shot, k=3)Accuracy52.8Unverified
7Chinchilla-70B (few-shot, k=5)Accuracy52.1Unverified
8Bloomberg GPT 50B (few-shot, k=3)Accuracy50.8Unverified
9Gopher-280B (few-shot, k=5)Accuracy50.7Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy84.9Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy65.8Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy48.7Unverified
4PaLM 540B (few-shot, k=3)Accuracy44.5Unverified
5Gopher-280B (few-shot, k=5)Accuracy40.6Unverified
6BLOOM 176B (few-shot, k=3)Accuracy40.41Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy37.67Unverified
8GPT-NeoX (few-shot, k=3)Accuracy33.56Unverified
9OPT 66B (few-shot, k=3)Accuracy28.08Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy91.2Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy61.2Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy59.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy49.2Unverified
5PaLM 540B (few-shot, k=3)Accuracy38Unverified
6BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy34.8Unverified
8OPT 66B (few-shot, k=3)Accuracy31.2Unverified
9GPT-NeoX (few-shot, k=3)Accuracy26Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy100Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy96.4Unverified
3PaLM 540B (few-shot, k=3)Accuracy39.6Unverified
4BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
5Chinchilla-70B (few-shot, k=5)Accuracy32Unverified
6Bloomberg GPT (few-shot, k=3)Accuracy29.2Unverified
7OPT 66B (few-shot, k=3)Accuracy23.6Unverified
8GPT-NeoX (few-shot, k=3)Accuracy21.2Unverified
9Gopher-280B (few-shot, k=5)Accuracy19Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy44Unverified
2PaLM-540B (few-shot, k=5)Accuracy42.4Unverified
3PaLM-62B (few-shot, k=5)Accuracy36.5Unverified
4Gopher-280B (few-shot, k=5)Accuracy35.1Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM-540B (few-shot, k=5)Accuracy73.9Unverified
2Chinchilla-70B (few-shot, k=5)Accuracy68.3Unverified
3PaLM-62B (few-shot, k=5)Accuracy65.4Unverified
4Gopher-280B (few-shot, k=5)Accuracy61Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy 83.7Unverified
2RuGPT-3 LargeAccuracy 40.7Unverified
3RuGPT-3 MediumAccuracy 38Unverified
4RuGPT-3 SmallAccuracy 34Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy87Unverified
2RuGPT-3 SmallAccuracy57.9Unverified
3RuGPT-3 MediumAccuracy57.2Unverified
4RuGPT-3 LargeAccuracy55.5Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy72.1Unverified
2Gopher-280B (few-shot, k=5)Accuracy58.9Unverified