SOTAVerified

Logical Reasoning

Papers

Showing 251300 of 747 papers

TitleStatusHype
MMM: Multi-stage Multi-task Learning for Multi-choice Reading ComprehensionCode0
FLEX: Feature-Logic Embedding Framework for CompleX Knowledge Graph ReasoningCode0
Multi-LogiEval: Towards Evaluating Multi-Step Logical Reasoning Ability of Large Language ModelsCode0
Noisy Exemplars Make Large Language Models More Robust: A Domain-Agnostic Behavioral AnalysisCode0
DeLTa: A Decoding Strategy based on Logit Trajectory Prediction Improves Factuality and Reasoning AbilityCode0
MedLogic-AQA: Enhancing Medical Question Answering with Abstractive Models Focusing on Logical StructuresCode0
DeepLogic: Towards End-to-End Differentiable Logical ReasoningCode0
DecoderLens: Layerwise Interpretation of Encoder-Decoder TransformersCode0
A Dataset and Architecture for Visual Reasoning with a Working MemoryCode0
MetaLogic: Logical Reasoning Explanations with Fine-Grained StructureCode0
Declarative Question Answering over Knowledge Bases containing Natural Language Text with Answer Set ProgrammingCode0
Deciphering Raw Data in Neuro-Symbolic Learning with Provable GuaranteesCode0
Adaptive Rectification Sampling for Test-Time Compute ScalingCode0
LR-IAD:Mask-Free Industrial Anomaly Detection with Logical ReasoningCode0
BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured DataCode0
LR-XFL: Logical Reasoning-based Explainable Federated LearningCode0
LogiGAN: Learning Logical Reasoning via Adversarial Pre-trainingCode0
LogiQA 2.0—An Improved Dataset for Logical Reasoning in Natural Language UnderstandingCode0
Meta-Reasoning: Semantics-Symbol Deconstruction for Large Language ModelsCode0
Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language ModelsCode0
CRAVE: A Conflicting Reasoning Approach for Explainable Claim Verification Using LLMsCode0
Exploring Self-supervised Logic-enhanced Training for Large Language ModelsCode0
Counterfactual Adversarial Learning with Representation InterpolationCode0
Logical Tasks for Measuring Extrapolation and Rule ComprehensionCode0
Logical Reasoning over Natural Language as Knowledge Representation: A SurveyCode0
A Neural-Symbolic Approach to Natural Language UnderstandingCode0
How susceptible are LLMs to Logical Fallacies?Code0
Logical Reasoning with Span-Level Predictions for Interpretable and Robust NLI ModelsCode0
LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct LanguagesCode0
HLM-Cite: Hybrid Language Model Workflow for Text-based Scientific Citation PredictionCode0
Context Transformer with Stacked Pointer Networks for Conversational Question Answering over Knowledge GraphsCode0
Large Language Models are Limited in Out-of-Context Knowledge ReasoningCode0
Leveraging large language models for nano synthesis mechanism explanation: solid foundations or mere conjectures?Code0
Leveraging LLMs for Hypothetical Deduction in Logical Inference: A Neuro-Symbolic ApproachCode0
LogicPro: Improving Complex Logical Reasoning via Program-Guided LearningCode0
Learning for Long-Horizon Planning via Neuro-Symbolic Abductive ImitationCode0
Learning Symmetric Rules with SATNetCode0
Learning the meanings of function words from grounded language using a visual question answering modelCode0
A Neural Divide-and-Conquer Reasoning Framework for Image Retrieval from Linguistically Complex TextCode0
Liar, Liar, Logical Mire: A Benchmark for Suppositional Reasoning in Large Language ModelsCode0
Conditional Logical Message Passing Transformer for Complex Query AnsweringCode0
Semantic RL with Action Grammars: Data-Efficient Learning of Hierarchical Task AbstractionsCode0
GOTaxon: Representing the evolution of biological functions in the Gene OntologyCode0
Hint-before-Solving Prompting: Guiding LLMs to Effectively Utilize Encoded KnowledgeCode0
Large Language Models Are Cross-Lingual Knowledge-Free ReasonersCode0
Language Model Guided Interpretable Video Action ReasoningCode0
Generating Programmatic Referring Expressions via Program SynthesisCode0
Atomic Inference for NLI with Generated Facts as AtomsCode0
Language models show human-like content effects on reasoning tasksCode0
Generating by Understanding: Neural Visual Generation with Logical Symbol GroundingsCode0
Show:102550
← PrevPage 6 of 15Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Claude OpusDelta_NoContext28.8Unverified
2GPT-4oDelta_NoContext25.1Unverified
3Gemini 1.5 ProDelta_NoContext23.4Unverified
4GPT-4Delta_NoContext21.5Unverified
5Command R+Delta_NoContext11.6Unverified
6GPT-3.5Delta_NoContext11.2Unverified
7Mixtral 8x7BDelta_NoContext6.4Unverified
8Llama 3 8BDelta_NoContext4.9Unverified
9Llama 3 70BDelta_NoContext2.9Unverified
10Gemma 7BDelta_NoContext2.2Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, Direct)Accuracy64.8Unverified
2PaLM 2 (few-shot, k=3, CoT)Accuracy57.2Unverified
3OPT 66B (few-shot, k=3)Accuracy54Unverified
4PaLM 540B (few-shot, k=3)Accuracy53.6Unverified
5GPT-NeoX 20B (few-shot, k=3)Accuracy52.8Unverified
6BLOOM 176B (few-shot, k=3)Accuracy52.8Unverified
7Chinchilla-70B (few-shot, k=5)Accuracy52.1Unverified
8Bloomberg GPT 50B (few-shot, k=3)Accuracy50.8Unverified
9Gopher-280B (few-shot, k=5)Accuracy50.7Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy84.9Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy65.8Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy48.7Unverified
4PaLM 540B (few-shot, k=3)Accuracy44.5Unverified
5Gopher-280B (few-shot, k=5)Accuracy40.6Unverified
6BLOOM 176B (few-shot, k=3)Accuracy40.41Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy37.67Unverified
8GPT-NeoX (few-shot, k=3)Accuracy33.56Unverified
9OPT 66B (few-shot, k=3)Accuracy28.08Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy91.2Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy61.2Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy59.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy49.2Unverified
5PaLM 540B (few-shot, k=3)Accuracy38Unverified
6BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy34.8Unverified
8OPT 66B (few-shot, k=3)Accuracy31.2Unverified
9GPT-NeoX (few-shot, k=3)Accuracy26Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy100Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy96.4Unverified
3PaLM 540B (few-shot, k=3)Accuracy39.6Unverified
4BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
5Chinchilla-70B (few-shot, k=5)Accuracy32Unverified
6Bloomberg GPT (few-shot, k=3)Accuracy29.2Unverified
7OPT 66B (few-shot, k=3)Accuracy23.6Unverified
8GPT-NeoX (few-shot, k=3)Accuracy21.2Unverified
9Gopher-280B (few-shot, k=5)Accuracy19Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy44Unverified
2PaLM-540B (few-shot, k=5)Accuracy42.4Unverified
3PaLM-62B (few-shot, k=5)Accuracy36.5Unverified
4Gopher-280B (few-shot, k=5)Accuracy35.1Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM-540B (few-shot, k=5)Accuracy73.9Unverified
2Chinchilla-70B (few-shot, k=5)Accuracy68.3Unverified
3PaLM-62B (few-shot, k=5)Accuracy65.4Unverified
4Gopher-280B (few-shot, k=5)Accuracy61Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy 83.7Unverified
2RuGPT-3 LargeAccuracy 40.7Unverified
3RuGPT-3 MediumAccuracy 38Unverified
4RuGPT-3 SmallAccuracy 34Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy87Unverified
2RuGPT-3 SmallAccuracy57.9Unverified
3RuGPT-3 MediumAccuracy57.2Unverified
4RuGPT-3 LargeAccuracy55.5Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy72.1Unverified
2Gopher-280B (few-shot, k=5)Accuracy58.9Unverified