SOTAVerified

Logical Reasoning

Papers

Showing 301350 of 747 papers

TitleStatusHype
Hint-before-Solving Prompting: Guiding LLMs to Effectively Utilize Encoded KnowledgeCode0
HLM-Cite: Hybrid Language Model Workflow for Text-based Scientific Citation PredictionCode0
LogicPro: Improving Complex Logical Reasoning via Program-Guided LearningCode0
Context Transformer with Stacked Pointer Networks for Conversational Question Answering over Knowledge GraphsCode0
Exploring Self-supervised Logic-enhanced Training for Large Language ModelsCode0
Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language ModelsCode0
Logical Reasoning with Span-Level Predictions for Interpretable and Robust NLI ModelsCode0
Logical Tasks for Measuring Extrapolation and Rule ComprehensionCode0
Atomic Inference for NLI with Generated Facts as AtomsCode0
Logical Reasoning over Natural Language as Knowledge Representation: A SurveyCode0
A Neural Divide-and-Conquer Reasoning Framework for Image Retrieval from Linguistically Complex TextCode0
Conditional Logical Message Passing Transformer for Complex Query AnsweringCode0
LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct LanguagesCode0
Semantic RL with Action Grammars: Data-Efficient Learning of Hierarchical Task AbstractionsCode0
GOTaxon: Representing the evolution of biological functions in the Gene OntologyCode0
Large Language Models are Limited in Out-of-Context Knowledge ReasoningCode0
Leveraging LLMs for Hypothetical Deduction in Logical Inference: A Neuro-Symbolic ApproachCode0
Generating Programmatic Referring Expressions via Program SynthesisCode0
Leveraging large language models for nano synthesis mechanism explanation: solid foundations or mere conjectures?Code0
Liar, Liar, Logical Mire: A Benchmark for Suppositional Reasoning in Large Language ModelsCode0
Generating by Understanding: Neural Visual Generation with Logical Symbol GroundingsCode0
GammaE: Gamma Embeddings for Logical Queries on Knowledge GraphsCode0
Learning the meanings of function words from grounded language using a visual question answering modelCode0
Learning Symmetric Rules with SATNetCode0
Learning for Long-Horizon Planning via Neuro-Symbolic Abductive ImitationCode0
A Structured Unplugged Approach for Foundational AI Literacy in Primary EducationCode0
From Babbling to Fluency: Evaluating the Evolution of Language Models in Terms of Human Language AcquisitionCode0
Investigating the Robustness of Natural Language Generation from Logical Forms via Counterfactual SamplesCode0
Language Model Guided Interpretable Video Action ReasoningCode0
Climate Finance BenchCode0
Language models show human-like content effects on reasoning tasksCode0
Adaptive Rectification Sampling for Test-Time Compute ScalingCode0
Large Language Models Are Cross-Lingual Knowledge-Free ReasonersCode0
One-Step Abductive Multi-Target Learning with Diverse Noisy Samples and Its Application to Tumour Segmentation for Breast CancerCode0
FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts0
City-LEO: Toward Transparent City Management Using LLM with End-to-End Optimization0
First Experiments with a Flexible Infrastructure for Normative Reasoning0
Few-shot Visual Reasoning with Meta-analogical Contrastive Learning0
FEVO: Financial Knowledge Expansion and Reasoning Evolution for Large Language Models0
Federated In-Context LLM Agent Learning0
ChatGPT is a Remarkable Tool -- For Experts0
Assessing the Reasoning Abilities of ChatGPT in the Context of Claim Verification0
Federated Neural Graph Databases0
FaiRR: Faithful and Robust Deductive Reasoning over Natural Language0
ChatABL: Abductive Learning via Natural Language Interaction with ChatGPT0
Assessing Step-by-Step Reasoning against Lexical Negation: A Case Study on Syllogism0
Extending Automated Deduction for Commonsense Reasoning0
Assessing SATNet's Ability to Solve the Symbol Grounding Problem0
A Mousetrap: Fooling Large Reasoning Models for Jailbreak with Chain of Iterative Chaos0
Exploring Generalization Ability of Pretrained Language Models on Arithmetic and Logical Reasoning0
Show:102550
← PrevPage 7 of 15Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Claude OpusDelta_NoContext28.8Unverified
2GPT-4oDelta_NoContext25.1Unverified
3Gemini 1.5 ProDelta_NoContext23.4Unverified
4GPT-4Delta_NoContext21.5Unverified
5Command R+Delta_NoContext11.6Unverified
6GPT-3.5Delta_NoContext11.2Unverified
7Mixtral 8x7BDelta_NoContext6.4Unverified
8Llama 3 8BDelta_NoContext4.9Unverified
9Llama 3 70BDelta_NoContext2.9Unverified
10Gemma 7BDelta_NoContext2.2Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, Direct)Accuracy64.8Unverified
2PaLM 2 (few-shot, k=3, CoT)Accuracy57.2Unverified
3OPT 66B (few-shot, k=3)Accuracy54Unverified
4PaLM 540B (few-shot, k=3)Accuracy53.6Unverified
5GPT-NeoX 20B (few-shot, k=3)Accuracy52.8Unverified
6BLOOM 176B (few-shot, k=3)Accuracy52.8Unverified
7Chinchilla-70B (few-shot, k=5)Accuracy52.1Unverified
8Bloomberg GPT 50B (few-shot, k=3)Accuracy50.8Unverified
9Gopher-280B (few-shot, k=5)Accuracy50.7Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy84.9Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy65.8Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy48.7Unverified
4PaLM 540B (few-shot, k=3)Accuracy44.5Unverified
5Gopher-280B (few-shot, k=5)Accuracy40.6Unverified
6BLOOM 176B (few-shot, k=3)Accuracy40.41Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy37.67Unverified
8GPT-NeoX (few-shot, k=3)Accuracy33.56Unverified
9OPT 66B (few-shot, k=3)Accuracy28.08Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy91.2Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy61.2Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy59.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy49.2Unverified
5PaLM 540B (few-shot, k=3)Accuracy38Unverified
6BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy34.8Unverified
8OPT 66B (few-shot, k=3)Accuracy31.2Unverified
9GPT-NeoX (few-shot, k=3)Accuracy26Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy100Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy96.4Unverified
3PaLM 540B (few-shot, k=3)Accuracy39.6Unverified
4BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
5Chinchilla-70B (few-shot, k=5)Accuracy32Unverified
6Bloomberg GPT (few-shot, k=3)Accuracy29.2Unverified
7OPT 66B (few-shot, k=3)Accuracy23.6Unverified
8GPT-NeoX (few-shot, k=3)Accuracy21.2Unverified
9Gopher-280B (few-shot, k=5)Accuracy19Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy44Unverified
2PaLM-540B (few-shot, k=5)Accuracy42.4Unverified
3PaLM-62B (few-shot, k=5)Accuracy36.5Unverified
4Gopher-280B (few-shot, k=5)Accuracy35.1Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM-540B (few-shot, k=5)Accuracy73.9Unverified
2Chinchilla-70B (few-shot, k=5)Accuracy68.3Unverified
3PaLM-62B (few-shot, k=5)Accuracy65.4Unverified
4Gopher-280B (few-shot, k=5)Accuracy61Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy 83.7Unverified
2RuGPT-3 LargeAccuracy 40.7Unverified
3RuGPT-3 MediumAccuracy 38Unverified
4RuGPT-3 SmallAccuracy 34Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy87Unverified
2RuGPT-3 SmallAccuracy57.9Unverified
3RuGPT-3 MediumAccuracy57.2Unverified
4RuGPT-3 LargeAccuracy55.5Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy72.1Unverified
2Gopher-280B (few-shot, k=5)Accuracy58.9Unverified