SOTAVerified

Logical Reasoning

Papers

Showing 201250 of 747 papers

TitleStatusHype
Neural Software AnalysisCode0
Evaluating Creativity and Deception in Large Language Models: A Simulation Framework for Multi-Agent BalderdashCode0
Evaluating ChatGPT-4 Vision on Brazil's National Undergraduate Computer Science ExamCode0
MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep ThinkingCode0
Multi-LogiEval: Towards Evaluating Multi-Step Logical Reasoning Ability of Large Language ModelsCode0
MMM: Multi-stage Multi-task Learning for Multi-choice Reading ComprehensionCode0
A Closer Look at Logical Reasoning with LLMs: The Choice of Tool MattersCode0
Enhancing Logical Reasoning in Large Language Models through Graph-based Synthetic DataCode0
Meta-Reasoning: Semantics-Symbol Deconstruction for Large Language ModelsCode0
Aristotle: Mastering Logical Reasoning with A Logic-Complete Decompose-Search-Resolve FrameworkCode0
MedLogic-AQA: Enhancing Medical Question Answering with Abstractive Models Focusing on Logical StructuresCode0
Can recursive neural tensor networks learn logical reasoning?Code0
Improving Certified Robustness via Statistical Learning with Logical ReasoningCode0
Empower Nested Boolean Logic via Self-Supervised Curriculum LearningCode0
MetaLogic: Logical Reasoning Explanations with Fine-Grained StructureCode0
Empowering Few-Shot Recommender Systems with Large Language Models -- Enhanced RepresentationsCode0
LogiQA 2.0—An Improved Dataset for Logical Reasoning in Natural Language UnderstandingCode0
LogiGAN: Learning Logical Reasoning via Adversarial Pre-trainingCode0
LR-IAD:Mask-Free Industrial Anomaly Detection with Logical ReasoningCode0
LogicPro: Improving Complex Logical Reasoning via Program-Guided LearningCode0
Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language ModelsCode0
EchoPrompt: Instructing the Model to Rephrase Queries for Improved In-context LearningCode0
LR-XFL: Logical Reasoning-based Explainable Federated LearningCode0
DyVal: Dynamic Evaluation of Large Language Models for Reasoning TasksCode0
Are LLMs Reliable Translators of Logical Reasoning Across Lexically Diversified Contexts?Code0
Bridging Machine Learning and Logical Reasoning by Abductive LearningCode0
Breaking the Language Barrier: Improving Cross-Lingual Reasoning with Structured Self-AttentionCode0
Dual Thinking and Logical Processing -- Are Multi-modal Large Language Models Closing the Gap with Human Vision ?Code0
Exploring Self-supervised Logic-enhanced Training for Large Language ModelsCode0
Double Equivariance for Inductive Link Prediction for Both New Nodes and New Relation TypesCode0
POE: Process of Elimination for Multiple Choice ReasoningCode0
Atomic Inference for NLI with Generated Facts as AtomsCode0
Logical Reasoning over Natural Language as Knowledge Representation: A SurveyCode0
Logical Reasoning with Span-Level Predictions for Interpretable and Robust NLI ModelsCode0
Document-level Biomedical Relation Extraction Based on Multi-Dimensional Fusion Information and Multi-Granularity Logical ReasoningCode0
BloombergGPT: A Large Language Model for FinanceCode0
Logical Tasks for Measuring Extrapolation and Rule ComprehensionCode0
Exploring Reasoning Biases in Large Language Models Through Syllogism: Insights from the NeuBAROCO DatasetCode0
Dissecting Logical Reasoning in LLMs: A Fine-Grained Evaluation and Supervision StudyCode0
Disentangling Logic: The Role of Context in Large Language Model Reasoning CapabilitiesCode0
Large Language Models are Limited in Out-of-Context Knowledge ReasoningCode0
Leveraging LLMs for Hypothetical Deduction in Logical Inference: A Neuro-Symbolic ApproachCode0
Liar, Liar, Logical Mire: A Benchmark for Suppositional Reasoning in Large Language ModelsCode0
Learning Symmetric Rules with SATNetCode0
Learning for Long-Horizon Planning via Neuro-Symbolic Abductive ImitationCode0
Learning the meanings of function words from grounded language using a visual question answering modelCode0
Leveraging large language models for nano synthesis mechanism explanation: solid foundations or mere conjectures?Code0
LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct LanguagesCode0
DeLTa: A Decoding Strategy based on Logit Trajectory Prediction Improves Factuality and Reasoning AbilityCode0
DeepLogic: Towards End-to-End Differentiable Logical ReasoningCode0
Show:102550
← PrevPage 5 of 15Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Claude OpusDelta_NoContext28.8Unverified
2GPT-4oDelta_NoContext25.1Unverified
3Gemini 1.5 ProDelta_NoContext23.4Unverified
4GPT-4Delta_NoContext21.5Unverified
5Command R+Delta_NoContext11.6Unverified
6GPT-3.5Delta_NoContext11.2Unverified
7Mixtral 8x7BDelta_NoContext6.4Unverified
8Llama 3 8BDelta_NoContext4.9Unverified
9Llama 3 70BDelta_NoContext2.9Unverified
10Gemma 7BDelta_NoContext2.2Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, Direct)Accuracy64.8Unverified
2PaLM 2 (few-shot, k=3, CoT)Accuracy57.2Unverified
3OPT 66B (few-shot, k=3)Accuracy54Unverified
4PaLM 540B (few-shot, k=3)Accuracy53.6Unverified
5GPT-NeoX 20B (few-shot, k=3)Accuracy52.8Unverified
6BLOOM 176B (few-shot, k=3)Accuracy52.8Unverified
7Chinchilla-70B (few-shot, k=5)Accuracy52.1Unverified
8Bloomberg GPT 50B (few-shot, k=3)Accuracy50.8Unverified
9Gopher-280B (few-shot, k=5)Accuracy50.7Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy84.9Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy65.8Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy48.7Unverified
4PaLM 540B (few-shot, k=3)Accuracy44.5Unverified
5Gopher-280B (few-shot, k=5)Accuracy40.6Unverified
6BLOOM 176B (few-shot, k=3)Accuracy40.41Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy37.67Unverified
8GPT-NeoX (few-shot, k=3)Accuracy33.56Unverified
9OPT 66B (few-shot, k=3)Accuracy28.08Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy91.2Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy61.2Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy59.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy49.2Unverified
5PaLM 540B (few-shot, k=3)Accuracy38Unverified
6BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy34.8Unverified
8OPT 66B (few-shot, k=3)Accuracy31.2Unverified
9GPT-NeoX (few-shot, k=3)Accuracy26Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy100Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy96.4Unverified
3PaLM 540B (few-shot, k=3)Accuracy39.6Unverified
4BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
5Chinchilla-70B (few-shot, k=5)Accuracy32Unverified
6Bloomberg GPT (few-shot, k=3)Accuracy29.2Unverified
7OPT 66B (few-shot, k=3)Accuracy23.6Unverified
8GPT-NeoX (few-shot, k=3)Accuracy21.2Unverified
9Gopher-280B (few-shot, k=5)Accuracy19Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy44Unverified
2PaLM-540B (few-shot, k=5)Accuracy42.4Unverified
3PaLM-62B (few-shot, k=5)Accuracy36.5Unverified
4Gopher-280B (few-shot, k=5)Accuracy35.1Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM-540B (few-shot, k=5)Accuracy73.9Unverified
2Chinchilla-70B (few-shot, k=5)Accuracy68.3Unverified
3PaLM-62B (few-shot, k=5)Accuracy65.4Unverified
4Gopher-280B (few-shot, k=5)Accuracy61Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy 83.7Unverified
2RuGPT-3 LargeAccuracy 40.7Unverified
3RuGPT-3 MediumAccuracy 38Unverified
4RuGPT-3 SmallAccuracy 34Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy87Unverified
2RuGPT-3 SmallAccuracy57.9Unverified
3RuGPT-3 MediumAccuracy57.2Unverified
4RuGPT-3 LargeAccuracy55.5Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy72.1Unverified
2Gopher-280B (few-shot, k=5)Accuracy58.9Unverified