SOTAVerified

Logical Reasoning

Papers

Showing 451500 of 747 papers

TitleStatusHype
Deciphering Raw Data in Neuro-Symbolic Learning with Provable GuaranteesCode0
How susceptible are LLMs to Logical Fallacies?Code0
Evolving Scientific Discovery by Unifying Data and Background Knowledge with AI HilbertCode1
Learning the meanings of function words from grounded language using a visual question answering modelCode0
Boosting Logical Reasoning in Large Language Models through a New Framework: The Graph of Thought0
Learning Deductive Reasoning from Synthetic Corpus based on Formal LogicCode1
Thinking Like an Expert:Multimodal Hypergraph-of-Thought (HoT) Reasoning to boost Foundation Modals0
Cumulative Reasoning with Large Language ModelsCode2
Structural Embeddings of Tools for Large Language Models0
COLLIE: Systematic Construction of Constrained Text Generation TasksCode1
EFO_k-CQA: Towards Knowledge Graph Complex Query Answering beyond Set OperationCode1
Is ChatGPT a Good Personality Recognizer? A Preliminary Study0
What is the Title of this Paper? Solving logic puzzles using algorithms0
Meta-Reasoning: Semantics-Symbol Deconstruction for Large Language ModelsCode0
Counterfactual Collaborative Reasoning0
Exploring & Exploiting High-Order Graph Structure for Sparse Knowledge Graph Completion0
IDOL: Indicator-oriented Logic Pre-training for Logical ReasoningCode1
Evaluating Large Language Models with NeuBAROCO: Syllogistic Reasoning Ability and Human-like Biases0
Modeling Hierarchical Reasoning Chains by Linking Discourse Units and Key Phrases for Reading ComprehensionCode1
Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and BeyondCode1
Language to Rewards for Robotic Skill Synthesis0
V-LoL: A Diagnostic Dataset for Visual Logical LearningCode0
Large Language Models as Tax Attorneys: A Case Study in Legal Capabilities Emergence0
Human-in-the-Loop through Chain-of-Thought0
LogiQA 2.0—An Improved Dataset for Logical Reasoning in Natural Language UnderstandingCode0
Deductive Verification of Chain-of-Thought ReasoningCode1
Certified Deductive Reasoning with Language ModelsCode1
ChatGPT is a Remarkable Tool -- For Experts0
Knowledge-based Reasoning and Learning under Partial Observability in Ad Hoc Teamwork0
InDL: A New Dataset and Benchmark for In-Diagram Logic Interpretation based on Visual IllusionCode0
Synthesizing a Progression of Subtasks for Block-Based Visual Programming TasksCode0
Counterfactual reasoning: Testing language models' understanding of hypothetical scenariosCode1
Not wacky vs. definitely wacky: A study of scalar adverbs in pretrained language models0
Unlocking Temporal Question Answering for Large Language Models with Tailor-Made Reasoning LogicCode0
Deduction under Perturbed Evidence: Probing Student Simulation Capabilities of Large Language Models0
Exploring Self-supervised Logic-enhanced Training for Large Language ModelsCode0
Query Structure Modeling for Inductive Logical Reasoning Over Knowledge GraphsCode0
Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization0
Atomic Inference for NLI with Generated Facts as AtomsCode0
Teaching Probabilistic Logical Reasoning to TransformersCode0
Abstract Meaning Representation-Based Logic-Driven Data Augmentation for Logical ReasoningCode1
LogiCoT: Logical Chain-of-Thought Instruction-TuningCode1
Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical ReasoningCode2
Hint of Thought prompting: an explainable and zero-shot approach to reasoning tasks with LLMs0
A Simple Generative Model of Logical Reasoning and Statistical Learning0
Knowledge Authoring for Rules and Actions0
Scalable Coupling of Deep Learning with Logical ReasoningCode0
Not All Languages Are Created Equal in LLMs: Improving Multilingual Capability by Cross-Lingual-Thought PromptingCode1
Wasserstein-Fisher-Rao Embedding: Logical Query Embeddings with Local Comparison and Global TransportCode1
Improved Logical Reasoning of Language Models via Differentiable Symbolic ProgrammingCode1
Show:102550
← PrevPage 10 of 15Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Claude OpusDelta_NoContext28.8Unverified
2GPT-4oDelta_NoContext25.1Unverified
3Gemini 1.5 ProDelta_NoContext23.4Unverified
4GPT-4Delta_NoContext21.5Unverified
5Command R+Delta_NoContext11.6Unverified
6GPT-3.5Delta_NoContext11.2Unverified
7Mixtral 8x7BDelta_NoContext6.4Unverified
8Llama 3 8BDelta_NoContext4.9Unverified
9Llama 3 70BDelta_NoContext2.9Unverified
10Gemma 7BDelta_NoContext2.2Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, Direct)Accuracy64.8Unverified
2PaLM 2 (few-shot, k=3, CoT)Accuracy57.2Unverified
3OPT 66B (few-shot, k=3)Accuracy54Unverified
4PaLM 540B (few-shot, k=3)Accuracy53.6Unverified
5GPT-NeoX 20B (few-shot, k=3)Accuracy52.8Unverified
6BLOOM 176B (few-shot, k=3)Accuracy52.8Unverified
7Chinchilla-70B (few-shot, k=5)Accuracy52.1Unverified
8Bloomberg GPT 50B (few-shot, k=3)Accuracy50.8Unverified
9Gopher-280B (few-shot, k=5)Accuracy50.7Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy84.9Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy65.8Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy48.7Unverified
4PaLM 540B (few-shot, k=3)Accuracy44.5Unverified
5Gopher-280B (few-shot, k=5)Accuracy40.6Unverified
6BLOOM 176B (few-shot, k=3)Accuracy40.41Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy37.67Unverified
8GPT-NeoX (few-shot, k=3)Accuracy33.56Unverified
9OPT 66B (few-shot, k=3)Accuracy28.08Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy91.2Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy61.2Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy59.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy49.2Unverified
5PaLM 540B (few-shot, k=3)Accuracy38Unverified
6BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy34.8Unverified
8OPT 66B (few-shot, k=3)Accuracy31.2Unverified
9GPT-NeoX (few-shot, k=3)Accuracy26Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy100Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy96.4Unverified
3PaLM 540B (few-shot, k=3)Accuracy39.6Unverified
4BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
5Chinchilla-70B (few-shot, k=5)Accuracy32Unverified
6Bloomberg GPT (few-shot, k=3)Accuracy29.2Unverified
7OPT 66B (few-shot, k=3)Accuracy23.6Unverified
8GPT-NeoX (few-shot, k=3)Accuracy21.2Unverified
9Gopher-280B (few-shot, k=5)Accuracy19Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy44Unverified
2PaLM-540B (few-shot, k=5)Accuracy42.4Unverified
3PaLM-62B (few-shot, k=5)Accuracy36.5Unverified
4Gopher-280B (few-shot, k=5)Accuracy35.1Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM-540B (few-shot, k=5)Accuracy73.9Unverified
2Chinchilla-70B (few-shot, k=5)Accuracy68.3Unverified
3PaLM-62B (few-shot, k=5)Accuracy65.4Unverified
4Gopher-280B (few-shot, k=5)Accuracy61Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy 83.7Unverified
2RuGPT-3 LargeAccuracy 40.7Unverified
3RuGPT-3 MediumAccuracy 38Unverified
4RuGPT-3 SmallAccuracy 34Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy87Unverified
2RuGPT-3 SmallAccuracy57.9Unverified
3RuGPT-3 MediumAccuracy57.2Unverified
4RuGPT-3 LargeAccuracy55.5Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy72.1Unverified
2Gopher-280B (few-shot, k=5)Accuracy58.9Unverified