SOTAVerified

Logical Reasoning

Papers

Showing 501550 of 747 papers

TitleStatusHype
Compositional Distributional Cognition0
Consistent CCG Parsing over Multiple Sentences for Improved Logical Reasoning0
Context-Awareness and Interpretability of Rare Occurrences for Discovery and Formalization of Critical Failure Modes0
Continuous Chain of Thought Enables Parallel Exploration and Reasoning0
Controlled Natural Languages and Default Reasoning0
COOL: A Constraint Object-Oriented Logic Programming Language and its Neural-Symbolic Compilation System0
Counterfactual Collaborative Reasoning0
CP-Router: An Uncertainty-Aware Router Between LLM and LRM0
Curriculum Abductive Learning0
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning0
Data Science with Vadalog: Bridging Machine Learning and Reasoning0
DB-Explore: Automated Database Exploration and Instruction Synthesis for Text-to-SQL0
DBRouting: Routing End User Queries to Databases for Answerability0
Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs0
Deciphering Digital Detectives: Understanding LLM Behaviors and Capabilities in Multi-Agent Mystery Games0
Deduction under Perturbed Evidence: Probing Student Simulation Capabilities of Large Language Models0
De-fine: Decomposing and Refining Visual Programs with Auto-Feedback0
Deliberate Reasoning for LLMs as Structure-aware Planning with Accurate World Model0
DetectGPT-SC: Improving Detection of Text Generated by Large Language Models through Self-Consistency with Masked Predictions0
Detection-based Intermediate Supervision for Visual Question Answering0
Diagnosing the First-Order Logical Reasoning Ability Through LogicNLI0
Dialogue-based Explanations for Logical Reasoning using Structured Argumentation0
Discourse-Aware Graph Networks for Textual Logical Reasoning0
Discrete JEPA: Learning Discrete Token Representations without Reconstruction0
Distilling Instruction-following Abilities of Large Language Models with Task-aware Curriculum Planning0
DMWM: Dual-Mind World Model with Long-Term Imagination0
Does Entity Abstraction Help Generative Transformers Reason?0
Do Large Language Models Mirror Cognitive Language Processing?0
Do Large Language Models Truly Grasp Mathematics? An Empirical Exploration From Cognitive Psychology0
Do Large Language Models Understand Logic or Just Mimick Context?0
Dynamic In-Context Learning from Nearest Neighbors for Bundle Generation0
DyVal: Dynamic Evaluation of Large Language Models for Reasoning Tasks0
Efficient but Vulnerable: Benchmarking and Defending LLM Batch Prompting Attack0
Efficient Training and Inference of Hypergraph Reasoning Networks0
Emergent Symbols through Binding in External Memory0
Emotion Recognition in Conversation using Probabilistic Soft Logic0
Empowering LLMs with Logical Reasoning: A Comprehensive Survey0
Enhanced User Interaction in Operating Systems through Machine Learning Language Models0
Enhancing Large Language Model Efficiencyvia Symbolic Compression: A Formal Approach Towards Interpretability0
Enhancing Logical Reasoning in Large Language Models to Facilitate Legal Applications0
Enhancing Neural Mathematical Reasoning by Abductive Combination with Symbolic Library0
Enhancing Retrieval Systems with Inference-Time Logical Reasoning0
Enhancing Transformers for Generalizable First-Order Logical Entailment0
Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles0
Evaluating Large Language Models with NeuBAROCO: Syllogistic Reasoning Ability and Human-like Biases0
Evaluating the Potential of Leading Large Language Models in Reasoning Biology Questions0
Evident: a Development Methodology and a Knowledge Base Topology for Data Mining, Machine Learning and General Knowledge Management0
Explainability Is in the Mind of the Beholder: Establishing the Foundations of Explainable Artificial Intelligence0
Explicitly Encoding Structural Symmetry is Key to Length Generalization in Arithmetic Tasks0
Exploiting LLMs' Reasoning Capability to Infer Implicit Concepts in Legal Information Retrieval0
Show:102550
← PrevPage 11 of 15Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Claude OpusDelta_NoContext28.8Unverified
2GPT-4oDelta_NoContext25.1Unverified
3Gemini 1.5 ProDelta_NoContext23.4Unverified
4GPT-4Delta_NoContext21.5Unverified
5Command R+Delta_NoContext11.6Unverified
6GPT-3.5Delta_NoContext11.2Unverified
7Mixtral 8x7BDelta_NoContext6.4Unverified
8Llama 3 8BDelta_NoContext4.9Unverified
9Llama 3 70BDelta_NoContext2.9Unverified
10Gemma 7BDelta_NoContext2.2Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, Direct)Accuracy64.8Unverified
2PaLM 2 (few-shot, k=3, CoT)Accuracy57.2Unverified
3OPT 66B (few-shot, k=3)Accuracy54Unverified
4PaLM 540B (few-shot, k=3)Accuracy53.6Unverified
5GPT-NeoX 20B (few-shot, k=3)Accuracy52.8Unverified
6BLOOM 176B (few-shot, k=3)Accuracy52.8Unverified
7Chinchilla-70B (few-shot, k=5)Accuracy52.1Unverified
8Bloomberg GPT 50B (few-shot, k=3)Accuracy50.8Unverified
9Gopher-280B (few-shot, k=5)Accuracy50.7Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy84.9Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy65.8Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy48.7Unverified
4PaLM 540B (few-shot, k=3)Accuracy44.5Unverified
5Gopher-280B (few-shot, k=5)Accuracy40.6Unverified
6BLOOM 176B (few-shot, k=3)Accuracy40.41Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy37.67Unverified
8GPT-NeoX (few-shot, k=3)Accuracy33.56Unverified
9OPT 66B (few-shot, k=3)Accuracy28.08Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy91.2Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy61.2Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy59.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy49.2Unverified
5PaLM 540B (few-shot, k=3)Accuracy38Unverified
6BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy34.8Unverified
8OPT 66B (few-shot, k=3)Accuracy31.2Unverified
9GPT-NeoX (few-shot, k=3)Accuracy26Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy100Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy96.4Unverified
3PaLM 540B (few-shot, k=3)Accuracy39.6Unverified
4BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
5Chinchilla-70B (few-shot, k=5)Accuracy32Unverified
6Bloomberg GPT (few-shot, k=3)Accuracy29.2Unverified
7OPT 66B (few-shot, k=3)Accuracy23.6Unverified
8GPT-NeoX (few-shot, k=3)Accuracy21.2Unverified
9Gopher-280B (few-shot, k=5)Accuracy19Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy44Unverified
2PaLM-540B (few-shot, k=5)Accuracy42.4Unverified
3PaLM-62B (few-shot, k=5)Accuracy36.5Unverified
4Gopher-280B (few-shot, k=5)Accuracy35.1Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM-540B (few-shot, k=5)Accuracy73.9Unverified
2Chinchilla-70B (few-shot, k=5)Accuracy68.3Unverified
3PaLM-62B (few-shot, k=5)Accuracy65.4Unverified
4Gopher-280B (few-shot, k=5)Accuracy61Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy 83.7Unverified
2RuGPT-3 LargeAccuracy 40.7Unverified
3RuGPT-3 MediumAccuracy 38Unverified
4RuGPT-3 SmallAccuracy 34Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy87Unverified
2RuGPT-3 SmallAccuracy57.9Unverified
3RuGPT-3 MediumAccuracy57.2Unverified
4RuGPT-3 LargeAccuracy55.5Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy72.1Unverified
2Gopher-280B (few-shot, k=5)Accuracy58.9Unverified