SOTAVerified

Logical Reasoning

Papers

Showing 301350 of 747 papers

TitleStatusHype
Learning Syllogism with Euler Neural-Networks0
Bayesian Entailment Hypothesis: How Brains Implement Monotonic and Non-monotonic Reasoning0
TAR: Neural Logical Reasoning across TBox and ABox0
Learning Reliable Logical Rules with SATNet0
Learning Symbolic Persistent Macro-Actions for POMDP Solving Over Time0
"Let's Argue Both Sides": Argument Generation Can Force Small Models to Utilize Previously Inaccessible Reasoning Capabilities0
A New Fundamental Evidence of Non-Classical Structure in the Combination of Natural Concepts0
Balancing Exploration and Exploitation in LLM using Soft RLLF for Enhanced Negation Understanding0
Learning Distributed Word Representations for Natural Logic Reasoning0
Deciphering Digital Detectives: Understanding LLM Behaviors and Capabilities in Multi-Agent Mystery Games0
Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation0
An Explainable Fast Deep Neural Network for Emotion Recognition0
Learning Guided Automated Reasoning: A Brief Survey0
Axiom Learning and Belief Tracing for Transparent Decision Making in Robotics0
Interactive Visual Assessment for Text-to-Image Generation Models0
DB-Explore: Automated Database Exploration and Instruction Synthesis for Text-to-SQL0
Instantiation-based Formalization of Logical Reasoning Tasks using Language Models and Logical Solvers0
Autoregressive Image Generation Guided by Chains of Thought0
Infi-MMR: Curriculum-based Unlocking Multimodal Reasoning via Phased Reinforcement Learning in Multimodal Small Language Models0
Data Science with Vadalog: Bridging Machine Learning and Reasoning0
LeafAI: query generator for clinical cohort discovery rivaling a human programmer0
Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing0
Let's Reinforce Step by Step0
Inferring User Preferences by Probabilistic Logical Reasoning over Social Networks0
Interleaved Reasoning for Large Language Models via Reinforcement Learning0
DBRouting: Routing End User Queries to Databases for Answerability0
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning0
Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs0
Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving?0
Is ChatGPT a Good Personality Recognizer? A Preliminary Study0
Is writing style predictive of scientific fraud?0
Is writing style predictive of scientific fraud?0
JAMES: Normalizing Job Titles with Multi-Aspect Graph Embeddings and Reasoning0
Join-Chain Network: A Logical Reasoning View of the Multi-head Attention in Transformer0
Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights0
Judgment of Thoughts: Courtroom of the Binary Logical Reasoning in Large Language Models0
Deduction under Perturbed Evidence: Probing Student Simulation Capabilities of Large Language Models0
KARGEN: Knowledge-enhanced Automated Radiology Report Generation Using Large Language Models0
Curriculum Abductive Learning0
KnowGraph: Knowledge-Enabled Anomaly Detection via Logical Reasoning on Graph Data0
Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks0
Knowledge Authoring with Factual English, Rules, and Actions0
Automating Mathematical Proof Generation Using Large Language Model Agents and Knowledge Graphs0
Improving Complex Reasoning over Knowledge Graph with Logic-Aware Curriculum Tuning0
Large Language Models (LLMs) as Traffic Control Systems at Urban Intersections: A New Paradigm0
Knowledge Informed Semantic Parsing for Conversational Question Answering0
Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions0
Improving Coherence and Consistency in Neural Sequence Models with Dual-System, Neuro-Symbolic Reasoning0
LAD-Reasoner: Tiny Multimodal Models are Good Reasoners for Logical Anomaly Detection0
Automated Theorem Provers Help Improve Large Language Model Reasoning0
Show:102550
← PrevPage 7 of 15Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Claude OpusDelta_NoContext28.8Unverified
2GPT-4oDelta_NoContext25.1Unverified
3Gemini 1.5 ProDelta_NoContext23.4Unverified
4GPT-4Delta_NoContext21.5Unverified
5Command R+Delta_NoContext11.6Unverified
6GPT-3.5Delta_NoContext11.2Unverified
7Mixtral 8x7BDelta_NoContext6.4Unverified
8Llama 3 8BDelta_NoContext4.9Unverified
9Llama 3 70BDelta_NoContext2.9Unverified
10Gemma 7BDelta_NoContext2.2Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, Direct)Accuracy64.8Unverified
2PaLM 2 (few-shot, k=3, CoT)Accuracy57.2Unverified
3OPT 66B (few-shot, k=3)Accuracy54Unverified
4PaLM 540B (few-shot, k=3)Accuracy53.6Unverified
5GPT-NeoX 20B (few-shot, k=3)Accuracy52.8Unverified
6BLOOM 176B (few-shot, k=3)Accuracy52.8Unverified
7Chinchilla-70B (few-shot, k=5)Accuracy52.1Unverified
8Bloomberg GPT 50B (few-shot, k=3)Accuracy50.8Unverified
9Gopher-280B (few-shot, k=5)Accuracy50.7Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy84.9Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy65.8Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy48.7Unverified
4PaLM 540B (few-shot, k=3)Accuracy44.5Unverified
5Gopher-280B (few-shot, k=5)Accuracy40.6Unverified
6BLOOM 176B (few-shot, k=3)Accuracy40.41Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy37.67Unverified
8GPT-NeoX (few-shot, k=3)Accuracy33.56Unverified
9OPT 66B (few-shot, k=3)Accuracy28.08Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy91.2Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy61.2Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy59.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy49.2Unverified
5PaLM 540B (few-shot, k=3)Accuracy38Unverified
6BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy34.8Unverified
8OPT 66B (few-shot, k=3)Accuracy31.2Unverified
9GPT-NeoX (few-shot, k=3)Accuracy26Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy100Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy96.4Unverified
3PaLM 540B (few-shot, k=3)Accuracy39.6Unverified
4BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
5Chinchilla-70B (few-shot, k=5)Accuracy32Unverified
6Bloomberg GPT (few-shot, k=3)Accuracy29.2Unverified
7OPT 66B (few-shot, k=3)Accuracy23.6Unverified
8GPT-NeoX (few-shot, k=3)Accuracy21.2Unverified
9Gopher-280B (few-shot, k=5)Accuracy19Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy44Unverified
2PaLM-540B (few-shot, k=5)Accuracy42.4Unverified
3PaLM-62B (few-shot, k=5)Accuracy36.5Unverified
4Gopher-280B (few-shot, k=5)Accuracy35.1Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM-540B (few-shot, k=5)Accuracy73.9Unverified
2Chinchilla-70B (few-shot, k=5)Accuracy68.3Unverified
3PaLM-62B (few-shot, k=5)Accuracy65.4Unverified
4Gopher-280B (few-shot, k=5)Accuracy61Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy 83.7Unverified
2RuGPT-3 LargeAccuracy 40.7Unverified
3RuGPT-3 MediumAccuracy 38Unverified
4RuGPT-3 SmallAccuracy 34Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy87Unverified
2RuGPT-3 SmallAccuracy57.9Unverified
3RuGPT-3 MediumAccuracy57.2Unverified
4RuGPT-3 LargeAccuracy55.5Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy72.1Unverified
2Gopher-280B (few-shot, k=5)Accuracy58.9Unverified