SOTAVerified

Logical Reasoning

Papers

Showing 601650 of 747 papers

TitleStatusHype
Instantiation-based Formalization of Logical Reasoning Tasks using Language Models and Logical Solvers0
Interactive Visual Assessment for Text-to-Image Generation Models0
Interleaved Reasoning for Large Language Models via Reinforcement Learning0
Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation0
Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving?0
Is ChatGPT a Good Personality Recognizer? A Preliminary Study0
Is writing style predictive of scientific fraud?0
Is writing style predictive of scientific fraud?0
JAMES: Normalizing Job Titles with Multi-Aspect Graph Embeddings and Reasoning0
Join-Chain Network: A Logical Reasoning View of the Multi-head Attention in Transformer0
TAR: Neural Logical Reasoning across TBox and ABox0
Judgment of Thoughts: Courtroom of the Binary Logical Reasoning in Large Language Models0
KARGEN: Knowledge-enhanced Automated Radiology Report Generation Using Large Language Models0
KGCompiler: Deep Learning Compilation Optimization for Knowledge Graph Complex Logical Query Answering0
KnowGraph: Knowledge-Enabled Anomaly Detection via Logical Reasoning on Graph Data0
Knowledge Authoring for Rules and Actions0
Knowledge Authoring with Factual English, Rules, and Actions0
Knowledge-based Reasoning and Learning under Partial Observability in Ad Hoc Teamwork0
Knowledge Informed Semantic Parsing for Conversational Question Answering0
KnowRA: Knowledge Retrieval Augmented Method for Document-level Relation Extraction with Comprehensive Reasoning Abilities0
LAD-Reasoner: Tiny Multimodal Models are Good Reasoners for Logical Anomaly Detection0
LAMBADA: Backward Chaining for Automated Reasoning in Natural Language0
LaMOuR: Leveraging Language Models for Out-of-Distribution Recovery in Reinforcement Learning0
Mathematical Reasoning via Self-supervised Skip-tree Training0
Language Models can be Logical Solvers0
Language to Rewards for Robotic Skill Synthesis0
Large Language Model Enhanced Multi-Agent Systems for 6G Communications0
Large Language Models are Complex Table Parsers0
Large Language Models as an Indirect Reasoner: Contrapositive and Contradiction for Automated Reasoning0
Large Language Models as Tax Attorneys: A Case Study in Legal Capabilities Emergence0
Large Language Models (LLMs) as Traffic Control Systems at Urban Intersections: A New Paradigm0
Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions0
Large Language User Interfaces: Voice Interactive User Interfaces powered by LLMs0
Latent Feature Mining for Predictive Model Enhancement with Large Language Models0
LeafAI: query generator for clinical cohort discovery rivaling a human programmer0
Learning Distributed Word Representations for Natural Logic Reasoning0
Learning Guided Automated Reasoning: A Brief Survey0
Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing0
Learning Reliable Logical Rules with SATNet0
Learning Syllogism with Euler Neural-Networks0
Learning Symbolic Persistent Macro-Actions for POMDP Solving Over Time0
"Let's Argue Both Sides": Argument Generation Can Force Small Models to Utilize Previously Inaccessible Reasoning Capabilities0
Let's Reinforce Step by Step0
Leveraging Large Language Models with Chain-of-Thought and Prompt Engineering for Traffic Crash Severity Analysis and Inference0
LGR2: Language Guided Reward Relabeling for Accelerating Hierarchical Reinforcement Learning0
Lifelong Personalized Low-Rank Adaptation of Large Language Models for Recommendation0
LLM-Aided Efficient Hardware Design Automation0
LLM-ARC: Enhancing LLMs with an Automated Reasoning Critic0
LLM-Based Multi-Hop Question Answering with Knowledge Graph Integration in Evolving Environments0
LLMI3D: Empowering LLM with 3D Perception from a Single 2D Image0
Show:102550
← PrevPage 13 of 15Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Claude OpusDelta_NoContext28.8Unverified
2GPT-4oDelta_NoContext25.1Unverified
3Gemini 1.5 ProDelta_NoContext23.4Unverified
4GPT-4Delta_NoContext21.5Unverified
5Command R+Delta_NoContext11.6Unverified
6GPT-3.5Delta_NoContext11.2Unverified
7Mixtral 8x7BDelta_NoContext6.4Unverified
8Llama 3 8BDelta_NoContext4.9Unverified
9Llama 3 70BDelta_NoContext2.9Unverified
10Gemma 7BDelta_NoContext2.2Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, Direct)Accuracy64.8Unverified
2PaLM 2 (few-shot, k=3, CoT)Accuracy57.2Unverified
3OPT 66B (few-shot, k=3)Accuracy54Unverified
4PaLM 540B (few-shot, k=3)Accuracy53.6Unverified
5GPT-NeoX 20B (few-shot, k=3)Accuracy52.8Unverified
6BLOOM 176B (few-shot, k=3)Accuracy52.8Unverified
7Chinchilla-70B (few-shot, k=5)Accuracy52.1Unverified
8Bloomberg GPT 50B (few-shot, k=3)Accuracy50.8Unverified
9Gopher-280B (few-shot, k=5)Accuracy50.7Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy84.9Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy65.8Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy48.7Unverified
4PaLM 540B (few-shot, k=3)Accuracy44.5Unverified
5Gopher-280B (few-shot, k=5)Accuracy40.6Unverified
6BLOOM 176B (few-shot, k=3)Accuracy40.41Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy37.67Unverified
8GPT-NeoX (few-shot, k=3)Accuracy33.56Unverified
9OPT 66B (few-shot, k=3)Accuracy28.08Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy91.2Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy61.2Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy59.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy49.2Unverified
5PaLM 540B (few-shot, k=3)Accuracy38Unverified
6BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy34.8Unverified
8OPT 66B (few-shot, k=3)Accuracy31.2Unverified
9GPT-NeoX (few-shot, k=3)Accuracy26Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy100Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy96.4Unverified
3PaLM 540B (few-shot, k=3)Accuracy39.6Unverified
4BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
5Chinchilla-70B (few-shot, k=5)Accuracy32Unverified
6Bloomberg GPT (few-shot, k=3)Accuracy29.2Unverified
7OPT 66B (few-shot, k=3)Accuracy23.6Unverified
8GPT-NeoX (few-shot, k=3)Accuracy21.2Unverified
9Gopher-280B (few-shot, k=5)Accuracy19Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy44Unverified
2PaLM-540B (few-shot, k=5)Accuracy42.4Unverified
3PaLM-62B (few-shot, k=5)Accuracy36.5Unverified
4Gopher-280B (few-shot, k=5)Accuracy35.1Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM-540B (few-shot, k=5)Accuracy73.9Unverified
2Chinchilla-70B (few-shot, k=5)Accuracy68.3Unverified
3PaLM-62B (few-shot, k=5)Accuracy65.4Unverified
4Gopher-280B (few-shot, k=5)Accuracy61Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy 83.7Unverified
2RuGPT-3 LargeAccuracy 40.7Unverified
3RuGPT-3 MediumAccuracy 38Unverified
4RuGPT-3 SmallAccuracy 34Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy87Unverified
2RuGPT-3 SmallAccuracy57.9Unverified
3RuGPT-3 MediumAccuracy57.2Unverified
4RuGPT-3 LargeAccuracy55.5Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy72.1Unverified
2Gopher-280B (few-shot, k=5)Accuracy58.9Unverified