SOTAVerified

Logical Reasoning

Papers

Showing 651700 of 747 papers

TitleStatusHype
RobustLR: Evaluating Robustness to Logical Perturbation in Deductive ReasoningCode0
Weakly Supervised Knowledge Transfer with Probabilistic Logical Reasoning for Object DetectionCode0
MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep ThinkingCode0
Counterfactual Adversarial Learning with Representation InterpolationCode0
Rule Learning as Machine Translation using the Atomic Knowledge BankCode0
Multi-LogiEval: Towards Evaluating Multi-Step Logical Reasoning Ability of Large Language ModelsCode0
BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured DataCode0
TextGames: Learning to Self-Play Text-Based Puzzle Games via Language Model ReasoningCode0
SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solverCode0
Scalable Coupling of Deep Learning with Logical ReasoningCode0
A Structured Unplugged Approach for Foundational AI Literacy in Primary EducationCode0
Context Transformer with Stacked Pointer Networks for Conversational Question Answering over Knowledge GraphsCode0
Interpret Your Decision: Logical Reasoning Regularization for Generalization in Visual ClassificationCode0
Integrating Expert Knowledge into Logical Programs via LLMsCode0
Conditional Logical Message Passing Transformer for Complex Query AnsweringCode0
Instances Need More Care: Rewriting Prompts for Instances with LLMs in the Loop Yields Better Zero-Shot PerformanceCode0
Scaling Synthetic Logical Reasoning Datasets with Context-Sensitive Declarative GrammarsCode0
Inductive Logical Query Answering in Knowledge GraphsCode0
InDL: A New Dataset and Benchmark for In-Diagram Logic Interpretation based on Visual IllusionCode0
Assessing Logical Puzzle Solving in Large Language Models: Insights from a Minesweeper Case StudyCode0
Aligning Knowledge Graphs Provided by Humans and Generated from Neural Networks in Specific TasksCode0
Improving Multi-hop Logical Reasoning in Knowledge Graphs with Context-Aware Query Representation LearningCode0
How susceptible are LLMs to Logical Fallacies?Code0
HLM-Cite: Hybrid Language Model Workflow for Text-based Scientific Citation PredictionCode0
Science Checker Reloaded: A Bidirectional Paradigm for Transparency and Logical ReasoningCode0
A Dataset and Architecture for Visual Reasoning with a Working MemoryCode0
Neural Sequence-to-grid Module for Learning Symbolic RulesCode0
Neural Software AnalysisCode0
SedarEval: Automated Evaluation using Self-Adaptive RubricsCode0
Assessing the Alignment of FOL Closeness Metrics with Human JudgementCode0
Hint-before-Solving Prompting: Guiding LLMs to Effectively Utilize Encoded KnowledgeCode0
GOTaxon: Representing the evolution of biological functions in the Gene OntologyCode0
Weisfeiler and Leman Go RelationalCode0
Climate Finance BenchCode0
Generating Programmatic Referring Expressions via Program SynthesisCode0
Noisy Exemplars Make Large Language Models More Robust: A Domain-Agnostic Behavioral AnalysisCode0
Who Speaks Next? Multi-party AI Discussion Leveraging the Systematics of Turn-taking in Murder Mystery GamesCode0
Generating by Understanding: Neural Visual Generation with Logical Symbol GroundingsCode0
Exploring the Reversal Curse and Other Deductive Logical Reasoning in BERT and GPT-Based Large Language ModelsCode0
Sequential Recommendation with Probabilistic Logical ReasoningCode0
GammaE: Gamma Embeddings for Logical Queries on Knowledge GraphsCode0
From Babbling to Fluency: Evaluating the Evolution of Language Models in Terms of Human Language AcquisitionCode0
Object-centric proto-symbolic behavioural reasoning from pixelsCode0
Aristotle: Mastering Logical Reasoning with A Logic-Complete Decompose-Search-Resolve FrameworkCode0
FLEX: Feature-Logic Embedding Framework for CompleX Knowledge Graph ReasoningCode0
Exploring Reasoning Biases in Large Language Models Through Syllogism: Insights from the NeuBAROCO DatasetCode0
Classifying Conspiratorial Narratives At Scale: False Alarms and Erroneous ConnectionsCode0
One-Step Abductive Multi-Target Learning with Diverse Noisy Samples and Its Application to Tumour Segmentation for Breast CancerCode0
Assessing Logical Reasoning Capabilities of Encoder-Only Transformer ModelsCode0
EviNet: Evidential Reasoning Network for Resilient Graph Learning in the Open and Noisy EnvironmentsCode0
Show:102550
← PrevPage 14 of 15Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Claude OpusDelta_NoContext28.8Unverified
2GPT-4oDelta_NoContext25.1Unverified
3Gemini 1.5 ProDelta_NoContext23.4Unverified
4GPT-4Delta_NoContext21.5Unverified
5Command R+Delta_NoContext11.6Unverified
6GPT-3.5Delta_NoContext11.2Unverified
7Mixtral 8x7BDelta_NoContext6.4Unverified
8Llama 3 8BDelta_NoContext4.9Unverified
9Llama 3 70BDelta_NoContext2.9Unverified
10Gemma 7BDelta_NoContext2.2Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, Direct)Accuracy64.8Unverified
2PaLM 2 (few-shot, k=3, CoT)Accuracy57.2Unverified
3OPT 66B (few-shot, k=3)Accuracy54Unverified
4PaLM 540B (few-shot, k=3)Accuracy53.6Unverified
5GPT-NeoX 20B (few-shot, k=3)Accuracy52.8Unverified
6BLOOM 176B (few-shot, k=3)Accuracy52.8Unverified
7Chinchilla-70B (few-shot, k=5)Accuracy52.1Unverified
8Bloomberg GPT 50B (few-shot, k=3)Accuracy50.8Unverified
9Gopher-280B (few-shot, k=5)Accuracy50.7Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy84.9Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy65.8Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy48.7Unverified
4PaLM 540B (few-shot, k=3)Accuracy44.5Unverified
5Gopher-280B (few-shot, k=5)Accuracy40.6Unverified
6BLOOM 176B (few-shot, k=3)Accuracy40.41Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy37.67Unverified
8GPT-NeoX (few-shot, k=3)Accuracy33.56Unverified
9OPT 66B (few-shot, k=3)Accuracy28.08Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy91.2Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy61.2Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy59.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy49.2Unverified
5PaLM 540B (few-shot, k=3)Accuracy38Unverified
6BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy34.8Unverified
8OPT 66B (few-shot, k=3)Accuracy31.2Unverified
9GPT-NeoX (few-shot, k=3)Accuracy26Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy100Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy96.4Unverified
3PaLM 540B (few-shot, k=3)Accuracy39.6Unverified
4BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
5Chinchilla-70B (few-shot, k=5)Accuracy32Unverified
6Bloomberg GPT (few-shot, k=3)Accuracy29.2Unverified
7OPT 66B (few-shot, k=3)Accuracy23.6Unverified
8GPT-NeoX (few-shot, k=3)Accuracy21.2Unverified
9Gopher-280B (few-shot, k=5)Accuracy19Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy44Unverified
2PaLM-540B (few-shot, k=5)Accuracy42.4Unverified
3PaLM-62B (few-shot, k=5)Accuracy36.5Unverified
4Gopher-280B (few-shot, k=5)Accuracy35.1Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM-540B (few-shot, k=5)Accuracy73.9Unverified
2Chinchilla-70B (few-shot, k=5)Accuracy68.3Unverified
3PaLM-62B (few-shot, k=5)Accuracy65.4Unverified
4Gopher-280B (few-shot, k=5)Accuracy61Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy 83.7Unverified
2RuGPT-3 LargeAccuracy 40.7Unverified
3RuGPT-3 MediumAccuracy 38Unverified
4RuGPT-3 SmallAccuracy 34Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy87Unverified
2RuGPT-3 SmallAccuracy57.9Unverified
3RuGPT-3 MediumAccuracy57.2Unverified
4RuGPT-3 LargeAccuracy55.5Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy72.1Unverified
2Gopher-280B (few-shot, k=5)Accuracy58.9Unverified