SOTAVerified

Logical Reasoning

Papers

Showing 451500 of 747 papers

TitleStatusHype
I-Design: Personalized LLM Interior Designer0
Language Model Guided Interpretable Video Action ReasoningCode0
Classifying Conspiratorial Narratives At Scale: False Alarms and Erroneous ConnectionsCode0
Sphere Neural-Networks for Rational Reasoning0
Natural Language as Policies: Reasoning for Coordinate-Level Embodied Control with LLMs0
Reasoning in Transformers - Mitigating Spurious Correlations and Reasoning Shortcuts0
Transforming Competition into Collaboration: The Revolutionary Role of Multi-Agent Systems and Language Models in Modern OrganizationsCode0
Learning Guided Automated Reasoning: A Brief Survey0
Fuzzy Datalog^ over Arbitrary t-Norms0
AS-ES Learning: Towards Efficient CoT Learning in Small Models0
Balancing Exploration and Exploitation in LLM using Soft RLLF for Enhanced Negation Understanding0
Towards Generalist Prompting for Large Language Models by Mental Models0
Do Large Language Models Mirror Cognitive Language Processing?0
Enhanced User Interaction in Operating Systems through Machine Learning Language Models0
Hint-before-Solving Prompting: Guiding LLMs to Effectively Utilize Encoded KnowledgeCode0
Federated Neural Graph Databases0
A Neuro-Symbolic Approach to Multi-Agent RL for Interpretability and Probabilistic Decision Making0
Science Checker Reloaded: A Bidirectional Paradigm for Transparency and Logical ReasoningCode0
Reasoning Algorithmically in Graph Neural Networks0
Conditional Logical Message Passing Transformer for Complex Query AnsweringCode0
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models0
Do Large Language Models Understand Logic or Just Mimick Context?0
DiLA: Enhancing LLM Tool Learning with Differential Logic Layer0
Puzzle Solving using Reasoning of Large Language Models: A Survey0
Assessing the Reasoning Abilities of ChatGPT in the Context of Claim Verification0
Navigating the Dual Facets: A Comprehensive Evaluation of Sequential Memory Editing in Large Language Models0
Beyond LLMs: Advancing the Landscape of Complex Reasoning0
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs0
Beware of Words: Evaluating the Lexical Diversity of Conversational LLMs using ChatGPT as Case Study0
Large Language User Interfaces: Voice Interactive User Interfaces powered by LLMs0
Symbol Correctness in Deep Neural Networks Containing Symbolic Layers0
Large Language Models as an Indirect Reasoner: Contrapositive and Contradiction for Automated Reasoning0
Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models0
Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing0
Revisiting Document-Level Relation Extraction with Context-Guided Link PredictionCode0
Detection-based Intermediate Supervision for Visual Question Answering0
Dynamic In-Context Learning from Nearest Neighbors for Bundle Generation0
Empowering Few-Shot Recommender Systems with Large Language Models -- Enhanced RepresentationsCode0
Understanding Inter-Session Intentions via Complex Logical ReasoningCode0
The Good, The Bad, and Why: Unveiling Emotions in Generative AI0
Assessing Logical Reasoning Capabilities of Encoder-Only Transformer ModelsCode0
Assessing SATNet's Ability to Solve the Symbol Grounding Problem0
Large Language Model Enhanced Multi-Agent Systems for 6G Communications0
Large Language Models are Complex Table Parsers0
Exploring the Reversal Curse and Other Deductive Logical Reasoning in BERT and GPT-Based Large Language ModelsCode0
Deciphering Digital Detectives: Understanding LLM Behaviors and Capabilities in Multi-Agent Mystery Games0
Generation of Explanations for Logic Reasoning0
Enhancing Logical Reasoning in Large Language Models to Facilitate Legal Applications0
De-fine: Decomposing and Refining Visual Programs with Auto-Feedback0
WatME: Towards Lossless Watermarking Through Lexical Redundancy0
Show:102550
← PrevPage 10 of 15Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Claude OpusDelta_NoContext28.8Unverified
2GPT-4oDelta_NoContext25.1Unverified
3Gemini 1.5 ProDelta_NoContext23.4Unverified
4GPT-4Delta_NoContext21.5Unverified
5Command R+Delta_NoContext11.6Unverified
6GPT-3.5Delta_NoContext11.2Unverified
7Mixtral 8x7BDelta_NoContext6.4Unverified
8Llama 3 8BDelta_NoContext4.9Unverified
9Llama 3 70BDelta_NoContext2.9Unverified
10Gemma 7BDelta_NoContext2.2Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, Direct)Accuracy64.8Unverified
2PaLM 2 (few-shot, k=3, CoT)Accuracy57.2Unverified
3OPT 66B (few-shot, k=3)Accuracy54Unverified
4PaLM 540B (few-shot, k=3)Accuracy53.6Unverified
5GPT-NeoX 20B (few-shot, k=3)Accuracy52.8Unverified
6BLOOM 176B (few-shot, k=3)Accuracy52.8Unverified
7Chinchilla-70B (few-shot, k=5)Accuracy52.1Unverified
8Bloomberg GPT 50B (few-shot, k=3)Accuracy50.8Unverified
9Gopher-280B (few-shot, k=5)Accuracy50.7Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy84.9Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy65.8Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy48.7Unverified
4PaLM 540B (few-shot, k=3)Accuracy44.5Unverified
5Gopher-280B (few-shot, k=5)Accuracy40.6Unverified
6BLOOM 176B (few-shot, k=3)Accuracy40.41Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy37.67Unverified
8GPT-NeoX (few-shot, k=3)Accuracy33.56Unverified
9OPT 66B (few-shot, k=3)Accuracy28.08Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy91.2Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy61.2Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy59.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy49.2Unverified
5PaLM 540B (few-shot, k=3)Accuracy38Unverified
6BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
7Bloomberg GPT (few-shot, k=3)Accuracy34.8Unverified
8OPT 66B (few-shot, k=3)Accuracy31.2Unverified
9GPT-NeoX (few-shot, k=3)Accuracy26Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM 2 (few-shot, k=3, CoT)Accuracy100Unverified
2PaLM 2 (few-shot, k=3, Direct)Accuracy96.4Unverified
3PaLM 540B (few-shot, k=3)Accuracy39.6Unverified
4BLOOM 176B (few-shot, k=3)Accuracy36.8Unverified
5Chinchilla-70B (few-shot, k=5)Accuracy32Unverified
6Bloomberg GPT (few-shot, k=3)Accuracy29.2Unverified
7OPT 66B (few-shot, k=3)Accuracy23.6Unverified
8GPT-NeoX (few-shot, k=3)Accuracy21.2Unverified
9Gopher-280B (few-shot, k=5)Accuracy19Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy44Unverified
2PaLM-540B (few-shot, k=5)Accuracy42.4Unverified
3PaLM-62B (few-shot, k=5)Accuracy36.5Unverified
4Gopher-280B (few-shot, k=5)Accuracy35.1Unverified
#ModelMetricClaimedVerifiedStatus
1PaLM-540B (few-shot, k=5)Accuracy73.9Unverified
2Chinchilla-70B (few-shot, k=5)Accuracy68.3Unverified
3PaLM-62B (few-shot, k=5)Accuracy65.4Unverified
4Gopher-280B (few-shot, k=5)Accuracy61Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy 83.7Unverified
2RuGPT-3 LargeAccuracy 40.7Unverified
3RuGPT-3 MediumAccuracy 38Unverified
4RuGPT-3 SmallAccuracy 34Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy87Unverified
2RuGPT-3 SmallAccuracy57.9Unverified
3RuGPT-3 MediumAccuracy57.2Unverified
4RuGPT-3 LargeAccuracy55.5Unverified
#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy72.1Unverified
2Gopher-280B (few-shot, k=5)Accuracy58.9Unverified