SOTAVerified

Math

Papers

Showing 12011225 of 1596 papers

TitleStatusHype
Instances Need More Care: Rewriting Prompts for Instances with LLMs in the Loop Yields Better Zero-Shot PerformanceCode0
Benchmarking and Improving Generator-Validator Consistency of Language Models0
Novice Learner and Expert Tutor: Evaluating Math Reasoning Abilities of Large Language Models with Misconceptions0
Fill in the Blank: Exploring and Enhancing LLM Capabilities for Backward Reasoning in Math Word ProblemsCode0
Investigating the Efficacy of Large Language Models in Reflective Assessment Methods through Chain of Thoughts Prompting0
L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models0
Fairness Hub Technical Briefs: AUC Gap0
Contrastive Decoding Improves Reasoning in Large Language Models0
Odd period cycles and ergodic properties in price dynamics for an exchange economy0
ChatGPT-4 with Code Interpreter can be used to solve introductory college-level vector calculus and electromagnetism problems0
Using Large Language Model to Solve and Explain Physics Word Problems Approaching Human Level0
MathAttack: Attacking Large Language Models Towards Math Solving Ability0
Solving Math Word Problem with Problem Type ClassificationCode0
GraphReason: Enhancing Reasoning Capabilities of Large Language Models through A Graph-Based Verification Approach0
Testing GPT-4 with Wolfram Alpha and Code Interpreter plug-ins on math and science problems0
NEOLAF, an LLM-powered neural-symbolic cognitive architecture0
Scalable and Equitable Math Problem Solving Strategy Prediction in Big Educational DataCode0
Automated Distractor and Feedback Generation for Math Multiple-choice Questions via In-context LearningCode0
Reasoning in Large Language Models Through Symbolic Math Word ProblemsCode0
Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models0
Augmented Math: Authoring AR-Based Explorable Explanations by Augmenting Static Math TextbooksCode0
A large language model-assisted education tool to provide feedback on open-ended responsesCode0
ARB: Advanced Reasoning Benchmark for Large Language Models0
Explaining Math Word Problem Solvers0
Controlling Equational Reasoning in Large Language Models with Prompt Interventions0
Show:102550
← PrevPage 49 of 64Next →

No leaderboard results yet.