SOTAVerified

Math

Papers

Showing 10511100 of 1596 papers

TitleStatusHype
NUMCoT: Numerals and Units of Measurement in Chain-of-Thought Reasoning using Large Language ModelsCode0
Improve Mathematical Reasoning in Language Models by Automated Process Supervision0
mCoT: Multilingual Instruction Tuning for Reasoning Consistency in Language ModelsCode0
D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models0
Code Pretraining Improves Entity Tracking Abilities of Language Models0
Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation0
Cutting Through the Noise: Boosting LLM Performance on Math Word ProblemsCode0
Arithmetic Reasoning with LLM: Prolog Generation & Permutation0
MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time0
Learning Beyond Pattern Matching? Assaying Mathematical Understanding in LLMs0
Large Language Models Can Self-Correct with Key Condition Verification0
Can LLMs Solve longer Math Word Problems Better?Code0
"Turing Tests" For An AI Scientist0
Investigating Symbolic Capabilities of Large Language Models0
DOP: Diagnostic-Oriented Prompting for Large Language Models in Mathematical CorrectionCode0
Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving0
Continued Pretraining for Domain Adaptation of Wav2vec2.0 in Automatic Speech Recognition for Elementary Math Classroom Settings0
A safety realignment framework via subspace-oriented model fusion for large language modelsCode0
Meaning-Typed Programming: Language Abstraction and Runtime for Model-Integrated Applications0
MathDivide: Improved mathematical reasoning by large language models0
Can Large Language Models Replicate ITS Feedback on Open-Ended Math Questions?Code0
Learning to Solve Geometry Problems via Simulating Human Dual-Reasoning ProcessCode0
Aligning Tutor Discourse Supporting Rigorous Thinking with Tutee Content Mastery for Predicting Math Achievement0
LLMs can Find Mathematical Reasoning Mistakes by Pedagogical Chain-of-Thought0
MAmmoTH2: Scaling Instructions from the Web0
Assessing and Verifying Task Utility in LLM-Powered Applications0
Self-Refine Instruction-Tuning for Aligning Reasoning in Language Models0
A Careful Examination of Large Language Model Performance on Grade School Arithmetic0
Math Multiple Choice Question Generation via Human-Large Language Model Collaboration0
Iterative Reasoning Preference Optimization0
Small Language Models Need Strong Verifiers to Self-Correct ReasoningCode0
Describe-then-Reason: Improving Multimodal Mathematical Reasoning through Visual Comprehension Training0
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone0
PARAMANU-GANITA: Language Model with Mathematical Capabilities0
Improving Automated Distractor Generation for Math Multiple-choice Questions with Overgenerate-and-rank0
On the Empirical Complexity of Reasoning and Planning in LLMs0
Mental Stress Detection: Development and Evaluation of a Wearable In-Ear Plethysmography0
Personality-aware Student Simulation for Conversational Intelligent Tutoring Systems0
MathVC: An LLM-Simulated Multi-Character Virtual Classroom for Mathematics Education0
FRACTAL: Fine-Grained Scoring from Aggregate Text Labels0
MM-MATH: Advancing Multimodal Math Evaluation with Process Evaluation and Fine-grained ClassificationCode0
Data Augmentation with In-Context Learning and Comparative Evaluation in Math Word Problem Solving0
HyperCLOVA X Technical Report0
Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language ModelsCode0
LM^2: A Simple Society of Language Models Solves Complex ReasoningCode0
IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations0
Exploring the Mystery of Influential Data for Mathematical Reasoning0
Stable Code Technical Report0
Self-Demos: Eliciting Out-of-Demonstration Generalizability in Large Language ModelsCode0
Can LLMs Master Math? Investigating Large Language Models on Math Stack ExchangeCode0
Show:102550
← PrevPage 22 of 32Next →

No leaderboard results yet.