SOTAVerified

Math

Papers

Showing 451500 of 1596 papers

TitleStatusHype
CityGPT: Empowering Urban Spatial Cognition of Large Language ModelsCode1
On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty AgentsCode1
FinanceMath: Knowledge-Intensive Math Reasoning in Finance DomainsCode1
Eliciting Latent Knowledge from Quirky Language ModelsCode1
Kalman Filter Enhanced GRPO for Reinforcement Learning-Based Language Model ReasoningCode1
Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context LearningCode1
TAIA: Large Language Models are Out-of-Distribution Data LearnersCode1
TANQ: An open domain dataset of table answered questionsCode1
Teaching Language Models to Self-Improve through Interactive DemonstrationsCode1
Template-based math word problem solvers with recursive neural networksCode1
Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and ObservationsCode1
The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language ModelsCode1
The NCTE Transcripts: A Dataset of Elementary Math Classroom TranscriptsCode1
TheoremQA: A Theorem-driven Question Answering datasetCode1
Is ChatGPT a Good Teacher Coach? Measuring Zero-Shot Performance For Scoring and Providing Actionable Insights on Classroom InstructionCode1
JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis ModelsCode1
JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem UnderstandingCode1
Ape210K: A Large-Scale and Template-Rich Dataset of Math Word ProblemsCode1
ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language ModelsCode1
Thinking Preference OptimizationCode1
ArMATH: a Dataset for Solving Arabic Math Word ProblemsCode1
Arithmetic Without Algorithms: Language Models Solve Math With a Bag of HeuristicsCode1
Evolving Prompts In-Context: An Open-ended, Self-replicating PerspectiveCode1
Injecting Numerical Reasoning Skills into Language ModelsCode1
Broken Neural Scaling LawsCode1
Toward Adaptive Reasoning in Large Language Models with Thought RollbackCode1
Building Dataset for Grounding of Formulae — Annotating Coreference Relations Among Math IdentifiersCode1
Towards an AI to Win Ghana's National Science and Maths QuizCode1
Large Language Models Are Neurosymbolic ReasonersCode1
How to Get Your LLM to Generate Challenging Problems for EvaluationCode1
Entropy-Based Adaptive Weighting for Self-TrainingCode1
MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model EvaluationCode1
How well do Large Language Models perform in Arithmetic tasks?Code1
Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM ReasoningCode1
HARDMath: A Benchmark Dataset for Challenging Problems in Applied MathematicsCode1
HARP: A challenging human-annotated math reasoning benchmarkCode1
Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word ProblemsCode1
HALO: Hierarchical Autonomous Logic-Oriented Orchestration for Multi-Agent LLM SystemsCode1
Can an AI Win Ghana's National Science and Maths Quiz? An AI Grand Challenge for EducationCode1
Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image ModelsCode1
Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step ReasoningCode1
Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive PrinciplesCode1
EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability TreesCode1
Evaluating and Improving Tool-Augmented Computation-Intensive Math ReasoningCode1
Implicit Chain of Thought Reasoning via Knowledge DistillationCode1
Are NLP Models really able to Solve Simple Math Word Problems?Code1
Case-Based or Rule-Based: How Do Transformers Do the Math?Code1
Graph-to-Tree Neural Networks for Learning Structured Input-Output Translation with Applications to Semantic Parsing and Math Word ProblemCode1
CARL-GT: Evaluating Causal Reasoning Capabilities of Large Language ModelsCode1
AI Coders Are Among Us: Rethinking Programming Language Grammar Towards Efficient Code GenerationCode1
Show:102550
← PrevPage 10 of 32Next →

No leaderboard results yet.