SOTAVerified

Math

Papers

Showing 451500 of 1596 papers

TitleStatusHype
Reasoning with Reinforced Functional Token TuningCode1
Efficient RL Training for Reasoning Models via Length-Aware OptimizationCode1
Recall and Learn: A Memory-augmented Solver for Math Word ProblemsCode1
GOLD: Geometry Problem Solver with Natural Language DescriptionCode1
Graph-to-Tree Learning for Solving Math Word ProblemsCode1
Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation CapabilitiesCode1
Get an A in Math: Progressive Rectification PromptingCode1
Graph-to-Tree Neural Networks for Learning Structured Input-Output Translation with Applications to Semantic Parsing and Math Word ProblemCode1
GeoEval: Benchmark for Evaluating LLMs and Multi-Modal Models on Geometry Problem-SolvingCode1
REAL-Prover: Retrieval Augmented Lean Prover for Mathematical ReasoningCode1
Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image ModelsCode1
Training Step-Level Reasoning Verifiers with Formal Verification ToolsCode1
GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical ReasoningCode1
ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World ChallengesCode1
CLEVR-Math: A Dataset for Compositional Language, Visual and Mathematical ReasoningCode1
RaDeR: Reasoning-aware Dense Retrieval ModelsCode1
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data ContaminationCode1
Ape210K: A Large-Scale and Template-Rich Dataset of Math Word ProblemsCode1
QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?Code1
FormulaNet: A Benchmark Dataset for Mathematical Formula DetectionCode1
Aioli: A Unified Optimization Framework for Language Model Data MixingCode1
From GAN to WGANCode1
CityGPT: Empowering Urban Spatial Cognition of Large Language ModelsCode1
On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty AgentsCode1
Forgotten Polygons: Multimodal Large Language Models are Shape-BlindCode1
From Zero to Hero: Convincing with Extremely Complicated MathCode1
Building Dataset for Grounding of Formulae — Annotating Coreference Relations Among Math IdentifiersCode1
A Relation Spectrum Inheriting Taylor Series: Muscle Synergy and Coupling for HandCode1
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle SolvingCode1
NeMo-Inspector: A Visualization Tool for LLM Generation AnalysisCode1
ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language ModelsCode1
Neural-Symbolic Solver for Math Word Problems with Auxiliary TasksCode1
NLPBench: Evaluating Large Language Models on Solving NLP ProblemsCode1
Entropy-Regularized Process Reward ModelCode1
ArMATH: a Dataset for Solving Arabic Math Word ProblemsCode1
Arithmetic Without Algorithms: Language Models Solve Math With a Bag of HeuristicsCode1
FELM: Benchmarking Factuality Evaluation of Large Language ModelsCode1
Fine-Tuning Large Language Models on Quantum Optimization Problems for Circuit GenerationCode1
PromptCoT: Synthesizing Olympiad-level Problems for Mathematical Reasoning in Large Language ModelsCode1
Pretrained Language Models are Symbolic Mathematics Solvers too!Code1
MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model EvaluationCode1
Problem-Oriented Segmentation and Retrieval: Case Study on Tutoring ConversationsCode1
Expression Syntax Information Bottleneck for Math Word ProblemsCode1
Evaluating and Improving Tool-Augmented Computation-Intensive Math ReasoningCode1
Plan, Verify and Switch: Integrated Reasoning with Diverse X-of-ThoughtsCode1
EXAONE Deep: Reasoning Enhanced Language ModelsCode1
Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word ProblemsCode1
Explaining Datasets in Words: Statistical Models with Natural Language ParametersCode1
Are NLP Models really able to Solve Simple Math Word Problems?Code1
Case-Based or Rule-Based: How Do Transformers Do the Math?Code1
Show:102550
← PrevPage 10 of 32Next →

No leaderboard results yet.