SOTAVerified

Math

Papers

Showing 251275 of 1596 papers

TitleStatusHype
MathViz-E: A Case-study in Domain-Specialized Tool-Using AgentsCode1
MathPrompter: Mathematical Reasoning using Large Language ModelsCode1
Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image ModelsCode1
Building Dataset for Grounding of Formulae — Annotating Coreference Relations Among Math IdentifiersCode1
Broken Neural Scaling LawsCode1
Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic TasksCode1
Brilla AI: AI Contestant for the National Science and Maths QuizCode1
Forgotten Polygons: Multimodal Large Language Models are Shape-BlindCode1
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle SolvingCode1
Ape210K: A Large-Scale and Template-Rich Dataset of Math Word ProblemsCode1
FormulaNet: A Benchmark Dataset for Mathematical Formula DetectionCode1
Bridging and Modeling Correlations in Pairwise Data for Direct Preference OptimizationCode1
FELM: Benchmarking Factuality Evaluation of Large Language ModelsCode1
Fine-Tuning Large Language Models on Quantum Optimization Problems for Circuit GenerationCode1
MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn InteractionsCode1
Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and ObservationsCode1
Expression Syntax Information Bottleneck for Math Word ProblemsCode1
BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoningCode1
Boosting Large Language Models with Socratic Method for Conversational Mathematics TeachingCode1
Explaining Datasets in Words: Statistical Models with Natural Language ParametersCode1
MathBERT: A Pre-trained Language Model for General NLP Tasks in Mathematics EducationCode1
MATHWELL: Generating Educational Math Word Problems Using Teacher AnnotationsCode1
MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for ReasoningCode1
Mathfish: Evaluating Language Model Math Reasoning via Grounding in Educational CurriculaCode1
BlenderGym: Benchmarking Foundational Model Systems for Graphics EditingCode1
Show:102550
← PrevPage 11 of 64Next →

No leaderboard results yet.