SOTAVerified

Math

Papers

Showing 326350 of 1596 papers

TitleStatusHype
FormulaNet: A Benchmark Dataset for Mathematical Formula DetectionCode1
Control LLM: Controlled Evolution for Intelligence Retention in LLMCode1
Math Word Problem Solving with Explicit Numerical ValuesCode1
MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language ModelsCode1
Explaining Datasets in Words: Statistical Models with Natural Language ParametersCode1
A Tree-Structured Decoder for Image-to-Markup GenerationCode1
Evolving Prompts In-Context: An Open-ended, Self-replicating PerspectiveCode1
EXAONE Deep: Reasoning Enhanced Language ModelsCode1
Expression Syntax Information Bottleneck for Math Word ProblemsCode1
MATHWELL: Generating Educational Math Word Problems Using Teacher AnnotationsCode1
Measuring Conversational Uptake: A Case Study on Student-Teacher InteractionsCode1
AutoBencher: Creating Salient, Novel, Difficult Datasets for Language ModelsCode1
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle SolvingCode1
Mathfish: Evaluating Language Model Math Reasoning via Grounding in Educational CurriculaCode1
CoT-based Synthesizer: Enhancing LLM Performance through Answer SynthesisCode1
A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource SettingsCode1
Conic10K: A Challenging Math Problem Understanding and Reasoning DatasetCode1
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human AnnotationsCode1
Evaluating and Improving Tool-Augmented Computation-Intensive Math ReasoningCode1
EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability TreesCode1
FELM: Benchmarking Factuality Evaluation of Large Language ModelsCode1
A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language ModelsCode1
MathViz-E: A Case-study in Domain-Specialized Tool-Using AgentsCode1
MUSTARD: Mastering Uniform Synthesis of Theorem and Proof DataCode1
Entropy-Based Adaptive Weighting for Self-TrainingCode1
Show:102550
← PrevPage 14 of 64Next →

No leaderboard results yet.