SOTAVerified

Math

Papers

Showing 426450 of 1596 papers

TitleStatusHype
DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical ReasoningCode1
OJBench: A Competition Level Code Benchmark For Large Language ModelsCode1
FELM: Benchmarking Factuality Evaluation of Large Language ModelsCode1
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle SolvingCode1
FormulaNet: A Benchmark Dataset for Mathematical Formula DetectionCode1
Expression Syntax Information Bottleneck for Math Word ProblemsCode1
Explaining Datasets in Words: Statistical Models with Natural Language ParametersCode1
EXAONE Deep: Reasoning Enhanced Language ModelsCode1
Evolving Prompts In-Context: An Open-ended, Self-replicating PerspectiveCode1
CLEVR-Math: A Dataset for Compositional Language, Visual and Mathematical ReasoningCode1
Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical ReasoningCode1
BlenderGym: Benchmarking Foundational Model Systems for Graphics EditingCode1
Non-Autoregressive Math Word Problem Solver with Unified Tree StructureCode1
NeMo-Inspector: A Visualization Tool for LLM Generation AnalysisCode1
Mathfish: Evaluating Language Model Math Reasoning via Grounding in Educational CurriculaCode1
Nerva: a Truly Sparse Implementation of Neural NetworksCode1
Aioli: A Unified Optimization Framework for Language Model Data MixingCode1
Natural Language Embedded Programs for Hybrid Language Symbolic ReasoningCode1
Neural-Symbolic Solver for Math Word Problems with Auxiliary TasksCode1
CityGPT: Empowering Urban Spatial Cognition of Large Language ModelsCode1
Mathematical Capabilities of ChatGPTCode1
On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty AgentsCode1
EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability TreesCode1
Evaluating and Improving Tool-Augmented Computation-Intensive Math ReasoningCode1
MWPToolkit: An Open-Source Framework for Deep Learning-Based Math Word Problem SolversCode1
Show:102550
← PrevPage 18 of 64Next →

No leaderboard results yet.