SOTAVerified

Math

Papers

Showing 351375 of 1596 papers

TitleStatusHype
Conic10K: A Challenging Math Problem Understanding and Reasoning DatasetCode1
EXAONE Deep: Reasoning Enhanced Language ModelsCode1
Evolving Prompts In-Context: An Open-ended, Self-replicating PerspectiveCode1
Explaining Datasets in Words: Statistical Models with Natural Language ParametersCode1
A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language ModelsCode1
Expression Syntax Information Bottleneck for Math Word ProblemsCode1
GOLD: Geometry Problem Solver with Natural Language DescriptionCode1
Multiple-Choice Questions are Efficient and Robust LLM EvaluatorsCode1
NeMo-Inspector: A Visualization Tool for LLM Generation AnalysisCode1
Plan, Verify and Switch: Integrated Reasoning with Diverse X-of-ThoughtsCode1
Evaluating and Improving Tool-Augmented Computation-Intensive Math ReasoningCode1
NLPBench: Evaluating Large Language Models on Solving NLP ProblemsCode1
ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language ModelsCode1
EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability TreesCode1
Memory-Efficient and Secure DNN Inference on TrustZone-enabled Consumer IoT DevicesCode1
HARP: A challenging human-annotated math reasoning benchmarkCode1
Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step ReasoningCode1
MedCaseReasoning: Evaluating and learning diagnostic reasoning from clinical case reportsCode1
Mathfish: Evaluating Language Model Math Reasoning via Grounding in Educational CurriculaCode1
A Symbolic Character-Aware Model for Solving Geometry ProblemsCode1
MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language ModelsCode1
Entropy-Based Adaptive Weighting for Self-TrainingCode1
Entropy-Regularized Process Reward ModelCode1
Math Word Problem Solving with Explicit Numerical ValuesCode1
Measuring Conversational Uptake: A Case Study on Student-Teacher InteractionsCode1
Show:102550
← PrevPage 15 of 64Next →

No leaderboard results yet.