SOTAVerified

Math

Papers

Showing 501525 of 1596 papers

TitleStatusHype
Injecting Numerical Reasoning Skills into Language ModelsCode1
Explaining Datasets in Words: Statistical Models with Natural Language ParametersCode1
Implicit Chain of Thought Reasoning via Knowledge DistillationCode1
How to Get Your LLM to Generate Challenging Problems for EvaluationCode1
HARDMath: A Benchmark Dataset for Challenging Problems in Applied MathematicsCode1
EXAONE Deep: Reasoning Enhanced Language ModelsCode1
Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive PrinciplesCode1
How well do Large Language Models perform in Arithmetic tasks?Code1
FinanceMath: Knowledge-Intensive Math Reasoning in Finance DomainsCode1
LASeR: Learning to Adaptively Select Reward Models with Multi-Armed BanditsCode1
MedCaseReasoning: Evaluating and learning diagnostic reasoning from clinical case reportsCode1
Retrieval-augmented Generation to Improve Math Question-Answering: Trade-offs Between Groundedness and Human PreferenceCode1
Examining the Robustness of Large Language Models across Language Complexity0
Examining the Behavior of LLM Architectures Within the Framework of Standardized National Exams in Brazil0
Can Stories Help LLMs Reason? Curating Information Space Through Narrative0
Evolving LLMs' Self-Refinement Capability via Iterative Preference Optimization0
Can LLMs understand Math? -- Exploring the Pitfalls in Mathematical Reasoning0
A range characterization of the single-quadrant ADRT0
EvoGPT-f: An Evolutionary GPT Framework for Benchmarking Formal Math Languages0
AI4Math: A Native Spanish Benchmark for University-Level Mathematical Reasoning in Large Language Models0
Hard Math -- Easy UVM: Pragmatic solutions for verifying hardware algorithms using UVM0
Evaluating the Design Features of an Intelligent Tutoring System for Advanced Mathematics Learning0
Evaluating Robustness of Reward Models for Mathematical Reasoning0
Can LLMs Reason Abstractly Over Math Word Problems Without CoT? Disentangling Abstract Formulation From Arithmetic Computation0
A Graph-Based Synthetic Data Pipeline for Scaling High-Quality Reasoning Instructions0
Show:102550
← PrevPage 21 of 64Next →

No leaderboard results yet.