SOTAVerified

Math

Papers

Showing 14261450 of 1596 papers

TitleStatusHype
Bounds on Multi-asset Derivatives via Neural NetworksCode0
HAPO: Training Language Models to Reason Concisely via History-Aware Policy OptimizationCode0
Who is better at math, Jenny or Jingzhen? Uncovering Stereotypes in Large Language ModelsCode0
CER: Confidence Enhanced Reasoning in LLMsCode0
A Robustly Optimized Long Text to Math Models for Numerical Reasoning On FinQACode0
TutorGym: A Testbed for Evaluating AI Agents as Tutors and StudentsCode0
Continual Pre-training of Language Models for Math Problem Understanding with Syntax-Aware Memory NetworkCode0
Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts?Code0
Reasoning Graph Enhanced Exemplars Retrieval for In-Context LearningCode0
Reasoning in Large Language Models Through Symbolic Math Word ProblemsCode0
The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not LongerCode0
The Rise of Open Science: Tracking the Evolution and Perceived Value of Data and Methods Link-Sharing PracticesCode0
Greek2MathTex: A Greek Speech-to-Text Framework for LaTeX Equations GenerationCode0
SemEval-2019 Task 10: Math Question AnsweringCode0
Does ChatGPT Comprehend the Place Value in Numbers When Solving Math Word Problems?Code0
Sequence to General Tree: Knowledge-Guided Geometry Word Problem SolvingCode0
Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language ModelsCode0
GThinker: Towards General Multimodal Reasoning via Cue-Guided RethinkingCode0
Guided Speculative Inference for Efficient Test-Time Alignment of LLMsCode0
Can Vision-Language Models Evaluate Handwritten Math?Code0
Adversarial Examples for Evaluating Math Word Problem SolversCode0
Guiding Through Complexity: What Makes Good Supervision for Hard Reasoning Tasks?Code0
Effective Skill Unlearning through Intervention and AbstentionCode0
Not All Votes Count! Programs as Verifiers Improve Self-Consistency of Language Models for Math ReasoningCode0
HARDMath2: A Benchmark for Applied Mathematics Built by Students as Part of a Graduate ClassCode0
Show:102550
← PrevPage 58 of 64Next →

No leaderboard results yet.