SOTAVerified

Math

Papers

Showing 14011450 of 1596 papers

TitleStatusHype
LLM Performance for Code Generation on Noisy TasksCode0
Adversarial Math Word Problem GenerationCode0
Generalizing Math Word Problem Solvers via Solution DiversificationCode0
Can LLMs Master Math? Investigating Large Language Models on Math Stack ExchangeCode0
An Exploration of Self-Supervised Mutual Information Alignment for Multi-Task SettingsCode0
AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System NeedCode0
X-MAS: Towards Building Multi-Agent Systems with Heterogeneous LLMsCode0
Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay PerspectiveCode0
An Edge-Enhanced Hierarchical Graph-to-Tree Network for Math Word Problem SolvingCode0
Towards Effective and Efficient Continual Pre-training of Large Language ModelsCode0
Stream Aligner: Efficient Sentence-Level Alignment via Distribution InductionCode0
Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical SupervisionCode0
Brain-Inspired Two-Stage Approach: Enhancing Mathematical Reasoning by Imitating Human Thought ProcessesCode0
Towards Interpretable Sequence Continuation: Analyzing Shared Circuits in Large Language ModelsCode0
Wide & Deep Learning for Judging Student Performance in Online One-on-one Math ClassesCode0
Automatic Generation of Headlines for Online Math QuestionsCode0
We are Who We Cite: Bridges of Influence Between Natural Language Processing and Other Academic FieldsCode0
GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image GenerationCode0
LogicSolver: Towards Interpretable Math Word Problem Solving with Logical Prompt-enhanced LearningCode0
Semantically-Aligned Equation Generation for Solving and Reasoning Math Word ProblemsCode0
The paradox of the compositionality of natural language: a neural machine translation case studyCode0
Neural Machine Translation and Sequence-to-sequence Models: A TutorialCode0
Give me a hint: Can LLMs take a hint to solve math problems?Code0
TEIMMA: The First Content Reuse Annotator for Text, Images, and MathCode0
Structure-Unified M-Tree Coding Solver for MathWord ProblemCode0
Bounds on Multi-asset Derivatives via Neural NetworksCode0
HAPO: Training Language Models to Reason Concisely via History-Aware Policy OptimizationCode0
Who is better at math, Jenny or Jingzhen? Uncovering Stereotypes in Large Language ModelsCode0
CER: Confidence Enhanced Reasoning in LLMsCode0
A Robustly Optimized Long Text to Math Models for Numerical Reasoning On FinQACode0
TutorGym: A Testbed for Evaluating AI Agents as Tutors and StudentsCode0
Continual Pre-training of Language Models for Math Problem Understanding with Syntax-Aware Memory NetworkCode0
Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts?Code0
Reasoning Graph Enhanced Exemplars Retrieval for In-Context LearningCode0
Reasoning in Large Language Models Through Symbolic Math Word ProblemsCode0
The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not LongerCode0
The Rise of Open Science: Tracking the Evolution and Perceived Value of Data and Methods Link-Sharing PracticesCode0
Greek2MathTex: A Greek Speech-to-Text Framework for LaTeX Equations GenerationCode0
SemEval-2019 Task 10: Math Question AnsweringCode0
Does ChatGPT Comprehend the Place Value in Numbers When Solving Math Word Problems?Code0
Sequence to General Tree: Knowledge-Guided Geometry Word Problem SolvingCode0
Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language ModelsCode0
GThinker: Towards General Multimodal Reasoning via Cue-Guided RethinkingCode0
Guided Speculative Inference for Efficient Test-Time Alignment of LLMsCode0
Can Vision-Language Models Evaluate Handwritten Math?Code0
Adversarial Examples for Evaluating Math Word Problem SolversCode0
Guiding Through Complexity: What Makes Good Supervision for Hard Reasoning Tasks?Code0
Effective Skill Unlearning through Intervention and AbstentionCode0
Not All Votes Count! Programs as Verifiers Improve Self-Consistency of Language Models for Math ReasoningCode0
HARDMath2: A Benchmark for Applied Mathematics Built by Students as Part of a Graduate ClassCode0
Show:102550
← PrevPage 29 of 32Next →

No leaderboard results yet.