SOTAVerified

Mathematical Problem-Solving

Papers

Showing 51100 of 106 papers

TitleStatusHype
Token-by-Token Regeneration and Domain Biases: A Benchmark of LLMs on Advanced Mathematical Problem-Solving0
Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMsCode1
VoxEval: Benchmarking the Knowledge Understanding Capabilities of End-to-End Spoken Language ModelsCode1
Efficiently Serving LLM Reasoning Programs with CertaindexCode3
Large Language Models for Mathematical AnalysisCode0
Training and Evaluating Language Models with Template-based Data GenerationCode1
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?Code7
Kwai-STaR: Transform LLMs into State-Transition Reasoners0
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by TencentCode5
VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning0
Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks0
Non-myopic Generation of Language Models for Reasoning and PlanningCode1
FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning0
Can LLMs plan paths with extra hints from solvers?0
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical ReasoningCode5
PersonaMath: Enhancing Math Reasoning through Persona-Driven Data Augmentation0
BEATS: Optimizing LLM Mathematical Capabilities with BackVerify and Adaptive Disambiguate based Efficient Tree SearchCode1
Building Math Agents with Multi-Turn Iterative Preference Learning0
Logic Contrastive Reasoning with Lightweight Large Language Model for Math Word Problems0
Benchmarking Large Language Models for Math Reasoning TasksCode0
MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data EngineCode4
MathCAMPS: Fine-grained Synthesis of Mathematical Problems From Human CurriculaCode1
MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math DataCode2
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language ModelsCode2
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-SolvingCode2
GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace TheoryCode0
Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning0
3D-Properties: Identifying Challenges in DPO and Charting a Path Forward0
OccamLLM: Fast and Exact Language Model Arithmetic in a Single Step0
MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn InteractionsCode1
The Buffer Mechanism for Multi-Step Information Reasoning in Language Models0
Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving0
Mixture-of-Instructions: Comprehensive Alignment of a Large Language Model through the Mixture of Diverse System Prompting Instructions0
Insights into Alignment: Evaluating DPO and its Variants Across Multiple TasksCode1
Mathify: Evaluating Large Language Models on Mathematical Problem Solving TasksCode0
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique PipelineCode2
Can LLMs Master Math? Investigating Large Language Models on Math Stack ExchangeCode0
PCToolkit: A Unified Plug-and-Play Prompt Compression Toolkit of Large Language ModelsCode3
SmallToLarge (S2L): Scalable Data Selection for Fine-tuning Large Language Models by Summarizing Training Trajectories of Small ModelsCode0
Premise Order Matters in Reasoning with Large Language Models0
Large Language Models for Mathematical Reasoning: Progresses and Challenges0
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language ModelCode4
Three Questions Concerning the Use of Large Language Models to Facilitate Mathematics Learning0
SEGO: Sequential Subgoal Optimization for Mathematical Problem-SolvingCode0
Data Contamination Through the Lens of TimeCode0
The Consensus Game: Language Model Generation via Equilibrium Search0
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem SolvingCode3
Beyond Traditional Teaching: The Potential of Large Language Models and Chatbots in Graduate Engineering Education0
Bayesian artificial brain with ChatGPT0
JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving0
Show:102550
← PrevPage 2 of 3Next →

No leaderboard results yet.