SOTAVerified

Mathematical Problem-Solving

Papers

Showing 150 of 106 papers

TitleStatusHype
EvoAgentX: An Automated Framework for Evolving Agentic WorkflowsCode7
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?Code7
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical ReasoningCode5
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by TencentCode5
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language ModelCode4
MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data EngineCode4
PCToolkit: A Unified Plug-and-Play Prompt Compression Toolkit of Large Language ModelsCode3
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem SolvingCode3
Efficiently Serving LLM Reasoning Programs with CertaindexCode3
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique PipelineCode2
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-SolvingCode2
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language ModelsCode2
Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph StructuresCode2
Nexus: A Lightweight and Scalable Multi-Agent Framework for Complex Tasks AutomationCode2
Measuring Mathematical Problem Solving With the MATH DatasetCode2
Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem SolvingCode2
MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math DataCode2
Solving Inequality Proofs with Large Language ModelsCode1
Entropy-Based Adaptive Weighting for Self-TrainingCode1
BEATS: Optimizing LLM Mathematical Capabilities with BackVerify and Adaptive Disambiguate based Efficient Tree SearchCode1
MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn InteractionsCode1
Training and Evaluating Language Models with Template-based Data GenerationCode1
MathFusion: Enhancing Mathematic Problem-solving of LLM through Instruction FusionCode1
Non-myopic Generation of Language Models for Reasoning and PlanningCode1
Abstractors and relational cross-attention: An inductive bias for explicit relational reasoning in TransformersCode1
VoxEval: Benchmarking the Knowledge Understanding Capabilities of End-to-End Spoken Language ModelsCode1
A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark DatasetsCode1
SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM ReasoningCode1
Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation CapabilitiesCode1
Evaluating Language Models for Mathematics through InteractionsCode1
Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMsCode1
MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal ReasoningCode1
Exposing Numeracy Gaps: A Benchmark to Evaluate Fundamental Numerical Abilities in Large Language ModelsCode1
MathCAMPS: Fine-grained Synthesis of Mathematical Problems From Human CurriculaCode1
RaDeR: Reasoning-aware Dense Retrieval ModelsCode1
Insights into Alignment: Evaluating DPO and its Variants Across Multiple TasksCode1
Forgotten Polygons: Multimodal Large Language Models are Shape-BlindCode1
Advancing Reasoning in Large Language Models: Promising Methods and Approaches0
Reasoning with OmniThought: A Large CoT Dataset with Verbosity and Cognitive Difficulty Annotations0
Bayesian artificial brain with ChatGPT0
MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task0
Large Language Models for Mathematical Reasoning: Progresses and Challenges0
Kwai-STaR: Transform LLMs into State-Transition Reasoners0
Automating Mathematical Proof Generation Using Large Language Model Agents and Knowledge Graphs0
JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving0
Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks0
How Do Large Language Monkeys Get Their Power (Laws)?0
Holistic Capability Preservation: Towards Compact Yet Comprehensive Reasoning Models0
Can reasoning models comprehend mathematical problems in Chinese ancient texts? An empirical study based on data from Suanjing Shishu0
Can LLMs plan paths with extra hints from solvers?0
Show:102550
← PrevPage 1 of 3Next →

No leaderboard results yet.