SOTAVerified

Mathematical Problem-Solving

Papers

Showing 125 of 106 papers

TitleStatusHype
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?Code7
EvoAgentX: An Automated Framework for Evolving Agentic WorkflowsCode7
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical ReasoningCode5
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by TencentCode5
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language ModelCode4
MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data EngineCode4
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem SolvingCode3
Efficiently Serving LLM Reasoning Programs with CertaindexCode3
PCToolkit: A Unified Plug-and-Play Prompt Compression Toolkit of Large Language ModelsCode3
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language ModelsCode2
MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math DataCode2
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique PipelineCode2
Nexus: A Lightweight and Scalable Multi-Agent Framework for Complex Tasks AutomationCode2
Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph StructuresCode2
Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem SolvingCode2
Measuring Mathematical Problem Solving With the MATH DatasetCode2
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-SolvingCode2
MathFusion: Enhancing Mathematic Problem-solving of LLM through Instruction FusionCode1
BEATS: Optimizing LLM Mathematical Capabilities with BackVerify and Adaptive Disambiguate based Efficient Tree SearchCode1
MathCAMPS: Fine-grained Synthesis of Mathematical Problems From Human CurriculaCode1
Evaluating Language Models for Mathematics through InteractionsCode1
MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn InteractionsCode1
Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation CapabilitiesCode1
Abstractors and relational cross-attention: An inductive bias for explicit relational reasoning in TransformersCode1
A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark DatasetsCode1
Show:102550
← PrevPage 1 of 5Next →

No leaderboard results yet.