SOTAVerified

Math

Papers

Showing 426450 of 1596 papers

TitleStatusHype
Non-myopic Generation of Language Models for Reasoning and PlanningCode1
Language Models as Science TutorsCode1
LASeR: Learning to Adaptively Select Reward Models with Multi-Armed BanditsCode1
Kalman Filter Enhanced GRPO for Reinforcement Learning-Based Language Model ReasoningCode1
JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis ModelsCode1
Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMsCode1
JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem UnderstandingCode1
Injecting Numerical Reasoning Skills into Language ModelsCode1
CLEVR-Math: A Dataset for Compositional Language, Visual and Mathematical ReasoningCode1
A Dynamic LLM-Powered Agent Network for Task-Oriented Agent CollaborationCode1
Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical ReasoningCode1
Is ChatGPT a Good Teacher Coach? Measuring Zero-Shot Performance For Scoring and Providing Actionable Insights on Classroom InstructionCode1
FinanceMath: Knowledge-Intensive Math Reasoning in Finance DomainsCode1
LLMThinkBench: Towards Basic Math Reasoning and Overthinking in Large Language ModelsCode1
Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive PrinciplesCode1
Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM ReasoningCode1
Aioli: A Unified Optimization Framework for Language Model Data MixingCode1
HARP: A challenging human-annotated math reasoning benchmarkCode1
How to Get Your LLM to Generate Challenging Problems for EvaluationCode1
CityGPT: Empowering Urban Spatial Cognition of Large Language ModelsCode1
On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty AgentsCode1
HALO: Hierarchical Autonomous Logic-Oriented Orchestration for Multi-Agent LLM SystemsCode1
HARDMath: A Benchmark Dataset for Challenging Problems in Applied MathematicsCode1
How well do Large Language Models perform in Arithmetic tasks?Code1
GOLD: Geometry Problem Solver with Natural Language DescriptionCode1
Show:102550
← PrevPage 18 of 64Next →

No leaderboard results yet.