SOTAVerified

Mathematical Problem-Solving

Papers

Showing 110 of 106 papers

TitleStatusHype
EvoAgentX: An Automated Framework for Evolving Agentic WorkflowsCode7
LocationReasoner: Evaluating LLMs on Real-World Site Selection ReasoningCode0
TeleMath: A Benchmark for Large Language Models in Telecom Mathematical Problem Solving0
SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM ReasoningCode1
Solving Inequality Proofs with Large Language ModelsCode1
Chain-of-Code Collapse: Reasoning Failures in LLMs via Adversarial Prompting in Code GenerationCode0
MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal ReasoningCode1
PoLAR: Polar-Decomposed Low-Rank Adapter Representation0
Evaluation of LLMs for mathematical problem solving0
Decomposing Elements of Problem Solving: What "Math" Does RL Teach?Code0
Show:102550
← PrevPage 1 of 11Next →

No leaderboard results yet.