SOTAVerified

Math

Papers

Showing 251300 of 1596 papers

TitleStatusHype
MathViz-E: A Case-study in Domain-Specialized Tool-Using AgentsCode1
Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic TasksCode1
Brilla AI: AI Contestant for the National Science and Maths QuizCode1
MATHWELL: Generating Educational Math Word Problems Using Teacher AnnotationsCode1
MathPrompter: Mathematical Reasoning using Large Language ModelsCode1
Ape210K: A Large-Scale and Template-Rich Dataset of Math Word ProblemsCode1
Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical ReasoningCode1
Bridging and Modeling Correlations in Pairwise Data for Direct Preference OptimizationCode1
Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward PassesCode1
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human AnnotationsCode1
Math Word Problem Solving with Explicit Numerical ValuesCode1
Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and ObservationsCode1
Mathematical Capabilities of ChatGPTCode1
MathGloss: Building mathematical glossaries from textCode1
MathDial: A Dialogue Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning ProblemsCode1
Math-KG: Construction and Applications of Mathematical Knowledge GraphCode1
MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn InteractionsCode1
BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoningCode1
Boosting Large Language Models with Socratic Method for Conversational Mathematics TeachingCode1
DataEnvGym: Data Generation Agents in Teacher Environments with Student FeedbackCode1
Multiple-Choice Questions are Efficient and Robust LLM EvaluatorsCode1
Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language ModelsCode1
BlenderGym: Benchmarking Foundational Model Systems for Graphics EditingCode1
An In-depth Look at Gemini's Language AbilitiesCode1
MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language ModelsCode1
Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMsCode1
MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for ReasoningCode1
Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMsCode1
Is Bigger and Deeper Always Better? Probing LLaMA Across Scales and LayersCode1
LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy PreservationCode1
LLMThinkBench: Towards Basic Math Reasoning and Overthinking in Large Language ModelsCode1
LoRA Soups: Merging LoRAs for Practical Skill Composition TasksCode1
M1: Towards Scalable Test-Time Compute with Mamba Reasoning ModelsCode1
A Neural Network Solves, Explains, and Generates University Math Problems by Program Synthesis and Few-Shot Learning at Human LevelCode1
CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language ModelsCode1
Advancing Multimodal Reasoning via Reinforcement Learning with Cold StartCode1
OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization ModelingCode1
Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning CapabilityCode1
Benchmarking Large Language Models for Persian: A Preliminary Study Focusing on ChatGPTCode1
MathChat: Converse to Tackle Challenging Math Problems with LLM AgentsCode1
Let's Verify Math Questions Step by StepCode1
BEATS: Optimizing LLM Mathematical Capabilities with BackVerify and Adaptive Disambiguate based Efficient Tree SearchCode1
Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant EvaluationCode1
An Early Evaluation of GPT-4V(ision)Code1
Learning to Reason Deductively: Math Word Problem Solving as Complex Relation ExtractionCode1
LEVER: Learning to Verify Language-to-Code Generation with ExecutionCode1
MathBERT: A Pre-trained Language Model for General NLP Tasks in Mathematics EducationCode1
LASeR: Learning to Adaptively Select Reward Models with Multi-Armed BanditsCode1
CoT-based Synthesizer: Enhancing LLM Performance through Answer SynthesisCode1
Learning by Fixing: Solving Math Word Problems with Weak SupervisionCode1
Show:102550
← PrevPage 6 of 32Next →

No leaderboard results yet.