SOTAVerified

Math

Papers

Showing 501550 of 1596 papers

TitleStatusHype
Explaining Datasets in Words: Statistical Models with Natural Language ParametersCode1
Reasoning with Reinforced Functional Token TuningCode1
Forgotten Polygons: Multimodal Large Language Models are Shape-BlindCode1
FormulaNet: A Benchmark Dataset for Mathematical Formula DetectionCode1
Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image ModelsCode1
RetICL: Sequential Retrieval of In-Context Examples with Reinforcement LearningCode1
RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-FoldCode1
SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step ReasoningCode1
SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language ModelsCode1
Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for CompressionCode1
U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMsCode1
EXAONE Deep: Reasoning Enhanced Language ModelsCode1
Can LLMs Solve longer Math Word Problems Better?Code0
A quantitative study of NLP approaches to question difficulty estimationCode0
Evaluating Token-Level and Passage-Level Dense Retrieval Models for Math Information RetrievalCode0
Can LLMs Reason in the Wild with Programs?Code0
A Probabilistic Model for Node Classification in Directed GraphsCode0
Can LLMs Master Math? Investigating Large Language Models on Math Stack ExchangeCode0
Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling EvaluatorsCode0
Evaluating and Optimizing Educational Content with Large Language Model JudgmentsCode0
Can Large Language Models Replicate ITS Feedback on Open-Ended Math Questions?Code0
ReCUT: Balancing Reasoning Length and Accuracy in LLMs via Stepwise Trails and Preference OptimizationCode0
A Goal-Driven Tree-Structured Neural Model for Math Word ProblemsCode0
Reasoning Graph Enhanced Exemplars Retrieval for In-Context LearningCode0
Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical SupervisionCode0
EPT-X: An Expression-Pointer Transformer model that generates eXplanations for numbersCode0
EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action PruningCode0
Reasoning in Large Language Models Through Symbolic Math Word ProblemsCode0
Enumerate-Conjecture-Prove: Formally Solving Answer-Construction Problems in Math CompetitionsCode0
AALC: Large Language Model Efficient Reasoning via Adaptive Accuracy-Length ControlCode0
PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt TuningCode0
RESOLVE: Relational Reasoning with Symbolic and Object-Level Features Using Vector Symbolic ProcessingCode0
Enhancing the Transformer with Explicit Relational Encoding for Math Problem SolvingCode0
Enhancing Textbooks with Visuals from the Web for Improved LearningCode0
Practice Makes a Solver Perfect: Data Augmentation for Math Word Problem SolversCode0
AgentSwift: Efficient LLM Agent Design via Value-guided Hierarchical SearchCode0
Personalized Exercise Recommendation with Semantically-Grounded Knowledge TracingCode0
OntoMath^PRO Ontology: A Linked Data Hub for MathematicsCode0
Brain-Inspired Two-Stage Approach: Enhancing Mathematical Reasoning by Imitating Human Thought ProcessesCode0
One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning TasksCode0
Bounds on Multi-asset Derivatives via Neural NetworksCode0
Efficient Non-Parametric Optimizer Search for Diverse TasksCode0
NUMCoT: Numerals and Units of Measurement in Chain-of-Thought Reasoning using Large Language ModelsCode0
Prover-Verifier Games improve legibility of LLM outputsCode0
Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical ReasoningCode0
Effects of structure on reasoning in instance-level Self-DiscoverCode0
Effective Skill Unlearning through Intervention and AbstentionCode0
Neural Machine Translation and Sequence-to-sequence Models: A TutorialCode0
Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay PerspectiveCode0
DyRRen: A Dynamic Retriever-Reranker-Generator Model for Numerical Reasoning over Tabular and Textual DataCode0
Show:102550
← PrevPage 11 of 32Next →

No leaderboard results yet.