SOTAVerified

Math

Papers

Showing 351375 of 1596 papers

TitleStatusHype
Nerva: a Truly Sparse Implementation of Neural NetworksCode1
MathViz-E: A Case-study in Domain-Specialized Tool-Using AgentsCode1
Toward Adaptive Reasoning in Large Language Models with Thought RollbackCode1
Learning Goal-Conditioned Representations for Language Reward ModelsCode1
TurkishMMLU: Measuring Massive Multitask Language Understanding in TurkishCode1
OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization ModelingCode1
AutoBencher: Creating Salient, Novel, Difficult Datasets for Language ModelsCode1
DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical ReasoningCode1
Eliminating Position Bias of Language Models: A Mechanistic ApproachCode1
Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical ReasoningCode1
Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMsCode1
LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language FeedbackCode1
RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-FoldCode1
CityGPT: Empowering Urban Spatial Cognition of Large Language ModelsCode1
Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive PrinciplesCode1
DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based SamplingCode1
Collective Constitutional AI: Aligning a Language Model with Public InputCode1
DICE: Detecting In-distribution Contamination in LLM's Fine-tuning Phase for Math ReasoningCode1
TAIA: Large Language Models are Out-of-Distribution Data LearnersCode1
MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn InteractionsCode1
JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis ModelsCode1
Multiple-Choice Questions are Efficient and Robust LLM EvaluatorsCode1
TANQ: An open domain dataset of table answered questionsCode1
VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual ContextCode1
GOLD: Geometry Problem Solver with Natural Language DescriptionCode1
Show:102550
← PrevPage 15 of 64Next →

No leaderboard results yet.