SOTAVerified

Math

Papers

Showing 14511475 of 1596 papers

TitleStatusHype
Evaluating Token-Level and Passage-Level Dense Retrieval Models for Math Information RetrievalCode0
MAMUT: A Novel Framework for Modifying Mathematical Formulas for the Generation of Specialized Datasets for Language Model TrainingCode0
Hard Negative Contrastive Learning for Fine-Grained Geometric Understanding in Large Multimodal ModelsCode0
Effects of structure on reasoning in instance-level Self-DiscoverCode0
Mapping to Declarative Knowledge for Word Problem SolvingCode0
NUMCoT: Numerals and Units of Measurement in Chain-of-Thought Reasoning using Large Language ModelsCode0
MARGE: Improving Math Reasoning for LLMs with Guided ExplorationCode0
Helpful assistant or fruitful facilitator? Investigating how personas affect language model behaviorCode0
Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling EvaluatorsCode0
Automated Knowledge Concept Annotation and Question Representation Learning for Knowledge TracingCode0
Efficient Non-Parametric Optimizer Search for Diverse TasksCode0
Heteroclinic cycling and extinction in May-Leonard models with demographic stochasticityCode0
Deterministic and Nondeterministic Particle Motion with Interaction MechanismsCode0
ArithmAttack: Evaluating Robustness of LLMs to Noisy Context in Math Problem SolvingCode0
LM^2: A Simple Society of Language Models Solves Complex ReasoningCode0
AALC: Large Language Model Efficient Reasoning via Adaptive Accuracy-Length ControlCode0
Textual Enhanced Contrastive Learning for Solving Math Word ProblemsCode0
ReCUT: Balancing Reasoning Length and Accuracy in LLMs via Stepwise Trails and Preference OptimizationCode0
How Do Humans Write Code? Large Models Do It the Same Way TooCode0
Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration PitfallsCode0
How Is LLM Reasoning Distracted by Irrelevant Context? An Analysis Using a Controlled BenchmarkCode0
How Should We Enhance the Safety of Large Reasoning Models: An Empirical StudyCode0
World Models for Math Story ProblemsCode0
One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning TasksCode0
ChatBench: From Static Benchmarks to Human-AI EvaluationCode0
Show:102550
← PrevPage 59 of 64Next →

No leaderboard results yet.