SOTAVerified

Math

Papers

Showing 351400 of 1596 papers

TitleStatusHype
From Zero to Hero: Convincing with Extremely Complicated MathCode1
Nerva: a Truly Sparse Implementation of Neural NetworksCode1
Fine-Tuning Large Language Models on Quantum Optimization Problems for Circuit GenerationCode1
A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource SettingsCode1
MWPToolkit: An Open-Source Framework for Deep Learning-Based Math Word Problem SolversCode1
Conic10K: A Challenging Math Problem Understanding and Reasoning DatasetCode1
Natural Language Embedded Programs for Hybrid Language Symbolic ReasoningCode1
Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM ReasoningCode1
FELM: Benchmarking Factuality Evaluation of Large Language ModelsCode1
A Diverse Corpus for Evaluating and Developing English Math Word Problem SolversCode1
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle SolvingCode1
Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive PrinciplesCode1
A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language ModelsCode1
Forgotten Polygons: Multimodal Large Language Models are Shape-BlindCode1
JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis ModelsCode1
Plan, Verify and Switch: Integrated Reasoning with Diverse X-of-ThoughtsCode1
NeMo-Inspector: A Visualization Tool for LLM Generation AnalysisCode1
Non-Autoregressive Math Word Problem Solver with Unified Tree StructureCode1
EXAONE Deep: Reasoning Enhanced Language ModelsCode1
Is ChatGPT a Good Teacher Coach? Measuring Zero-Shot Performance For Scoring and Providing Actionable Insights on Classroom InstructionCode1
ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language ModelsCode1
Explaining Datasets in Words: Statistical Models with Natural Language ParametersCode1
Multiple-Choice Questions are Efficient and Robust LLM EvaluatorsCode1
Evolving Prompts In-Context: An Open-ended, Self-replicating PerspectiveCode1
Modeling Complex Mathematical Reasoning via Large Language Model based MathAgentCode1
ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World ChallengesCode1
MultiMath: Bridging Visual and Mathematical Reasoning for Large Language ModelsCode1
A Relation Spectrum Inheriting Taylor Series: Muscle Synergy and Coupling for HandCode1
A Symbolic Character-Aware Model for Solving Geometry ProblemsCode1
Evaluating and Improving Tool-Augmented Computation-Intensive Math ReasoningCode1
Mathfish: Evaluating Language Model Math Reasoning via Grounding in Educational CurriculaCode1
EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability TreesCode1
CoMAT: Chain of Mathematically Annotated Thought Improves Mathematical ReasoningCode1
Collective Constitutional AI: Aligning a Language Model with Public InputCode1
A Categorical Archive of ChatGPT FailuresCode1
MUSTARD: Mastering Uniform Synthesis of Theorem and Proof DataCode1
Entropy-Regularized Process Reward ModelCode1
Memory-Efficient and Secure DNN Inference on TrustZone-enabled Consumer IoT DevicesCode1
Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation CapabilitiesCode1
MedCaseReasoning: Evaluating and learning diagnostic reasoning from clinical case reportsCode1
Entropy-Based Adaptive Weighting for Self-TrainingCode1
Enhancing Cross-Tokenizer Knowledge Distillation with Contextual Dynamical MappingCode1
Language Models Encode the Value of Numbers LinearlyCode1
Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context LearningCode1
Eliminating Position Bias of Language Models: A Mechanistic ApproachCode1
Large Language Models Can Be Easily Distracted by Irrelevant ContextCode1
Large (Vision) Language Models are Unsupervised In-Context LearnersCode1
Discovering Mathematical Objects of Interest -- A Study of Mathematical NotationsCode1
Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step ReasoningCode1
Efficient Reasoning for LLMs through Speculative Chain-of-ThoughtCode1
Show:102550
← PrevPage 8 of 32Next →

No leaderboard results yet.