SOTAVerified

Math

Papers

Showing 301350 of 1596 papers

TitleStatusHype
Multiple-Choice Questions are Efficient and Robust LLM EvaluatorsCode1
Graph-to-Tree Learning for Solving Math Word ProblemsCode1
MultiMath: Bridging Visual and Mathematical Reasoning for Large Language ModelsCode1
GOLD: Geometry Problem Solver with Natural Language DescriptionCode1
Modeling Complex Mathematical Reasoning via Large Language Model based MathAgentCode1
Automatic Generation of Socratic Subquestions for Teaching Math Word ProblemsCode1
GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical ReasoningCode1
GeoEval: Benchmark for Evaluating LLMs and Multi-Modal Models on Geometry Problem-SolvingCode1
A Diverse Corpus for Evaluating and Developing English Math Word Problem SolversCode1
Get an A in Math: Progressive Rectification PromptingCode1
Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image ModelsCode1
Mining Mathematical Documents for Question Answering via Unsupervised Formula LabelingCode1
ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World ChallengesCode1
A Relation Spectrum Inheriting Taylor Series: Muscle Synergy and Coupling for HandCode1
Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning CapabilityCode1
Memory-Efficient and Secure DNN Inference on TrustZone-enabled Consumer IoT DevicesCode1
CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language ModelsCode1
From GAN to WGANCode1
From Zero to Hero: Convincing with Extremely Complicated MathCode1
Autoformalize Mathematical Statements by Symbolic Equivalence and Semantic ConsistencyCode1
Forgotten Polygons: Multimodal Large Language Models are Shape-BlindCode1
MedCaseReasoning: Evaluating and learning diagnostic reasoning from clinical case reportsCode1
Fine-Tuning Large Language Models on Quantum Optimization Problems for Circuit GenerationCode1
Augmenting Math Word Problems via Iterative Question ComposingCode1
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual DependencyCode1
FormulaNet: A Benchmark Dataset for Mathematical Formula DetectionCode1
Control LLM: Controlled Evolution for Intelligence Retention in LLMCode1
Math Word Problem Solving with Explicit Numerical ValuesCode1
MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language ModelsCode1
Explaining Datasets in Words: Statistical Models with Natural Language ParametersCode1
A Tree-Structured Decoder for Image-to-Markup GenerationCode1
Evolving Prompts In-Context: An Open-ended, Self-replicating PerspectiveCode1
EXAONE Deep: Reasoning Enhanced Language ModelsCode1
Expression Syntax Information Bottleneck for Math Word ProblemsCode1
MATHWELL: Generating Educational Math Word Problems Using Teacher AnnotationsCode1
Measuring Conversational Uptake: A Case Study on Student-Teacher InteractionsCode1
AutoBencher: Creating Salient, Novel, Difficult Datasets for Language ModelsCode1
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle SolvingCode1
Mathfish: Evaluating Language Model Math Reasoning via Grounding in Educational CurriculaCode1
CoT-based Synthesizer: Enhancing LLM Performance through Answer SynthesisCode1
A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource SettingsCode1
Conic10K: A Challenging Math Problem Understanding and Reasoning DatasetCode1
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human AnnotationsCode1
Evaluating and Improving Tool-Augmented Computation-Intensive Math ReasoningCode1
EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability TreesCode1
FELM: Benchmarking Factuality Evaluation of Large Language ModelsCode1
A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language ModelsCode1
MathViz-E: A Case-study in Domain-Specialized Tool-Using AgentsCode1
MUSTARD: Mastering Uniform Synthesis of Theorem and Proof DataCode1
Entropy-Based Adaptive Weighting for Self-TrainingCode1
Show:102550
← PrevPage 7 of 32Next →

No leaderboard results yet.