SOTAVerified

Math

Papers

Showing 401450 of 1596 papers

TitleStatusHype
Benchmarking Large Language Models for Persian: A Preliminary Study Focusing on ChatGPTCode1
Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image ModelsCode1
PECC: Problem Extraction and Coding ChallengesCode1
GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical ReasoningCode1
Pretrained Language Models are Symbolic Mathematics Solvers too!Code1
Diversify and Conquer: Diversity-Centric Data Selection with Iterative RefinementCode1
A Symbolic Character-Aware Model for Solving Geometry ProblemsCode1
Pairwise RM: Perform Best-of-N Sampling with Knockout TournamentCode1
From Zero to Hero: Convincing with Extremely Complicated MathCode1
From GAN to WGANCode1
A Neural Network Solves, Explains, and Generates University Math Problems by Program Synthesis and Few-Shot Learning at Human LevelCode1
LLMThinkBench: Towards Basic Math Reasoning and Overthinking in Large Language ModelsCode1
Get an A in Math: Progressive Rectification PromptingCode1
DocMath-Eval: Evaluating Math Reasoning Capabilities of LLMs in Understanding Long and Specialized DocumentsCode1
CoMAT: Chain of Mathematically Annotated Thought Improves Mathematical ReasoningCode1
Collective Constitutional AI: Aligning a Language Model with Public InputCode1
A Categorical Archive of ChatGPT FailuresCode1
Over-Reasoning and Redundant Calculation of Large Language ModelsCode1
Problem-Oriented Segmentation and Retrieval: Case Study on Tutoring ConversationsCode1
Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMsCode1
M1: Towards Scalable Test-Time Compute with Mamba Reasoning ModelsCode1
Fine-Tuning Large Language Models on Quantum Optimization Problems for Circuit GenerationCode1
Forgotten Polygons: Multimodal Large Language Models are Shape-BlindCode1
Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation CapabilitiesCode1
Is Bigger and Deeper Always Better? Probing LLaMA Across Scales and LayersCode1
DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical ReasoningCode1
OJBench: A Competition Level Code Benchmark For Large Language ModelsCode1
FELM: Benchmarking Factuality Evaluation of Large Language ModelsCode1
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle SolvingCode1
FormulaNet: A Benchmark Dataset for Mathematical Formula DetectionCode1
Expression Syntax Information Bottleneck for Math Word ProblemsCode1
Explaining Datasets in Words: Statistical Models with Natural Language ParametersCode1
EXAONE Deep: Reasoning Enhanced Language ModelsCode1
Evolving Prompts In-Context: An Open-ended, Self-replicating PerspectiveCode1
CLEVR-Math: A Dataset for Compositional Language, Visual and Mathematical ReasoningCode1
Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical ReasoningCode1
BlenderGym: Benchmarking Foundational Model Systems for Graphics EditingCode1
Non-Autoregressive Math Word Problem Solver with Unified Tree StructureCode1
NeMo-Inspector: A Visualization Tool for LLM Generation AnalysisCode1
Mathfish: Evaluating Language Model Math Reasoning via Grounding in Educational CurriculaCode1
Nerva: a Truly Sparse Implementation of Neural NetworksCode1
Aioli: A Unified Optimization Framework for Language Model Data MixingCode1
Natural Language Embedded Programs for Hybrid Language Symbolic ReasoningCode1
Neural-Symbolic Solver for Math Word Problems with Auxiliary TasksCode1
CityGPT: Empowering Urban Spatial Cognition of Large Language ModelsCode1
Mathematical Capabilities of ChatGPTCode1
On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty AgentsCode1
EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability TreesCode1
Evaluating and Improving Tool-Augmented Computation-Intensive Math ReasoningCode1
MWPToolkit: An Open-Source Framework for Deep Learning-Based Math Word Problem SolversCode1
Show:102550
← PrevPage 9 of 32Next →

No leaderboard results yet.