SOTAVerified

Math

Papers

Showing 776800 of 1596 papers

TitleStatusHype
DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice QuestionsCode0
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMsCode3
MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math DataCode2
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language ModelsCode2
Task Oriented In-Domain Data Augmentation0
Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMsCode1
Generative AI for Enhancing Active Learning in Education: A Comparative Study of GPT-3.5 and GPT-4 in Crafting Customized Test Questions0
RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-FoldCode1
LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language FeedbackCode1
Towards Infinite-Long Prefix in TransformerCode0
CityGPT: Empowering Urban Spatial Cognition of Large Language ModelsCode1
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning0
Adaptable Logical Control for Large Language ModelsCode2
Knowledge Tagging System on Math Questions via LLMs with Flexible Demonstration Retriever0
Can LLMs Reason in the Wild with Programs?Code0
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-SolvingCode2
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All ToolsCode14
Navigating the Labyrinth: Evaluating and Enhancing LLMs' Ability to Reason About Search Problems0
Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive PrinciplesCode1
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts0
DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based SamplingCode1
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code IntelligenceCode9
GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image GenerationCode0
Program Synthesis Benchmark for Visual Programming in XLogoOnline Environment0
Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning0
Show:102550
← PrevPage 32 of 64Next →

No leaderboard results yet.