SOTAVerified

Math

Papers

Showing 901925 of 1596 papers

TitleStatusHype
Automate Knowledge Concept Tagging on Math Questions with LLMs0
To Err is Machine: Vulnerability Detection Challenges LLM Reasoning0
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?0
A Chain-of-Thought Prompting Approach with LLMs for Evaluating Students' Formative Assessment Responses in Science0
From Large to Tiny: Distilling and Refining Mathematical Expertise for Math Word Problems with Weakly Supervision0
PARAMANU-AYN: Pretrain from scratch or Continual Pretraining of LLMs for Legal Domain Adaptation?0
Evolutionary Optimization of Model Merging RecipesCode5
Memory-Efficient and Secure DNN Inference on TrustZone-enabled Consumer IoT DevicesCode1
Instructing Large Language Models to Identify and Ignore Irrelevant ConditionsCode0
What Makes Math Word Problems Challenging for LLMs?Code0
An upper bound of the mutation probability in the genetic algorithm for general 0-1 knapsack problem0
Incorporating Graph Attention Mechanism into Geometric Problem Solving Based on Deep Reinforcement LearningCode0
Hydrodynamics of Markets:Hidden Links Between Physics and Finance0
Self-Consistency Boosts Calibration for Math Reasoning0
Sabiá-2: A New Generation of Portuguese Large Language Models0
Easy-to-Hard Generalization: Scalable Alignment Beyond Human SupervisionCode2
The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?Code1
Laying the Foundation First? Investigating the Generalization from Atomic Skills to Complex Reasoning Tasks0
Mastering Text, Code and Math Simultaneously via Fusing Highly Specialized Language Models0
FineMath: A Fine-Grained Mathematical Evaluation Benchmark for Chinese Large Language Models0
SmallToLarge (S2L): Scalable Data Selection for Fine-tuning Large Language Models by Summarizing Training Trajectories of Small ModelsCode0
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM0
Common 7B Language Models Already Possess Strong Math CapabilitiesCode5
Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word ProblemCode0
MathScale: Scaling Instruction Tuning for Mathematical ReasoningCode0
Show:102550
← PrevPage 37 of 64Next →

No leaderboard results yet.