SOTAVerified

Math

Papers

Showing 901950 of 1596 papers

TitleStatusHype
Automate Knowledge Concept Tagging on Math Questions with LLMs0
To Err is Machine: Vulnerability Detection Challenges LLM Reasoning0
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?0
A Chain-of-Thought Prompting Approach with LLMs for Evaluating Students' Formative Assessment Responses in Science0
From Large to Tiny: Distilling and Refining Mathematical Expertise for Math Word Problems with Weakly Supervision0
PARAMANU-AYN: Pretrain from scratch or Continual Pretraining of LLMs for Legal Domain Adaptation?0
Evolutionary Optimization of Model Merging RecipesCode5
Memory-Efficient and Secure DNN Inference on TrustZone-enabled Consumer IoT DevicesCode1
Instructing Large Language Models to Identify and Ignore Irrelevant ConditionsCode0
What Makes Math Word Problems Challenging for LLMs?Code0
An upper bound of the mutation probability in the genetic algorithm for general 0-1 knapsack problem0
Incorporating Graph Attention Mechanism into Geometric Problem Solving Based on Deep Reinforcement LearningCode0
Hydrodynamics of Markets:Hidden Links Between Physics and Finance0
Self-Consistency Boosts Calibration for Math Reasoning0
Sabiá-2: A New Generation of Portuguese Large Language Models0
Easy-to-Hard Generalization: Scalable Alignment Beyond Human SupervisionCode2
The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?Code1
Laying the Foundation First? Investigating the Generalization from Atomic Skills to Complex Reasoning Tasks0
Mastering Text, Code and Math Simultaneously via Fusing Highly Specialized Language Models0
FineMath: A Fine-Grained Mathematical Evaluation Benchmark for Chinese Large Language Models0
SmallToLarge (S2L): Scalable Data Selection for Fine-tuning Large Language Models by Summarizing Training Trajectories of Small ModelsCode0
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM0
Common 7B Language Models Already Possess Strong Math CapabilitiesCode5
Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word ProblemCode0
MathScale: Scaling Instruction Tuning for Mathematical ReasoningCode0
Evaluating and Optimizing Educational Content with Large Language Model JudgmentsCode0
Experimenting with Generative AI: Does ChatGPT Really Increase Everyone's Productivity?0
The Claude 3 Model Family: Opus, Sonnet, Haiku0
Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training0
Brilla AI: AI Contestant for the National Science and Maths QuizCode1
Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language ModelsCode1
Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning0
Improving the Validity of Automatically Generated Feedback via Reinforcement LearningCode1
ClickTree: A Tree-based Method for Predicting Math Students' Performance Based on Clickstream Data0
Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning GapCode2
PRSA: Prompt Stealing Attacks against Real-World Prompt Services0
GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem SolversCode2
StarCoder 2 and The Stack v2: The Next GenerationCode7
Data Interpreter: An LLM Agent For Data Science0
Adversarial Math Word Problem GenerationCode0
MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical ReasoningCode0
Case-Based or Rule-Based: How Do Transformers Do the Math?Code1
MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs0
Stepwise Self-Consistent Mathematical Reasoning with Large Language ModelsCode1
How Do Humans Write Code? Large Models Do It the Same Way TooCode0
MATHWELL: Generating Educational Math Word Problems Using Teacher AnnotationsCode1
Brain-Inspired Two-Stage Approach: Enhancing Mathematical Reasoning by Imitating Human Thought ProcessesCode0
ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language ModelsCode1
Measuring Multimodal Mathematical Reasoning with MATH-Vision DatasetCode2
MoELoRA: Contrastive Learning Guided Mixture of Experts on Parameter-Efficient Fine-Tuning for Large Language Models0
Show:102550
← PrevPage 19 of 32Next →

No leaderboard results yet.