SOTAVerified|Agents Browse Leaderboard About

Math

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 726–750 of 1596 papers

Title	Date	Tasks	Status	Hype
A Perspective on Large Language Models, Intelligent Machines, and Knowledge Acquisition	Aug 13, 2024	Common Sense ReasoningMath	—Unverified	0
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers	Aug 12, 2024	GSM8KMath	CodeCode Available	4
P3: A Policy-Driven, Pace-Adaptive, and Diversity-Promoted Framework for data pruning in LLM Training	Aug 10, 2024	DiversityLogical Reasoning	—Unverified	0
Examining the Behavior of LLM Architectures Within the Framework of Standardized National Exams in Brazil	Aug 9, 2024	MathMultiple-choice	—Unverified	0
Mathfish: Evaluating Language Model Math Reasoning via Grounding in Educational Curricula	Aug 8, 2024	GSM8KLanguage Modeling	CodeCode Available	1
AltCanvas: A Tile-Based Image Editor with Generative AI for Blind or Visually Impaired People	Aug 5, 2024	Math	—Unverified	0
The Logic of Political Survival Revisited: Consequences of Elite Uncertainty Under Authoritarian Rule	Aug 4, 2024	Math	—Unverified	0
On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty Agents	Aug 2, 2024	Code GenerationLarge Language Model	CodeCode Available	1
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities	Aug 1, 2024	MathMM-Vet	CodeCode Available	3
Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models	Aug 1, 2024	Math	CodeCode Available	2
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling	Jul 31, 2024	GSM8KMath	CodeCode Available	3
AI-Assisted Generation of Difficult Math Questions	Jul 30, 2024	MathMathematical Reasoning	CodeCode Available	0
Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process	Jul 29, 2024	GSM8KMath	CodeCode Available	2
Towards Effective and Efficient Continual Pre-training of Large Language Models	Jul 26, 2024	Math	CodeCode Available	0
Recursive Introspection: Teaching Language Model Agents How to Self-Improve	Jul 25, 2024	Imitation LearningLanguage Modeling	—Unverified	0
Boosting Large Language Models with Socratic Method for Conversational Mathematics Teaching	Jul 24, 2024	Math	CodeCode Available	1
MathViz-E: A Case-study in Domain-Specialized Tool-Using Agents	Jul 24, 2024	Math	CodeCode Available	1
LEAN-GitHub: Compiling GitHub LEAN repositories for a versatile LEAN prover	Jul 24, 2024	Automated Theorem ProvingMath	CodeCode Available	4
Nerva: a Truly Sparse Implementation of Neural Networks	Jul 24, 2024	Math	CodeCode Available	1
TaskGen: A Task-Based, Memory-Infused Agentic Framework using StrictJSON	Jul 22, 2024	Language ModelingLanguage Modelling	CodeCode Available	3
Toward Adaptive Reasoning in Large Language Models with Thought Rollback	Jul 21, 2024	Arithmetic ReasoningMath	CodeCode Available	1
Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data	Jul 20, 2024	Language ModellingMachine Translation	—Unverified	0
Learning Goal-Conditioned Representations for Language Reward Models	Jul 18, 2024	GSM8KMath	CodeCode Available	1
Weak-to-Strong Reasoning	Jul 18, 2024	GSM8KMath	CodeCode Available	2
Prover-Verifier Games improve legibility of LLM outputs	Jul 18, 2024	Math	CodeCode Available	0

Show:10 25 50

← PrevPage 30 of 64Next →

No leaderboard results yet.