SOTAVerified

Math

Papers

Showing 726750 of 1596 papers

TitleStatusHype
A Perspective on Large Language Models, Intelligent Machines, and Knowledge Acquisition0
Mutual Reasoning Makes Smaller LLMs Stronger Problem-SolversCode4
P3: A Policy-Driven, Pace-Adaptive, and Diversity-Promoted Framework for data pruning in LLM Training0
Examining the Behavior of LLM Architectures Within the Framework of Standardized National Exams in Brazil0
Mathfish: Evaluating Language Model Math Reasoning via Grounding in Educational CurriculaCode1
AltCanvas: A Tile-Based Image Editor with Generative AI for Blind or Visually Impaired People0
The Logic of Political Survival Revisited: Consequences of Elite Uncertainty Under Authoritarian Rule0
On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty AgentsCode1
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated CapabilitiesCode3
Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language ModelsCode2
Large Language Monkeys: Scaling Inference Compute with Repeated SamplingCode3
AI-Assisted Generation of Difficult Math QuestionsCode0
Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning ProcessCode2
Towards Effective and Efficient Continual Pre-training of Large Language ModelsCode0
Recursive Introspection: Teaching Language Model Agents How to Self-Improve0
Boosting Large Language Models with Socratic Method for Conversational Mathematics TeachingCode1
MathViz-E: A Case-study in Domain-Specialized Tool-Using AgentsCode1
LEAN-GitHub: Compiling GitHub LEAN repositories for a versatile LEAN proverCode4
Nerva: a Truly Sparse Implementation of Neural NetworksCode1
TaskGen: A Task-Based, Memory-Infused Agentic Framework using StrictJSONCode3
Toward Adaptive Reasoning in Large Language Models with Thought RollbackCode1
Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data0
Learning Goal-Conditioned Representations for Language Reward ModelsCode1
Weak-to-Strong ReasoningCode2
Prover-Verifier Games improve legibility of LLM outputsCode0
Show:102550
← PrevPage 30 of 64Next →

No leaderboard results yet.