SOTAVerified

Math

Papers

Showing 10011050 of 1596 papers

TitleStatusHype
QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning0
Benchmarking Large Language Models for Math Reasoning TasksCode0
A Study of PHOC Spatial Region Configurations for Math Formula Retrieval0
Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions0
Does Reasoning Emerge? Examining the Probabilities of Causation in Large Language Models0
Leveraging Web-Crawled Data for High-Quality Fine-TuningCode0
MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical BenchmarkCode0
A Perspective on Large Language Models, Intelligent Machines, and Knowledge Acquisition0
P3: A Policy-Driven, Pace-Adaptive, and Diversity-Promoted Framework for data pruning in LLM Training0
Examining the Behavior of LLM Architectures Within the Framework of Standardized National Exams in Brazil0
AltCanvas: A Tile-Based Image Editor with Generative AI for Blind or Visually Impaired People0
The Logic of Political Survival Revisited: Consequences of Elite Uncertainty Under Authoritarian Rule0
AI-Assisted Generation of Difficult Math QuestionsCode0
Towards Effective and Efficient Continual Pre-training of Large Language ModelsCode0
Recursive Introspection: Teaching Language Model Agents How to Self-Improve0
Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data0
Prover-Verifier Games improve legibility of LLM outputsCode0
A LLM Benchmark based on the Minecraft Builder Dialog Agent Task0
CCoE: A Compact LLM with Collaboration of Experts0
Reasoning with Large Language Models, a Survey0
Token-Supervised Value Models for Enhancing Mathematical Reasoning Capabilities of Large Language Models0
TelecomGPT: A Framework to Build Telecom-Specfic Large Language Models0
Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model TutorsCode0
Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On0
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist0
ConvNLP: Image-based AI Text Detection0
Who is better at math, Jenny or Jingzhen? Uncovering Stereotypes in Large Language ModelsCode0
Solving for X and Beyond: Can Large Language Models Solve Complex Math Problems with More-Than-Two Unknowns?Code0
Smart Vision-Language ReasonersCode0
Helpful assistant or fruitful facilitator? Investigating how personas affect language model behaviorCode0
Advancing Process Verification for Large Language Models via Tree-Based Preference Learning0
CMMaTH: A Chinese Multi-modal Math Skill Evaluation Benchmark for Foundation Models0
ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting0
DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice QuestionsCode0
Task Oriented In-Domain Data Augmentation0
Generative AI for Enhancing Active Learning in Education: A Comparative Study of GPT-3.5 and GPT-4 in Crafting Customized Test Questions0
Towards Infinite-Long Prefix in TransformerCode0
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning0
Can LLMs Reason in the Wild with Programs?Code0
Knowledge Tagging System on Math Questions via LLMs with Flexible Demonstration Retriever0
Navigating the Labyrinth: Evaluating and Enhancing LLMs' Ability to Reason About Search Problems0
GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image GenerationCode0
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts0
Program Synthesis Benchmark for Visual Programming in XLogoOnline Environment0
Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning0
ReMI: A Dataset for Reasoning with Multiple Images0
CLST: Cold-Start Mitigation in Knowledge Tracing by Aligning a Generative Language Model as a Students' Knowledge Tracer0
Can I understand what I create? Self-Knowledge Evaluation of Large Language Models0
Human Learning about AI0
A multi-core periphery perspective: Ranking via relative centrality0
Show:102550
← PrevPage 21 of 32Next →

No leaderboard results yet.