| QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning | Aug 20, 2024 | BenchmarkingLanguage Modelling | —Unverified | 0 |
| Benchmarking Large Language Models for Math Reasoning Tasks | Aug 20, 2024 | BenchmarkingIn-Context Learning | CodeCode Available | 0 |
| A Study of PHOC Spatial Region Configurations for Math Formula Retrieval | Aug 17, 2024 | MathRetrieval | —Unverified | 0 |
| Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions | Aug 16, 2024 | DescriptiveHallucination | —Unverified | 0 |
| Does Reasoning Emerge? Examining the Probabilities of Causation in Large Language Models | Aug 15, 2024 | Math | —Unverified | 0 |
| Leveraging Web-Crawled Data for High-Quality Fine-Tuning | Aug 15, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark | Aug 14, 2024 | MathMathematical Reasoning | CodeCode Available | 0 |
| A Perspective on Large Language Models, Intelligent Machines, and Knowledge Acquisition | Aug 13, 2024 | Common Sense ReasoningMath | —Unverified | 0 |
| P3: A Policy-Driven, Pace-Adaptive, and Diversity-Promoted Framework for data pruning in LLM Training | Aug 10, 2024 | DiversityLogical Reasoning | —Unverified | 0 |
| Examining the Behavior of LLM Architectures Within the Framework of Standardized National Exams in Brazil | Aug 9, 2024 | MathMultiple-choice | —Unverified | 0 |
| AltCanvas: A Tile-Based Image Editor with Generative AI for Blind or Visually Impaired People | Aug 5, 2024 | Math | —Unverified | 0 |
| The Logic of Political Survival Revisited: Consequences of Elite Uncertainty Under Authoritarian Rule | Aug 4, 2024 | Math | —Unverified | 0 |
| AI-Assisted Generation of Difficult Math Questions | Jul 30, 2024 | MathMathematical Reasoning | CodeCode Available | 0 |
| Towards Effective and Efficient Continual Pre-training of Large Language Models | Jul 26, 2024 | Math | CodeCode Available | 0 |
| Recursive Introspection: Teaching Language Model Agents How to Self-Improve | Jul 25, 2024 | Imitation LearningLanguage Modeling | —Unverified | 0 |
| Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data | Jul 20, 2024 | Language ModellingMachine Translation | —Unverified | 0 |
| Prover-Verifier Games improve legibility of LLM outputs | Jul 18, 2024 | Math | CodeCode Available | 0 |
| A LLM Benchmark based on the Minecraft Builder Dialog Agent Task | Jul 17, 2024 | MathMinecraft | —Unverified | 0 |
| CCoE: A Compact LLM with Collaboration of Experts | Jul 16, 2024 | Language ModellingLarge Language Model | —Unverified | 0 |
| Reasoning with Large Language Models, a Survey | Jul 16, 2024 | Few-Shot LearningIn-Context Learning | —Unverified | 0 |
| Token-Supervised Value Models for Enhancing Mathematical Reasoning Capabilities of Large Language Models | Jul 12, 2024 | GSM8KMath | —Unverified | 0 |
| TelecomGPT: A Framework to Build Telecom-Specfic Large Language Models | Jul 12, 2024 | Code GenerationMath | —Unverified | 0 |
| Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors | Jul 12, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On | Jul 11, 2024 | GSM8KMath | —Unverified | 0 |
| Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist | Jul 11, 2024 | GSM8KMath | —Unverified | 0 |