| AltCanvas: A Tile-Based Image Editor with Generative AI for Blind or Visually Impaired People | Aug 5, 2024 | Math | —Unverified | 0 |
| The Logic of Political Survival Revisited: Consequences of Elite Uncertainty Under Authoritarian Rule | Aug 4, 2024 | Math | —Unverified | 0 |
| On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty Agents | Aug 2, 2024 | Code GenerationLarge Language Model | CodeCode Available | 1 |
| MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities | Aug 1, 2024 | MathMM-Vet | CodeCode Available | 3 |
| Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models | Aug 1, 2024 | Math | CodeCode Available | 2 |
| Large Language Monkeys: Scaling Inference Compute with Repeated Sampling | Jul 31, 2024 | GSM8KMath | CodeCode Available | 3 |
| AI-Assisted Generation of Difficult Math Questions | Jul 30, 2024 | MathMathematical Reasoning | CodeCode Available | 0 |
| Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process | Jul 29, 2024 | GSM8KMath | CodeCode Available | 2 |
| Towards Effective and Efficient Continual Pre-training of Large Language Models | Jul 26, 2024 | Math | CodeCode Available | 0 |
| Recursive Introspection: Teaching Language Model Agents How to Self-Improve | Jul 25, 2024 | Imitation LearningLanguage Modeling | —Unverified | 0 |