| A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration | Oct 3, 2023 | Arithmetic ReasoningCode Generation | CodeCode Available | 1 |
| FELM: Benchmarking Factuality Evaluation of Large Language Models | Oct 1, 2023 | BenchmarkingMath | CodeCode Available | 1 |
| NLPBench: Evaluating Large Language Models on Solving NLP Problems | Sep 27, 2023 | BenchmarkingMath | CodeCode Available | 1 |
| Design of Chain-of-Thought in Math Problem Solving | Sep 20, 2023 | DiversityGSM8K | CodeCode Available | 1 |
| Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning | Sep 19, 2023 | Instruction FollowingLanguage Modeling | CodeCode Available | 1 |
| Towards an AI to Win Ghana's National Science and Maths Quiz | Aug 8, 2023 | MathQuestion Answering | CodeCode Available | 1 |
| Studying Large Language Model Generalization with Influence Functions | Aug 7, 2023 | counterfactualLanguage Modeling | CodeCode Available | 1 |
| A Symbolic Character-Aware Model for Solving Geometry Problems | Aug 5, 2023 | MathMulti-Label Classification | CodeCode Available | 1 |
| SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning | Aug 1, 2023 | GSM8KMath | CodeCode Available | 1 |
| SIGHT: A Large Annotated Dataset on Student Insights Gathered from Higher Education Transcripts | Jun 15, 2023 | Math | CodeCode Available | 1 |