| ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools | Jun 18, 2024 | AllGSM8K | CodeCode Available | 14 |
| ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation | Jun 16, 2024 | Continual LearningGSM8K | CodeCode Available | 0 |
| Reactor Mk.1 performances: MMLU, HumanEval and BBH test results | Jun 15, 2024 | BenchmarkingHumanEval | —Unverified | 0 |
| PLUM: Improving Code LMs with Execution-Guided On-Policy Preference Learning Driven By Synthetic Test Cases | Jun 11, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| Validating LLM-Generated Programs with Metamorphic Prompt Testing | Jun 11, 2024 | HumanEval | —Unverified | 0 |
| JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models | Jun 10, 2024 | BenchmarkingCode Generation | CodeCode Available | 0 |
| How Efficient is LLM-Generated Code? A Rigorous & High-Standard Benchmark | Jun 10, 2024 | HumanEvalProgram Synthesis | CodeCode Available | 1 |
| Does your data spark joy? Performance gains from domain upsampling at the end of training | Jun 5, 2024 | GSM8KHumanEval | —Unverified | 0 |
| SemCoder: Training Code Language Models with Comprehensive Semantics Reasoning | Jun 3, 2024 | Code CompletionCode Generation | CodeCode Available | 1 |
| Automatic Instruction Evolving for Large Language Models | Jun 2, 2024 | GSM8KHumanEval | CodeCode Available | 3 |