| A Case Study in CUDA Kernel Fusion: Implementing FlashAttention-2 on NVIDIA Hopper Architecture using the CUTLASS Library | Dec 19, 2023 | GPU | CodeCode Available | 2 |
| XLand-MiniGrid: Scalable Meta-Reinforcement Learning Environments in JAX | Dec 19, 2023 | DiversityGPU | CodeCode Available | 2 |
| mLoRA: Fine-Tuning LoRA Adapters via Highly-Efficient Pipeline Parallelism in Multiple GPUs | Dec 5, 2023 | GPULarge Language Model | CodeCode Available | 2 |
| CoLLiE: Collaborative Training of Large Language Models in an Efficient Way | Dec 1, 2023 | GPUparameter-efficient fine-tuning | CodeCode Available | 2 |
| XLB: A differentiable massively parallel lattice Boltzmann library in Python | Nov 27, 2023 | CPUGPU | CodeCode Available | 2 |
| Learning to Fly in Seconds | Nov 22, 2023 | GPUReinforcement Learning (RL) | CodeCode Available | 2 |
| Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model | Nov 22, 2023 | DenoisingGPU | CodeCode Available | 2 |
| JaxMARL: Multi-Agent RL Environments and Algorithms in JAX | Nov 16, 2023 | CPUGPU | CodeCode Available | 2 |
| Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster | Nov 14, 2023 | GPUPosition | CodeCode Available | 2 |
| Black-Box Prompt Optimization: Aligning Large Language Models without Model Training | Nov 7, 2023 | GPU | CodeCode Available | 2 |