| MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning | Feb 27, 2024 | 8kLanguage Modeling | CodeCode Available | 0 |
| Unleashing the Potential of Large Language Models as Prompt Optimizers: Analogical Analysis with Gradient-based Model Optimizers | Feb 27, 2024 | MMLU | CodeCode Available | 1 |
| ChatMusician: Understanding and Generating Music Intrinsically with LLM | Feb 25, 2024 | MMLUText Generation | CodeCode Available | 3 |
| tinyBenchmarks: evaluating LLMs with fewer examples | Feb 22, 2024 | MMLUMultiple-choice | CodeCode Available | 2 |
| ARL2: Aligning Retrievers for Black-box Large Language Models via Self-guided Adaptive Relevance Labeling | Feb 21, 2024 | MMLURetrieval | CodeCode Available | 0 |
| Automating Dataset Updates Towards Reliable and Timely Evaluation of Large Language Models | Feb 19, 2024 | MMLU | —Unverified | 0 |
| A StrongREJECT for Empty Jailbreaks | Feb 15, 2024 | MMLU | CodeCode Available | 2 |
| Accurate LoRA-Finetuning Quantization of LLMs via Information Retention | Feb 8, 2024 | MMLUQuantization | CodeCode Available | 2 |
| When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards | Feb 1, 2024 | Answer SelectionLanguage Modeling | CodeCode Available | 0 |
| Towards Uncertainty-Aware Language Agent | Jan 25, 2024 | MMLUStrategyQA | —Unverified | 0 |