| Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems | Oct 19, 2023 | In-Context LearningLanguage Modeling | CodeCode Available | 0 |
| TabuLa: Harnessing Language Models for Tabular Data Synthesis | Oct 19, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Data Augmentations for Improved (Large) Language Model Generalization | Oct 19, 2023 | Attributecounterfactual | —Unverified | 0 |
| Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture | Oct 18, 2023 | 4kimage-classification | CodeCode Available | 2 |
| Solving the multiplication problem of a large language model system using a graph-based method | Oct 18, 2023 | ChatbotLanguage Modeling | —Unverified | 0 |
| Preference Optimization for Molecular Language Models | Oct 18, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Document-Level Language Models for Machine Translation | Oct 18, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Pseudointelligence: A Unifying Framework for Language Model Evaluation | Oct 18, 2023 | Language Model EvaluationLanguage Modeling | —Unverified | 0 |
| Harnessing Dataset Cartography for Improved Compositional Generalization in Transformers | Oct 18, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Zero-shot Faithfulness Evaluation for Text Summarization with Foundation Language Model | Oct 18, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Fast Multipole Attention: A Divide-and-Conquer Attention Mechanism for Long Sequences | Oct 18, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Solving Hard Analogy Questions with Relation Embedding Chains | Oct 18, 2023 | Knowledge GraphsLanguage Modeling | CodeCode Available | 0 |
| Large Language Model Prediction Capabilities: Evidence from a Real-World Forecasting Tournament | Oct 17, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Iterative Shallow Fusion of Backward Language Model for End-to-End Speech Recognition | Oct 17, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Multi-stage Large Language Model Correction for Speech Recognition | Oct 17, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging | Oct 17, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Revealing the Unwritten: Visual Investigation of Beam Search Trees to Address Language Model Prompting Challenges | Oct 17, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| BitNet: Scaling 1-bit Transformers for Large Language Models | Oct 17, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Leveraging Large Language Model for Automatic Evolving of Industrial Data-Centric R&D Cycle | Oct 17, 2023 | Anomaly DetectionDecision Making | —Unverified | 0 |
| ViSoBERT: A Pre-Trained Language Model for Vietnamese Social Media Text Processing | Oct 17, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Correction Focused Language Model Training for Speech Recognition | Oct 17, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Learn Your Tokens: Word-Pooled Tokenization for Language Modeling | Oct 17, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Watermarking LLMs with Weight Quantization | Oct 17, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| ChapGTP, ILLC's Attempt at Raising a BabyLM: Improving Data Efficiency by Automatic Task Formation | Oct 17, 2023 | Data AugmentationLanguage Modeling | —Unverified | 0 |
| Utilising a Large Language Model to Annotate Subject Metadata: A Case Study in an Australian National Research Data Catalogue | Oct 17, 2023 | In-Context LearningLanguage Modeling | —Unverified | 0 |