SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 58515900 of 17610 papers

TitleStatusHype
A-MESS: Anchor based Multimodal Embedding with Semantic Synchronization for Multimodal Intent Recognition0
A Meta-Learning Perspective on Transformers for Causal Language Modeling0
A Methodology for Obtaining Concept Graphs from Word Graphs0
A Method on Searching Better Activation Functions0
Amharic-English Speech Translation in Tourism Domain0
Amharic Word Sequence Prediction0
A Mixture-of-Expert Approach to RL-based Dialogue Management0
A Mixture of h-1 Heads is Better than h Heads0
A Mixture of h - 1 Heads is Better than h Heads0
A ML-LLM pairing for better code comment classification0
Amobee at SemEval-2020 Task 7: Regularization of Language Model Based Classifiers0
A Monte Carlo Framework for Calibrated Uncertainty Estimation in Sequence Prediction0
A Monte Carlo Language Model Pipeline for Zero-Shot Sociopolitical Event Extraction0
A more abstractive summarization model0
AMPO: Active Multi-Preference Optimization0
A Multi-Context Character Prediction Model for a Brain-Computer Interface0
A Multi-Expert Large Language Model Architecture for Verilog Code Generation0
Bridging Items and Language: A Transition Paradigm for Large Language Model-Based Recommendation0
A Multi-Granularity Retrieval Framework for Visually-Rich Documents0
A Multilingual Bag-of-Entities Model for Zero-Shot Cross-Lingual Text Classification0
A Multilingual Bag-of-Entities Model for Zero-Shot Cross-Lingual Text Classification0
A Multimodal Approach to Device-Directed Speech Detection with Large Language Models0
A Multimodal Automated Interpretability Agent0
A Multimodal Educational Corpus of Oral Courses: Annotation, Analysis and Case Study0
A Multi-Modal Knowledge-Enhanced Framework for Vessel Trajectory Prediction0
A Multimodal Recaptioning Framework to Account for Perceptual Diversity in Multilingual Vision-Language Modeling0
A Multi-Phase Analysis of Blood Culture Stewardship: Machine Learning Prediction, Expert Recommendation Assessment, and LLM Automation0
A Multi-solution Study on GDPR AI-enabled Completeness Checking of DPAs0
A Multi-Task Text Classification Pipeline with Natural Language Explanations: A User-Centric Evaluation in Sentiment Analysis and Offensive Language Identification in Greek Tweets0
A multitask transfer learning framework for the prediction of virus-human protein-protein interactions0
An Actor-Critic Approach to Boosting Text-to-SQL Large Language Model0
An Adversarial Multi-Task Learning Method for Chinese Text Correction with Semantic Detection0
An Agentic Framework for Autonomous Metamaterial Modeling and Inverse Design0
AnaLog: Testing Analytical and Deductive Logic Learnability in Language Models0
Analyse des performances de mod\`eles de langage sub-lexicale pour des langues peu-dot\'ees \`a morphologie riche (Performance analysis of sub-word language modeling for under-resourced languages with rich morphology: case study on Swahili and Amharic) [in French]0
Analysing Dropout and Compounding Errors in Neural Language Models0
Analysing the Effect of Masking Length Distribution of MLM: An Evaluation Framework and Case Study on Chinese MRC Datasets0
Analysing the Effect of Out-of-Domain Data on SMT Systems0
Analysis of Argument Structure Constructions in the Large Language Model BERT0
Analysis of Disinformation and Fake News Detection Using Fine-Tuned Large Language Model0
An Analysis of Semantically-Aligned Speech-Text Embeddings0
Analysis of Plan-based Retrieval for Grounded Text Generation0
Analysis of the User Perception of Chatbots in Education Using A Partial Least Squares Structural Equation Modeling Approach0
Analysis of Word Embeddings and Sequence Features for Clinical Information Extraction0
Analytic Subspace Routing: How Recursive Least Squares Works in Continual Learning of Large Language Model0
Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models0
Analyzing and Reducing the Performance Gap in Cross-Lingual Transfer with Fine-tuning Slow and Fast0
Analyzing Bias in Swiss Federal Supreme Court Judgments Using Facebook's Holistic Bias Dataset: Implications for Language Model Training0
Analyzing FOMC Minutes: Accuracy and Constraints of Language Models0
以語言模型判斷學習者文句流暢度(Analyzing Learners `Writing Fluency Based on Language Model)[In Chinese]0
Show:102550
← PrevPage 118 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified