SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 62516300 of 17610 papers

TitleStatusHype
Assessing Relative Sentence Complexity using an Incremental CCG Parser0
Assessing Social Alignment: Do Personality-Prompted Large Language Models Behave Like Humans?0
Assessing the Aesthetic Evaluation Capabilities of GPT-4 with Vision: Insights from Group and Individual Assessments0
Assessing the Answerability of Queries in Retrieval-Augmented Code Generation0
Assessing the Effectiveness of GPT-4o in Climate Change Evidence Synthesis and Systematic Assessments: Preliminary Insights0
Assessing the efficacy of large language models in generating accurate teacher responses0
Assessing the Impact of Prompting Methods on ChatGPT's Mathematical Capabilities0
Assessing the Performance of Chinese Open Source Large Language Models in Information Extraction Tasks0
Assessing the Performance of Human-Capable LLMs -- Are LLMs Coming for Your Job?0
Assessing the Readability of Sentences: Which Corpora and Features?0
Assessing the Stylistic Properties of Neurally Generated Text in Authorship Attribution0
Assessing Translation capabilities of Large Language Models involving English and Indian Languages0
Assessing Wikipedia-Based Cross-Language Retrieval Models0
Assessment of the Relative Importance of different hyper-parameters of LSTM for an IDS0
Assigning Deep Lexical Types Using Structured Classifier Features for Grammatical Dependencies0
Assisted Debate Builder with Large Language Models0
Assisted Text Annotation Using Active Learning to Achieve High Quality with Little Effort0
AssistGUI: Task-Oriented PC Graphical User Interface Automation0
Assisting Undergraduate Students in Writing Spanish Methodology Sections0
Assistive Large Language Model Agents for Socially-Aware Negotiation Dialogues0
Assistive Recipe Editing through Critiquing0
Associative and Semantic Features Extracted From Web-Harvested Corpora0
AStarTwice at SemEval-2021 Task 5: Toxic Span Detection Using RoBERTa-CRF, Domain Specific Pre-Training and Self-Training0
A transfer learning framework for weak-to-strong generalization0
A statistically consistent measure of Semantic Variability using Language Models0
A Statistical Physics of Language Model Reasoning0
AstBERT: Enabling Language Model for Financial Code Understanding with Abstract Syntax Trees0
A Step-Wise Weighting Approach for Controllable Text Generation0
A string-to-graph constructive alignment algorithm for discrete and probabilistic language modeling0
Astrocyte-Enabled Advancements in Spiking Neural Networks for Large Language Modeling0
Astro-HEP-BERT: A bidirectional language model for studying the meanings of concepts in astrophysics and high energy physics0
AstroLLaVA: towards the unification of astronomical data and natural language0
Multi-Sentence Grounding for Long-term Instructional Video0
A Structured Language Model for Incremental Tree-to-String Translation0
A Study of BFLOAT16 for Deep Learning Training0
Continual Learning Under Language Shift0
A Study of Language Modeling for Chinese Spelling Check0
A study of the impact of generative AI-based data augmentation on software metadata classification0
A Study of Word-Classing for MT Reordering0
A Study on Contextualized Language Modeling for Machine Reading Comprehension0
A Study on Contextualized Language Modeling for FAQ Retrieval0
A Study on Educational Data Analysis and Personalized Feedback Report Generation Based on Tags and ChatGPT0
A Study on Effect of Reference Knowledge Choice in Generating Technical Content Relevant to SAPPhIRE Model Using Large Language Model0
A Study on Effects of Implicit and Explicit Language Model Information for DBLSTM-CTC Based Handwriting Recognition0
A study on native American English speech recognition by Indian listeners with varying word familiarity level0
A Study on Neural Network Language Modeling0
A Study on Prompt-based Few-Shot Learning Methods for Belief State Tracking in Task-oriented Dialog Systems0
A Study on the Implementation of Generative AI Services Using an Enterprise Data-Based LLM Application Architecture0
A Subword Level Language Model for Bangla Language0
A Superalignment Framework in Autonomous Driving with Large Language Models0
Show:102550
← PrevPage 126 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified