SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 1500115050 of 17610 papers

TitleStatusHype
Quantifying Semantic Emergence in Language ModelsCode0
Turning Logic Against Itself : Probing Model Defenses Through Contrastive QuestionsCode0
Multilinguals at SemEval-2022 Task 11: Complex NER in Semantically Ambiguous Settings for Low Resource LanguagesCode0
TopicRNN: A Recurrent Neural Network with Long-Range Semantic DependencyCode0
Revisiting The Classics: A Study on Identifying and Rectifying Gender Stereotypes in Rhymes and PoemsCode0
Topics as Entity Clusters: Entity-based Topics from Large Language Models and Graph Neural NetworksCode0
Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image CaptioningCode0
Revisiting Topic-Guided Language ModelsCode0
QLESS: A Quantized Approach for Data Valuation and Selection in Large Language Model Fine-TuningCode0
LMCap: Few-shot Multilingual Image Captioning by Retrieval Augmented Language Model PromptingCode0
QASiNa: Religious Domain Question Answering using Sirah NabawiyahCode0
Pythia: AI-assisted Code Completion SystemCode0
L-MAGIC: Language Model Assisted Generation of Images with CoherenceCode0
KALM: Knowledge-Aware Integration of Local, Document, and Global Contexts for Long Document UnderstandingCode0
Multi-Lingual Question Generation with Language Agnostic Language ModelCode0
Pyramidal Recurrent Unit for Language ModelingCode0
Putting words in context: LSTM language models and lexical ambiguityCode0
Putting GPT-3's Creativity to the (Alternative Uses) TestCode0
Multilingual Normalization of Temporal Expressions with Masked Language ModelsCode0
Dynamically Allocated Interval-Based Generative Linguistic Steganography with Roulette WheelCode0
Understanding Language Modeling Paradigm Adaptations in Recommender Systems: Lessons Learned and Open ChallengesCode0
Put It Back: Entity Typing with Language Model EnhancementCode0
LLM vs. Lawyers: Identifying a Subset of Summary Judgments in a Large UK Case Law DatasetCode0
Learning to Solve Voxel Building Embodied Tasks from Pixels and Natural Language InstructionsCode0
Pushing the bounds of dropoutCode0
Learning to Rank Context for Named Entity Recognition Using a Synthetic DatasetCode0
Multilingual Named Entity Recognition Using Pretrained Embeddings, Attention Mechanism and NCRFCode0
PuoBERTa: Training and evaluation of a curated language model for SetswanaCode0
Multi-level Multimodal Common Semantic Space for Image-Phrase GroundingCode0
Language Models Meet Anomaly Detection for Better Interpretability and GeneralizabilityCode0
The Birth of Bias: A case study on the evolution of gender bias in an English language modelCode0
Non-Determinism of "Deterministic" LLM SettingsCode0
RIDE: Enhancing Large Language Model Alignment through Restyled In-Context Learning Demonstration ExemplarsCode0
Riemannian Normalizing Flow on Variational Wasserstein Autoencoder for Text ModelingCode0
PunKtuator: A Multilingual Punctuation Restoration System for Spoken and Written TextCode0
Stable LM 2 1.6B Technical ReportCode0
Pun Generation with SurpriseCode0
The Role of Language Imbalance in Cross-lingual Generalisation: Insights from Cloned Language ExperimentsCode0
RILQ: Rank-Insensitive LoRA-based Quantization Error Compensation for Boosting 2-bit Large Language Model AccuracyCode0
Punctuation Restoration Improves Structure Understanding Without SupervisionCode0
Learning to Plan for Language Modeling from Unlabeled DataCode0
Punctuation Restoration for Singaporean Spoken Languages: English, Malay, and MandarinCode0
PULSAR at MEDIQA-Sum 2023: Large Language Models Augmented by Synthetic Dialogue Convert Patient Dialogues to Medical RecordsCode0
MultiHal: Multilingual Dataset for Knowledge-Graph Grounded Evaluation of LLM HallucinationsCode0
LLMs learn governing principles of dynamical systems, revealing an in-context neural scaling lawCode0
Multi-Granularity Tibetan Textual Adversarial Attack Method Based on Masked Language ModelCode0
The Butterfly Effect of Altering Prompts: How Small Changes and Jailbreaks Affect Large Language Model PerformanceCode0
Stacked AMR Parsing with Silver DataCode0
Towards Zero-shot Commonsense Reasoning with Self-supervised Refinement of Language ModelsCode0
Public Sentiment Toward Solar Energy: Opinion Mining of Twitter Using a Transformer-Based Language ModelCode0
Show:102550
← PrevPage 301 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified