SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 72517300 of 17610 papers

TitleStatusHype
FQuAD2.0: French Question Answering and Learning When You Don’t Know0
FQuAD: French Question Answering Dataset0
FRAME: Evaluating Rationale-Label Consistency Metrics for Free-Text Rationales0
Framing the News:From Human Perception to Large Language Model Inferences0
Free Acoustic and Language Models for Large Vocabulary Continuous Speech Recognition in Swedish0
Free and Fair Hardware: A Pathway to Copyright Infringement-Free Verilog Generation using LLMs0
FreeLM: Fine-Tuning-Free Language Model0
Free on-line speech recogniser based on Kaldi ASR toolkit producing word posterior lattices0
FreeRide: Harvesting Bubbles in Pipeline Parallelism0
Freezing the Pivot for Triangular Machine Translation0
FreqKV: Frequency Domain Key-Value Compression for Efficient Context Window Extension0
Frequency Autoregressive Image Generation with Continuous Tokens0
Frequency-aware SGD for Efficient Embedding Learning with Provable Benefits0
From a Lossless (~1.5:1) Compression Algorithm for Llama2 7B Weights to Variable Precision, Variable Range, Compressed Numeric Data Types for CNNs and LLMs0
From Attention to Atoms: Spectral Dictionary Learning for Fast, Interpretable Language Models0
From Beginner to Expert: Modeling Medical Knowledge into General LLMs0
From Caesar Cipher to Unsupervised Learning: A New Method for Classifier Parameter Estimation0
From Captions to Rewards (CAREVL): Leveraging Large Language Model Experts for Enhanced Reward Modeling in Large Vision-Language Models0
From Characters to Words: Hierarchical Pre-trained Language Model for Open-vocabulary Language Understanding0
From Characters to Words to in Between: Do We Capture Morphology?0
From Complex to Simple: Unraveling the Cognitive Tree for Reasoning with Small Language Models0
From Concept to Manufacturing: Evaluating Vision-Language Models for Engineering Design0
From Context to Action: Analysis of the Impact of State Representation and Context on the Generalization of Multi-Turn Web Navigation Agents0
From Cooking Recipes to Robot Task Trees -- Improving Planning Correctness and Task Efficiency by Leveraging LLMs with a Knowledge Network0
From Dogwhistles to Bullhorns: Unveiling Coded Rhetoric with Language Models0
From FreEM to D'AlemBERT: a Large Corpus and a Language Model for Early Modern French0
From FreEM to D’AlemBERT: a Large Corpus and a Language Model for Early Modern French0
From General to Specific: Tailoring Large Language Models for Personalized Healthcare0
From Hallucinations to Facts: Enhancing Language Models with Curated Knowledge Graphs0
From Human to Machine Psychology: A Conceptual Framework for Understanding Well-Being in Large Language Model0
From Idea to CAD: A Language Model-Driven Multi-Agent System for Collaborative Design0
From Information Bottleneck To Activation Norm Penalty0
From Instructions to Constraints: Language Model Alignment with Automatic Constraint Verification0
From Keywords to Structured Summaries: Streamlining Scholarly Information Access0
From Language Models over Tokens to Language Models over Characters0
From Language to Language-ish: How Brain-Like is an LSTM's Representation of Nonsensical Language Stimuli?0
From LLM to NMT: Advancing Low-Resource Machine Translation with Claude0
From Machine Learning to Machine Reasoning0
From Macro to Micro: Probing Dataset Diversity in Language Model Fine-Tuning0
From melodic note sequences to pitches using word2vec0
From Narratives to Numbers: Valid Inference Using Language Model Predictions from Verbal Autopsy Narratives0
From n-gram to Attention: How Model Architectures Learn and Propagate Bias in Language Modeling0
From Nodes to Networks: Evolving Recurrent Neural Networks0
ASETF: A Novel Method for Jailbreak Attack on LLMs through Translate Suffix Embeddings0
From PARIS to LE-PARIS: Toward Patent Response Automation with Recommender Systems and Collaborative Large Language Models0
From Pixels to Graphs: using Scene and Knowledge Graphs for HD-EPIC VQA Challenge0
From Prediction to Application: Language Model-based Code Knowledge Tracing with Domain Adaptive Pre-Training and Automatic Feedback System with Pedagogical Prompting for Comprehensive Programming Education0
From Prejudice to Parity: A New Approach to Debiasing Large Language Model Word Embeddings0
From Questions to Insightful Answers: Building an Informed Chatbot for University Resources0
From RAG to Agentic: Validating Islamic-Medicine Responses with LLM Agents0
Show:102550
← PrevPage 146 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified