SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 1585115900 of 17610 papers

TitleStatusHype
Teaching LLMs to Abstain across Languages via Multilingual FeedbackCode0
Teaching Smaller Language Models To Generalise To Unseen Compositional QuestionsCode0
Teaching Specific Scientific Knowledge into Large Language Models through Additional TrainingCode0
Knowledge Graph informed Fake News Classification via Heterogeneous Representation EnsemblesCode0
Knowledge Graph Completion using Structural and Textual EmbeddingsCode0
OmniNet: Omnidirectional Representations from TransformersCode0
oLMpics -- On what Language Model Pre-training CapturesCode0
Leveraging Protein Language Model Embeddings for Catalytic Turnover Prediction of Adenylate Kinase Orthologs in a Low-Data RegimeCode0
Connecting degree and polarity: An artificial language learning studyCode0
OffensiveLang: A Community Based Implicit Offensive Language DatasetCode0
Objectively Evaluating the Reliability of Cell Type Annotation Using LLM-Based StrategiesCode0
NVP-HRI: Zero Shot Natural Voice and Posture-based Human-Robot Interaction via Large Language ModelCode0
TiEBe: Tracking Language Model Recall of Notable Worldwide Events Through TimeCode0
keepitsimple at SemEval-2025 Task 3: LLM-Uncertainty based Approach for Multilingual Hallucination Span DetectionCode0
Layout Generation Agents with Large Language ModelsCode0
Nutri-bullets: Summarizing Health Studies by Composing SegmentsCode0
Team Ohio State at CMCL 2021 Shared Task: Fine-Tuned RoBERTa for Eye-Tracking Data PredictionCode0
Team Papelo: Transformer Networks at FEVERCode0
Leveraging pre-trained language models for code generationCode0
MAMUT: A Novel Framework for Modifying Mathematical Formulas for the Generation of Specialized Datasets for Language Model TrainingCode0
SLIDE: Integrating Speech Language Model with LLM for Spontaneous Spoken Dialogue GenerationCode0
Nutribullets Hybrid: Multi-document Health SummarizationCode0
Can Out-of-Domain data help to Learn Domain-Specific Prompts for Multimodal Misinformation Detection?Code0
Making the Most of Text Semantics to Improve Biomedical Vision--Language ProcessingCode0
Towards Robust Named Entity Recognition for Historic GermanCode0
TigerLLM -- A Family of Bangla Large Language ModelsCode0
Tight Clusters Make Specialized ExpertsCode0
ÚFAL CorPipe at CRAC 2023: Larger Context Improves Multilingual Coreference ResolutionCode0
Transformer based neural networks for emotion recognition in conversationsCode0
Numeracy for Language Models: Evaluating and Improving their Ability to Predict NumbersCode0
SLM-Mod: Small Language Models Surpass LLMs at Content ModerationCode0
NukeBERT: A Pre-trained language model for Low Resource Nuclear DomainCode0
Techniques to Improve Neural Math Word Problem SolversCode0
Slot Induction via Pre-trained Language Model Probing and Multi-level Contrastive LearningCode0
Nugget: Neural Agglomerative Embeddings of TextCode0
Leveraging Open Information Extraction for More Robust Domain Transfer of Event Trigger DetectionCode0
No Wrong Turns: The Simple Geometry Of Neural Networks Optimization PathsCode0
TeDA: Boosting Vision-Lanuage Models for Zero-Shot 3D Object Retrieval via Testing-time Distribution AlignmentCode0
Not-Just-Scaling Laws: Towards a Better Understanding of the Downstream Impact of Language Model Design DecisionsCode0
UFIN: Universal Feature Interaction Network for Multi-Domain Click-Through Rate PredictionCode0
Not all parameters are born equal: Attention is mostly what you needCode0
Language Modeling with Syntactic and Semantic Representation for Sentence Acceptability PredictionsCode0
Balancing Rigor and Utility: Mitigating Cognitive Biases in Large Language Models for Multiple-Choice QuestionsCode0
NormSAGE: Multi-Lingual Multi-Cultural Norm Discovery from Conversations On-the-FlyCode0
Leveraging Online Data to Enhance Medical Knowledge in a Small Persian Language ModelCode0
Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in TransformersCode0
Knowledge-enhanced Prompt Tuning for Dialogue-based Relation Extraction with Trigger and Label SemanticCode0
TEII: Think, Explain, Interact and Iterate with Large Language Models to Solve Cross-lingual Emotion DetectionCode0
Leveraging Multimodal LLM for Inspirational User Interface SearchCode0
NoPPA: Non-Parametric Pairwise Attention Random Walk Model for Sentence RepresentationCode0
Show:102550
← PrevPage 318 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified