SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 51515200 of 17610 papers

TitleStatusHype
Randomized Geometric Algebra Methods for Convex Neural NetworksCode0
Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion0
Seed-TTS: A Family of High-Quality Versatile Speech Generation ModelsCode7
CR-UTP: Certified Robustness against Universal Text Perturbations on Large Language ModelsCode0
Conditional Language Learning with ContextCode0
Assessing the Performance of Chinese Open Source Large Language Models in Information Extraction Tasks0
Towards Effective Time-Aware Language Representation: Exploring Enhanced Temporal Understanding in Language Models0
Edit Distance Robust Watermarks via Indexing Pseudorandom Codes0
Block Transformer: Global-to-Local Language Modeling for Fast InferenceCode2
Diver: Large Language Model Decoding with Span-Level Mutual Information Verification0
Disentangling Logic: The Role of Context in Large Language Model Reasoning CapabilitiesCode0
Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis0
RKLD: Reverse KL-Divergence-based Knowledge Distillation for Unlearning Personal Information in Large Language Models0
Radar Spectra-Language Model for Automotive Scene Parsing0
Meta-Designing Quantum Experiments with Language Models0
Scalable MatMul-free Language ModelingCode7
Why Would You Suggest That? Human Trust in Language Model Responses0
LlamaCare: A Large Medical Language Model for Enhancing Healthcare Knowledge SharingCode1
Audio Mamba: Selective State Spaces for Self-Supervised Audio RepresentationsCode1
HoneyGPT: Breaking the Trilemma in Terminal Honeypots with Large Language Model0
Large Language Model-Enabled Multi-Agent Manufacturing Systems0
LongSSM: On the Length Extension of State-space Models in Language Modelling0
MaskSR: Masked Language Model for Full-band Speech Restoration0
Zyda: A 1.3T Dataset for Open Language ModelingCode1
DrEureka: Language Model Guided Sim-To-Real Transfer0
TruthEval: A Dataset to Evaluate LLM Truthfulness and ReliabilityCode0
An Independence-promoting Loss for Music Generation with Language Models0
HPE-CogVLM: Advancing Vision Language Models with a Head Pose Grounding Task0
GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language ModelCode2
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language ModelsCode4
VerilogReader: LLM-Aided Hardware Test GenerationCode1
SwitchLoRA: Switched Low-Rank Adaptation Can Learn Full-Rank Information0
SUBLLM: A Novel Efficient Architecture with Token Sequence Subsampling for LLMCode2
Graph Neural Network Enhanced Retrieval for Question Answering of LLMs0
Large Language Model Assisted Optimal Bidding of BESS in FCAS Market: An AI-agent based Approach0
Towards Harnessing Large Language Models for Comprehension of Conversational GroundingCode0
Alibaba LingmaAgent: Improving Automated Issue Resolution via Comprehensive Repository ExplorationCode0
OLoRA: Orthonormal Low-Rank Adaptation of Large Language Models0
Generative Pre-trained Speech Language Model with Efficient Hierarchical TransformerCode2
Synergizing Unsupervised and Supervised Learning: A Hybrid Approach for Accurate Natural Language Task Modeling0
Unsupervised Distractor Generation via Large Language Model Distilling and Counterfactual Contrastive Decoding0
Scalable Ensembling For Mitigating Reward Overoptimisation0
SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models0
The Geometry of Categorical and Hierarchical Concepts in Large Language ModelsCode2
LLM and GNN are Complementary: Distilling LLM for Multimodal Graph Learning0
TabPedia: Towards Comprehensive Visual Table Understanding with Concept SynergyCode2
Asynchronous Multi-Server Federated Learning for Geo-Distributed Clients0
Luna: An Evaluation Foundation Model to Catch Language Model Hallucinations with High Accuracy and Low Cost0
L-MAGIC: Language Model Assisted Generation of Images with CoherenceCode0
Superhuman performance in urology board questions by an explainable large language model enabled for context integration of the European Association of Urology guidelines: the UroBot study0
Show:102550
← PrevPage 104 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified