SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 10011050 of 17610 papers

TitleStatusHype
TURNA: A Turkish Encoder-Decoder Language Model for Enhanced Understanding and GenerationCode2
Towards 3D Molecule-Text Interpretation in Language ModelsCode2
ChatterBox: Multi-round Multimodal Referring and GroundingCode2
DsDm: Model-Aware Dataset Selection with DatamodelsCode2
In-Context Language Learning: Architectures and AlgorithmsCode2
With Greater Text Comes Greater Necessity: Inference-Time Training Helps Long Text GenerationCode2
MLLM-Tool: A Multimodal Large Language Model For Tool Agent LearningCode2
Evolutionary Computation in the Era of Large Language Model: Survey and RoadmapCode2
SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language ModelCode2
Spatial-Temporal Large Language Model for Traffic PredictionCode2
Graph Language ModelsCode2
Prometheus-Vision: Vision-Language Model as a Judge for Fine-Grained EvaluationCode2
TechGPT-2.0: A large language model project to solve the task of knowledge graph constructionCode2
SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent SystemsCode2
Malla: Demystifying Real-world Large Language Model Integrated Malicious ServicesCode2
ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction TuningCode2
ChangeCLIP: Remote sensing change detection with multimodal vision-language representation learningCode2
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and ToxicityCode2
LION: Empowering Multimodal Large Language Model with Dual-Level Visual KnowledgeCode2
MosaicBERT: A Bidirectional Encoder Optimized for Fast PretrainingCode2
Understanding the Potential of FPGA-Based Spatial Acceleration for Large Language Model InferenceCode2
LingoQA: Visual Question Answering for Autonomous DrivingCode2
Lookahead: An Inference Acceleration Framework for Large Language Model with Lossless Generation AccuracyCode2
Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization ApproachCode2
LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style PluginCode2
Holodeck: Language Guided Generation of 3D Embodied AI EnvironmentsCode2
VLFM: Vision-Language Frontier Maps for Zero-Shot Semantic NavigationCode2
Large Language Models on Graphs: A Comprehensive SurveyCode2
Customization Assistant for Text-to-image GenerationCode2
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video UnderstandingCode2
CritiqueLLM: Towards an Informative Critique Generation Model for Evaluation of Large Language Model GenerationCode2
War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World WarsCode2
LLMGA: Multimodal Large Language Model based Generation AssistantCode2
YUAN 2.0: A Large Language Model with Localized Filtering-based AttentionCode2
Algorithm Evolution Using Large Language ModelCode2
GeoChat: Grounded Large Vision-Language Model for Remote SensingCode2
Controlled Text Generation via Language Model ArithmeticCode2
FinMem: A Performance-Enhanced LLM Trading Agent with Layered Memory and Character DesignCode2
A Survey of Graph Meets Large Language Model: Progress and Future DirectionsCode2
Meta Prompting for AI SystemsCode2
Open-Vocabulary Camouflaged Object SegmentationCode2
HuatuoGPT-II, One-stage Training for Medical Adaption of LLMsCode2
Exponentially Faster Language ModellingCode2
REST: Retrieval-Based Speculative DecodingCode2
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video UnderstandingCode2
Tamil-Llama: A New Tamil Language Model Based on Llama 2Code2
BeLLM: Backward Dependency Enhanced Large Language Model for Sentence EmbeddingsCode2
On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous DrivingCode2
Large Trajectory Models are Scalable Motion Predictors and PlannersCode2
Discrete Diffusion Modeling by Estimating the Ratios of the Data DistributionCode2
Show:102550
← PrevPage 21 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified