SOTAVerified

Lexical Normalization

Lexical normalization is the task of translating/transforming a non standard text to a standard register.

Example:

new pix comming tomoroe
new pictures coming tomorrow

Datasets usually consists of tweets, since these naturally contain a fair amount of these phenomena.

For lexical normalization, only replacements on the word-level are annotated. Some corpora include annotation for 1-N and N-1 replacements. However, word insertion/deletion and reordering is not part of the task.

Papers

Showing 147 of 47 papers

TitleStatusHype
ViLexNorm: A Lexical Normalization Corpus for Vietnamese Social Media TextCode1
ÚFAL at MultiLexNorm 2021: Improving Multilingual Lexical Normalization by Fine-tuning ByT5Code1
Lexical Normalization for Code-switched Data and its Effect on POS TaggingCode0
A Clustering Framework for Lexical Normalization of Roman UrduCode0
Modeling Input Uncertainty in Neural Network Dependency ParsingCode0
MoNoise: A Multi-lingual and Easy-to-use Lexical Normalization ToolCode0
MoNoise: Modeling Noise Using a Modular Normalization SystemCode0
MultiLexNorm: A Shared Task on Multilingual Lexical NormalizationCode0
DaN+: Danish Nested Named Entities and Lexical NormalizationCode0
Adapting Deep Learning for Sentiment Classification of Code-Switched Informal Short TextCode0
Automatic Textual Normalization for Hate Speech DetectionCode0
User-Generated Text Corpus for Evaluating Japanese Morphological Analysis and Lexical NormalizationCode0
A Multi-cascaded Deep Model for Bilingual SMS ClassificationCode0
Increasing Robustness for Cross-domain Dialogue Act Classification on Social Media DataCode0
ViSoLex: An Open-Source Repository for Vietnamese Social Media Lexical NormalizationCode0
Adapting Sequence to Sequence models for Text Normalization in Social MediaCode0
NCSU-SAS-Ning: Candidate Generation and Feature Engineering for Supervised Lexical Normalization0
NCSU\_SAS\_SAM: Deep Encoding and Reconstruction for Normalization of Noisy Text0
Noise-Robust Morphological Disambiguation for Dialectal Arabic0
Normalization of Indonesian-English Code-Mixed Twitter Data0
Norm It! Lexical Normalization for Italian and Its Downstream Effects for Dependency Parsing0
Sequence-to-Sequence Lexical Normalization with Multilingual Transformers0
Sesame Street to Mount Sinai: BERT-constrained character-level Moses models for multilingual lexical normalization0
Shared Tasks of the 2015 Workshop on Noisy User-generated Text: Twitter Lexical Normalization and Named Entity Recognition0
Synthetic Data for English Lexical Normalization: How Close Can We Get to Manually Annotated Data?0
The Denoised Web Treebank: Evaluating Dependency Parsing under Noisy Input Conditions0
Towards Shared Datasets for Normalization Research0
To What Extent Does Lexical Normalization Help English-as-a-Second Language Learners to Read Noisy English Texts?0
Tweet Normalization with Syllables0
Accurate Word Segmentation and POS Tagging for Japanese Microblogs: Corpus Annotation and Joint Modeling with Lexical Normalization0
USZEGED: Correction Type-sensitive Normalization of English Tweets Using Efficiently Indexed n-gram Statistics0
TweetNorm\_es: an annotated corpus for Spanish microtext normalization0
A Character-level Ngram-based MT Approach for Lexical Normalization in Social Media0
A Large Corpus of Product Reviews in Portuguese: Tackling Out-Of-Vocabulary Words0
A Log-Linear Model for Unsupervised Text Normalization0
An In-depth Analysis of the Effect of Lexical Normalization on the Dependency Parsing of Social Media0
A Taxonomy for In-depth Evaluation of Normalization for User Generated Content0
A Text Editing Approach to Joint Japanese Word Segmentation, POS Tagging, and Lexical Normalization0
A Weakly Supervised Data Labeling Framework for Machine Lexical Normalization in Vietnamese Social Media0
CL-MoNoise: Cross-lingual Lexical Normalization0
Contrastive String Representation Learning using Synthetic Data0
Enhancing BERT for Lexical Normalization0
Handling Normalization Issues for Part-of-Speech Tagging of Online Conversational Text0
IHS\_RD: Lexical Normalization for English Tweets0
Lexical Normalization for Code-switched Data and its Effect on POS-tagging0
Lexical Normalization of User-Generated Medical Text0
Multilingual Sequence Labeling Approach to solve Lexical Normalization0
Show:102550

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MoNoiseAccuracy87.63Unverified
2Syllable basedAccuracy86.08Unverified
3TextNormAccuracy83.94Unverified
4unLOLAccuracy82.06Unverified