SOTAVerified

Lexical Normalization

Lexical normalization is the task of translating/transforming a non standard text to a standard register.

Example:

new pix comming tomoroe
new pictures coming tomorrow

Datasets usually consists of tweets, since these naturally contain a fair amount of these phenomena.

For lexical normalization, only replacements on the word-level are annotated. Some corpora include annotation for 1-N and N-1 replacements. However, word insertion/deletion and reordering is not part of the task.

Papers

Showing 110 of 47 papers

TitleStatusHype
ÚFAL at MultiLexNorm 2021: Improving Multilingual Lexical Normalization by Fine-tuning ByT5Code1
ViLexNorm: A Lexical Normalization Corpus for Vietnamese Social Media TextCode1
Accurate Word Segmentation and POS Tagging for Japanese Microblogs: Corpus Annotation and Joint Modeling with Lexical Normalization0
A Weakly Supervised Data Labeling Framework for Machine Lexical Normalization in Vietnamese Social Media0
A Large Corpus of Product Reviews in Portuguese: Tackling Out-Of-Vocabulary Words0
A Log-Linear Model for Unsupervised Text Normalization0
An In-depth Analysis of the Effect of Lexical Normalization on the Dependency Parsing of Social Media0
A Character-level Ngram-based MT Approach for Lexical Normalization in Social Media0
A Taxonomy for In-depth Evaluation of Normalization for User Generated Content0
A Text Editing Approach to Joint Japanese Word Segmentation, POS Tagging, and Lexical Normalization0
Show:102550
← PrevPage 1 of 5Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MoNoiseAccuracy87.63Unverified
2Syllable basedAccuracy86.08Unverified
3TextNormAccuracy83.94Unverified
4unLOLAccuracy82.06Unverified