| FOCUS: Effective Embedding Initialization for Monolingual Specialization of Multilingual Models | May 23, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Extrapolating Multilingual Understanding Models as Multilingual Generators | May 22, 2023 | DenoisingLanguage Modeling | —Unverified | 0 |
| DUMB: A Benchmark for Smart Evaluation of Dutch Models | May 22, 2023 | XLM-R | CodeCode Available | 1 |
| Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages | May 20, 2023 | Language ModellingXLM-R | CodeCode Available | 1 |
| ESCOXLM-R: Multilingual Taxonomy-driven Pre-training for the Job Market Domain | May 20, 2023 | De-identificationLanguage Modeling | CodeCode Available | 1 |
| USTC-NELSLIP at SemEval-2023 Task 2: Statistical Construction and Dual Adaptation of Gazetteer for Multilingual Complex NER | May 4, 2023 | named-entity-recognitionNamed Entity Recognition | —Unverified | 0 |
| DN at SemEval-2023 Task 12: Low-Resource Language Text Classification via Multilingual Pretrained Language Model Fine-tuning | May 4, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Transfer to a Low-Resource Language via Close Relatives: The Case Study on Faroese | Apr 18, 2023 | Cross-Lingual Transfernamed-entity-recognition | —Unverified | 0 |
| GreekBART: The First Pretrained Greek Sequence-to-Sequence Model | Apr 3, 2023 | Natural Language InferenceText Generation | CodeCode Available | 0 |
| Tollywood Emotions: Annotation of Valence-Arousal in Telugu Song Lyrics | Mar 16, 2023 | Emotion RecognitionMusic Emotion Recognition | —Unverified | 0 |
| Evaluating the Effectiveness of Pre-trained Language Models in Predicting the Helpfulness of Online Product Reviews | Feb 19, 2023 | Feature EngineeringXLM-R | CodeCode Available | 0 |
| Modeling Sequential Sentence Relation to Improve Cross-lingual Dense Retrieval | Feb 3, 2023 | RelationRepresentation Learning | CodeCode Available | 0 |
| LEXTREME: A Multi-Lingual and Multi-Task Benchmark for the Legal Domain | Jan 30, 2023 | XLM-R | CodeCode Available | 1 |
| Multilingual Sentence Transformer as A Multilingual Word Aligner | Jan 28, 2023 | SentenceWord Alignment | CodeCode Available | 1 |
| XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models | Jan 25, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| ViHOS: Hate Speech Spans Detection for Vietnamese | Jan 24, 2023 | Hate Span IdentificationSequence-to-sequence Language Modeling | CodeCode Available | 1 |
| Integrating Semantic Information into Sketchy Reading Module of Retro-Reader for Vietnamese Machine Reading Comprehension | Jan 1, 2023 | Machine Reading ComprehensionReading Comprehension | —Unverified | 0 |
| DAMP: Doubly Aligned Multilingual Parser for Task-Oriented Dialogue | Dec 15, 2022 | Semantic ParsingXLM-R | CodeCode Available | 0 |
| VTCC-NLP at NL4Opt competition subtask 1: An Ensemble Pre-trained language models for Named Entity Recognition | Dec 14, 2022 | named-entity-recognitionNamed Entity Recognition | —Unverified | 0 |
| Towards Leaving No Indic Language Behind: Building Monolingual Corpora, Benchmark and Models for Indic Languages | Dec 11, 2022 | Natural Language UnderstandingXLM-R | CodeCode Available | 1 |
| Punctuation Restoration for Singaporean Spoken Languages: English, Malay, and Mandarin | Dec 10, 2022 | Language ModellingPunctuation Restoration | CodeCode Available | 0 |
| Languages You Know Influence Those You Learn: Impact of Language Characteristics on Multi-Lingual Text-to-Text Transfer | Dec 4, 2022 | Cross-Lingual TransferXLM-R | —Unverified | 0 |
| Compressing Cross-Lingual Multi-Task Models at Qualtrics | Nov 29, 2022 | ManagementModel Compression | —Unverified | 0 |
| X^2-VLM: All-In-One Pre-trained Model For Vision-Language Tasks | Nov 22, 2022 | AllCross-Modal Retrieval | CodeCode Available | 2 |
| L3Cube-HindBERT and DevBERT: Pre-Trained BERT Transformer models for Devanagari based Hindi and Marathi Languages | Nov 21, 2022 | named-entity-recognitionNamed Entity Recognition | —Unverified | 0 |
| AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities | Nov 12, 2022 | Contrastive LearningCross-Modal Retrieval | CodeCode Available | 4 |
| Collateral facilitation in humans and language models | Nov 9, 2022 | XLM-R | CodeCode Available | 0 |
| IITD at the WANLP 2022 Shared Task: Multilingual Multi-Granularity Network for Propaganda Detection | Oct 31, 2022 | Multi-Label ClassificationMUlTI-LABEL-ClASSIFICATION | CodeCode Available | 0 |
| Improving Bilingual Lexicon Induction with Cross-Encoder Reranking | Oct 30, 2022 | Bilingual Lexicon InductionCross Encoder Reranking | CodeCode Available | 1 |
| Legal-Tech Open Diaries: Lesson learned on how to develop and deploy light-weight models in the era of humongous Language Models | Oct 24, 2022 | Knowledge DistillationModel Compression | —Unverified | 0 |
| Alibaba-Translate China's Submission for WMT 2022 Quality Estimation Shared Task | Oct 18, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| A Simple and Effective Method to Improve Zero-Shot Cross-Lingual Transfer Learning | Oct 18, 2022 | Cross-Lingual Transfertext-classification | CodeCode Available | 0 |
| Are Pretrained Multilingual Models Equally Fair Across Languages? | Oct 11, 2022 | Cloze TestFairness | CodeCode Available | 0 |
| Disfluency Detection for Vietnamese | Oct 1, 2022 | Vietnamese Word SegmentationXLM-R | CodeCode Available | 0 |
| How about Time? Probing a Multilingual Language Model for Temporal Relations | Oct 1, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Extending Word-Level Quality Estimation for Post-Editing Assistance | Sep 23, 2022 | Word AlignmentXLM-R | —Unverified | 0 |
| SMTCE: A Social Media Text Classification Evaluation Benchmark and BERTology Models for Vietnamese | Sep 21, 2022 | Classificationtext-classification | —Unverified | 0 |
| ALEXSIS-PT: A New Resource for Portuguese Lexical Simplification | Sep 19, 2022 | ArticlesLexical Simplification | —Unverified | 0 |
| From Disfluency Detection to Intent Detection and Slot Filling | Sep 17, 2022 | Intent DetectionLanguage Modeling | CodeCode Available | 0 |
| Multi-stage Distillation Framework for Cross-Lingual Semantic Similarity Matching | Sep 13, 2022 | Contrastive LearningKnowledge Distillation | CodeCode Available | 0 |
| Multilingual Bidirectional Unsupervised Translation Through Multilingual Finetuning and Back-Translation | Sep 6, 2022 | DecoderNMT | CodeCode Available | 0 |
| Neural Approaches to Multilingual Information Retrieval | Sep 3, 2022 | Document TranslationInformation Retrieval | CodeCode Available | 0 |
| Investigating Language Relationships in Multilingual Sentence Encoders Through the Lens of Linguistic Typology | Sep 1, 2022 | SentenceXLM-R | —Unverified | 0 |
| BERTifying Sinhala -- A Comprehensive Analysis of Pre-trained Language Models for Sinhala Text Classification | Aug 16, 2022 | Classificationtext-classification | —Unverified | 0 |
| Massively Multilingual Lexical Specialization of Multilingual Transformers | Aug 1, 2022 | Bilingual Lexicon InductionRetrieval | —Unverified | 0 |
| AsNER -- Annotated Dataset and Baseline for Assamese Named Entity recognition | Jul 7, 2022 | named-entity-recognitionNamed Entity Recognition | —Unverified | 0 |
| Improving Code-Switching Dependency Parsing with Semi-Supervised Auxiliary Tasks | Jul 1, 2022 | Dependency ParsingXLM-R | CodeCode Available | 0 |
| Hitachi at SemEval-2022 Task 2: On the Effectiveness of Span-based Classification Approaches for Multilingual Idiomaticity Detection | Jul 1, 2022 | ClassificationSentence | —Unverified | 0 |
| Sliced at SemEval-2022 Task 11: Bigger, Better? Massively Multilingual LMs for Multilingual Complex NER on an Academic GPU Budget | Jul 1, 2022 | GPUNER | —Unverified | 0 |
| Alexa Teacher Model: Pretraining and Distilling Multi-Billion-Parameter Encoders for Natural Language Understanding Systems | Jun 15, 2022 | Cross-Lingual Natural Language Inferenceintent-classification | —Unverified | 0 |