| Prototypical Verbalizer for Prompt-based Few-shot Tuning | Mar 18, 2022 | Contrastive LearningEntity Typing | CodeCode Available | 4 |
| Language Through a Prism: A Spectral Approach for Multiscale Language Representations | Nov 9, 2020 | Part-Of-Speech TaggingTopic Classification | CodeCode Available | 1 |
| 2kenize: Tying Subword Sequences for Chinese Script Conversion | May 7, 2020 | General ClassificationTopic Classification | CodeCode Available | 1 |
| SynthesizRR: Generating Diverse Datasets with Retrieval Augmentation | May 16, 2024 | Bias DetectionDiversity | CodeCode Available | 1 |
| DocSCAN: Unsupervised Text Classification via Learning from Neighbors | May 9, 2021 | ClassificationClustering | CodeCode Available | 1 |
| Hierarchical Multi-Label Classification of Scientific Documents | Nov 5, 2022 | ClassificationHierarchical Multi-label Classification | CodeCode Available | 1 |
| Hierarchical Transformers for Long Document Classification | Oct 23, 2019 | ClassificationDocument Classification | CodeCode Available | 1 |
| MultiEURLEX -- A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer | Sep 2, 2021 | Cross-Lingual TransferDocument Classification | CodeCode Available | 1 |
| SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects | Sep 14, 2023 | Cross-Lingual TransferLanguage Modelling | CodeCode Available | 1 |
| Cross-Lingual Adaptation using Structural Correspondence Learning | Aug 4, 2010 | ClassificationDomain Adaptation | CodeCode Available | 1 |
| Polyglot Prompt: Multilingual Multitask PrompTraining | Apr 29, 2022 | named-entity-recognitionNamed Entity Recognition | CodeCode Available | 1 |
| GrEmLIn: A Repository of Green Baseline Embeddings for 87 Low-Resource Languages Injected with Multilingual Graph Knowledge | Sep 26, 2024 | Natural Language InferenceSentiment Analysis | CodeCode Available | 1 |
| HUE: Pretrained Model and Dataset for Understanding Hanja Documents of Ancient Korea | Oct 11, 2022 | named-entity-recognitionNamed Entity Recognition | CodeCode Available | 1 |
| Explaining NLP Models via Minimal Contrastive Editing (MiCE) | Dec 27, 2020 | counterfactualMultiple-choice | CodeCode Available | 1 |
| L3Cube-IndicNews: News-based Short Text and Long Document Classification Datasets in Indic Languages | Jan 4, 2024 | ArticlesClassification | CodeCode Available | 1 |
| KLUE: Korean Language Understanding Evaluation | May 20, 2021 | Dependency ParsingDialogue State Tracking | CodeCode Available | 1 |
| LexC-Gen: Generating Data for Extremely Low-Resource Languages with Large Language Models and Bilingual Lexicons | Feb 21, 2024 | Sentiment AnalysisTopic Classification | CodeCode Available | 1 |
| Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering | Jul 6, 2021 | Active LearningObject Recognition | CodeCode Available | 1 |
| Newswire: A Large-Scale Structured Database of a Century of Historical News | Jun 13, 2024 | ArticlesEntity Disambiguation | CodeCode Available | 1 |
| Revisiting LSTM Networks for Semi-Supervised Text Classification via Mixed Objective Function | Sep 8, 2020 | ClassificationGeneral Classification | CodeCode Available | 1 |
| TEMPERA: Test-Time Prompting via Reinforcement Learning | Nov 21, 2022 | Few-Shot LearningNatural Language Inference | CodeCode Available | 1 |
| Zero-Shot Text Classification via Self-Supervised Tuning | May 19, 2023 | ClassificationSelf-Supervised Learning | CodeCode Available | 1 |
| MultiEURLEX - A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer | Nov 1, 2021 | Cross-Lingual TransferDocument Classification | CodeCode Available | 1 |
| Adapting Pre-trained Language Models to African Languages via Multilingual Adaptive Fine-Tuning | Apr 13, 2022 | Cross-Lingual TransferLanguage Modelling | CodeCode Available | 1 |
| Entailment as Few-Shot Learner | Apr 29, 2021 | Contrastive LearningData Augmentation | CodeCode Available | 1 |
| MasakhaNEWS: News Topic Classification for African languages | Apr 19, 2023 | ClassificationFew-Shot Learning | CodeCode Available | 1 |
| Label Semantic Aware Pre-training for Few-shot Text Classification | Apr 14, 2022 | ClassificationFew-Shot Text Classification | CodeCode Available | 1 |
| In-Context Learning with Iterative Demonstration Selection | Oct 15, 2023 | Few-Shot LearningIn-Context Learning | CodeCode Available | 1 |
| Baselines and Bigrams: Simple, Good Sentiment and Topic Classification | Jul 1, 2012 | ClassificationGeneral Classification | —Unverified | 0 |
| Analysis of Policy Agendas: Lessons Learned from Automatic Topic Classification of Croatian Political Texts | Aug 1, 2016 | Decision MakingGeneral Classification | —Unverified | 0 |
| AWS CORD-19 Search: A Neural Search Engine for COVID-19 Literature | Jul 17, 2020 | ArticlesDocument Ranking | —Unverified | 0 |
| Evaluating Pixel Language Models on Non-Standardized Languages | Dec 12, 2024 | Dependency ParsingIntent Detection | —Unverified | 0 |
| Expanding the Text Classification Toolbox with Cross-Lingual Embeddings | Mar 23, 2019 | ClassificationGeneral Classification | —Unverified | 0 |
| Few-Shot Cross-Lingual Transfer for Prompting Large Language Models in Low-Resource Languages | Mar 9, 2024 | Abstractive Text SummarizationCross-Lingual Transfer | —Unverified | 0 |
| A Multilingual Bag-of-Entities Model for Zero-Shot Cross-Lingual Text Classification | Nov 16, 2021 | ClassificationEntity Typing | —Unverified | 0 |
| Attention-Enhancing Backdoor Attacks Against BERT-based Models | Oct 23, 2023 | Sentiment AnalysisTopic Classification | —Unverified | 0 |
| A Multilingual Bag-of-Entities Model for Zero-Shot Cross-Lingual Text Classification | Oct 15, 2021 | ClassificationEntity Typing | —Unverified | 0 |
| A Statistical Theory of Contrastive Learning via Approximate Sufficient Statistics | Mar 21, 2025 | Contrastive LearningData Augmentation | —Unverified | 0 |
| Embracing Error to Enable Rapid Crowdsourcing | Feb 14, 2016 | General ClassificationSentiment Analysis | —Unverified | 0 |
| Estimating Confidence of Predictions of Individual Classifiers and TheirEnsembles for the Genre Classification Task | Jun 1, 2022 | Genre classificationtext-classification | —Unverified | 0 |
| From Measurement Instruments to Data: Leveraging Theory-Driven Synthetic Training Data for Classifying Social Constructs | Oct 16, 2024 | Classificationtext-classification | —Unverified | 0 |
| Cross-Lingual Classification of Topics in Political Texts | Aug 1, 2017 | ClassificationGeneral Classification | —Unverified | 0 |
| Assessing In-context Learning and Fine-tuning for Topic Classification of German Web Data | Jul 23, 2024 | Binary ClassificationIn-Context Learning | —Unverified | 0 |
| Co-Training for Topic Classification of Scholarly Data | Sep 1, 2015 | ClassificationGeneral Classification | —Unverified | 0 |
| A Soft Contrastive Learning-based Prompt Model for Few-shot Sentiment Analysis | Dec 16, 2023 | ClassificationContrastive Learning | —Unverified | 0 |
| CTM - A Model for Large-Scale Multi-View Tweet Topic Classification | Nov 16, 2021 | ClassificationTopic Classification | —Unverified | 0 |
| CTM -- A Model for Large-Scale Multi-View Tweet Topic Classification | May 3, 2022 | ClassificationTopic Classification | —Unverified | 0 |
| CTM - A Model for Large-Scale Multi-View Tweet Topic Classification | Jul 1, 2022 | ClassificationTopic Classification | —Unverified | 0 |
| Data Sets: Word Embeddings Learned from Tweets and General Data | Aug 14, 2017 | ArticlesSentiment Analysis | —Unverified | 0 |
| American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers | Aug 24, 2023 | ArticlesLanguage Modeling | —Unverified | 0 |