Cross-Lingual Document Classification
Cross-lingual document classification refers to the task of using data and models available for one language for which ample such resources are available (e.g., English) to solve classification tasks in another, commonly low-resource, language.
Papers
Showing 1–10 of 25 papers
All datasetsMLDoc Zero-Shot English-to-FrenchMLDoc Zero-Shot English-to-SpanishMLDoc Zero-Shot English-to-ChineseMLDoc Zero-Shot English-to-GermanMLDoc Zero-Shot English-to-RussianMLDoc Zero-Shot English-to-ItalianMLDoc Zero-Shot English-to-JapaneseReuters RCV1/RCV2 English-to-GermanReuters RCV1/RCV2 German-to-EnglishMLDoc Zero-Shot German-to-French
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | XLMft UDA | Accuracy | 96.05 | — | Unverified |
| 2 | MultiFiT, pseudo | Accuracy | 89.42 | — | Unverified |
| 3 | Massively Multilingual Sentence Embeddings | Accuracy | 77.95 | — | Unverified |
| 4 | BiLSTM (UN) | Accuracy | 74.52 | — | Unverified |
| 5 | BiLSTM (Europarl) | Accuracy | 72.83 | — | Unverified |
| 6 | MultiCCA + CNN | Accuracy | 72.38 | — | Unverified |