SOTAVerified

Multimodal Machine Translation

Multimodal machine translation is the task of doing machine translation with multiple data sources - for example, translating "a bird is flying over water" + an image of a bird over water to German text.

( Image credit: Findings of the Third Shared Task on Multimodal Machine Translation )

Papers

Showing 125 of 108 papers

TitleStatusHype
Seamless: Multilingual Expressive and Streaming Speech TranslationCode6
Attention Is All You NeedCode3
M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-trainingCode1
Tackling Ambiguity with Images: Improved Multimodal Machine Translation and Contrastive EvaluationCode1
Distill the Image to Nowhere: Inversion Knowledge Distillation for Multimodal Machine TranslationCode1
BERTGEN: Multi-task Generation through BERTCode1
Neural Machine Translation with Phrase-Level Universal Visual RepresentationsCode1
CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine TranslationCode1
Scene Graph as Pivoting: Inference-time Image-free Unsupervised Multimodal Machine Translation with Visual Scene HallucinationCode1
Self-Knowledge Distillation with Progressive Refinement of TargetsCode1
VISA: An Ambiguous Subtitles Dataset for Visual Scene-Aware Machine TranslationCode1
BigVideo: A Large-scale Video Subtitle Translation Dataset for Multimodal Machine TranslationCode1
Multimodal Transformer for Multimodal Machine TranslationCode1
On Vision Features in Multimodal Machine TranslationCode1
VALHALLA: Visual Hallucination for Machine TranslationCode1
Cross-lingual Visual Pre-training for Multimodal Machine TranslationCode1
3AM: An Ambiguity-Aware Multi-Modal Machine Translation DatasetCode1
MSCTD: A Multimodal Sentiment Chat Translation DatasetCode1
Dynamic Context-guided Capsule Network for Multimodal Machine TranslationCode1
Multimodal Lexical TranslationCode0
Bridging the Gap between Synthetic and Authentic Images for Multimodal Machine TranslationCode0
Multimodal Machine Translation with Embedding PredictionCode0
Distilling Translations with Visual AwarenessCode0
Beyond Triplet: Leveraging the Most Data for Multimodal Machine TranslationCode0
Multi30K: Multilingual English-German Image DescriptionsCode0
Show:102550
← PrevPage 1 of 5Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1delMeteor (EN-FR)74.6Unverified
2ERNIE-UniX2BLEU (EN-DE)49.3Unverified
3IKD-MMTBLEU (EN-DE)41.28Unverified
4DCCNBLEU (EN-DE)39.7Unverified
5CaglayanBLEU (EN-DE)39.4Unverified
6Gumbel-Attention MMTBLEU (EN-DE)39.2Unverified
7Multimodal TransformerBLEU (EN-DE)38.7Unverified
8ImagiTBLEU (EN-DE)38.4Unverified
9del+objBLEU (EN-DE)38Unverified
10VMMTFBLEU (EN-DE)37.6Unverified
#ModelMetricClaimedVerifiedStatus
1ViTABLEU (EN-HI)51.6Unverified
#ModelMetricClaimedVerifiedStatus
1ViTABLEU (EN-HI)44.6Unverified