SOTAVerified

Multimodal Machine Translation

Multimodal machine translation is the task of doing machine translation with multiple data sources - for example, translating "a bird is flying over water" + an image of a bird over water to German text.

( Image credit: Findings of the Third Shared Task on Multimodal Machine Translation )

Papers

Showing 150 of 108 papers

TitleStatusHype
Seamless: Multilingual Expressive and Streaming Speech TranslationCode6
Attention Is All You NeedCode3
Tackling Ambiguity with Images: Improved Multimodal Machine Translation and Contrastive EvaluationCode1
BERTGEN: Multi-task Generation through BERTCode1
Self-Knowledge Distillation with Progressive Refinement of TargetsCode1
On Vision Features in Multimodal Machine TranslationCode1
M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-trainingCode1
MSCTD: A Multimodal Sentiment Chat Translation DatasetCode1
CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine TranslationCode1
Dynamic Context-guided Capsule Network for Multimodal Machine TranslationCode1
BigVideo: A Large-scale Video Subtitle Translation Dataset for Multimodal Machine TranslationCode1
VALHALLA: Visual Hallucination for Machine TranslationCode1
Distill the Image to Nowhere: Inversion Knowledge Distillation for Multimodal Machine TranslationCode1
Scene Graph as Pivoting: Inference-time Image-free Unsupervised Multimodal Machine Translation with Visual Scene HallucinationCode1
Cross-lingual Visual Pre-training for Multimodal Machine TranslationCode1
Neural Machine Translation with Phrase-Level Universal Visual RepresentationsCode1
Multimodal Transformer for Multimodal Machine TranslationCode1
3AM: An Ambiguity-Aware Multi-Modal Machine Translation DatasetCode1
VISA: An Ambiguous Subtitles Dataset for Visual Scene-Aware Machine TranslationCode1
Bridging the Gap between Synthetic and Authentic Images for Multimodal Machine TranslationCode0
Beyond Triplet: Leveraging the Most Data for Multimodal Machine TranslationCode0
Distilling Translations with Visual AwarenessCode0
Towards Zero-Shot Multimodal Machine TranslationCode0
HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa LanguageCode0
Multimodal Lexical TranslationCode0
Video-Helpful Multimodal Machine TranslationCode0
UMONS Submission for WMT18 Multimodal Translation TaskCode0
Vision Matters When It Should: Sanity Checking Multimodal Machine Translation ModelsCode0
Findings of the Third Shared Task on Multimodal Machine TranslationCode0
TopicVD: A Topic-Based Dataset of Video-Guided Multimodal Machine Translation for DocumentariesCode0
ViTA: Visual-Linguistic Translation by Aligning Object TagsCode0
Multimodal Machine Translation with Embedding PredictionCode0
A Visual Attention Grounding Neural Model for Multimodal Machine TranslationCode0
Multi30K: Multilingual English-German Image DescriptionsCode0
Latent Variable Model for Multi-modal TranslationCode0
Incorporating Probing Signals into Multimodal Machine Translation via Visual Question-Answering PairsCode0
Cultural and Geographical Influences on Image Translatability of Words across LanguagesCode0
Efficient Object-Level Visual Context Modeling for Multimodal Machine Translation: Masking Irrelevant Objects Helps Grounding0
Adaptive Fusion Techniques for Multimodal Data0
CaMMT: Benchmarking Culturally Aware Multimodal Machine Translation0
Doubly Attentive Transformer Machine Translation0
A Survey of Vision-Language Pre-training from the Lens of Multimodal Machine Translation0
Doubly-Attentive Decoder for Multi-modal Neural Machine Translation0
Does Multimodality Help Human and Machine for Translation and Image Captioning?0
Gumbel-Attention for Multi-modal Machine Translation0
Grounded Word Sense Translation0
A Shared Task on Multimodal Machine Translation and Crosslingual Image Description0
A Dataset and Reranking Method for Multimodal MT of User-Generated Image Captions0
Good for Misconceived Reasons: Revisiting Neural Multimodal Machine Translation0
Good for Misconceived Reasons: An Empirical Revisiting on the Need for Visual Context in Multimodal Machine Translation0
Show:102550
← PrevPage 1 of 3Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1delMeteor (EN-FR)74.6Unverified
2ERNIE-UniX2BLEU (EN-DE)49.3Unverified
3IKD-MMTBLEU (EN-DE)41.28Unverified
4DCCNBLEU (EN-DE)39.7Unverified
5CaglayanBLEU (EN-DE)39.4Unverified
6Gumbel-Attention MMTBLEU (EN-DE)39.2Unverified
7Multimodal TransformerBLEU (EN-DE)38.7Unverified
8ImagiTBLEU (EN-DE)38.4Unverified
9del+objBLEU (EN-DE)38Unverified
10VMMTFBLEU (EN-DE)37.6Unverified
#ModelMetricClaimedVerifiedStatus
1ViTABLEU (EN-HI)51.6Unverified
#ModelMetricClaimedVerifiedStatus
1ViTABLEU (EN-HI)44.6Unverified