SOTAVerified

Multimodal Machine Translation

Multimodal machine translation is the task of doing machine translation with multiple data sources - for example, translating "a bird is flying over water" + an image of a bird over water to German text.

( Image credit: Findings of the Third Shared Task on Multimodal Machine Translation )

Papers

Showing 76100 of 108 papers

TitleStatusHype
Supervised Visual Attention for Simultaneous Multimodal Machine Translation0
The AFRL-Ohio State WMT18 Multimodal System: Combining Visual with Traditional0
The AFRL-OSU WMT17 Multimodal Translation System: An Image Processing Approach0
The Case for Evaluating Multimodal Translation Models on Text Datasets0
The MeMAD Submission to the WMT18 Multimodal Translation Task0
TMU Japanese-English Multimodal Machine Translation System for WAT 20200
Understanding the Effect of Textual Adversaries in Multimodal Machine Translation0
Transformer-based Cascaded Multimodal Speech Translation0
MSVD-Turkish: A Comprehensive Multimodal Dataset for Integrated Vision and Language Research in Turkish0
Multilingual Multimodal Machine Translation for Dravidian Languages utilizing Phonetic Transcription0
Multimodal Machine Translation through Visuals and Speech0
Multimodal Machine Translation with Reinforcement Learning0
Multimodal Machine Translation with Visual Scene Graph Pruning0
Latent Variable Model for Multi-modal TranslationCode0
Video-Helpful Multimodal Machine TranslationCode0
Multi30K: Multilingual English-German Image DescriptionsCode0
A Visual Attention Grounding Neural Model for Multimodal Machine TranslationCode0
Multimodal Lexical TranslationCode0
Does Multimodality Help Human and Machine for Translation and Image Captioning?Code0
Distilling Translations with Visual AwarenessCode0
Multimodal Machine Translation with Embedding PredictionCode0
Cultural and Geographical Influences on Image Translatability of Words across LanguagesCode0
Vision Matters When It Should: Sanity Checking Multimodal Machine Translation ModelsCode0
Incorporating Probing Signals into Multimodal Machine Translation via Visual Question-Answering PairsCode0
TopicVD: A Topic-Based Dataset of Video-Guided Multimodal Machine Translation for DocumentariesCode0
Show:102550
← PrevPage 4 of 5Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1delMeteor (EN-FR)74.6Unverified
2ERNIE-UniX2BLEU (EN-DE)49.3Unverified
3IKD-MMTBLEU (EN-DE)41.28Unverified
4DCCNBLEU (EN-DE)39.7Unverified
5CaglayanBLEU (EN-DE)39.4Unverified
6Gumbel-Attention MMTBLEU (EN-DE)39.2Unverified
7Multimodal TransformerBLEU (EN-DE)38.7Unverified
8ImagiTBLEU (EN-DE)38.4Unverified
9del+objBLEU (EN-DE)38Unverified
10VMMTFBLEU (EN-DE)37.6Unverified
#ModelMetricClaimedVerifiedStatus
1ViTABLEU (EN-HI)51.6Unverified
#ModelMetricClaimedVerifiedStatus
1ViTABLEU (EN-HI)44.6Unverified