SOTAVerified

Multimodal Machine Translation

Multimodal machine translation is the task of doing machine translation with multiple data sources - for example, translating "a bird is flying over water" + an image of a bird over water to German text.

( Image credit: Findings of the Third Shared Task on Multimodal Machine Translation )

Papers

Showing 150 of 108 papers

TitleStatusHype
CaMMT: Benchmarking Culturally Aware Multimodal Machine Translation0
Multimodal Machine Translation with Visual Scene Graph Pruning0
TopicVD: A Topic-Based Dataset of Video-Guided Multimodal Machine Translation for DocumentariesCode0
Florenz: Scaling Laws for Systematic Generalization in Vision-Language Models0
Make Imagination Clearer! Stable Diffusion-based Visual Imagination for Multimodal Machine Translation0
EMMeTT: Efficient Multimodal Machine Translation Training0
Towards Zero-Shot Multimodal Machine TranslationCode0
3AM: An Ambiguity-Aware Multi-Modal Machine Translation DatasetCode1
Exploring the Necessity of Visual Modality in Multimodal Machine Translation using Authentic Datasets0
The Case for Evaluating Multimodal Translation Models on Text Datasets0
Adding Multimodal Capabilities to a Text-only Translation Model0
Detecting Concrete Visual Tokens for Multimodal Machine Translation0
Seamless: Multilingual Expressive and Streaming Speech TranslationCode6
Video-Helpful Multimodal Machine TranslationCode0
Incorporating Probing Signals into Multimodal Machine Translation via Visual Question-Answering PairsCode0
Bridging the Gap between Synthetic and Authentic Images for Multimodal Machine TranslationCode0
CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine TranslationCode1
A Survey of Vision-Language Pre-training from the Lens of Multimodal Machine Translation0
HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa LanguageCode0
BigVideo: A Large-scale Video Subtitle Translation Dataset for Multimodal Machine TranslationCode1
Scene Graph as Pivoting: Inference-time Image-free Unsupervised Multimodal Machine Translation with Visual Scene HallucinationCode1
Iterative Adversarial Attack on Image-guided Story Ending Generation0
Generalization algorithm of multimodal pre-training model based on graph-text self-supervised training0
Tackling Ambiguity with Images: Improved Multimodal Machine Translation and Contrastive EvaluationCode1
Beyond Triplet: Leveraging the Most Data for Multimodal Machine TranslationCode0
ERNIE-UniX2: A Unified Cross-lingual Cross-modal Framework for Understanding and Generation0
LVP-M3: Language-aware Visual Prompt for Multilingual Multimodal Machine Translation0
Increasing Visual Awareness in Multimodal Neural Machine Translation from an Information Theoretic Perspective0
Distill the Image to Nowhere: Inversion Knowledge Distillation for Multimodal Machine TranslationCode1
VALHALLA: Visual Hallucination for Machine TranslationCode1
Neural Machine Translation with Phrase-Level Universal Visual RepresentationsCode1
On Vision Features in Multimodal Machine TranslationCode1
MSCTD: A Multimodal Sentiment Chat Translation DatasetCode1
Supervised Visual Attention for Simultaneous Multimodal Machine Translation0
VISA: An Ambiguous Subtitles Dataset for Visual Scene-Aware Machine TranslationCode1
On Vision Features in Multimodal Machine Translation0
Vision Matters When It Should: Sanity Checking Multimodal Machine Translation ModelsCode0
Multimodal Neural Machine Translation System for English to Bengali0
Low Resource Multimodal Neural Machine Translation of English-Hindi in News Domain0
Experiences of Adapting Multimodal Machine Translation Techniques for Hindi0
Make the Blind Translator See The World: A Novel Transfer Learning Solution for Multimodal Machine Translation0
Rakuten’s Participation in WAT 2021: Examining the Effectiveness of Pre-trained Models for Multilingual and Multimodal Machine Translation0
BERTGEN: Multi-task Generation through BERTCode1
Cultural and Geographical Influences on Image Translatability of Words across LanguagesCode0
ViTA: Visual-Linguistic Translation by Aligning Object TagsCode0
Good for Misconceived Reasons: An Empirical Revisiting on the Need for Visual Context in Multimodal Machine Translation0
Gumbel-Attention for Multi-modal Machine Translation0
Cross-lingual Visual Pre-training for Multimodal Machine TranslationCode1
Good for Misconceived Reasons: Revisiting Neural Multimodal Machine Translation0
Efficient Object-Level Visual Context Modeling for Multimodal Machine Translation: Masking Irrelevant Objects Helps Grounding0
Show:102550
← PrevPage 1 of 3Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1delMeteor (EN-FR)74.6Unverified
2ERNIE-UniX2BLEU (EN-DE)49.3Unverified
3IKD-MMTBLEU (EN-DE)41.28Unverified
4DCCNBLEU (EN-DE)39.7Unverified
5CaglayanBLEU (EN-DE)39.4Unverified
6Gumbel-Attention MMTBLEU (EN-DE)39.2Unverified
7Multimodal TransformerBLEU (EN-DE)38.7Unverified
8ImagiTBLEU (EN-DE)38.4Unverified
9del+objBLEU (EN-DE)38Unverified
10VMMTFBLEU (EN-DE)37.6Unverified
#ModelMetricClaimedVerifiedStatus
1ViTABLEU (EN-HI)51.6Unverified
#ModelMetricClaimedVerifiedStatus
1ViTABLEU (EN-HI)44.6Unverified