Attention-Based Multimodal Image Matching

2021-03-20Code Available1· sign in to hype

Aviad Moreshet, Yosi Keller

Code Available — Be the first to reproduce this paper.

Code

github.com/CodeJjang/multiscale-attention-patch-matching
OfficialIn paperpytorch★ 13

Abstract

We propose an attention-based approach for multimodal image patch matching using a Transformer encoder attending to the feature maps of a multiscale Siamese CNN. Our encoder is shown to efficiently aggregate multiscale image embeddings while emphasizing task-specific appearance-invariant image cues. We also introduce an attention-residual architecture, using a residual connection bypassing the encoder. This additional learning signal facilitates end-to-end training from scratch. Our approach is experimentally shown to achieve new state-of-the-art accuracy on both multimodal and single modality benchmarks, illustrating its general applicability. To the best of our knowledge, this is the first successful implementation of the Transformer encoder architecture to the multimodal image patch matching task.

Tasks

Multimodal Patch Matching Patch Matching

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
Brown Dataset	Multiscale Transformer Encoder	FPR95	0.9	—	Unverified

Attention-Based Multimodal Image Matching

Code

Abstract

Tasks

Benchmark Results

Reproductions