Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis

2021-11-29CVPR 2022Code Available1· sign in to hype

Yucheng Tang, Dong Yang, Wenqi Li, Holger Roth, Bennett Landman, Daguang Xu, Vishwesh Nath, Ali Hatamizadeh

Code Available — Be the first to reproduce this paper.

Code

github.com/Project-MONAI/research-contributions/tree/master/SwinUNETR
Officialpytorch★ 0
github.com/jusiro/fewshot-finetuning
pytorch★ 43

Abstract

Vision Transformers (ViT)s have shown great performance in self-supervised learning of global and local representations that can be transferred to downstream applications. Inspired by these results, we introduce a novel self-supervised learning framework with tailored proxy tasks for medical image analysis. Specifically, we propose: (i) a new 3D transformer-based model, dubbed Swin UNEt TRansformers (Swin UNETR), with a hierarchical encoder for self-supervised pre-training; (ii) tailored proxy tasks for learning the underlying pattern of human anatomy. We demonstrate successful pre-training of the proposed model on 5,050 publicly available computed tomography (CT) images from various body organs. The effectiveness of our approach is validated by fine-tuning the pre-trained models on the Beyond the Cranial Vault (BTCV) Segmentation Challenge with 13 abdominal organs and segmentation tasks from the Medical Segmentation Decathlon (MSD) dataset. Our model is currently the state-of-the-art (i.e. ranked 1st) on the public test leaderboards of both MSD and BTCV datasets. Code: https://monai.io/research/swin-unetr

Tasks

Anatomy Computed Tomography (CT)Medical Image Analysis Medical Image Segmentation Segmentation Self-Supervised Learning

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
Medical Segmentation Decathlon	Swin UNETR	Dice (Average)	78.68	—	Unverified

Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis

Code

Abstract

Tasks

Benchmark Results

Reproductions