Abstract | ||
---|---|---|
Though U-Net has achieved tremendous success in medical image segmentation tasks, it lacks the ability to explicitly model long-range dependencies. Therefore, Vision Transformers have emerged as alternative segmentation structures recently, for their innate ability of capturing long-range correlations through Self-Attention (SA). However, Transformers usually rely on large-scale pre-training and have high computational complexity. Furthermore, SA can only model self-affinities within a single sample, ignoring the potential correlations of the overall dataset. To address these problems, we propose a novel Transformer module named Mixed Transformer Module (MTM) for simultaneous inter- and intra- affinities learning. MTM first calculates self-affinities efficiently through our well-designed Local-Global Gaussian-Weighted Self-Attention (LGG-SA). Then, it mines inter-connections between data samples through External Attention (EA). By using MTM, we construct a U-shaped model named Mixed Transformer U-Net (MT-UNet) for accurate medical image segmentation. We test our method on two different public datasets, and the experimental results show that the proposed method achieves better performance over other state-of-the-art methods. The code is available at: https://github.com/Dootmaan/MT-UNet. |
Year | DOI | Venue |
---|---|---|
2022 | 10.1109/ICASSP43922.2022.9746172 | IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) |
DocType | Citations | PageRank |
Conference | 0 | 0.34 |
References | Authors | |
0 | 7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Hongyi Wang | 1 | 0 | 0.68 |
Shiao Xie | 2 | 0 | 0.68 |
Lanfen Lin | 3 | 4 | 8.67 |
Yutaro Iwamoto | 4 | 13 | 17.95 |
Xian-Hua Han | 5 | 14 | 10.19 |
Yen-Wei Chen | 6 | 720 | 155.73 |
Ruofeng Tong | 7 | 466 | 49.69 |