- Title
- A Novel Vision Transformer Model for Skin Cancer Classification
- Creator
- Yang, Guang; Luo, Suhuai; Greer, Peter
- Relation
- Neural Processing Letters Vol. 55, Issue 27 March 2023, p. 9335-9351
- Publisher Link
- http://dx.doi.org/10.1007/s11063-023-11204-5
- Publisher
- Springer
- Resource Type
- journal article
- Date
- 2023
- Description
- Skin cancer can be fatal if it is found to be malignant. Modern diagnosis of skin cancer heavily relies on visual inspection through clinical screening, dermoscopy, or histopathological examinations. However, due to similarity among cancer types, it is usually challenging to identify the type of skin cancer, especially at its early stages. Deep learning techniques have been developed over the last few years and have achieved success in helping to improve the accuracy of diagnosis and classification. However, the latest deep learning algorithms still do not provide ideal classification accuracy. To further improve the performance of classification accuracy, this paper presents a novel method of classifying skin cancer in clinical skin images. The method consists of four blocks. First, class rebalancing is applied to the images of seven skin cancer types for better classification performance. Second, an image is preprocessed by being split into patches of the same size and then flattened into a series of tokens. Third, a transformer encoder is used to process the flattened patches. The transformer encoder consists of N identical layers with each layer containing two sublayers. Sublayer one is a multihead self-attention unit, and sublayer two is a fully connected feed-forward network unit. For each of the two sublayers, a normalization operation is applied to its input, and a residual connection of its input and its output is calculated. Finally, a classification block is implemented after the transformer encoder. The block consists of a flattened layer and a dense layer with batch normalization. Transfer learning is implemented to build the whole network, where the ImageNet dataset is used to pretrain the network and the HAM10000 dataset is used to fine-tune the network. Experiments have shown that the method has achieved a classification accuracy of 94.1%, outperforming the current state-of-the-art model IRv2 with soft attention on the same training and testing datasets. On the Edinburgh DERMOFIT dataset also, the method has better performance compared with baseline models.
- Subject
- skin cancer classification; deep learning; transformer; image processing; neural networks; SDG 3; Sustainable Development Goals
- Identifier
- http://hdl.handle.net/1959.13/1494304
- Identifier
- uon:53760
- Identifier
- ISSN:1370-4621
- Rights
- This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
- Language
- eng
- Full Text
- Reviewed
- Hits: 1215
- Visitors: 1244
- Downloads: 34
Thumbnail | File | Description | Size | Format | |||
---|---|---|---|---|---|---|---|
View Details Download | ATTACHMENT02 | Publisher version (open access) | 716 KB | Adobe Acrobat PDF | View Details Download |