Suat TORAMAN, Bihter DAŞ
Detection of genomic markers associated with cancer accurately is a computationally challenging task. Most cancer prediction studies are based on the digitization of cancer gene sequences, extracting a large number of features, and then applying classification approaches to predict cancer. With the recent advances in the genomic fields, researchers have applied many approaches to DNA data for extracting the hidden features and periodicities within the DNA sequences. In this study, an algorithm based on the convolutional neural network (CNN) is presented for the classification of breast cancer gene sequences and healthy DNA sequences without the need for manual feature extraction and feature selection process. The proposed method consists of four stages. In the first stage, DNA gene sequences have been digitized using two different numerical mapping techniques (EIIP and Entropy-based). In the second stage, the digitized DNA gen sequences have been converted into spectrograms. Thus, the data has been moved into two-dimensional space. In the third stage, features have been extracted from the spectrogram images. ResNet, which is one of the pre-trained CNN models, have been used for feature extraction. 2048 dimensional feature vectors have been obtained for each image. Finally, feature vectors have been classified by support vector machines (SVM). In the proposed method, the accuracy of 98.93% have been achieved with the Entropy-based numerical mapping technique and the accuracy of 99.00% with the EIIP technique. The results show that the proposed method could be used in the classification or analysis of DNA gene sequences.

Anahtar Kelimeler: Cancer Gene, CNN, Deep Learning, DNA Signals