Diagnosing COVID-19 with Xception + Bi-LSTM: Detecting Anomalies with Grad-CAM

: To successfully contain the spread of COVID-19, the importance of swift and accurate testing is unparalleled. Currently, PCR tests are most commonly utilized to detect COVID-19, yet these tests typically consume 24 hours—not a short period of time. Hence, new deep learning algorithms have been under development to accurately and quickly detect COVID-19. With this aim, we have proposed a deep learning model to determine the presence of COVID-19 using X-ray images by combining Xception with Bi-LSTM. Altering the output from the Xception network into a three-dimensional shape rendered the ensuing Bi-LSTM network compatible. Consequently, the novel model yielded a high accuracy rate of 98.5%, one greater than the accuracy rates of VGG16, Densenet, Mobilenet, Mobilenet_v2, Resnet50, and DNN models. Moreover, with the creation of a heatmap, by using a Class Activation Map, our model could specifically locate the anomaly. However, our model could not yield high accuracy when we applied it to the lung ct scan dataset. Even though training and validation accuracy kept rising, the test accuracy was far lower than them. Furthermore, with limitations including a small sample size, inflated accuracy rates for binary classification, and incompatibility with CT images, follow-up research will need to ensue to perfect the model at hand.


Background
The coronavirus (COVID-19) pandemic has had unprecedented impacts on modern society as the virus has spread across the world. COVID-19 is an infectious respiratory disease that carries symptoms including fevers, dry coughs, and tiredness [1]. According to the World Health Organization, as of April 1st, 2021, there are more than 125 million infected individuals along with 3 million deaths worldwide [2]. Adding to the concern, in December, 2020, a new variant of COVID-19 was detected in nations including the United Kingdom, South Africa, and Brazil. The variant strain of COVID-19 with greater potency and transmissibility amongst humans [3]. In such continued battle against COVID-19, health experts have constantly avowed that accurate, timely detection is critical to controlling the rapid spread of the virus. Despite the fact that 462 million individuals have been vaccinated as of April 1, 2021, health experts proclaim that until herd immunity is established, vaccination still remains important. The most common method worldwide of diagnosing COVID-19 is the polymerase chain reaction (PCR) test-alternately known as the nucleic acid amplification tests (NAAT). After extracting specimens from an individual's nose or mouth, the PCR test transforms the virus's ribosomal nucleic acid (RNA), if present, into deoxyribonucleic acid (DNA). Then, the DNA is amplified and tested for the presence of any viral material within the DNA; the test typically consumes 24 hours to produce accurate results [4]. However, recently, the advent of COVID-19 diagnostics using artificial intelligence has allowed for timely, precise testing of the virus. Utilizing data from ct or x-ray scans, deep learning models have shown the potential to determine whether an individual is infected with COVID-19 in a shorter period of time while maintaining a high accuracy [5].

Objective
The purpose of this paper is to put forth a deep learning model utilizing CNN and DNN to accurately detect whether an individual has been infected with COVID-19 by automatically analyzing x-ray scans. While deep learning based COVID-19 diagnostic algorithms exist, there is still great need to improve such models as most possess poor accuracy scores. For instance, Aras M. Ismael and Abdulkadir Sengir presented three deep CNN approaches to detect COVID-19. However, the accuracy score of these deep learning techniques linger around 90% [6]. Specifically, ResNet50 Features + SVM produced an accuracy score of 94.7%, Fine-tuning of ResNet 50 yielded 92.6%, End-to-end training of CNN produced 91.6%, and BSIF + SVM yielded 90.5% [6]. In addition to aforementioned problems, Ismael and Sengir failed to provide heat maps demonstrating which areas of the X-ray images depicted showings of COVID-19 [6]. Heat maps allow for deep learning machines to specifically identify which parts of the lung show signs of COVID-19, which can help demonstrate overall trends [6]. Hence, in order to improve upon the existing deep learning mechanisms used to identify COVID-19 in both accuracy and specificity, we began developing a novel CNN deep learning algorithm.

Related Works
Recently, with the help of data science and different machine learning techniques, researchers have been able to use CT and X-ray scans of lungs to distinguish between COVID positive and negative. Islam et al. used readily available X-ray scans as datasets. The team used CNN and LSTM to determine the training accuracy of each dataset. These images consisted of 613 X-ray scans of COVID-19 cases, 1525 images of pneumonia cases, and 1525 images of COVID-19 and pneumonia negative cases. Overall, they obtained an accuracy of 98.5%, specificity of 98.2%, and sensitivity of 99.0% for COVID-19 cases [7]. Hassantabar et al. took into consideration the extensive load of layers and memory required for this task. Therefore, they used both the DNN and CNN to determine the accuracy of their dataset. The data consisted of 613 MRI images, which were classified into COVID positive and negative. Overall, the CNN method was able to produce a higher accuracy and sensitivity than the DNN with an accuracy of 93.2% and sensitivity of 96.1% [8]. Khadidos et al. used a collective dataset of 349 lung CT scans of COVID-19 positive patients and 463 lung CT scans of COVID-19 negative patients. In order to achieve the highest possible accuracy score from their dataset, the team designed a DeepSense Method. The DeepSense Method is a combination of CNN and DNN that eases the process of input classification. Ultimately, the team's use of the DeepSense Method produced an accuracy of 96.69% and a sensitivity of 84.06% [9]. Ahuja et al. used 349 lung CT scans of COVID-19 positive patients and 397 lung CT scans of COVID-19 negative patients. The team used ResNet18, ResNet50, ResNet101, and SqueezeNet for their classification process. The highest classification accuracy derived from the ResNet18 classification, producing an accuracy of 99.82% and validation of 97.32% [10]. Bukhari et al. used 89 X-ray scans of COVID-19 positive patients, 96 X-ray scans of patients diagnosed with pneumonia, and 93 Xray scans of healthy patients. The team ultimately used a ResNet-50 structure to produce an accuracy of 98.18% [11].

Data Description
The first dataset is about x-ray images consisted of 74 images each in the train dataset, while the test dataset consisted of 20 images each from kaggle, which is available on https://www.kaggle.com/khoongweihao/covid19-xraydataset-train-test-sets [12]. The second dataset is about COVID-19 lung ct scans with 397 normal data and 349 COVID-19 data, which is available on https://www.kaggle.com/luisblanche/covidct?select=CT_No nCOVID [13]. 70% of them were used for train dataset and 30% for test dataset.

LSTM
Long short-term memory (LSTM) is one of the representative recurrent neural networks (RNN) which belongs to deep learning algorithms. Deep neural network (DNN) is basically a one-way network which allows the input data to pass through the neural network once. On the other hand, the architecture of RNN networks is far different from DNN. While DNN has a one way architecture, weight from the nodes in RNN turns into an input for the same nodes, and therefore, it is called "recurrent". However, as RNN models have a significant drawback, which is a vanishing gradient problem and therefore, LSTM overcomes the drawbacks with a 'memory cell', which can preserve the information for long periods of time. Therefore, LSTM consists of an input gate, an output gate, and a "forget" gate [14].

Bi-LSTM
While the DNN can not preserve the past information in the hidden layers because of the one way structure, RNN and LSTM can preserve. It means that the output weight from the hidden layers influence each other. However, even though those models are recurrent one, the calculation in hidden layers is conducted through one way direction. This one way calculation has become a significant drawback of RNN and LSTM. Therefore, Bidirectional -LSTM (Bi-LSTM) solves this issue by training the model in bi directional ways, which include forward and reverse directions. This two way calculation makes Bi-LSTM perform end-to-end learning, which has the advantage of minimizing the loss through training the whole parameters [15].

Xception
The Xception model was announced in 2016, based on the Inception model. While the Inception model focuses on separating cross-channel correlation and spatial correlation, the Xception focuses on maximizing them through optimizing the hyper parameters. In order to optimize them, the Xception model is composed of a depthwise separable convolution layer and residual connections. The depthwise separable convolution differs from the other convolution as the order of 3x3 operation after 1x1 operation was changed to 1x1 operation after 3x3 operation. 1x1 operation is known as point wise operation, while 3x3 is channel-wise one [16].

Proposed Model
Our proposed model consists of the Xception model and Bi-LSTM model. When the images were inserted as an input data, the Xception model firstly extracted the features from the input images as Convolutional Neural Network (CNN) does. Then, unlike the normal CNN or other pretrained models such as Xception, Inception, Mobilenet or Densenet, our proposed model utilized the Bi-LSTM model for better classification. To combine the Bi-LSTM model with Xception, we reshaped the output material of the Xception into a three-dimensional size. The reason why we used a bidirectional one is because previous research proved that the bidirectional one performs better than one way directional one.  Figure 10. Graph for accuracy Comparison.

Result for X-ray Data
These are the results of comparison between the accuracy of different models of deep learning in detecting whether an individual has been infected with COVID-19. We tried out with 7 different types of models: vgg16 (96%), densenet (96.3%), mobilenet (95.2%), mobilenet_V2 (93.4%), resnet50 (94.4%), dnn (73%), and xception + Bi-LSTM (98.5%) -our proposed model. As shown in Figure 10, 'dnn model' showed the lowest accuracy with 84% and our proposed model 'xception + Bi-LSTM' showed the highest accuracy with 98.5%. According to the generation done with ct scan data, accuracy came out as 48%. Below (Figure 14 & Figure 15) are the graphs drawn based on loss and accuracy. Also, as shown in FIgures 12 and 13, we found anomaly parts of the COVID-19 Xray image through class activation map. With this heat map, the red color shows us the most anomalous one and the purple shows the normal one.

Result for CT-scan Data
We tried CT-scan data with our proposed model. However, the accuracy of that data was 48% which is much lower than Xray data. Figures 14 and 15 showed us the graph for loss and accuracy of training and validation set. They let us know that the overfitting occured as accuracy of training set and validation set was far higher than that of the test set.

Conclusion
In this work, we proposed a new deep learning model in efforts to improve upon the accuracies of the preexisting models. Our model combines the Xception model and the Bidirectional -LSTM (Bi-LSTM) model in a way that the Xception model is in charge of extracting the features from the input images, while the Bi-LSTM model deals with classifying the images. Using a dataset consisting images of x-ray scans, we evaluated the accuracies for each models to be 96% for vgg16, 96.3% for densenet, 95.2% for mobilenet, 93.4% for mobilenet_v2, 94.4% for resnet50, 73% for dnn, and 98.5% for the Xception + Bi-LSTM model. With the ct scan data, our model produced an accuracy value of 48.0%.

Outlook
Our proposed deep learning model Xception + Bi-LSTM proved efficacious in determining whether an individual has COVID-19 through his or her X-ray scan. Xception + Bi-LSTM yielded an accuracy rate of 98.5%, a higher accuracy rate compared to VGG16, Densenet, Mobilenet, Mobilenet_v2, Resnet50, and DNN models. In addition, utilizing the Class Activation function, we were able to create a heat map illustrating the anomalies for those with COVID-19. Thus, our proposed model for COVID-19 detection improved on various aspects of preexisting models through increasing both accuracy and specificity of detection. However, our model also has certain limitations. For one, before augmentation, 74 X-ray images were utilized for training our model and 20 X-ray images were used to test our model. Hence, a larger data size may be necessary to gain insight as to whether our model consistently yields high accuracy rates. Additionally, as our deep learning model was utilized for binary classification, by default, the margin of error would have been lower than for multiclass classification models. Hence, our model's high accuracy rate may have been inflated to an extent by the fact that binary classifications tend to have lower margins of error. Finally, when given CT scan images instead of X-ray images, our deep learning model Xception + Bi-LSTM yielded a low accuracy rate of 48.0%. Therefore, in our follow-up research, we will seek to address this issue, and increase the accuracy rate of our deep learning model for detecting COVID-19 using CT images.