Finding the Best Performing Pre-Trained CNN Model for Image Classification: Using a Class Activation Map to Spot Abnormal Parts in Diabetic Retinopathy Image

: Diabetic retinopathy (DR) is a common eye disease that people get from diabetes. About 33.7% of the people with diabetes have DR. With our datas, which are pictures of the eyeball with and without DR, we tried different convolutional neural network (CNN) models to get the best accuracy score. We tested our datas with a default CNN model, and 5 different pre-trained models: MobileNet, VGG16, VGG19, Inception V3, and Inception ResNet V2. The default CNN model didn’t perform very well, getting only 10.4%. The pre-trained model also didn’t perform as good as expected, so we decided to use GRU with the models, which increases the score. For the higher accuracy, we added bidirectional GRU to train the whole parameters in the model. The 5 different pre-trained models scored an average of 74.2% accuracy score, and Inception ResNet V2 with bidirectional GRU included scored the highest accuracy, achieving 83.57%. For additional study, we used a class activation map to spot the abnormal parts of the eyeball with DR, and we could spot abnormal veins and bleeding on the eyeball. However, our research has limitations on that we did not use the segmentation methods, which is more advanced technique compared to classification, such as U-net, Fully Convolutional Network (FCN), Deep Lab V3, and Feature Pyramid Network. Furthermore, even though our model classified 5 different classes, the fact that the highest accuracy score was lower than 90% is also a limitation. For further study, we would prepare a masking method for applying segmentation methods to our dataset.


Background
Diabetes is a lifelong disease that can happen from the low production of insulin. There are two different types of diabetes. The cause of Type-1 is lack of production of insulin by nature, and type-2 diabetes happens when the body resists producing insulin as growing up. The main reasons for type 2 are inactivity and being overweight. Diabetic retinopathy is a complication that patients get on their eyes when they get diabetes. Diabetic retinopathy can cause swelling in the retina, which is called diabetic macular edema (DME). DME can result in blurring the vision. Diabetic retinopathy also can cause neovascular glaucoma, which is abnormal blood vessels growing out of the retina, and blocking fluid from draining out. The most severe result of diabetic retinopathy could be blindness, with a very low chance [1]. Figure 1 shows the proportion of diabetic patients with diabetic retinopathy out of total diabetic patients. Out of 395 total patients, there were 133 patients with the diabetic retinopathy, which is 33.7%. It's not a high chance of getting retinopathy, but it is pretty common. The graph shows 34% of the patients have diabetic retinopathy. These days, a lot of scientists and doctors use deep learning techniques to diagnose diseases such as diabetic retinopathy. Deep learning performs outstanding results on these topics, too [2].

Objective
A lot of people with diabetes are not worried about diabetic retinopathy because only 34 percent of them get it, and people don't think that he/she will be that 34 percent. But it is really important to take care of it since the diseases can cause such severe damages. Our objective is to find a deep learning model that performs the best on the dataset to increase the accuracy and detecting abnormal parts of the eyeball to diagnose diabetic retinopathy. In order to do that, we have to try different models to find out the model that fits with the dataset the best and use the heat map feature to find out the abnormal parts of the eyeball [3].

Related Works
Convolutional Neural Networks perform very well in detecting and predicting pictures and images. Qummar et al. used pretrained CNN models such as Resnet50, Inceptionv3, Xception, Dense121, Dense169 to predict retinopathy, and the models with fine tuning yielded 70% accuracy [3]. Kwasigroch used CNN models and worked well to achieve decent scores getting 82% accuracy score, and Kappa score equal 0.776, using the datasets provided by the EyePACKS organization [4]. Ghosh et al. also used the CNN method, and got an accuracy score of 95% for the binary classification, and an accuracy of 85% for the five classification on 3,000 validation image dataset from Kaggle [5]. Qomariah et al. used Support Vector Machine (SVM), using the CNN as the input features for classification. They got the highest accuracy values of 95.83% and 95.24% for base 12 and base 13 [6]. Mobeen-ur-Rehman et al. used 3 different pre-trained CNN models: AlexNet, VGG-16, and SqueezNet. Those pretrained models recorded very high accuracy scores which are 93.46%, 91.82%, and 94.49%. They also used the Messidor datasets [7].

Data Description
In our study, we used about 30,000 different image data that are provided by Kaggle, which is available at https://www.kaggle.com/sovitrath/diabetic-retinopathy-224x224-gaussian-filtered [8]. Kaggle collected the images from various sources. They classified the images into 5 different levels of Diabetic Retinopathy (DR): No DR, Mild DR, Moderate DR, Severe DR, and Proliferate DR. We divided the data into 2 groups, which are training data, and testing data. We divided each level of DR into those groups, and the proportion was 7:3 where training data is 70 percent, and testing data is 30 percent. The figures below show an example of each level of DR.

GRU
Gated Recurrent Units (GRU) has appeared to solve a downside of RNN (Recurrent Neural Network), which are gradient vanishing and exploding problems in backpropagation through time. GRU is composed of reset gate, update gate and candidate. The purpose of reset gate is to reset the past information by multiplying (0, 1) to the prior hidden layer through sigmoid function (1). Update gate is similar to the forget gate and input gate of Long Short Term Memory (LSTM) and it decides the ratio of updating past and current information (2). Candidate is a stage for calculating the information candidates (3). In the hidden layer stage, the calculation is performed through adding the outcome of the update gate and candidate gate (4) [9].

CNN
CNN (Convolutional Neural Network) consists of the convolutional layer, pooling layer and fully connected layer. When the CNN gets the input image, the convolutional layer first convolutes and yields a feature map through a filter, which is also called a kernel. Then, the pooling layer reduces the size of the feature map through calculating the average or maximum values of the feature map, each process is called average pooling and max pooling. The fully connected layer is just like a deep neural network (DNN), the main purpose of this layer is to classify the target with an activation function. For multi-class classification, softmax function is used as the activation function and for the binary classification, sigmoid is mainly used [10].

Pretrained CNN
For extracting the features from the images, we utilized pre-trained CNN models such as VGG16, VGG19, MobileNet, Inception-Resnet_v2, Inception_v3. These pretrained models were downloaded from Keras. They were pre-trained with the dataset named ImageNet. Specifically, the Inception-Resnet_v2 has an image input shape as 299 x 299 x 3. After those models extracted the features from the images, then the user-defined layers were used for classifying the targets [11].

Proposed Model
Our proposed model consists of Inception-Resnet-v1 and Bidirectional GRU (Bi-GRU). After pretrained Inception-Resnet-v1 extracts the features from the given images, the lambda layer reshapes the output of the flatten layer to transmit to Bi-GRU. Unlike GRU, Bi-GRU allows end-toend learning, which minimizes loss from the output through training the whole parameters. Adam was used as an optimizer with learning rate 0.0001. Furthermore, all of our datasets were preprocessed through the image data generator function because the number of given datasets were not enough to be trained efficiently. The image data generator function augmented the images through changing a width, height, brightness [12].

Figure 8. Comparison of model accuracy.
In the beginning of our study, we tried to use a pure CNN model, but we only got a 10.4% accuracy score. So we decided to use these pretrained models from keras: MobileNet, VGG16, VGG19, Inception V3, and Inception ResNet V2. The benefit of these models is that these models are already trained very well based on countless datasets, so they usually perform better. As expected, the pre-trained model reached way higher accuracy than the normal CNN model. MobileNet reached to 66.04% accuracy score, VGG16 got 71.21% accuracy score, VGG19, a newer version of VGG16 achieved 73.35% accuracy score, inception V3 scored a very high score which is 76.84% accuracy score, and finally, Inception ResNet V2 performed the highest score, 83.57% accuracy score. Also, in order to find out the erratic part of the eyeball with the diabetic retinopathy, we used a class activation map to create a heatmap based on the picture datas we have. The figure 9 and figure 10 below are the heatmap, and the blue-purple area is showing the normal parts, and the red-yellow area is showing the abnormal area [13].    figure 12 show the accuracy and loss graph of training dataset and validation dataset. Those graphs show that accuracy from training and validation keep increasing during the epoch. Also, the loss graph from the both dataset also yields that loss kept decreasing during the epoch, and we could conclude that training was done efficiently, with a low probability of over fitting.

Principal Finding
Our research was outstanding because we could perform very well because we tried out various kinds of models to find out the best model for the dataset. Also, we used class activation maps to figure out the abnormal part of the eye with diabetic retinopathy (DR) so it's easier to find out whether the patients have diabetic retinopathy [14]. Also, Wu et al. shows that using a CNN model with Gated Recurrent Units (GRU) helps improve the accuracy (Wu When using only Inception ResNet V2, we only got a 77.97% accuracy score, so we decided to also include GRU. Using GRU combined, our accuracy score improved by 5.6%. By combining GRU and pre-trained model, we could train it as bi-directional GRU to improve the results even better [15].

Limitation
Our limitation was that we could've used a better way to figure out the abnormal part of the eye with segmentation methods such as U-net, Fully Convolutional Network (FCN), Deep Lab V3, and Feature Pyramid Network. It is because the class activation map worked inaccurately for some of the datas. Another limitation was that the accuracy wasn't as good as expected even though we tried a lot of different ways to get higher accuracy. For the next study, we are going to aim for at least 90% using CNN models.

Conclusion and Recommendations
We used different kinds of CNN models to get the best accuracy score, and we used class activation map to figure out abnormal parts of the eyeball with diabetic retinopathy. As a result, we got an 83.57% accuracy score using Inception ResNet V2, which is a pre-trained model. We combined the pre-trained model with GRU to get even better results. The heatmap showed that there were more abnormal veins and bleeding on the eyeball with DR. Our study was outstanding because we did an additional study with the heatmap in order to figure out the difference between normal eyeball and eyeball with DR. But the heatmap program didn't perform well on some of the eyeball pictures, so for the next study, we will try different heatmap methods to find a program that performs better.