A Classification Model for Severity of Neonatal Jaundice Using Deep Learning

A Classification Model for Severity of Neonatal


Introduction
Jaundice is a common disease in newborns babies with majority of cases due to the breakdown of red blood cells which release bilirubin into the blood alongside the immaturity of the liver to effectively metabolize the bilirubin and prepare it for excretion into the urine thus causing elevated levels of bilirubin. Neonatal jaundice is a yellowish discoloration of the white part of the eyes and skin in a newborn baby due to high bilirubin levels with symptoms including excess sleepiness or poor feeding while complications may include seizures, cerebral palsy, or kernicterus [1]. Pathologic jaundice occurs in babies when levels are greater than 308 µmol/L (18 mg/dL) and is noticed in the first day of life lasting for more than two weeks, or the baby appears unwell. Jaundice tends to develop in neonates because of two factors [2]. First, the breakdown of fetal hemoglobin as it is replaced with adult hemoglobin and the relatively immature metabolic pathways of the liver, which are unable to conjugate and so excrete bilirubin as quickly as an adult. This causes an accumulation of bilirubin in the blood (hyperbilirubinemia), leading to the symptoms of jaundice. Secondly, if the neonatal jaundice does not clear up with simple phototherapy, other causes such as biliary atresia, Progressive familial intrahepatic cholestasis, bile duct paucity, Alagille syndrome, alpha 1-antitrypsin deficiency, and other pediatric liver diseases should be considered. Prolonged neonatal jaundice is serious and should be followed up promptly [3].
Hyperbilirubinemia, the cause of jaundice, appears in approximately 60% of the newborns at term and almost in all preterm neonates, with prevalence greater than 80% [4]. In the vast majority of newborns, jaundice is a benign condition. However, an incorrect or delayed diagnosis may put newborns at risk of developing kernicterus [5]. Kernicterus is the chronic form of bilirubin encephalopathy and occurs when the deposition of bilirubin in the brain causes irreversible damage [6]. The correct identification of newborns at risk of developing severe hyperbilirubinemia and kernicterus is essential for early treatment. Therefore, preventing the newborn from toxic bilirubin levels, especially for their immature central nervous system, has become a main concern for pediatricians [7]. Assessing the risk of neonatal jaundice is currently done with the support of specific monograms that take into account the age of the newborns, the serum or transcutaneous bilirubin levels and associated risk factors [8]. Despite the use of different methodologies to assess the risk of developing neonatal hyperbilirubinemia, several studies pointed out a growing resurgence of bilirubin encephalopathy and kernicterus, identifying the need to improve diagnosis [9].
Using information about the modifiable risk factors and associated degree of Jaundice can assist in the detection of Jaundice among neonates thus reducing complications arising from the disease. Increasing number of newborn are being discharged from hospital within 48 hours after birth and with short post-natal hospital stay, jaundice may not be apparent at the time of hospital discharge [10]. Neonatal mortality rate was 48 per 1000 live births, 284000 newborns die annually at an average of 700 per day with neonatal deaths in Nigeria accounting for a quarter of under-five deaths since many babies arrive late in hospitals with kernicterus [11]. Early intervention plays a key role in the prevention of the adverse outcomes resulting from neonatal hyperbilirubinemia [12]. Early post-natal discharge from the hospital requires that parents should be able to recognize neonatal jaundice and seek prompt medical attention for it.
Data mining is one of the newest areas of computer science that uses various statistical techniques, databases, artificial intelligence and pattern recognition (one of the areas of machine learning) from data. The basis of the methodologies of data mining is its ability to find patterns and relationships within large quantities of data that can enable the construction of models that meet the task of assigning the class label at unlabeled cases, the combination of statistical methods and artificial intelligence to the management of databases [13]. Data mining techniques have thus successfully been applied in a variety of classification tasks [14]. By identifying hidden patterns, data mining can get information that allows a new perspective on certain diseases and to find knowledge that can foster more research in several areas of medicine. The high degree of accuracy of developed models is a good example of data mining's contribution to medicine [15]. In many areas of medicine, data mining has proven to be a huge added value by contributing with new discoveries and improving the results obtained with other methodologies [16].
Machine learning techniques in addition to improvements in available computational control, have come to play a dynamic role in Big Data analytics and knowledge discovery in medical data [17]. In comparison to conventional learning methods, which are dependent on the use of shallowstructured learning architectures, deep learning refers to machine learning techniques that practice supervised and unsupervised approaches to spontaneously learn hierarchical representations in deep architectures for performing classification tasks more effectively [18]. Deep learning implemented successfully in industry domains that perform very well on an enormous amount of ordinal data [19]. Firms similar to MRIs, CT scans, and X-rays gather and explore massive volumes of images each and every day, violently insistent to deep learning associated projects. Due to the increasing number of patients' data stored in clinical databases, there has been increasing demand for the application of deep learning algorithms for the extraction of relevant patterns from Big Data. This study presents a classification model for the severity of jaundice among neonates based on information about modifiable risk factors using deep learning. Therefore, there is a need for the development of a predictive model for the early detection of jaundice in neonates which is the focus of this paper.

Related Works
This section provides a review of related works in the subject area of knowledge discovery using data mining and deep learning. Among the related works, it was observed that deep learning algorithms were mostly applied for the extraction of complex unseen patterns from unstructured data (such as text-based, sound and video) and data of large sizes. Other works showed the advantage of adopting a neural network-based model for solving classification tasks in the medical domain. A number of such papers are presented in the following paragraphs.
In 2018, Balogun et al. [13], worked on the development of an ensemble model for the severity of sickle cell anemia among pediatrics patients in Nigeria. The study collected data from a tertiary hospital consisting of records of 115 pediatric sickle cell patients alongside the severity classified as either low, moderate and high risk cases. The ensemble model was formulated using a combination of C4.5 decision trees, support vector machine and naïve Bayes classifiers. The results showed that the best classification model for determining the severity of anemia among SCD patients was developed using an ensemble of DT and NB algorithms. The study was limited to the classification of the severity of sickle cell diseases among pediatrics.
In 2017, Idowu et al. [20], worked on the development of a classification model for the survival of HIV/AIDS among pediatric patients. The study collected data from 216 pediatric receiving antiretroviral drug treatment in Nigeria and was used to develop a predictive model for HIV/AIDS survival based on identified variables. The model was formulated using the naïve Bayes' classifier based on the 10fold cross validation technique. The results showed 81.02% accuracy in the performance of the naïve Bayes' classifier used in developing the predictive model for HIV/AIDS survival in pediatric patients. In addition, the area under the receiver operating characteristics (ROC) curve had a value of 0.933. The study was limited to the classification of HIV/AIDS survival among pediatrics.
In 2017, Purushothama et al. [21], worked on the application of deep learning models on large healthcare data. The study collected data for the study from the MMIC-III dataset for Intensive Care Unit (ICU) mortality of patients. The study adopted deep learning for the formulation of three (3) models which include: mortality prediction, forecasting length of stay and for ICD-9 cod group prediction using raw and processed datasets. The results of the study showed that deep learning models were showed a better performance using the datasets to develop the classification and regression models required for the study. The results also showed that deep learning algorithms however effective required more computational time for completing processes. The study was limited to the application of deep learning algorithm to the classification of neonatal mortality.
In 2017, Balajee and Sethumadahavi [22], worked on the application of deep learning for the processing of structured and unstructured data. The study collected medical data consisting of doctor's reports, medical test results, medical images and genomics data which were pre-processed into structured datasets. The model was formulated using deep belief networks (DBN), convolution neural networks (CNN) and recurrent neural networks (RNN) using the collected dataset. The results of the study showed that using the different deep learning algorithms, a better accuracy compared to conventional techniques was achieved. The study was limited to the application of deep learning to the extraction of relevant patterns from medial data.

Materials and Methods
This section presents the materials and the methods that were adopted for the development of the classification model required for the classification of the severity of neonatal jaundice. The materials and methods adopted for data identification and collection, model formulation and simulation alongside performance evaluation were presented.

Method of Data Identification and Collection
Following the review of related works of literature in the body of knowledge of severity of liver disease and the variables related to determine severity of liver disease, a number of variables were identified. The identified variables for determining risk of jaundice were validated by a neonatologist interviewed with more than 10 years' experience in medicine before the data was collected from the hospital located in the south-western part of Nigeria. For the purpose of this study, data was collected from 23 neonates at a hospital located in the south-western part of Nigeria from hospital case files. The information collected from the hospital was stored in a spreadsheet application -Microsoft Excel of the Microsoft Office 2013. Information collected from the neonates contained the explanatory variables for the diagnosis of jaundice as proposed by the cardiologist for each neonate. A description of the attributes contained in the dataset is presented in Table 1. Following the collection of data from the 23 neonates alongside the attributes (22 risk factors) alongside the diagnosis of jaundice, the data collected was checked for the presence of error in data entry including misspellings and missing data. Following this process, there was no error in misspellings but there was missing data in the cells describing severe Anaemia for one record alone. The data was transformed into the attribute file format (.arff) for the purpose of the development of the predictive model for infertility risk using the simulation environment. The dataset collected for the purpose of the development of the predictive model for the diagnosis of jaundice was stored in.arff in the name Jaundice_Data.arff while the number of attributes listed in the attribute section were 23 including the target attribute. Following this, the values of the risk factors for the record of the 49 neonates considered for this study was provided.

Model Formulation of Classification Model
Supervised machine learning algorithms make it possible to assign a set of records (risk factors for neonatal jaundice) to a target classes -the severity of neonatal jaundice. Supervised machine learning algorithms are Black-boxed models, thus it is not possible to give an exact description of the mathematical relationship existing among the independent variables (input variables) with respect to the target variable (output variable -severity of liver disease).
Cost functions are used by supervised machine learning algorithms to estimate the error in prediction during the training of data for model development. For any supervised machine learning algorithm proposed for the formulation of a predictive model, a mapping function can be used to easily express the general expression for the formulation of the classification model for the prediction of the severity of neonatal jaundice.
The historical dataset S which consists of the records of neonates containing fields representing the set of classification factors (i number of input variables for j patients), alongside the respective target variable (severity of neonatal jaundice) represented by the variable -the risk time of liver disease for the jth neonate in the j records of data collected from the hospital selected for the study. Equation (1) shows the mapping function that describes the relationship between the classification factors and the target class -severity of liver disease patients.
: → : (1) The equation shows the relationship between the set of risk factors (input vector space) represented by a vector, X consisting of the values of i variables and the label Y which defines the severity of neonatal jaundice for each neonate as expressed in equation (2). Assuming the values of the set of variable for a neonate is represented as , , , . . . . . . , where is the value of each variable, i = 1 to i; then the mapping used to represent the predictive model for patient performance maps the variables of each individual to their respective severity of neonatal jaundice according to equation (2).

Multi-Layer Perceptron (MLP) Architecture
A Multi-Layer Perceptron (MLP) is a deep network of perceptrons (an interconnected group of nodes using synaptic weights) which is similar in topology to the network of neurons in a human brain. MLPs are presented as systems of interconnected neurons (containing activation functions) which send messages to each other such that each connection have numeric synaptic weights (having values between -1.0 to 1.0) that can be tuned based on experience (model learning) thus making neural nets adaptive to inputs and capable of learning unknown patterns from data. The word network refers to the inter-connections between the neurons located at different layers using synaptic weights as shown in Figure 1. The MLP propagated the sum of product of the weights w and the inputs (risk factors of severity of neonatal jaundice) i through nodes located at the input layer j. The sum of product of the values of the weight and inputs as expressed in equation (3) was used by the MLP to propagate successive sum of product of previous nodes to successive nodes all the way to the output node. The output from successive hidden nodes were propagated using a sigmoid function to normalize the output into an interval of [0, 1] or [-1, 1] as expressed in equation (4).
Using the back-propagation algorithm, the MLP compares the output calculated by the feed-forward propagation with the actual value in order to compute an error-function. Backward propagation of the propagation's output activations through the neural network using the training pattern target in order to generate deltas 4 of all output and hidden neurons using the gradient descent according to equation (5) while adjusting the weights as a function of the error determined at each node using equations (6) and (7). The process was repeated for a number of training cycles for which the MLP network converged to a state where the error determined is small enough, then the MLP network was able to learn the target function.
The algorithm for performing the classification model required for the severity of neonatal jaundice was done using the deep learning classification/regression using multi-layer perceptron made available on the Waikato Environment for Knowledge Analysis (WEKA) Simulation Environment. Figure 2 shows the interface of the properties of the algorithm using WEKA. The figure on the far-right shows the main properties of the algorithms while the two dialog boxes on the left (top and bottom) were needed for managing the layers of the MLP architecture. Following the management of the properties of the deep learning scheme, the properties of the MLP component was defined as shown in Figure 3. The specification of the MLP required the identification of the optimization algorithm which composed of variations of the gradient descent algorithms such as: linear-based, conjugate-based, stochastic-based etc.

Simulation of Classification Model
The dataset collected was divided into two parts: training and testing data -the training data was used to formulate the model while the test data was used to validate the model. The process of training and testing predictive model according to literature is a very difficult experience especially with the various available validation procedures. For this problem, it was natural to measure the model's performance in terms of the error rate. The error rate being the proportion of errors made over a whole set of instances, and thus measured the overall performance of the classifier. The error rate on the training data set was not likely to be a good indicator of future performance; because the models were being learned from the very same training data. In order to predict the performance of the model on new data, there was the need to assess the error rate of the predictive model on a dataset that played no part in the formation of the model. This independent dataset was called the test dataset -which was a representative sample of the underlying problem as was the training data.  It was important that the test dataset was not used in any way to create the classifier since the machine learning classifiers involve two stages: one to come up with a basic structure of the predictive model and the second to optimize parameters involved in that structure. For this study the cross-validation procedure was employed, which involved dividing the whole datasets into a number of folds (or partitions) of the data. Each partition was selected for testing with the remaining k -1 partitions used for training; the next partition was used for testing with the remaining k -1 partitions (including the first partition used or testing) used for training until all k partitions had been selected for testing. The error rate recorded from each process was added up with the mean the mean error rate recorded. The process used in this study was the stratified 10-fold cross validation method which involves splitting the whole dataset into ten partitions.

Performance Evaluation for Validation of Classification Model
A confusion matrix is a square which shows the actual classification along the vertical and the predicted along the vertical a shown in Figure 4. All correct classifications lie Using the aforementioned performance metrics, the performance of the predictive model for the classification of student's performance can be evaluated by validation using a historical dataset collected based on the information provided in the questionnaire. The TP rate and precision lie within the interval [0, 1], accuracy within the interval of [0, 100]% while the FP rate lies within an interval of [0, 1]. The closer the accuracy is to 100% the better the model, the closer the value of the TP rate and precision is to 1 the better while the closer the value of FP rate is to 0 the better. Therefore, the evaluation of an effective model has a high TP/Precision rates and a low FP rates.

Results
In this section of the study, the results of the methodological approach described earlier are discussed. A thorough investigation into the analysis of the description of the dataset collected was initially performed in order to understand the distribution of the values of the variables among the patients selected for this study using the minimum and maximum values, and the mean and standard deviation of the data distribution. Following this, the results of the model formulation and simulation process for the development of the classification model for the severity of neonatal jaundice was presented. The performance of the predictive models developed using deep learning was evaluated in order to identify the most effective and efficient predictive model for the severity.

Results of the Identification and Collection of Variables
The analysis of the data containing information about the attributes for the 49 neonates are shown in Tables 2 and 3. Table 2 shows the description of the nominal variables while Table 3 shows the distribution of the numeric variables. From the description shown in Table 2, the result of the data collection showed that there were more male than there were female neonates with a ratio of about 3 female to 1 male respectively whom were majorly of Yoruba ethnicity and Christian religion. Majority of the neonates were observed to have G6PD deficiency, ABO incompatibility, were breast fed by mothers regularly, had some form of infection/syndrome, UTI, complications on bile duct, hypothermia, severe anaemia, siblings with aneamia and family history of jaundice. The results of the descriptive statistics of the four (4) numeric features was done for neonates aged 8 and 9 months (32 to 36 weeks) based on the identification of the minimum, maximum, mean and standard deviation. The results showed that the weight at birth of neonates was lesser compared to their present weight with a mean age of birth of 2.56 kg and a mean present age of 2.38 kg. Also, it was observed that there was the maximum weight of neonates at birth and present was 2.9 kg while the minimum weights were 1.6 kg at birth and 1.5 kg at present. Figure 5 shows a diagram of the arff file for the new training data stored in the file Jaundice_training_data.arff.

Results of Formulation and Simulation of Classification Model
Following the simulation of the predictive model for the severity of neonatal jaundice using the deep learning with MLP classifier, the evaluation of the performance of the model following validation using the 10-fold cross validation method was recorded. Using the deep learning with MLP classifier, the simulation using the 10-fold cross validation process was repeated for 5,10,15,20,25,30,35,40,45 and 50 epochs with the build time and mean absolute error of the model observed. The results showed that the model with the best performance required for the classification of the severity of neonatal jaundice was done using 5 epochs however with the longest build time. As observed from Table 4, the results of the build time shown in Figure 6 presents a steady decrease in the build time of the model from epoch 5 (with 4.87s) to epoch 20 (with 0.72s) and with a steady increase to a value of 1.47s for the 50 th epoch. Figure 7 also shows that during the model building, the lowest MAE was recorded for the 5 th epoch which increased from a value of 0.3889 to 0.3947 at the 50 th epoch.

Results of Validation of Model
Following the process of model building using the 10-fold cross validation technique, the dataset was also applied upon by the deep learning with MLP classifier for the purpose of validating the classification model based on the simulated model. The model validation process showed that out of the 23 datasets, the model was able to correctly classify 13 out of the 23 instances such that 3 were Low cases, 8 were Moderate cases and 2 were High cases. Figure 8 shows the screenshot for the plot of the correct and incorrect classification. Figure 9 (left) shows the correct classifications as crosses while misclassifications were presented using boxes such that blue colour for Low cases, Red colour for Moderate cases and Green colour for High cases. According to the results presented in the confusion matrix shown in Figure 8 (right), the results showed that all the Low cases were correctly classified; 8 out of the 14 actual moderate cases were correctly classified while 2 and 3 were incorrectly classified as low and high cases respectively; while 3 out of the 7 actual high cases were correctly classified with 4 misclassified as moderate cases. Overall, there were 14 correct classifications out of 23 owing for an accuracy of 60.9%.

Discussions
The results of evaluation of the performance of the deep learning with MLP classifier based on the 5 epochs used in this study for the development of the classification model for the severity of neonatal jaundice diseases is shown in Table  5. The evaluation was done using metrics such as the TP rate which provides the proportion of actual cases correctly classified hence the higher the better; the FP rate which gives the proportion of a class misclassified as another hence the lower the better and the precision which gives the proportion of the prediction that are correct hence the higher the better. The results showed that the performance of the deep learning with MLP classifier using 5 epochs was better than using the other epochs from 10 to 50 based on the 10-fold cross validation technique. The results also showed that the lower values for the MAE were reflected in the value of the accuracy of the model based on its ability to correctly classify the severity of neonatal jaundice. The results further showed that the deep learning with MLP classifier using 5 epochs had the lowest TP rate for the high cases however was able to identify all low cases and 61.5% of moderate cases. The results also showed that the deep learning with MLP classifier using 5 epochs had the lowest FP rates for the low and high cases while the highest rates was observed in moderate cases showing that 40% of low and high cases were misclassified as moderate cases. The highest value for the area under the ROC was observed for the low cases owing to high TP and low FP rates followed by high cases with moderate TP rate and low FP rate while the lowest value was reported for moderate cases with moderate TP and FP rates.

Conclusion
In this study, the development of a classification model for predicting the severity of neonatal jaundice given the values of variables which was developed using dataset collected from tertiary hospital in south-western Nigeria. Twenty-three variables were identified in the dataset which consisted of records for 23 patients alongside their respective severity of neonatal jaundice target class. After the process of data collection and pre-processing, the deep learning with MLP classifier was used to develop the classification model for the severity of neonatal jaundice using the historical dataset from which the training and testing dataset was collected. The 10fold cross validation method was used to train the predictive model developed using the classifier and the performance of the models evaluated. The multi-layer perceptron algorithm proved to be better than the support vector machine which had the better performance because the MLP could correctly classify both Liver and non-Liver cases. Following the development of the predictive model for the severity of neonatal jaundice, the deep learning multilayer perceptron algorithm with MLP classifier using 5 epochs was proposed due to the understanding of the relationship between the attributes and neonatal jaundice severity. The model can also be integrated into existing Health Information System (HIS) which captures and manages clinical information which can be fed to the neonatal jaundice predictive model thus improving the decisions affecting the patient's outcome and the real-time assessment of clinical information affecting the risk of Kernicterus and liver disease among patients.