Exploiting Machine Learning Algorithms for Predicting Crash Injury Severity in Yemen: Hospital Case Study

: This study focused on exploiting machine learning algorithms for classifying and predicting injury severity of vehicle crashes in Yemen. The primary objective is to assess the contribution of the leading causes of injury severity. The selected machine learning algorithms compared with traditional statistical methods. The filtrated second data collected within two months (August-October 2015) from the two main hospitals included 156 injured patients of vehicle crashes reported from 128 locations. The data classified into three categories of injury severity: Severe, Serious, and Minor. It balanced using a synthetic minority oversampling technique (SMOTE). Multinomial logit model (MNL) compared with five machine learning classifiers: Naïve Bayes (NB), J48 Decision Tree, Random Forest (RF), Support Vector Machine (SVM), and Multilayer Perceptron (MLP). The results showed that most of machine learning-based algorithms performed well in predicting and classifying the severity of the traffic injury. Out of five classifiers, RF is the best classifier with 94.84% of accuracy. The characteristics of road type, total injured person, crash type, road user, transport way to the emergency department (ED), and accident action were the most critical factors in the severity of the traffic injury. Enhancing strategies for using roadway facilities may improve the safety of road users and regulations.


Introduction
Vehicles (cars, motorbikes, or bicycles) are globally safety tools for transportation. However, one of the negative results of its use is road traffic accidents (RTAs), which is one of the top ten leading causes of deaths. A global estimate of RTAs deaths is 1.35 million, and between 20 and 50 million suffer non-severe injuries. The most affected are the young pedestrians, cyclists, and motorcyclists [1,2]. The behavior of riders, drivers, cyclists, and pedestrians are the crucial causes of accidents [3]. Driving/riding and using mobile phones or using drugs/alcohol simultaneously were reported as major causes of RTAs for young drivers. On the other hand, cognitive, visual, and mobility injury are factors causing accidents for elderly drivers [4]. The rapid economic growth in low and middle-income countries is parallel to a huge number of injuries due to greater accessibility to transportation. Due to the limited technology of vehicle control [5], it leads to the highest rate of RTAs in Africa and the Southern part of Asia [6].
There were many strategies employed to decrease the rate of crash injuries and related consequences of RTAs [1]; some include road safety efforts and targets, problem solvingpolicies, and monitoring and evaluation [7]. The safety management system practices linked with incident reduction have been proposed to be reliable and secure solutions that will lead to RTAs and injury prevention [8].
In Yemen, the violated traffic regulations, uncontrolled passenger movements, road conditions are the most probable causes of the accidents. In 2015, Sana'a traffic data showed that the trends of counted deaths per 10,000 vehicles were 27.5, and the injuries and fatalities caused by driving at high speed were 38.1% of total RTAs [9]. Further studies showed that RTAs cause 12 % of all mortality, and the trends of vehicle crash injuries still high [9][10][11][12]. After this year, due to the political affairs, the database of the RTAs was unavailable for use.
The causality models based on both statistical and reasonableness in viable decision making and regression models for predicting yearly deaths, fatalities, and injuries were applied [10,12]. Unfortunately, these models could not show the RTAs trends in the presence of accident causes. Hospital case study shown that the majority of RTAs in Yemen affected the people less than 30 years old [13]. However, comprehensive studies combining road accidents and their relative leading causes are mostly needed in Yemen for boosting the safety and sustainable development of public health.
Recently, machine learning techniques have become modern methods used in transportation safety studies to identify the substantial influences associated with crash injury severity [14,15]. It is able to show the proportionality of each factor influencing RTAs from each type of vehicle [16].
Other studies also used machine learning techniques to identify the most critical factors that might lead to crash accidents, investigate motorcycle crashes, and detect crash characteristics and influence of vehicle users in crash injuries [17][18][19]. In advance, these algorithms clarify the complex patterns associated with crash risk [20]. More detail in RTAs data, see [21] in counted data, regression model in prediction [22], artificial neural networks (ANN) [23], support vector machines (SVM) [24] and decision trees (DTs) [20].
Unlike statistical models, machine learning-based algorithms need no assumption for mathematical/statistical models that define the relationship between the dependent and independent variables. Machine learning algorithms can deal with nominal and discrete variables that have over two levels and handle well with multi-collinear explanatory variables [25].
In brief, previous studies partially contributed to the analysis and forecasting RTAs consequences by using the unspecified/counted data and traditional statistical methods.
The contributing factors which effectively influence crash injury severities appeared to be the potential gaps, and applying machine learning-based algorithms became a leading tool to show the relationship between road accidents and associated causes. These algorithms were not used in the studies conducted in Yemen. Unlike traditional methods, these algorithms classified the observed data into several classes for having full access to every scenario that occurred on the accident scene before and after the accidents, which is the deepest analysis needed in Yemen.
However, this study merely on assessing and analyzing the relative importance of characteristics and behavior of targeted variables in predicting injury severities of vehicle crashes, and then establish the effect rate of factors influencing RTAs. Due to the lack of studies focusing on contributing factors associated with crashes injuries in Yemen motivates this topic to relate with the comparison of selected machine learning algorithms (Support Vector Machines, Naïve Bayes, Random Forest, Multilayer Perceptron, and J48 decision tree) and tradition statistical methods (Multinomial logit model) for predicting crash injury severities.

Data
Despite several challenges in getting detailed data, we used secondary data collected in two ways. The records of the admitted crash injuries were obtained from emergency departments (EDs) in Sciences and Technology University Hospital (STUH) and Al-Gumhouri General Hospital (AGH) in the capital of Yemen (Sana'a city). The injured people were interviewed by a questionnaire that is designed by [26]. The data was collected between August 24, 2015, and October 8, 2015, and it was published in [13]. The injury severity scoring class is used in three injury levels: minor class noted for simple or no apparent injury, serious class noted for the injury that needs some treatment, and severe that requires intensive medical or surgical management.

Participants
The second data combining cases from recorded files and questionnaire were 156 (injuries and deaths with 128 males and 28 females). The data enclosed 128 roads and was collected from two main hospitals (STUH and AGH).

Data Analysis Tools
The data analyzed using the Waikato Environment for Knowledge Analysis (WEKA), which is a group of data mining algorithms implemented using java tools and created by the University of Waikato in New Zealand. This workbench was used in similar studies since it is an openaccess software tool containing software that assists in functionalities of data mining such as data preprocessing, visualization, classification, feature selection, regression, clustering, and association rules [27]. WEKA workbench provides over 100 classification algorithms, 75 algorithms for preprocessing data, 25 algorithms for dimensionality reduction and evaluation metrics, and 20 algorithms for clustering methods. WEKA toolkit version 3.9 was freely downloaded and installed on a Lenovo IdeaPad S410p computer equipped with 4 GB RAM, 2.23 GHz, a 64-bit operating system, and an Intel (R) Core (TM) i5-4200U CPU to perform the experiments. Hospital Case Study

Preprocessing of Dataset
In this study, first, we cleaned the data and transformed the continuous attributes into categorical attributes. The entire dataset contains 24 attributes that converted from numeric to nominal attributes. Those attributes included accident time, road types, road users, road conditions, vehicle types, and organs injured for victims. The response variable is multiclass, the minor injury takes the value (1), the serious injury takes the value (2), and the severe injury takes the value (3).

Balancing Data
Most road accident data are imbalanced data because of the minor injury class has more instances than serious or severe injury class. In our data, the minor class has 95 injured cases, whereas the serious and severe injury classes, respectively, contain 35 and 26 cases. When the training dataset is excessively imbalanced, the predicted of the minority class will not detect the truth information and will affect the prediction accuracy [28]. In this study, a synthetic minority oversampling technique (SMOTE) [29] was used for balancing the data.

Attribute Selection
After the data became balanced, we applied CorrelationAttributeEvel tools in WEKA for selecting the most correlated attributes with class data. Among 24 categorical attributes, seven attributes were removed. The removed attributes are education status, occupation, accident time, vehicle type, cuts bites or open wound, sex, and road conditions. The concluding list of the attributes and their descriptions are presented in Table 1.

Machine Learning Classifiers
In this study, the target variable is crash injury severities containing one of three possible outcomes (severe, serious, and minor). The most suitable function of machine learning is classification. The supervised learning algorithms contain classifiers that are able to utilize and classify datasets and provide interesting results. Classification methods are predictive techniques used to forecast classes of a target variable from measurements of one or more attributes. The classification step is processed into three steps: Input has a defined set of known explanatory variables, classifier to forecast the explanatory variables whose value is unknown, and last, the output gives unknown explanatory variables determined by other known explanatory variables because of using a classification algorithm [30].
Several classifiers can handle various classification problems in WEKA according to their categories as divided into sub-packages (Bayesian classifiers, lazy classifiers, decision trees classifiers, functions algorithms, meta-learning algorithms, rule-based algorithms, and miscellaneous) [31]. WEKA was used in this study Hospital Case Study because it is an open-access software freely available under the general public license. It is a Java programming language compatible with much modern computing platform and comprises a complete set of data preprocessing and modeling methods. In the following sub-sections, a short explanation of the some selected classification algorithms used in this work is presented.

J48 Decision Tree
J48 decision tree classifier is an open-source of the C4.5 algorithm implemented in WEKA that generates a decision tree through information entropy. This method uses a famous approach known as divide-and-conquer to solve the learning problem from a group of independent instances. It uses an endogenous attribute to choose a target value of a new instance. In this process, the interior node of the decision tree represents the unique attributes. The branches between nodes illustrate the possible values that the attributes can have in the observed instances, and the terminal nodes represent the final node of the class [32].

Random Forest (RF)
The RF approach uses the decision tree algorithm for parameterization and mixing a sampling procedure, subspace technique, and an ensemble strategy to optimize the model building. RF principle is to aggregate many binary decision trees as following: the use of bootstrap samples (obtained by randomly selecting observations with replacement from the learning set L) instead of the whole sample L and the construction of a randomized tree predictor instead of CART on each bootstrap sample [33]. In classification problems, it selects a majority vote among all individual tree predictions and predicts new instances to the majority vote class.

Naïve Bayes (NB)
NB is a robust learning algorithm for classification in WEKA, based on Bayes' rule with a strong assumption that the attributes are conditionally independent in a given class. The advantages of naïve Bayes classification are often to deliver a competitive classification accuracy, computational efficiency, and many other desirable features, including multi-classification [34].

Support Vector Machine (SVM)
SVM is a powerful binary classification method that primarily performs classification step by constructing hyperplanes in a high or multidimensional space that separates the instances that are belonging to different class labels [35]. To achieve the classification target for multiclass by applying SVM, three popular methods (one against all, one against one, and directed acyclic graph) used, and due to classification accuracy, directed acyclic graph-SVM marked to be more robust [36]. In this study, SVM implemented using the SMO algorithm in WEKA software that uses one against one method.

Multilayer Perceptron (MLP)
MLP is a feed-forward neural network classifier with more than one hidden layer. It uses linear or nonlinear activation functions for computing the weighted sum of input and biases, that is used to decide if a neuron can be fired or not [37]. This algorithm involves an iterative procedure for minimization of the error of the function for achieving good prediction with adjustments of weights [38].

Multinomial Logit Model (MNL)
The MNL model is an upgraded version of binary logit regression, and both are traditional statistical methods applied to predict the probability of class association on a predicted variable depends on numerous predictor variables. The desirable variable in inquiry is nominal and for which there are over two categories, while the predictor variables can be dichotomous or continuous. MNL is famous in multiclassification since its ability to tolerate two or more categories of the outcome variables and applies maximum likelihood estimation to assess the probability of categorical membership. MNL relies on independence normality, and multicollinearity assumptions [39].

Parameter Setting
The data were balanced in WEKA using SMOTE oversampling technique. After the data became balanced, randomly split into train data (70%) and test data (30%). The parameters are shown in Table 2. After selecting the best parameters, we applied the selected machine learning algorithms for the classification step.

Evaluation
For evaluating the classifier performance, the following evaluation metrics are used: confusion matrix, accuracy, precision, recall, F-measure, and Kappa statistics. The confusion matrix shows the well predicted and misclassified instances. The accuracy measures the ratio of correctly predicted instances over the total number of the whole dataset. Recall calculates the rate of the instances that are predicted correctly positive. Precision measures the instances that are predicted correctly positive from the total predicted instances in the positive class, F-measure measures the harmonic mean between the recall value and precision value [41]. Kappa statistic shows the goodness of the observed agreement ( ) in the classifier over the predicted agreement ( ) that is predicted by chance [42]. The columns in the Table 3 denotes the predicted class instances, rows indicate the actual class instances, and diagonal elements represent the accurate prediction. Thus, the performance of a classifier can be visualized in the confusion matrix [32]. This confusion matrix can be generalized to be used for multiclass problems.
According to [27], True positives (TP) and true negatives (TN) are correctly classified. A false positive (FP) is noted for the cases that are wrongly classified as "Yes" and a false negative (FN) is noted for the cases that are improperly classified as "No". TPR measures the rate of the cases that are correctly identified, whereas FPR measures the rate of the cases that are incorrectly classified. The evaluation metrics can be given as follows:

Experiment and Analysis
In this study, machine learning classifiers and MNL were applied to model crash injury severities. The injury severity attribute is the class attribute and takes three values as target values (Minor, Serious, and Severe). Distributing values in the dataset are presented in Table 1. The numbers (1, 2 …10) present values assigned to each variable; rows represent attributes, and columns represent the classes of these attributes. Some attributes were environmental leading causes (E1-E3): accident action, road conditions and accident time, and infrastructure leading causes as characteristics of the road type. In total, 156 crash records were reported between August 24, 2015, and October 8, 2015.  104 cases were from the Emergency department of Science and Technology University Hospital (49% minor, 26% serious, and 25% severe sub-classes respectively), and 52 collected from Al-Gomhouri General Hospital (84.6% minor, 15.4% serious and 0% severe sub-classes respectively). Approximately 60.9%, 22.4%, and 16.7% were classified into minor, serious, and severe classes respectively. Min-age=1, average=23, and Max-age=65, students=61 (nursery and school pupils), University students=10, workers=47 (public and daily workers), private employees = 34, and unemployed=10, drivers=40, passengers=59, and pedestrians=50. After the preprocessing step, the dataset was loaded as an Attribute Relation File Format (ARFF) file into WEKA tools. Seventeen predictor factors (attributes) were applied with the class variable to create models for forecasting the level of crash injury severities. Table 4 presents the confusion matrix for each classifier where M, S, and D are respectively noted for minor, serious, and severe injury. Performance metrics of all the five model types, specifically, it shows the confusion matrix, TPR, FPR, precision, recall, F-measure, Kappa statistics, and classification accuracy obtained using a randomly splitting method for each of the five classifiers. For each class, the confusion matrix tells how instances from that class recognized the classifications used in this study. All correctly classified are in the diagonal of the contingency table. Therefore, it is possible to examine the matrix for errors visually. Table 4 indicates the confusion matrix of predicted class with all correctly classified are in the diagonal. Table 5 showed the performance metrics (TPR, FPR, Precision, recall, and accuracy) for each classifier. Table 5 indicates the performance metrics by class for each class. The NB classifier was the worst classifier that achieved an accuracy of 84.7%, with a precision of 0.84, 0.833, and 0.867 for minor, serious, and severe, respectively. For the SVM, the accuracy achieved was 90.59%, with a precision of 0.875, 0.857, and 1 for minor, serious, and severe, respectively. For MLP, the accuracy was 89.41%, with a precision of 0.808, 0.900, and 0.966 for minor, serious, and severe, respectively. For J48 classifier achieved an accuracy of 91.76%, with the precision of 0.840, 0.931, and 0.968 for minor, serious and severe, respectively. The best classifier that achieved high performance is RF with an accuracy of 94.84%, with a precision of 0.926, 0.941, and 0.972 for minor, serious, and severe, respectively. MNL classifier achieved an accuracy of 87.05%, with a precision of 0.909, 0.844, and 0.871 for minor, serious, and severe, respectively. This table shows the comparison amongst classes for each classifier through TP and FP rates, precision, recall, and F-measure.
Several features of visualizations of the threshold curves are presented in Tables 5 and 6, and Figures 1 and 2. These tables and figures indicate that for each classifier, the accuracy, recall, F-measure, and Kappa statistics were significantly higher than 0.7, and some fall in the acceptable range of substantial (0.75-0.85), and another fall in perfect agreement ranges (0.857-1.0). However, the values of these statistical tests indicate that most of the corresponding classifiers have a greater ability and useful to classify crash injury severities correctly than traditional statistical methods represented by MNL in this study. Additionally, four machine learning techniques are better in predicting all injury severity classes. Results indicate that RF was the most accurate classifier with the highest TPR, precision, and recall, and lowest FPR.  The second was j48 and then followed by SVM and MLP with closely in the accuracy but differ with FPR, with slightly better performance metrics than the traditional method MNL and NB. RF performed well in other studies that obtained by [27,14], but it slightly differs from those studies by achieving higher accuracy. The goodness of the accuracy that we got might be achieved well due to the resampling SMOTE method.
For checking the performance of RF against the other classifiers based on weighted average F-Measure, paired ttest has proved reliable for comparing machine learning algorithms in related studies [43,44], and it used for the same purpose in this study. After checking the normality of the data, a paired t-test is used for the mean comparison of the weighted average F-Measure for RF and the other proposed classifiers. The results are shown in Table 7, and it indicates that RF performed better than the other classifiers, especially NB and MNL. Root mean square error (RMSE) also can evaluate the model and show the error of the classification process and the misclassification cases [45]. The classification process was repeated ten times, as shown in Figure 3. It indicates that RF has less RMSE than the other classifiers and is more stable than the other classifiers. Table 8 shows the importance of the attributes that contribute to determining the relation between the class (injury severity scoring) and the independent attributes. It was obtained in WEKA-RF tree-compute Attribute Importance (set True). The remaining seven attributes were deleted in the feature selection step.  According to the contributing factors, this study shows that road type, crash type, road user, accident action, characteristics of road type, and collision partners had a similar impact on road accidents in different vehicles in other countries [27,46] The predicted leadings causes (environmental and infrastructure causes): the behavior of road users (cyclist and pedestrians) identified as common causes, the activity during an accident, characteristics of road-type, ways of transport crash injuries to the hospital, collision type, vehicle type (motors are exposing their users to high risks of crash injuries and fatalities) and road shoulder condition; detected to have a significant impact in vehicle crash injury severity. Some of these predicted environmental and infrastructure leading causes, and others are similar to those predicted in the cyclist crash severity in Spain [46]. The factors that are related to the injured situation have an impact on injury severity scoring, and it can be recognized from Tables 1 and 8.
This study has several limitations. There was a lack of access to the road traffic police database due to the political situation in Yemen. The previous studies only used the counted mortalities data published by the Central Statistical Organization in Yemen (Yearly statistical books: 1991-2013), but RTAs causes were not identified in these books, and its data were not suitable for this study. Regarding the data collection procedures and under-conditions of participants, some environmental causes (weather conditions), visibilities of the streets, violating road safety regulations (speed, overweights), and other clarification of streets (junctions, crosses) which probably lead to the vehicle crashes were not reported.
Regardless of these challenges, some strategies are recommended to diminish the severity of injuries in vehicle crashes in Yemen. These safety strategies include the use of roadway facilities such as road signage and speed bump at junctions, implementation of laws on red-light violations and speed limit, and road safety and behavior management of road users in either urban or rural villages. These are essential strategies for improving and achieving sustainable development goals in declining mortality. Developing visibility on the roadway, especially with street lighting and visible road allocation, also can be applied. Attention must be taken to the behavior of road users as established in some studies [47][48][49].

Conclusions
This study compared machine learning algorithms and traditional statistical methods for classifying and predicting injury severity of vehicle crashes. Based on performance and errors as revealed by evaluators, machine learning algorithms were substantially classified and predicted vehicle crash injury severity better than traditional methods. Most crash injury severity data are imbalanced, and the severe injury level has the least instances comparing with the remaining levels. The oversampling method SMOTE used for balancing crash injury severity data, and it helped in improving the classification accuracy. RF was the best classifier that got high accuracy and low misclassification cases. The determined effect rate of relative factors (environmental and infrastructural leading causes) that influence injury severity in the vehicle crash shows that awareness and policymakers should improve its conditions.

Funding
There was no available funding for the study.

Conflicts of Interest
The authors declare that they have no competing interest.