Learning Algorithms Using BPNN & SFS for Prediction of Compressive Strength of Ultra-High Performance Concrete

: This paper presents machine learning algorithms based on back-propagation neural network (BPNN) that employs sequential feature selection (SFS) for predicting the compressive strength of Ultra-High Performance Concrete (UHPC). A database, containing 110 points and eight material constituents, was collected from the literature for the development of models using machine learning techniques. The BPNN and SFS were used interchangeably to identify the relevant features that contributed with the response variable. As a result, the BPNN with the selected features was able to interpret more accurate results (r 2 = 0.991) than the model with all the features (r 2 = 0.816). The utilization of ANN modelling made its way into the prediction of fresh and hardened properties of concrete based on given experimental input parameters, whereby several authors developed AI models to predict the compressive strength of normal weight, light weight and recycled concrete. The steps that were are followed in developing a robust and accurate numerical model using SFS include (1) design and validation of ANN model by manipulating the number of neurons and hidden layers; (2) execution of SFS using ANN as a wrapper; and (3) analysis of selected features using both ANN and nonlinear regression. It is concluded that the usage of ANN with SFS provided an improvement to the prediction model’s accuracy, making it a viable tool for machine learning approaches in civil engineering case studies.


Introduction
Several types of machine learning algorithms such as Artificial Neural Network (ANN) have been used in different fields for the development of models that predict response parameters (experimental dataset) using certain independent input parameters. However, an experiment could have a large number of independent parameters most of which are redundant and have negligible effects on the response parameters. Therefore, an artificially intelligent (AI) selection algorithm is required to overcome this shortcoming and identify the underlying parameters that improve the model's accuracy and simplify the computational complexity. The need for soft computing tools and models for the prediction of behavioural properties of engineering components, systems and materials is continuously rising. ANN emerged as one of soft computing paradigms that have been successfully applied in several engineering fields [1]. Specifically, ANN has been used to solve a wide variety of civil engineering problems [2][3][4]. Mainly, ANN was utilized to model the nonlinear behaviour of fatigue and creep of Reinforced Concrete (RC) members [5][6][7][8]. Recently, research interest has revolved around the development of ANN models to interpret the behaviour of structural materials such as steel, concrete, and composites [9][10][11][12][13][14]. The utilization of ANN modelling made its way into the prediction of fresh and hardened properties of concrete based on given experimental input parameters, whereby several authors developed AI models to predict the compressive strength of normal weight, light weight and recycled concrete [14][15][16][17]. Afterwards, several authors began developing ANN models for the prediction of compressive strength of high performance concrete [18][19][20][21]. In this study ANN is employed with other machine learning techniques to identify the parameters that capture the compressive strength of UHPC using data collected from the literature. Using BPNN & SFS for Prediction of Compressive Strength of Ultra-High Performance Concrete

UHPC and ANN Background
The evolution of UHPC has lead structural engineers to improve the compressive strength, ductility, and durability of heavy loaded reinforced concrete (RC) structures. Several researchers have been investigating the mechanical behaviour of UHPC and its applications over the last four decade, where it was founded that UHPC exhibits a compressive strength that would range from 150 to 810 MPa [22,23]. The underlying material constituents that enable such a superior mix are cement (up to 800 kg/m3), water/binder ratio that is lower than 0.20, high-range water-reducing (HRWR) admixture, very fine powders (crushed quartzite and silica fume), and steel fibers. [24]. Other researchers proposed different mixtures by adding fly ash and sand to reduce the amount of cement and silica fume, and acquire an optimum mix that is both economical and sustainable [25,26]. However, most of the aforementioned mixtures result in exhausting a large amount of resources and performing tests on many batches, while barely predicting the strength of UHPC [19]. Therefore, researchers began conducting investigations on the utilization of machine learning techniques for the development of prediction models that could assist engineers and researchers to produce appropriate UHPC mixes.
Ghafari et al. [19] used the back-propagation neural network (BPNN) and statistical mixture design (SMD) in predicting the required performance of UHPC. The objective of the study was to develop an ANN and SMD model to predict both the compressive strength and the consistency of UHPC with two different types of curing systems (steam curing and wet curing). As a result, BPNN proved to be more accurate than SMD in the prediction of compressive strength and slump flow of UHPC. Despite the statistical advantages of ANN, it has been long regarded as a black box that evaluates functions using input covariates and yielding outputs. Meaning, the model does not produce any analytical model with a mathematical structure that can be studied. Therefore, ANN should be utilized in detecting the dominant input parameters that have direct association with the ANN model. This will reduce the amount of parameters in the model, which will improve the computation complexity of the ANN model and simplify the derivation strategies of a mathematical model used to predict the compressive strength of UHPC. In addition prediction of compressive strength of high strength and high performance concrete was addressed by other researchers [20,21].
There are several machine learning techniques, in the literature, that assist researchers in identifying the underlying covariates impacting the prediction model. Sequential feature selection (SFS) is a machine learning tool that sequentially selects features and inputs them into a fitting model (i.e. ANN) until the model's error function increases. This technique makes use of ANN's complex computation and allows the SFS tool to select and remove the influential and redundant parameters, respectively. The reduction in the covariate domain improves the accuracy of the fitting model, decreases its computation time, and facilitates a better understanding of the data processing [27]. There are two types of SFS classes -mainly filter method and wrapper method [28], where Zhou et al. [29] used the Markov Blanket with a wrapper method to select the most relevant features of human motion recognition. Four sets of open human motion data and two types of machine learning algorithms were used. The total number of features was reduced rapidly, where this reduction helped the algorithm demonstrate better recognition accuracy than traditional methods. Moreover, Rodriguez-Galiano et al. [30] used SFS when tackling ground water quality problems, where 20 datasets of parameters were extracted from a GIS database. Four types of machine learning algorithms were used as wrappers for the SFS. As a result, the Rain Forest machine learning algorithm used with SFS showed promising results, where only three features were sufficient enough in predicting the most accurate results.

Methodology of Modeling
The steps that were are followed in developing a robust and accurate numerical model using SFS include (1) design and validation of ANN model by manipulating the number of neurons and hidden layers; (2) execution of SFS using ANN as a wrapper; and (3) analysis of selected features using both ANN and nonlinear regression. Table 1 presents the initial input variables together with their range (maximum and minimum values) and symbols for identifying them in this experimental program.

Artificial Neural Network
Artificial neural network (ANN) is a machine learning tool that imitates the learning functions of a human brain by providing a robust technique in classifying and predicting certain outcomes based on the model's objective. There are two types of ANN models: (1) feed forward; and (2) feed backward. In this study, the feed backward ANN is used, where it is composed of input neurons, hidden neurons, bias units, wires containing randomly generated weights, and output neurons. The input neurons are responsible for containing the independent parameter presented by the user, the wires represent the randomly generated matrices called weights that manipulate the function's slope or steepness, the hidden neurons map the weights variables using an activation function, and the bias units control the output function's shift, either upward or downward. Equation (1) shows the linear combination of mapping weights from each input neuron, via wires, to the hidden neurons.

∑ ∑
(1) Where X i represents the first input parameter of size R×1 (R is the number of data points), θ ij is the weight of size R×(n+1), O i is the value of the output neuron or prediction function h θ (X), and g(x) is the activation function. The bias unit is simulated by creating a column vector of size R×1 and assigning it with values of ones, where X 0 and θ i0 contain the bias values.

Sequential Feature Selection
SFS reduces the dimensionality of data by selecting only a subset of measured features to create a prediction model. SFS is composed of two components: the objective function, which is the criteria the algorithm follows when selecting the features (i.e. the NMSE), and the search algorithm, which are the methods of how the machine add/removes the features from the subset. There are two types of search algorithms: sequential forward selection and sequential backward selection. In this study, the previously verified ANN model was used as the objective function and the forward selection was used in selecting the relevant features. Figure 1 shows the algorithm SFS uses when performing forward selection.

Verification of ANN
The ANN numerical solver, Levenberg-Marquardt, was verified by testing different number of neurons using a basis like the normalized mean square error (NMSE) to measure the error. The increment started from one neuron and ended with 15 neurons, where the model was analyzed 10 times, for each increment, because the Levenberg-Marquardt algorithm locates the local, and not the global, minimum of a function. Hence, for each neuron tested, ten NMSE values will be stored in a column vector, where each column vector will be averaged and plotted against its corresponding number of neuron(s). Figure 2 shows the plot of all the scenarios with the minimum point circled at 11 neurons. Therefore, 11 neurons is, approximately, the number of neurons that is sufficient enough for BPNN to facilitate an accurate ANN model for the collected dataset.

Execution of SFS
The SFS algorithm was run 200 times to capture all possible combinations of independent features when using ANN. Table 2 tabulates the percentage of features that were used during the 200 trials. Based on the results of these trials, the most abundant combination during the SFS analysis, within a 20% threshold, was selected as the important parameters that contribute mostly in the model. In this study, four variables (Cement, Sillica Fume, Flyash, and Water) were selected as the most relevant features for the prediction model. Figure 3(a) and (b) present the architecture of both ANN models, before and after selection.

ANN Results
The selected features, using SFS, were analyzed by the previous BPNN model. As a result, the model that used the selected features showed stronger agreement with the experimental results in contrast with that prior to the selection. Table 3 shows the statistical measurements calculated for both cases. It was observed that the r 2 and NMSE before and after selection yielded 81.6% and 99.1%, respectively, and 0.0594 and 0.026, respectively. The correlation plots between the predicted and experimental results for the ANN models, with and without selected features using SFS, are summarized in Figure 4(a) presents the percent deviation, where an arbitrary percent deviation was plotted above and below the perfect fit line with a deviation value of ±20%. As a result, the ANN model with the relevant features was capable of predicting 89.6% of its values within the aforementioned boundaries, as opposed to the ANN model with all the features which predicted 58.7% of its values within the boundaries. Figure 4

Linear Square Regression Analysis
A linear regression model was developed using the Least Square Regression (LSR) method, where the analytical model consisted of the previously selected features. Table 4 shows the coefficient values, with their corresponding symbols, for each UHPC constituent with the statistical measurements of the LSR model. The LSR model is a linear function and its form is shown in (2). fc = θ1C + θ2SI + θ3FA + θ4W (2)

Parametric Study
Since the developed LSR model is capable of accurately predicting the experimentally measured compressive strength, a parametric study was conducted, using this model, to study the effect of Fly Ash and Silica Fume on the compressive strength of UHPC. Using Fly Ash quantities that range between 0-200 kg/m 3 and Silica Fume quantities that range between 40-160 kg/m 3 while fixing the quantity of cement at 1400 kg/m 3 and water at 175 kg/m 3 , several plots showing the variation of strength of UHPC were generated as shown in Figure 5. It is observed from Figure 6 that there is noticeable increase in the compressive strength of UHPC with the increase in Flay Ash and more noticeable with the increase in Silica Fume.

Conclusion
This study was conducted to detect the correlation between the material constituents of UHPC and its compressive strength. BPNN was used and three major steps were executed: (1) verification of ANN; (2) application of both SFS; and NID, and (3) analysis of selected features using ANN and LSR. The SFS tool was used to select the relevant constituent that impacted have the most impact on the compressive strength of UHPC which are mainly Cement, Sillica Fume, Flyash, and Water. It can be concluded from this study that: 1) The use of ANN with SFS reduced the number of input parameters needed to accurately predict the compressive strength of UHPC mix for the prediction of compressive strength, making it less computationally expensive.
2) The use of ANN with selected input parameters improved the accuracy of prediction of compressive strength of UHPC and reduced the computational effort. The correlation coefficient (r 2 ) before and after the use of SFS improved from 81.6% to 99.1% while the NMSE improved from 0.0594 to 0.026, respectively.
3) The ANN model with the selected relevant input parameters also showed a lower deviation (89.6 %) than the ANN model with all the features (58.7%). 4) LSR was implemented using the selected input parameters to develop an analytical model that can be used to accurately predict the compressive strength of UHPC.