BADM-Net: Hierarchical Classification Network for Identifying Anomalous Trends in Bridge Monitoring Data Patterns

: SHM systems have been widely implemented in long-span bridges, and seas of field measurement data have been accumulated. Due to the imperfect sensors, data transmission and acquisition, various anomalies inevitably exist in the SHM data, which may lead to unreliable structural condition assessment. Thus, an effective approach for detecting data anomalies is highly desirable. Due to the imbalanced data, some anomalous patterns are undertrained in popular end-to-end deep neural network models, resulting in a reduction in detection precision. In this paper, a hierarchical classification model with deep neural network tree is proposed for imbalanced data. The DNN tree contains three levels: (1) CNN to divide seven types of data into four categories (134, 2, 5, 67), denoted as C4; (2) two DNNs to classify to two classes separately (1, 34, 6, 7), denoted as D2D2; (3) DNNs to classify to two classes (3, 4). So, the DNN tree is presented as C4_D2D2_D2. The DNN tree is an open framework and can be defined based on the data characteristics. In the data processing, three data sets are built for training, namely single-channel data set, dual-channel data set and statistical data set. To validate our work, we considered the effects of balanced and imbalanced training sets and training ratios. The results show that our model can detect the multi-pattern anomalies of SHM data efficiently with 95.5% high accuracy. Besides, the proportion of abnormal data classified to normal data has been reduced, especially 3-minor. This model successfully solves the problem in a simple and easy to understand way, which has certain reference significance for the bridge structure anomaly judgment in the future.


Introduction
Bridges are fundamental facilities in the transportation system, and their operational safety is crucial to the economic and social development.The routine environmental actions and operational loads as well as the extreme events such as typhoons and earthquakes may cause unfavorable effects on bridges and shorten the service life [1].In order to recognize structural condition in a real-time online fashion, more and more bridges, during the last three decades, are equipped with structural health monitoring (SHM) systems, which utilize sensing techniques and structural characteristics analysis to detect structural damage or degradation [2][3][4][5][6].
However, the SHM system itself suffers from malfunction and produces a variety of data anomalies.This practical issue adversely affects the subsequent monitoring data analysis and may mislead the bridge condition assessment.Therefore, identifying data anomaly becomes a prerequisite for an effective SHM [7][8].
To solve the imbalanced classification problem with bias distribution of data and improve the data anomaly detection accuracy further, a hierarchical classification model with deep neural network tree is proposed.First, in the data processing, the raw data, its Fourier transform images and the statistical characteristics are integrated for model training.Second, three levels of DNN tree was established.(1) CNN divided the seven types of data into four categories (134, 2, 5, 67), which were represented as C4; (2) the two DNNs are classified into two categories respectively (1,34,6,7), denoted as D2D2; (3) The DNN is classified into two classes (3,4).So, the DNN tree is represented as C4_D2D2_D2.The experimental results show that our model can detect the multi-pattern anomalies of SHM data efficiently with 95.5% high accuracy.Besides, the proportion of abnormal data classified to normal data has been reduced, especially 3-minor.
The main contributions of this paper are as follows: 1) A hierarchical model with deep neural network tree is proposed for the imbalanced classification problem with bias distribution of data.
It is an open framework and can be defined based on the data characteristics.Compared with the end-to-end models, it is easy to control the training details, which can reduce the number of abnormal samples being classified to normal classes especially 3-minor.2) Three data sets are built for training, namely single-channel data set, dual-channel data set and statistical data set.Compared with only using the raw data, more features are integrated and play an important role at subtrees of the model.

Related Work
Since Geoffrey Hinton and his student Simon Osindero proposed a new deep belief network training method in 2006 [9], deep learning has developed rapidly and has been applied to many fields.Compared with the previous shallow learners, deep learning has a more excellent feature learning ability and a more essential characterization of the data.Many feature extraction steps that previously required manual coding are completely replaced by isomorphic networks in deep learning, which greatly reduces the difficulty of developing new algorithms for specific tasks.Previous work shows that, as long as there is enough training, deep networks can often extract better feature representations than those carefully designed manually [10].Many excellent deep neural networks are proposed, such as AlexNet [10], VGGNet [11], Google Inception Net [12], ResNet [13].In this paper, we combine the advantages of these networks and the data characteristics to propose an effective network for feature extraction and anomaly detection.
At present, many researches have been based on neural networks.Gu et al. detected structural damage under varying temperatures using auto-associative neural networks [14].Ye et al. conducted vision-based dynamic displacement measurements of a long-span bridge using a pattern matching algorithm [15].Zhou et al. applied a hierarchical clustering method for damage detection [16].Ma used DBN network to train label data, and proposed GSO-DBN model to solve the defect of parameters falling into local optimal [17].Bao et al. presented a computer vision and deep learning-based automatic anomaly detection method [7].They trained a deep neural network (DNN) to observe and classify future data pieces automatically.Zhiyi Tang et al. [8] used Convolutional Neural Network (CNN) to detect anomaly.But a common problem arose in these studies is the unbalance of the input data sets.Some of the anomalous patterns were under-trained because of the lack of training samples, which resulted in lower detection accuracy compared with other patterns.

Method
In this study, we conducted data anomaly detection on the acceleration data given by SHM system of a long-span cable-stayed bridge in China.The system consists of 38 channels with a sampling frequency of 20Hz.The acceleration data is divided into 7 patterns, including 6 abnormal patterns: missing, minor, outlier, trend and drift.A brief description of the characteristics of these seven data patterns is presented in Table 1.Trend The data has an obvious trend in the time domain and has an obvious peak value in the frequency domain 7 Drift The vibration response is non-stationary, with random drift Figure 1 shows 7 images each for channels 2, 20, and 35 from the training set.And the coordinate system is invisible, because the information about duration and amplitude of the vibration response are not essential for an outline classification.

Raw Data
The data was measured by the SHM system in January 2012.There were 38 channels in the system, and each channel contained 744 samples for a total of 28272 samples.The single sample data was based on 72,000 time points within 1 hour and the sensor collected data every 0.05 seconds.These samples are divided into the 7 data patterns.The distribution of samples is shown in Table 2.We can briefly conclude that these eight types of data patterns are distributed inhomogeneous.In addition to normal data, trend data accounted for the largest share, followed by missing and square data.So, it is a classification problem with unbalanced labels.The abnormal patterns of the samples are also labeled as marking normal, missing, minor, outlier, trend and drift as numbers 1 to 7 respectively.During sample training, we convert the labels into one-hot labels.
Three data sets are built based on these samples, namely single-channel data set, dual-channel data set and statistical data set.

Single-Channel Data Set
A 72000×1 image vector is used to visualize the 72000 data points in one hour of each channel.Then, the image vector is transformed and resized to a 100×100 image pixel array by sequentially in line-by-line order.There are 38×744 images in total, denoted as {x , [0,744], [1,38]}

Dual-Channel Data Set
The raw data undergone a fast Fourier transform to obtain Magnitude maps.The obtained magnitude maps are converted into 100×100 grayscale maps, which could be defined as a single-channel data set.The dual-channel data set was obtained from a combination of the Fourier maps and the above single-channel data set, denoted as {m , [0, 744], [1,38]

Statistical Data Set
The statistical characteristics of the data are sorted out.For each sample of each channel, the mean u, standard deviation s and range r of every 100time points are extracted to form a 3×720 matrix named.Similarly, we inspect the aforementioned four factors after Fourier changes every seconds to construct a 3×720 matrix.The above two matrices are combined to the 6×720 matrix.The statistical data set contains 38×744 matrices for 38 channels with 744 samples, in which each matrix is 6×720.The statistical data set is denoted as: [u, u ,s,s , r, r ] Following the divide and conquer strategy, a hierarchical classification model with deep neural network tree is proposed for data anomaly detection method.The DNN tree combines convolutional neural network and deep neural networks with fully connected layers.CNN is used to divide abnormal data patterns into some easily distinguishable classes.Then DNN models are adopted to classify the remaining classes which are not be well distinguished.In details, the DNN tree contains three levels:

DNN Tree Based Hierarchical Classification Model
1) CNN to divide seven types of data into four categories (134, 2, 5, 67), denoted as C4, in which the normal classes, the minor classes and the outlier classes are grouped together to form a new class in the first level.2) 2two DNNs to classify to two classes separately (1,34,6,7), denoted as D2D2; 3) DNN to classify to the remaining two classes (3,4), denoted as D2.So, the DNN tree is presented as C4_D2D2_D2.The DNN tree can be defined based on the data characteristics.In practice, through experiments it can be seen which classes are easy to distinguish or not.And the classes which are not easy to distinguish are grouped into one category and then classify them layer by layer.
The model includes two basic models, namely CNN and DNN.CNN is in charge of image processing, completing the preliminary classification, CNN is proper for extracting feature of image which is arrange in the first branch of classify tree.CNN includes four convolutional layers and the neuron of each layer is design for two dimensions, including height and width.First two layers possess 8 neurons and a Maxpooling layer with 2 2 pool-size is designed after it.The later two layers possess 16 neurons, all kernel size of layers is 6 6.Besides, each layer sharing the weight.Another 2 pooling-size Max-pooling layer is designed before classification dense.ReLU is treat as activation function.The convolution in layers recognize the feature of image and complete the extraction, max-pooling implements image sampling.7 output dense provides position for classification results.
The metrics depends accuracy and categorical_crossentropy is used as loss function.
DNN acts on further classification, it was design for three hiding layers, the first layer consist of 100 units, the second layer consist of 50 units and the third layer consist of 25 units, using the ReLU as activation function.While calculating the first layers, we add the 0.2 drop out coefficient to avoid model overfitting caused by possibility of lack of data.The tight connection of neurons of proximal layers ensure the effective disseminate of calculated vectors.Multi-layers improve the expressing ability of model.Each layer has its weight and the unit follows the regression relationship focusing on the neuron.The last step outputs three dense and completes the classification.To complete the classification, the final layers process 3 dense activated by softmax.The metrics is accuracy and categorical cross-entropy is used as loss function, same as the framework of CNN.

Model Training
In the C4 process, CNN model, which trained by 3000 samples selected from the Dual-channel dataset is applied.The selected samples contains 300 samples for each class of anomalies, totally 1800 samples for the 6 classes of anomalies, and 1200 samples for the normal samples.In this way, 7 abnormal data patterns can be classified into 4 classes.And pattern 2 and 5 can be clearly distinguished, while pattern 1, 3, 4 and 6, 7 were still always divided into one class.The model continues to classify these two classes of data.
For the class of data patterns 6 and 7, the DNN model with single-channel data set is used.So the data obtained in C4 is organized into the form of single-channel data set and 800 samples of 6 abnormal pattern and 400 samples of 7 abnormal pattern were selected for training.
For the other class of data patterns 1, 3, 4, two-classification operation is adopted twice.The data obtained in C4 is organized into the form of statistical data set.
The DNN model is selected for training with samples of 1600, 1200, 400 for abnormal pattern 1, 3, 4 respectively.In this way, pattern 1 will be distinguished and pattern 3 and 4 will be grouped together.Then, the DNN model with single-channel data set is selected again to distinguish 3 and 4. We set samples of 400,400 for pattern 3 and 4 for training.
During the whole process, all the abnormal data patterns are distinguished and classified well.

Model Evaluation
Before our model was proposed, some other models like CNN [7], DNN [8] and CNN+DNN were also taken into account.CNN+DNN model is a combination of the DNN model and the CNN model.
The results obtained from C4_D2D2_D2 model and three other models are presented in Table 4 respectively.In the training phase, 3000 samples, about 10% of the total samples are taken for training, and the remaining samples are used for validation.The selected samples contain 300 samples for each class of anomalies, totally 1800 samples for the 6 classes of anomalies, and 1200 samples for the normal samples.
Accuracy, precision rate, and recall rate are important indicators for testing model.Accuracy is the ratio of the correctly predicted sample to the total sample.The precision rate is the average of the proportion of samples that were correctly predicted to be abnormal as a proportion of all samples predicted to be abnormal in each abnormal pattern.The recall rate is averaged over the proportion of each type of exception that was correctly predicted as an exception to the actual exception.

Accuracy = (TP + TN)/(TP + TN + FP + FN)
(1) Recall = TP/(FN + FP) where: TP = True positive; FP = False positive; TN = True negative; FN = False negative.Through training, a simple conclusion can be drawn that the accuracies of the three models with different data sets make little difference except for the DNN model with statistical data set achieves the lowest accuracy.However, it is found that when using CNN or DNN alone, an increase in the accuracy of one anomaly is likely to cause a decrease in the accuracy of another anomaly.In addition, the data sets are treated equally for each anomaly, so the data obtained by one model can distinguish well between two anomalies while it hard to classify other anomalies.In this way, a hierarchical classification model is designed for better performance on the detection of all anomalies.Our method can achieve higher precision rate and recall rate.In details, from the confusion matrixes shown in Figure 4, it can be concluded that the three types of training models have good discrimination for the two abnormal patterns of 2-missing and 5-square, and there are few other abnormal patterns mixed in these two patterns.In addition, the three types of abnormal patterns as 1-normal, 3-minor and 4-outlier are always classified together.Pattern 6-trend and 7-drift are classified together sometimes.However, C4_D2D2_D2 performs better in anomaly identification, especially in classifying pattern 1-normal, 3-minor and 4-outlier.
So, this hierarchical classification model can address the shortcomings posed by individual model to some extent.When using hierarchical classification model, distinguishable anomalies can be separated out at first and then come to resolve the more difficult-to-differentiate anomalies.Besides, the advantage of the different processing methods to obtain the data set is emerged so that less worries about the data set exist which can lead to confusion of several other anomalies.A hierarchical classification model is also effective for adjusting model parameters when training models.
The thermodynamic diagrams of 38 744 labels with time as x and channel number as y is shown in Figure 5 to present the comparison between the raw data and the training results, the raw data is roughly indistinguishable from the results obtained from model training.It also reveals in Figure 5 that some subtle differences exist in the marked boxes 1, 2, and 3.The model predictions are largely correct in terms of the time distribution.The sample distribution predicts that the wrong samples are mostly concentrated in a specific region, with a very small fraction being discrete.The thermodynamic diagram can be used to improve our model.

Design of Training Set
A specific number of normal and abnormal data patterns are chosen in the design of network infrastructure.The impact of the two methods of sampling a specific number of each anomaly as well as random sampling on the data set should also be considered.As the accuracy difference of the training model is not obvious, the DNN model with single-channel data set is chosen to train the optimal training set size, because it has the shortest training time and can effectively save time.For the design of training set, 8 training tasks are set, which are shown in Table 5.After training, a simple conclusion can be got that there are still some inaccuracies in general.For random sampling, if the sample size is too small, some anomalies will not be recognized.When the sample size is increased to a certain size, the sample accuracy rate and the test set accuracy rate tend to be the same because enough samples are almost distributed to all abnormalities and distributed proportionally.When the accuracy of the model improves the discrimination of the pattern 1-normal, 3-minor and 4-outlier would be more likely to be recognized as pattern 1-normal.In brief, the training set size of 3000 samples are the best one.
In the design of training set size, the single channel data set is used.However, this type of data set is not the best choice and other type of data set should be considered.So, DNN with statistical data set and CNN with dual-channel data set is trained of 3000 samples.From the results presented in Figure 8, the DNN model with statistical data set performs well in distinguishing pattern 1, 3, and 4 which are always difficult to be classified, but it is less able to distinguish between other anomalies.The CNN with dual-channel data set has high ability to distinguish between various types of anomalies, so it is applied in our hierarchical model.The highlight of the chosen datasets is the statistical data set.The statistical data set is more advantageous for data information compression and retention compared to a single grayscale graph type dataset.When transforming the original data into images, a longitudinal coordinate adaptive way is applied.Under this circumstance, the image data obtained may lose part of the information contained in the longitudinal coordinates.However, the statistical data set, using the combination of these three numerical features as the mean, standard deviation, extreme deviation can reflect the trend of data changes and data changes in the range of data orders of magnitude, which is equivalent to retain a considerable part of the information in the longitudinal coordinates.At the same time, the compression ratio of the statistical data set is also relatively high.

Defect Recognition
Using C4_D2D2_D2 model for classification, combining the resulting image labels with the visualization of the initial data, we found that more than 100 samples were mislabeled, resulting in a decrease in the accuracy of model predictions.In Figure 7, this sample was originally labeled as 6-trend, but to be predicted as 7-drift under C4_D2D2_D2 model.In fact, based on the way the data moves, we can tell it's 7-drift instead.Similarly, Figure 8 shows that 3-minor was labeled as 4-outlier wrongly.
For bridge anomaly detection, previous research have used perceptual measurements of vertical form variables to detect bridge anomalies or damage.This method is based on thermodynamics to establish the equation of state of the beam under quasi-static processes, and the equation of state is verified through case studies.The thermodynamic rotational equation of state is combined with Bayesian theory to obtain a dual-indicator anomaly detection method and apply it to actual bridges.For this method, rich bridge temperature field information is required to accurately predict the temperature effect, and the requirements for the collected information are high.In some scenarios where the sensor distribution is less effective; and our proposed BADM-Net, based on the existing data collected by the current SHM systems can be a complete analysis of bridge anomalies, where it is worth noting that before that, there are also people who build neural network models to analyze bridge anomalies, but the complexity of the data processing network has been very high and the interpretability is poor.This paper builds a network model based on multi-distributed convolutional neural network to process the data set in a hierarchical manner.This processing Bridge Monitoring Data Patterns model greatly improves the speed of data processing and can be applied to real-time bridge abnormality detection, which has better applicability compared to the previously proposed deep learning model, due to the hierarchical processing technique of BADM-Net, we divide the data more finely, so that the deep learning network model is more accurate in identifying bridge anomaly types.

Limitation
Although we have done a lot of work, there are still some weaknesses in our study.The main weakness is that 1-normal data might be classified into other abnormal data due to the limitations of the model and we haven't done a better job of separating the 4-outlier from the 1-normal, which reduced the accuracy of the final classification.Another limitation of this model is that the steps of data preprocessing are complex.Thirdly, some of the data labels provided by the match side are wrong, and we did not deal with the wrong labels, resulting in misclassification to some extent.

Conclusion
In this paper, an anomaly detection method based on computer vision and deep learning was presented to auto-detect multiple anomalies in SHM systems.The SHM time series data are first converted into image, which can be visualized for computer, and then the image vectors of grayscale figures are used as training set.Then a hierarchical classification model called C4_D2D2_D2 was proposed.The data in the example includes six patterns of data anomaly, the global accuracy of data anomaly detection results by the designed and trained C4_D2D2_D2 can achieve 95.5%.Compared with the manual inspection method, the proposed computer vision and deep learning-based method is much higher efficient.While the data preprocessing is more complicated and this model failed to optimize the separation of anomaly 4-outlier from 1-normal.Because of the numerous data size and extremely imbalanced proportion between patterns in actual applications, in future work we will use image processing to increase the number of samples.

Figure 2 .
Figure 2. Workflow of the data processing.

Figure 5 .
Figure 5.Comparison between actual data anomaly distribution and detection results.

Figure 6 .
Figure 6.The confusion matrix of DNN and CNN.

Table 1 .
Ratio of each pattern in actual data anomaly distribution.

Table 2 .
Ratio of each pattern in actual data anomaly distribution.

Table 4 .
Comparison of different models.

Table 5 .
Comparison of different data sets under DNN model.