Analysis and Prediction of Urban Traffic Congestion Based on Big Data

: With the rapid development of big data technology, its application has become more and more extensive. The application of big data technology in intelligent transportation systems is the best way to solve traffic congestion in big cities. The paper analyses in detail the main causes of traffic congestion in big cities and the classification and evaluation of traffic congestion. Utilizing the Internet of Things and modern communication technologies, large-scale traffic data and related data based on GPS are acquired


Introduction
Traffic congestion has become increasingly serious, and traffic accidents have occurred frequently in many cities. These have become traffic management problems that need to be solved. In response to the increasing traffic demand and the pressure of transportation resources, the traditional way of traffic management seems to be inadequate [1]. Intelligent transportation system (ITS) has become an important way to improve traffic management. By use of modern information technology, based on Big Data (BD), the traditional road traffic management methods are deeply reformed to improve the efficiency of urban traffic network and ease urban traffic. Problems and reduce unnecessary losses and improve the efficiency of public transportation [2].
At present, most cities have completed or are building complete three-dimensional traffic monitoring platforms [3]. Vehicle GPS, pedestrian GPS, camera and other monitoring tools are used to comprehensively monitor vehicle travel speed, road cross-section flow, and intersection shunt. By use of these data, real-time evaluation of the operational status of urban roads is possible [4]. Huge and real-time monitoring data forms massive traffic information, which provides effective data support for traffic congestion prediction, and also requires traffic congestion prediction to efficiently and comprehensively cover the entire urban transportation network. Accurate prediction of traffic congestion is one of the core objectives of intelligent transportation systems [5]. Urban traffic conditions have certain self-similarity laws that can be used for prediction; however, road segment and environmental factors generate road segment diversity and road network dynamics caused by external factors, making the traffic congestion law highly complex and uncertain. The prediction model not only needs to adapt to the complex road network conditions, but also achieves high-precision prediction, and needs to achieve high-efficiency updates according to the changes in the road network environment [6].
In the field of traffic flow prediction, a variety of prediction models and methods have been developed, such as linear regression, time series, Kalman filtering, nonparametric models. Kalman filtering is a relatively advanced data processing method based on the filtering theory proposed by Kalman in the 1960s. After the Kalman filtering method was first proposed by Okutani and Stephanedes in 1984, a lot of improvements were made [7]. In 2004, Yang et al. proposed a recursive least squares method for short-term vehicle speed prediction based on maximum likelihood and Bayesian rule. The Kalman filter method was used to adapt to the fastchanging mode. In general, the Kalman filtering method is a matrix iterative parameter estimation method for linear regression analysis models, which has the advantages of flexible selection of predictive factors and high precision. However, due to the large number of matrix and vector operations, the algorithm is more complicated and difficult to use for real-time online prediction.
With the development of Big Data and artificial intelligence, the artificial neural network-based prediction model has been gradually introduced into the field of short-term traffic volume prediction, and has achieved satisfactory prediction results. Smith and Demetsky apply backpropagation neural networks to short-term traffic flow prediction [8]. The neural network models currently applied in the field of traffic prediction include backpropagation algorithms, recurrent neural networks, radial-based RBF neural networks, and multilayer feedback neural networks. This topic mainly uses big data technology combined with deep learning to analyze and predict urban traffic congestion [9].

Population and Vehicles Increasing in Large Cities
In many big cities, population expansion and the surge of private cars and vehicles are one of the main causes of traffic congestion [10]. There are the following reasons: the distribution of population work and residence is not reasonable, the population density is too large in some areas; in urban public transportation, the investment of public transportation vehicles is also increasing; the increase of nonmotor vehicles and pedestrians has a certain impact on traffic congestion [11].

Problems of Urban Basic Transportation Facilities
In some large cities, the width of many traffic roads is limited due to land shortages. Due to urban planning problems, the distance between some adjacent traffic intersections is relatively short [12,13]. There are also traffic lights at the intersection of the bus stop and the number of buses stopping at the bus stop. The switching frequency of the traffic signal and the duration of the red and green lights in a certain direction are long. Another problem that cannot be ignored is that the entrances to schools and hospitals are relatively close to the traffic roads. These situations can all lead to traffic congestion [14].
In Figure 1 below, it is analysed that traffic congestion maybe happens due to unreasonable traffic facilities. Factors affecting the efficiency of the traffic intersection: the width of the AB unilateral road; the distance between O1 and O2; the direction of the turning of the vehicle at the O2 intersection; the switching frequency of the O2 traffic signal; The time of the red light (or green light) in a certain direction of O2; the conflict between the right turning vehicles and the straight non-motorized vehicles and pedestrians at the O2 intersection.

Other Reasons Cause Traffic Congestion
Traffic accidents can also lead to traffic congestion. The degree of congestion is closely related to the number and severity of road traffic accidents and the speed of accident handling. The occurrence of major events and the necessary traffic control are also the causes of traffic congestion. In addition, traffic lights, such as accidental power outages, road construction, bad weather (such as: heavy rain, blizzard, typhoon, etc.). Sometimes, the traffic congestion in the city has some regularity. For example, the commuting time is more likely to cause congestion. Traffic congestion has a certain relationship with the season. In northern China, due to the cold weather, the number of vehicles and pedestrians traveling in winter is less than in summer, and the traffic conditions in winter are better [15].
In many large cities in China, sharing bicycles facilitates people's travel. The subway has alleviated the traffic congestion pressure of the city to a certain extent, but it also brings certain problems. The traffic congestion near the subway station is very serious [16].

Traffic Flow and Traffic Congestion Evaluation
Traffic flow is an important factor causing traffic congestion. Traffic flow refers to the number of passing vehicles per unit time in a particular direction of a road, or the number of vehicles passing through a unit of time in a certain direction of a traffic intersection [17].
The intelligent transportation industry commonly uses three parameters to quantitatively describe the traffic flow: (1) The first is traffic flow, also known as traffic volume, which indicates the number of vehicles passing through the designated section of the road in unit time. The unit is vehicles/hour (V stands for traffic volume).
(2) The second is the traffic flow speed, referred to as the flow rate, indicating the speed of the traffic flow, in meters or kilometres (S stands for traffic flow speed) (3) The third is the traffic flow density, which indicates the degree of density of the traffic flow, that is, the number of vehicles included in the length of the road unit, the unit is km/km (D stands for traffic flow density ).
The relationship between the parameters is: In Equation 1, it can be seen that when there are few vehicles on the road, the driver can choose a higher speed. At this time, the traffic flow speed is larger, but because the traffic flow density is small, the traffic flow is also relatively small. As the number of vehicles on the road increases, the traffic density increases, the speed of the vehicle decreases due to the constraints of the vehicles before and after, and the flow rate decreases, but the traffic flow increases until the product of flow rate and density reaches a maximum under certain conditions. The value, that is, when the traffic flow is maximum. The flow rate at this time is called the optimum speed, and the density is called the optimum density. If the vehicle on the road increases again, the density continues to increase [18].
Traffic congestion evaluation indicators are mainly based on road speed, road traffic density, traffic volume and travel time [19].

Research on Traffic Congests of Big Data
Among the factors affecting traffic congestion, the collected traffic data is divided into static data and dynamic operational data. Static data refers to long-term fixed data, such as the width of traffic roads, the setting of traffic intersection signals, the lane setting of roads, the distance of adjacent intersections, and so on [20]. Dynamic operational data refers to data that changes in time in traffic information data, such as: location information of vehicles, number of pedestrians at traffic intersections, number of motor vehicles at traffic intersections, and so on. In the case of legal permission and protection of personal privacy, the location information is obtained by the GPS positioning system, the location information of the vehicle or pedestrian is obtained, and the movement trajectory is predicted.
Traffic information is collected by remote terminal equipment, and transmit it to the Big Data centre, establish a dedicated storage system, analyse, classify, organize, and make decisions and implement control. First of all, the electronic map library is the basis of the intelligent traffic control system. It must be first established and updated in real time to provide guarantee for the accuracy of GPS navigation. Road information, traffic flow information, vehicle status information, and parking space information must ensure real-time performance to ensure traffic. Scheduling accuracy and security [21].
Big Data uses road information and human travel information for analysis. Big Data technology greatly reduces the requirements for the structure of data [22]. It uses realtime processing of information such as road information, behavioural habits, and preference information of pedestrians to outline various characteristics of each individual to discover a large amount of traffic flow information. Implicit patterns and rules. Big Data will be dispersed in different departments of traffic data, such as personal information, public transport network information, logistics information, weather information and other information related to the traffic related departments, so that the information of all departments is open and interoperable, to achieve multi-level, cross-sectorial Information resource exchange and sharing. The collaborative computing of heterogeneous data is to enhance the knowledge discovery ability, not only to solve the problem of information fusion, but also to solve the cross-domain association problem of multi-source data [23].

System Structure Based on Big Data
System architecture discusses the use of the bottom layer of the Internet of Things perception layer to obtain data, the network layer to collect data, Big Data technology data preprocessing, cloud computing platform to provide computing services, large data analysis, and finally to the Internet of Things upper application layer. By use of large data analysis, using historical data and real-time data to predict traffic congestion, remote control traffic lights. According to the corresponding strategy, traffic signal switching frequency or delay the green light time in the direction of congestion are carried out for single intersection congestion. For multiintersection congestion, multi-intersection signal coordination is used to smooth traffic congestion [24]. The hierarchical structure of the system is analysed in detail (see Figure 2).

Traffic Data Acquisition
By use of a series of data acquisition devices, such as mobile phones, car network, GPS receiver, video surveillance, cameras, etc., a series of information can be obtained such as the time, location, and number of the traffic vehicle. The purpose of obtaining this information is to analyse the vehicle's trajectory and predict the amount of traffic and the likelihood of congestion in a place [25]. The information in Table 1 below is required and how to obtain the information. Car GPS is developing rapidly, generating a large amount of spatio-temporal data, which can be used to predict congestion road segments. Logistics-related vehicles include oil tank trucks, cement tank trucks, garbage collectors, trucks with detachable decks, car carriers, and water sprinklers etc.
Pedestrians and non-motorized vehicles are an important factor affecting traffic congestion in some large cities. Pedestrians and non-motorized vehicles are an important factor affecting traffic congestion in some large cities [26]. Especially at some traffic intersections, pedestrians or nonmotorized vehicles who do not obey the traffic rules affect normal driving, which is likely to cause traffic congestion. Due to improper setting of traffic lights at some traffic intersections, turning vehicles will also collide with pedestrians and non-motorized vehicles, causing traffic accidents and causing traffic congestion. The information in Table 2 is about the data of the Pedestrians and nonmotorized vehicles. In China, motorcycles are driven on nonmotorized roads. The subway has slowed down the pressure on the ground traffic, but the subway station is a place that is likely to cause traffic congestion. Non-motor vehicles are divided into bicycles, tricycles, electric bicycles, disabled motor wheelchairs and animal vehicles [27]. The condition of roads is also a factor affecting traffic. Traffic will be good with the more lanes, the wider lane, and the longer distance of adjacent intersection and the higher of the lane. The contents in Table 3 detail this aspect. There are many other complex issues that affect the state of urban road traffic. It not only includes the differential impacts of different regions of the city, including differences in population density in different regions of the city, differences in road infrastructure construction specifications, etc., but also due to environmental factors such as weather, for example heavy rain, blizzard and other abnormal weather [28,29]. There are other factors that affect traffic congestion: the occurrence of major events, traffic flow control, and maintenance of transportation facilities. The information in Table 4 below is descript in detail.

Data Analysis and Processing
Based on the analysis of dynamic traffic flow and road network congestion status, combined with the temporal and spatial characteristics of traffic data and traffic domain constraints, the potential similarities, correlations and correlations of the data are analysed in depth [30]. The traffic data is clustered, predictively analysed, correlated with analysis, and anomaly detected, so as to discover the knowledge hidden by different feature dimensions and different data granularities, and use the dimensionality reduction technology to process the data.

Short-Term Intersection Traffic Flow Forecast
Data mining technology is used to predict traffic flow. Traffic flow data is a form of time series that is segmented according to changes in time series data characteristics. The most commonly used time series segmentation method is a piecewise linear description, that is, the sequence is segmented and described piece by piece using a linear model. K-Means clustering algorithm is used as the basis of sequence segmentation. The traffic flow is predicted by the combined model algorithm of sequence segmentation and BP neural network. The traffic flow pattern is divided according to the two-dimensional clustering algorithm of flow and time [31,32].

Traffic Congestion Forecast
At present, many research results of traffic congestion prediction are mainly concentrated in the subject area, mainly based on time series correlation prediction analysis, neural network prediction, Bayesian network prediction, and multiclassifier combination prediction [33].
The traffic flow feature vector is constructed by summarizing the basic data such as the traffic flow parameters, the environmental state, and the time period, and the four predicted states are determined (then the levels are determined according to different driving speeds, such as smooth: V≥30; Crowded: 10≤V≤30; Congestion: 3≤V≤10; Blockage: V≤3, V stands for speed and is measured in km / h). A self-encoding network method using deep learning learns from the unlabelled data set to obtain hidden layer parameters that can characterize deep features of the data and generate new feature sets. Softmax regression is used to learn the new feature set with label to generate predictive classifier, and the model predicts the polymorphism of traffic congestion [34].

Pedestrian and Non-motorized Vehicle Trajectory
Prediction Based on GPS Data At many traffic intersections, the collision of pedestrians and non-motorized vehicles with motor vehicles has led to traffic congestion. Based on GPS data, the moving trajectory of a pedestrian or a non-motor vehicle at certain time is predicted by using relevant clustering algorithm. In this way, the number of pedestrians and non-motorized vehicles can be predicted at a certain traffic intersection at a specific time [35].

Conclusions
By use of Big Data technology to get the traffic information of the entire city, and provide practical data and solutions for traffic guidance and urban planning, it can solve the following problems: traffic congestion prediction analysis and processing; traffic flow forecast; scientific planning of transport infrastructure. By the technology of the Internet of Things to collect data, obtain real-time data and historical data, Using Big Data technology, the data is cleaned and preprocessed, and the appropriate algorithm is selected to establish a traffic prediction model. With the widespread use of driverless technology in large cities, intelligent transportation systems based on Big Data can provide accurate and highly reliable traffic information for the Internet of Vehicles.

Abbreviations
The following abbreviations are used in this manuscript: BG: Big Data ITS: Intelligent Transportation System