Ship Trajectory Data Compression Algorithms for Automatic Identification System: Comparison and Analysis

: With the development of Internet of Things (IoT) technology and its vast applications in ship transportation systems, such as the Automatic Identification System (AIS)


Introduction
Maritime transport plays a vital role in global supply chains. The 50 years of Review of Maritime Transport 1968-2018 published by United Nations Conference on Trade and Development (UNCTAD) notes that shipping carries the vast majority of international trade with its share ranging between 80 and 90 per cent of trade. In terms of trade value, the shipping share around 60 to 70 per cent of trade. The importance of maritime transport for trade and development cannot be overemphasized. Ocean shipping will remain the most important mode of transport for international merchandise trade. However, maritime transport is facing many challenges to ensure a high level of efficiency, safety and environmental protection, which need academia to develop supporting models and methods of analysis [1,2].
With the development of information technology and its vast applications in transportation, trajectory data becomes easy to be obtained and have been widely used in road, railway and air traffic researches and practical applications. In marine transportation, ships' trajectories are becoming one of the main data sources for studying the characteristics of ship traffic behaviors, which will be an important basis for supporting the research and application of maritime transport in the future. At present, with the popularity of Automatic Identification System (AIS), a large number of ship trajectory data has been recorded and stored. Nowadays, ship transportation has also entered the age of big data [3][4][5][6].
Furthermore, increasingly numerous methods, theories and technologies of big data, knowledge mining and machine learning have been proposed. Therefore, how to take full use of the data to promote the intelligent of marine transportation becomes one of the most important research topics. Nonetheless, the AIS equipment of a ship generally publishes a message within every 2 s to 6 min, which makes the trajectory data from the AIS notably large. Because the AIS has a high frequency of information, the redundant problem of trajectory data from the AIS is serious [7][8][9]. This problem makes it difficult to be used in research and actual applications. Therefore, the ship trajectory data compression becomes particularly important [10,11].
To compress the ship trajectory data, several methods have been proposed. Because ship trajectory compression quality significantly depends on the threshold selection, an adaptive Douglas Peucker algorithm with automatic thresholding for AIS-based ship trajectory compression was proposed [12][13][14]. Second, the reconstruction approach can also be used to reduce the volume of the data. Some algorithms were proposed to reconstruct a ship's trajectory by AIS data. In these algorithms, not only the navigational behavior could be clearly shown in the reconstructed trajectory, but also the data volume would decrease [15][16][17]. Third, the semantic trajectory compression method, which is used for the movement trajectories in an urban environment, has been used for data reduction techniques applied on AIS data [18,19]. Besides, there are also many other methods used to compress ship trajectory data, e.g. clustering method [20,21], piecewise linear segmentation method [22], directed acyclic graph method [23], direction-preserving trajectory simplification method [24], improved sliding window algorithm [25], etc.
It is noteworthy that, as a type of vector data, ship trajectory data can be compressed by the vector data compression algorithms, which can compress data very effectively for its advantages of easy implementation and low time complexity. Besides, it can also be used as a pre-processing of the above compression algorithms in practical applications. Generalized vector data compression should include the storage compression and re-sampling of the vector data [26,27]. The concept of storage compression reduces the amount of vector data by converting the data type or file type. The concept of re-sampling is to extract subset B from set A which is a collection of the points that compose the vector graphics. Subset B should reflect the original data set A within a certain accuracy as much as possible and should ensure that the points of subset B are as little as possible.
At present, the most widely used algorithms for vector data are mainly the choosing interval points algorithm, limiting vertical distance algorithm, limiting angle algorithm, offset angle algorithm, Douglas Peucker algorithm, grating algorithm etc. [28][29][30]. The research on ship trajectory compression mainly focuses on the application and improvement of the Douglas Peucker algorithm, and some problems in the practical application of ship trajectory data were effectively solved. However, many other vector data compression algorithms have not been applied to ship trajectory compression. More testing and analysis of these algorithms in ship trajectory compression are needed. Moreover, different algorithms have different characteristics, which may be highly effective in some specific data compression applications. Therefore, it is necessary to introduce the above vector data compression algorithms and to study the advantages and disadvantages of these algorithms in ship trajectory compression through experiments. The limiting angle algorithm has poor performance for vector data when the points are dense, so it is not been tested in this paper. The other algorithms are of five typical vector data compression algorithms, and many of the new vector data compression algorithms were proposed based on the five algorithms. Therefore, the five algorithms and their pseudo-code for ship trajectory data compression will be introduced. The data compression experiments for actual ship trajectories from AIS will be done, and the results will be used to analyze and compare the algorithms.
The remainder of this paper is organized as follows. Section 2 introduces the pseudo-code of the five algorithms for ship trajectories data compression. Section 3 presents the data compression experiments in which the performances of the algorithms are tested, and the results are analyzed and discussed. The study's conclusions are summarized in Section 4.

Compression Algorithms
Suppose a ship's trajectory is composed by a set of points in chronological order, which can be represented by set A = {p 1 , p 2 , …, p n }. p is a point on the trajectory. The subscript represents the number of the point ordered by time. n is the total number of points on the trajectory. The ship sailed through each point in chronological order.
Let subset B stand for the compression result of set A. The pseudo-code of five typical vector data compression algorithms for ships' trajectories data compression is as follows.

Choosing Interval Points Algorithm (CIPA)
The basic idea of the choosing interval points algorithm is to retain a point in interval k points or an inter-equal distance d on the trajectory. Let k stand for the number of interval points. The pseudo-code of this algorithm is . end ⌊( − 1)/ ⌋ rounds to the nearest integer less than or equal to ( − 1)/ . This algorithm supports real-time compression processing.

Limiting Vertical Distance Algorithm (LVDA)
The basic idea of the limiting vertical distance algorithm is to select three consecutive points and calculate the vertical distance between the middle point and the straight line between the other two points.

Offset Angle Algorithm (OAA)
The basic idea of the offset angle algorithm is to select three consecutive points, such as p 1 , p 2 and p 3 . After that step, calculate the degree of the angle ∠p 1 p 2 p 3 . Next, compare it with a given thresholdθ threshold . If When the point is dense or the course changes slowly, the algorithm may delete all the points on the curved segment and lead to compression error. To compensate for this defect, the course change can be highlighted by increasing the distance between the three selected points.

Douglas Peucker Algorithm (DPA)
The basic idea of the Douglas Peucker algorithm is to connect the first point p 1 and the last points p n of the trajectory with a straight line. After that step, calculate the distance between the other points in the middle to this straight line. Next, discover the maximum distance d max and the corresponding point p i . Then, compare d max with d threshold . If d max < d threshold , delete all points between the first and last points. If d max ≥d threshold , retain the point p i and divide the trajectory into two segments {p 1 , …, p i } and {p i , …, p n }. Next, for each segment, repeat the above process until to the end. The pseudo-code of this algorithm is Input: Set A, Where unique(B) returns a copy of the subset B that contains only the sorted unique observations. This algorithm does not support real-time compression processing.

Grating Algorithm (GA)
The basic idea of the grating algorithm is to define a fan-shaped region and judge whether the point on the trajectory is inside or outside the region. If it is inside, delete the related point.

Compression Experiment
This section will introduce the compression experiments based on the algorithms above. The results will be compared and analyzed. The data sample for experiments is from the position report messages of AIS. The update frequency of the message is related to the ships' speed and the rate of turn (ROT). Therefore, the sample data should include the AIS messages when ships have different speeds and ROTs. Taking the trajectories of ships in the Qiongzhou Strait as an example, as shown in the bottom layer of Figure 1, it contains AIS data of ships with different navigational states. Five representative trajectories with obvious variations in speed and ROT were taken as the samples for the first experiments, as shown in the middle layer of Figure 1. And another two representative trajectories without obvious variations in speed and ROT were also taken as the samples for the second experiments, as shown in the top layer of Figure 1.
Firstly, compress the five representative trajectories by the choosing interval points algorithm, limiting vertical distance algorithm, offset angle algorithm, Douglas Peucker algorithm and grating algorithm. When the data compression ratio is 20, the compressed trajectories and the mean of the points' displacements are shown in Figure 2(b)~(e). All the compressed trajectories can reflect the spatial distribution of the raw trajectories. However, there are also some different between the compression results.  The compression result by the limiting vertical distance algorithm, as shown in Figure 2(c), retains more details of the raw trajectories than that by the offset angle algorithm, but the improper distribution of points still remains and many points are concentrated on the corners. The mean of the points' displacements is 27.3 m, which is less than the above two algorithms. Figure 2(e) and Figure 2(f) show the compression results by the Douglas Peucker algorithm and the grating algorithm. The performances of the two algorithms are better than the above algorithms. The points are properly distributed, which means that the spatial structural characteristics of the trajectories and details on the corners are preserved well. The mean of the points' displacements caused by the Douglas Peucker algorithm is minimum, which is 13.2 m. The point displacement caused by the grating algorithm is a litter larger than it, which is 16.5m. Secondly, compress the two representative trajectories, by the choosing interval points algorithm, limiting vertical distance algorithm, offset angle algorithm, Douglas Peucker algorithm and grating algorithm. When the data compression ratio is 20, the compressed trajectories and the mean of the points' displacements are shown in Figure 3 (b)~(e). All the compressed trajectories can reflect the spatial distribution of the raw trajectories. However, there are also some different between the compression results.
For the choosing interval points algorithm, as shown in Figure  3(b), some detail still cannot be retained. However, as there are no obvious corner on the two trajectories, the compression result is better than the result shown in Figure 2(b). The mean of the points' displacements reduces to 6.6 m. On the contrary, for the offset angle algorithm, because the feature points on the local segments have been ignored, the improper distribution of the points is enlarged. As shown in Figure 3 Figure 2(c), the compression result by the limiting vertical distance algorithm is better than the offset angle algorithm but the improper distribution of the points is still exist. The mean of the points' displacements shown in Figure 3(c) is 51.4 m. Figure 3(e) and Figure 3(f) show the compression results by the Douglas Peucker algorithm and grating algorithm. The points distribution property of the two algorithm is still better the other algorithms. The important points on local segments are retained, and the detail variations of the two trajectories are preserved well. The mean of the points' displacements caused by Douglas Peucker algorithm is 5.0 m, which is still minimum in the results. The mean of the points' displacements caused by the grating algorithm is similar to the choosing interval points algorithm, which is 6.8 m.
In summary, the performance of the offset angle algorithm is worse than the other algorithms. The performance of the limiting vertical distance algorithm is a little better than it. The performance of the choosing interval points algorithm is unstable, which is more suitable for the trajectories that are approximately straight lines. The performance of the Douglas Peucker algorithm and grating algorithm are better than other algorithms in the experiments. The data compression error caused by the Douglas Peucker algorithm is minimum, but it does not support real-time processing. Although the error caused by the grating algorithm is a little larger than the Douglas Peucker algorithm, it supports real-time processing. Therefore, when it needs to compress historical ship trajectory data, the Douglas Peucker algorithm is recommended, and when it needs to compress ship trajectory data in real time, the grating algorithm is recommended.

Conclusions
Several classic vector data compression algorithms are introduced, as well as their pseudo-code for ship trajectory compression. Through the experiment of ship trajectory data (from AIS) compression, the performances of these algorithms are analyzed. The advantages and disadvantages of these algorithms in compressing ship trajectory data are compared. The result shows that the performances of the Douglas Peucker algorithm and grating algorithm are better than the other algorithms. The Douglas Peucker algorithm is suitable for historical data compression. The grating algorithm is suitable for real-time data compression. When the trajectories are approximately straight line, the choosing interval points algorithm can be considered, because its performance will rise in this situation and it is suitable for both historical and real-time data compression. The research results provide a support for the selection of the algorithms in practical applications, e.g. ship traffic monitoring, safety enhancement and fleet management, based on AIS.