The Power of the Pruned Exact Linear Time(PELT) Test in Multiple Changepoint Detection

: Changepoint detection is the problem of estimating the point at which the statistical properties of a sequence of observations change. Over the years several multiple changepoint search algorithms have been proposed to overcome this challenge. They include binary segmentation algorithm, the Segment neighbourhood algorithm and the Pruned Exact Linear Time (PELT) algorithm. The PELT algorithm is exact and under mild conditions has a computational cost that is linear in the number of data points. PELT is more accurate than binary segmentation and faster as than other exact search methods. However


Introduction
Changepoint analysis can be considered to be the identification of points within a data set where the statistical properties change. Detecting such changes is important in many different application areas. Examples include climatology [1], bioinformatics applications [2], finance [3], oceanography [4] and medical imaging [5]. The challenge in multiple changepoint detection is identifying the optimal number and location of changepoints as the number of solutions increases rapidly with the size of the data. This problem has attracted a lot of attention in statistics, and a variety of search methods have been proposed and implemented .As increasingly longer data sets are being collected, more and more applications require the detection of changes in the distributional properties of such data. As such, it is clearly desirable to use an efficient method for searching the large solution space.
Over the years several multiple changepoint search algorithms have been proposed to overcome this challenge .They include binary segmentation algorithm which from the work of [6], the Segment neighbourhood algorithm from the work of [7] and the Pruned Exact Linear Time (PELT) algorithm by [8]. Both segment neighbourhood algorithm and PELT algorithm are exact methods. The PELT algorithm uses a common approach of detecting changepoints through minimization of costs. The computational efficiency of PELT is O(n).To find multiple change points, the PELT algorithm is first applied to the whole data set and iteratively and independently to each partition until no further change points are detected. The main assumption of the PELT algorithm is that the numbers of change points increases linearly with the increase of data set, that is, the change points are spread throughout the data and are not restricted to one portion of the data. The PELT algorithm is exact and under mild conditions has a computational cost that is linear in the number of data points. The method is more accurate compared to approximate search methods and faster compared to other exact search methods.

Review of Previous Studies
The changepoint problem has been extensively discussed in literature in recent years. The study of the changepoint problem dates back to [9] and [10] which tested the existence of single change point, and [11] which was motivated by consideration of a "tracking" problem. Gichuhi et al [12] considered changepoint analysis in a regression setting in case that the responses are dichotomous. They used neural network to detect changepoint and used maximum likelihood estimation method to estimate changepoint when it has been detected.
Multiple changepoint problems also have been considered by many authors including [13] who tested and estimated linear models with multiple structural changes. Pan and Chen [14] used modified information criterion to detect multiple changepoints. The problem was also discussed in a Bayesian framework. Yao [15] considered the problem of estimating a signal which is a step function when one observes the signal plus gaussian noise. Barry and Hartigan [16] showed that ,with appropriate selection of prior product models,the observations can eventually determine approximately the true partition. Lee [17] concluded that, under mild assumptions and with respect to a suitable prior distribution, the posterior mode of the number of change points converges to the true number of changepoints in the frequentist sense. The discussion of changepoint problem for dependent observations can be found in [18] and [19].
Over the years several multiple changepoint search algorithms have been proposed to overcome this challenge .Binary Segmentation from the work of [6] is arguably the most established search method used within the changepoint literature. It is an approximate method with computational cost, O (n log n), where n is the number of data points. The segment neighbourhood algorithm by [7] is an exact test that searches the entire segmentation space using dynamic programming. It has significant computational cost, O(Qn 2 ), where Q is the maximum number of changepoints and n is the number of data points.If the number of changepoints increases linearly as the observed data increases, then Q = O(n) and the computational cost becomes O(n 3 ) . The optimal partitioning method by [20] improves on the computational efficiency of the segment neighborhood method but cannot match the efficiency of binary segmentation.It can be applied to a slightly smaller class of problems and is an exact approach whose computational cost is O(n 2 ).
PELT method has been used by authors like [21] who used PELT algorithm in analysing oceanographic time series. They identified the start and end of the storm season automatically. PELT produces quicker and more consistent results than identification `by eye' or assuming that the variability is constant. The analysis focussed on changes in variability within oceanographic data. Madon and Hingrat [22] used PELT algorithm to analyze animal tracking locations as they are just another kind of time series data. The timing of movement was deciphered by the algorithm for the nine migrant Macqueen's Bustards and the PELT-TREE method seemed to stand out by its ability to highlight` fine patterns in movement. The segments obtained by the change point analysis highlighted the complexity of the migration strategy of this species making any visual analysis more than subjective.

Multiple Changepoint Detection
One commonly used approach to identify multiple changepoints is to minimize: Where C is a cost function for a segment and β f (m) is a penalty to guard against over fitting.
Twice the negative log likelihood is a commonly used cost function in the changepoint literature, although other cost functions such as quadratic loss and cumulative sums are also used or those based on both the segment log-likelihood and the length of the segment. In practice, the most common choice of penalty is one which is linear in the number of changepoints, that is β f (m) = β m. Examples of such penalties include Akaike Information Criterion (AIC) (β= 2p) and Schwarz Information Criterion (SIC, also known as BIC) (β = p log n), where p is the number of additional parameters introduced by adding a changepoint. The PELT method is designed for such linear cost functions

The Pruned Exact Linear Time (PELT) Method
It is based on the algorithm of [20], but involves a pruning step within the dynamic program. Jackson, et al. [20] proposed a search method that aims to minimize: Where C is a cost function for the segment and β is a penalty to guard against over fitting. Equation 2 is equivalent to equation 1 where f (m) = m.
The PELT method modifies the optimal partitioning method of [20] by pruning. It combines optimal partitioning and pruning to achieve exact and efficient computational cost which is linear in n. The optimal segmentation is F (n) where, Conditioning on the last point of change, ! and calculating the optimal segmentation of the data up to that changepoint gives, This could equally be repeated for the second to last, third to last, … changepoints. The recursive nature of this conditioning becomes clearer as one notes that the inner minimisation is reminiscent of equation 3. In fact the inner minimisation is equal to F (! ) and as such equation 3 can be re-written as We start by calculating F (1) and then recursively calculate F (2), … , F(n). At each step we store the optimal segmentation up to ! . When we reach F (n) the optimal segmentation for the entire data has been identified and the number and location of changepoints have been recorded.
At each step the minimisation over ! covers all previous values e.g. when calculating F(3) the minimisation covers ! = 0, 1, 2. The computational efficiency of the PELT method is achieved by removing candidate values of ! from the minimisation at each step. The essence of pruning in this context is to remove those values of ! which can never be minima from the minimisation performed at each iteration.

Power of the Test
The changepoint hypothesis problem will be stated as: H 0 : No changepoint in the data. H 1 : There is changepoint in the data. The likelihood ratio statistic is: where ( = 0(1 2 3 )

0(1 2 )
, is the ratio of the likelihoods of the sample after and before the change. k is the changepoint and is not fixed and its location is unknown. Q n is an increasing function of where R is some bound that depends on level of significance and the size n of the sample. R grows asymptotically as n for a given x depending on the size of the test so that, If there is change, then it occurs at a certain point in the data. Thus for a changepoint k, 1 ≤ B ≤ − 1 and as n→ ∞ , then we have that k, − B → ∞ , ( → E(0,1). Therefore, this test is consistent, since for a given size α the power of the test converges to 1.
The study will find out the power of the PELT algorithm for finite sample size for specific alternatives of one changepoint. The test rejects the null hypothesis if $ G.I > R, where R is the asymptotic critical value which depends on the size of α and the sample size n. For a given level α, the power of the test is the probability of accepting this alternative correctly, that is, Since the distribution of $ G.I under H 1 is not known, simulations will be used to estimate the power of the test. For a sample size n, 1000 replicates will be made and in each replicate $ G.I will be estimated. The power function for a given level α will be estimated as, where #($ G.I > 6 (U) denotes the number of times $ G.I > 6 (U). For a given sample, the power of the test at each changepoint location was be evaluated. For a sample of size n=200, the location of the changepoint k was placed at ! , … , ! = 20, 40, 60, 80, 100, 120, 140, 160, 180. 1000 simulations will be done at each changepoint location. The value of the test statistic $ G.I in each of 1000 simulations will be computed. Using the critical values R 1 and R 2 generated using theorem 2.1 or 3.1 in [23], the power of the test was computed using equation 10. The study analyzed the sensitivity of the test as the changepoints approach the extremes. 1000 simulations were then carried out on the sample to investigate the power of the test in relation to the size of the change ∆. The size of change will be placed at an interval of 0.5. The data was assumed to follow normal distribution.

Empirical Results
The study used simulated data.The data was assumed to follow normal distribution with a constant variance. The study analysed the sensitivity of the test as the changepoints approach the extremes. For a given sample, the power of the test at each changepoint location was evaluated. For a sample of size n=200, the location of the changepoint k was then put at n/10, n/5, 3n/10, 2n/5, n/2, 3n/5, 7n/10, 4n/5. This means that k = 20, 40, 60, 80, 100, 120, 140, 160, 180. Using the size of change (∆ = 0.5, 2, 3, 6), 1000 simulations were done at each changepoint location and the power of the test was computed. The results are presented in table 1. A plot of the power of test against the location of changepoint at α = 0.05 is presented in figure 1. Results in table 1 and figure 1show that the power of the PELT method at a given size of change is the almost the same at all changepoint locations.
The study analysed the sensitivity of the test as the size of change increases. For a sample of size n=200, 1000 simulations were done to determine the power of the test at different changes in size of the mean. The size of change denoted by ∆ was put at the interval 0.5 .The study considered the following changes in size: 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6. The results are presented in table 2. A plot of the power of test against the size of change at α = 0.05 for k=20, k=60 andk=100 is presented in figure 2. Result in table 2 and figure 2 show that the power of the test increases as the size of change increases.
Simulations were also carried out to find out the power of the test as n increases. The simulated data consisted of 5 scenarios with varying lengths, n = c (100,200,300,400,500). The location of the changepoint was put at k = n/2. Using the size of change as 6, 1000 simulations were done using each sample size and the power of the test was computed. The results are presented in table 3. A plot of the 95% average power function at different sample sizes is presented in figure 3.

Conclusion
The study found out that the power of the PELT method increases with increase in the size of change. This means the bigger the change, the more likely it can be detected. It also found out that the power of the test at a given size of change is almost the same at all location of changepoints. This means that the search method is highly sensitive than other methods that are more sensitive at n/2. This is true because PELT method is an exact search method. Its efficient computational cost makes it a good test. Also, the power of the test for different sample sizes is almost the same when the changepoint is placed at k=n/2.This study found the PELT method to be a search method. Therefore, we recommend the use of PELT method by other researchers in other application areas.