Estimation of Change Point in Poisson Random Variables Using the Maximum Likelihood Method

The point at which a process undergoes a significant shift from its usual course is known as change point. Change point analysis entails testing for the presence of change in a given process, and the location of a single or multiple change points. This study presents a maximum likelihood estimate of a single change point in a sequence of independent and identically distributed Poisson random variables which are dependent on some covariates. A Poisson regression model is used to estimate the mean parameter and the likelihood function. A likelihood ratio test is conducted to check whether change exists with critical values of the test being obtained as in Gombay and Horvath [9]. The procedure is validated for simulated data for cases when there is no change and when there is a predefined change point with special application to incidence of road accidents in Kenya.


Introduction
Change is a usual aspect of everyday life. It can be noted perhaps in the socio-economic status of a low income earner who recently won a substantial sum of money in a lottery. It could be noted in the behavior of a young girl who is at the onset of adolescence. It could even be in the health status of an elderly man who has begun daily physical exercises at the local gymnasium. In other words, change is something one can easily relate to. However, change often goes unnoticed and most of the time, only the effects of change are noted at a much later time following its occurrence.
From a statistical viewpoint, the realizations of any scientific process usually vary considerably over a defined threshold, within which the process is said to be in its "usual state". Nevertheless, these realizations may exceed an acceptable threshold or their distributions may change at one point or at multiple points.
If the exact instance when change occurred were known, then perhaps one could be a step closer to establishing the causes that could be attributed to it. This could help in taking appropriate measures to either reinforce these changes or more so to avoid them if they had adverse effects. For instance in medical science, the treatment and management of cancer has posed a great challenge in the current and past century even with the advancement of medical technology. This is mainly attributed to the fact that the success of cancer treatment is to a large extent dependent on the stage of development at which the cancerous growths were detected, Farber [8], Cooper [7]. However, most cancer patients do not realize that they have the disease until it is in advanced stages of development when the symptoms are obvious. This may be attributed to the relatively long latency period between exposure to carcinogens and the transformation of normal body cells to cancerous cells, Cooper [6]. The earlier the detection and onset of treatment of cancer the more manageable it is.
Historically control charts, such as the CUSUM charts and Shewhart charts, were the most popular monitoring tools used to detect deviations in various processes. These charts were developed in the 1950's for industrial quality control. They signaled that a process was out of control once measurements departed significantly from some predefined benchmark values.
However, it was noted with concern that the point at which a control chart gave an out-of-control signal was not the actual point of change, but rather a belated point.
Change point analysis was initially introduced in the quality control context as an improvement on the method of control charting. With CPA, it was possible not only to detect the presence of change in various processes, but also to locate the point of change. However, the two methodscontrol charting and CPA, are often used to complement each other, Amiri [1].
Over the years CPA has developed into a fundamental problem with applications in various fields including but not limited to; dose-response surveys, credit scoring, identifying structural breaks, studying major drifts in weather patterns and earthquake patterns.
The study of change points has evolved in time from the detection and estimation of a single change point, to that of multiple change points in a system. It has been applied over time to offline and online data sequences from various distributions. Different approaches to estimation of change points have been employed, the most common of which are Bayesian and likelihood-based approaches. The following sections 2 and 3 give a summary of the methodology applied and results obtained.

Generalized Linear Models
Generalized Linear Models, as in Nelder [11] are a class of linear models that provide an avenue to model a response variable against several predictor variables without requiring a linear relationship between individual predictors and the response variable. GLM's have three main components: a random component which specifies the probability distribution of the response variable; a systematic component and a link function.
GLM's are based on the assumption that the distribution functions of the response variables belong to the exponential family of distributions with a given mean that specifies the form of the link function, Lee (2007). Particularly, a random variable Y is said to belong to the exponential family of distributions if its probability distribution can be expressed in the form: For some constants , , a b c and scale and location parameters ϕ and θ respectively Some models that belong to the class of generalized linear models are: the simple linear regression model, the logistic regression models, the log-linear models and the Poisson regression model. The Poisson regression model has the natural logarithm as the canonical link function. The link function allows the response variable to relate to the explanatory variables through a set of regression coefficients. This model rides on the basic assumption that the probability distribution of the response variable under consideration is the Poisson distribution. Particularly, a random variable Y is said to follow a Poisson distribution with mean parameter θ if it takes integer values 0,1, 2,... y = with probability function: It can be shown that the Poisson variable Y belongs to the exponential family of distributions.
In this model the natural logarithm of the expected value of the response variable, which is the Poisson mean parameter θ is expressed as a linear combination of the predictor variables. The Poisson regression model with log link takes the form: This is equivalent to the linear form expressed as: Where; X is a vector of explanatory variables, β is a vector of regression coefficients, θ is the Poisson mean parameter.

The Poisson Regression Change Point Model
The general concept behind detection of a single change point is binary splitting. This entails the partitioning of a sequence of realizations of a random variable into two subsequences; those cases whose values fall below some value, say k, and those whose values are above this value. The constant k is known as the change point. The change-point is chosen such that it maximizes the distinction between the two sub-sequences, Boudjellaba [2]. Similarly for a multiple change point problem, this binary splitting algorithm is applied recursively to each sub-sequence to obtain further change points, until the change points are exhausted. Several methods have been studied to estimate the locations of change points so far, including Bayesian and frequentist approaches, Chen and Gupta [5]. In this study the binary splitting algorithm is performed for the single change point case with respect to Poisson data sequences and maximum likelihood approach used to estimate the change point.
Consider a sequence 1 Where the MLE of is obtained through Poisson regression as; Under the alternative hypothesis, the likelihood function takes the form: The likelihood ratio for 2 ≤ ≤ − 1 is obtained as: The likelihood ratio test statistic that is used in this study is; The change point k is estimated such that k B is maximized. The null hypothesis is rejected for large values of where C is a constant that is determined by the size of the test, the sample size and the null distribution of this test statistic. The reader is referred to Gombay and Horvath [9] for a detailed illustration of the asymptotic distribution and the asymptotic critical values for the statistic k B . Various sample sizes and dimensions were considered and critical values obtained for each sample size at three different levels of the test. See tables of critical values in the appendix.

Simulation Study
Simulated observations for the response variable and two independent variables were random numbers generated using the R statistical software as follows: 1 X , taken to represent the age of the driver, was obtained from a truncated Normal distribution with mean 35 and standard deviation of 10, confined within the integers 18 and 60. 2 X , taken to represent the type of vehicle, was obtained from the Bernoulli distribution with parameter 0.6. Particularly it assumed the value 1 for a PSV and 0 otherwise. Y , taken to represent the total annual accident count, was obtained from the Poisson distribution with mean parameter θ , obtained by fixing the regression coefficients as 1 2 exp(1.5+0.019X +0.005X ) θ = for a case of no change;  table A1 in the appendix). Therefore the null hypothesis was not rejected; as such, the method correctly showed that there was no change. Figure 1 shows the result of the LRT for a case of no change with n=200 . The red line marks the maximum value of the test statistic while the green line marks the critical value at 5% level.
For the case of a preset change point with a sample size of 200, it was found that the maximum value of the LRT statistic 6.45632 was above each of the critical values for all values of alpha (see table A1 in the appendix). Therefore the null hypothesis was rejected; as such, the method correctly showed that there was a change. Figure 2 shows the results of the LRT for a change at 2 n when n=200 . The redline marks the maximum value of the test statistic whereas the green line marks the critical value at 5% level. Histograms of change points for the case of a change were plotted to investigate the power of the Likelihood Ratio test. It was noted as expected that majority of change points for different sample sizes lay around the preset change points. The power of the LRT was obtained as a ratio of the number of times the test yielded the correct results for the change points estimates to the total number of iterations. The reader is referred to Mundia and Waititu [10] for further reading on the power of the LRT for a single change point . The results   are summarized in table 1 for some preset change points at  4 , 2  n n and 3 4 n for a sample size n = 200.   The test was found to be most powerful when the change point was set at 2 n . Moreover it was noted that the test was more powerful at 3 4 n compared to 4 n even when the two points are in the same relative position in respect of the end points. Thus the distribution of change points is asymmetrical. Secondary data on road accidents in Kenya were obtained from the Ministry of Transport, Traffic department for the period 2000-2011. The Poisson response variable Y, was the total annual accident count, presumed to be influenced by four major causes: X 1 -human errors, X 2 -non-human errors, X 3 -bad weather and X 4 -vehicle and road defects.

Model Selection
Hierarchical Poisson regression models were fitted to the real data and the AIC alongside the deviance statistic used to choose the model that best fit the data. Hierarchical models such as those used in Syamsunder and Naikan [12] are useful in evaluating the effects of various covariates on the response variable.
The full model included all four independent variables to take the form: Where;θ =mean annual accident count. Three reduced models were obtained by leaving out some of the independent variables. The results obtained for the four models with regard to AIC, the deviance and the corresponding degrees of freedom are summarized in table 2. With regard to the AIC values, Model 1 was the best choice since it had the smallest AIC.
A chi-square test for change in deviance was constructed at 5% level to compare each of the smaller models against the full model. The hypotheses tested were of the form: H 0 : Reduced model is better H 1 : Saturated model is better The p-values for models 2, 3 and were; 10 4.775*10 − , 0.0314 and 0.00174 respectively. The null hypothesis was rejected for all three models since their p-values were lower than 5%. Therefore the most appropriate regression model was Model 1.

Model Fitting
A graph of the LRT statistic for the real data as shown in figure 5 attained a maximum at the year 2003. This maximum value (3.4271) as marked by the red line exceeded the critical value at 5% level (see table A2 in the appendix) as marked by the green line. This revealed that a significant change was present; hence the null hypothesis of no change was rejected. It was concluded that there was a change in the distribution of the two sub-sequences before and after the

Conclusions, Limitations and Recommendations
It was found that there was a significant change in the annual number accidents on Kenyan roads in Transport, the late Hon. John Michuki. However as time passed, these rules were flouted especially by motorists in the public service sector, and the accident rate increased consequently. Since these rules had a marked positive contribution toward the efforts to curb road carnage they should be reinforced by the Ministry of transport.
This study focused on the estimation of a single change point in a sequence of i.i.d. random variables using the maximum likelihood approach. A Poisson regression model was fitted to the data with the necessary assumption that the sequence of variables was derived from a Poisson distribution with equal mean and variance. However, as the case may be in some datasets, the variance may be lower than or exceed the mean, which would rule out the applicability of a Poisson regression model. In such cases, the negative binomial regression model for instance, may be considered. More suggestions on how to deal with non-homogeneous Poisson processes may be found in Chang [4].
A major drawback faced in this study was dealing with large data values that are problematic under the Poisson model. This problem could possibly be resolved by considering a non-parametric approach. Moreover, an alternative method of estimation to the MLE such as the CUSUM procedure or Bayesian analysis as in Carlin [3] may be considered. Further, there is a need for improved documentation of data on road accident in Kenya especially with regard to the variables affecting these events.