Power of Overdispersion Tests in Zero-Truncated Negative Binomial Regression Model

Poisson regression is the most extensively used model for modeling data that are measured as counts. The main characteristic of Poisson regression model is the equidispersion limitation in which the mean and variance of the count variable are the same. However, in many situations the variance of the count variable is greater than the mean which causes overdispersion, and hence, poor fit will be resulted when inference about regression parameters. Alternatively, the negative binomial regression is preferred when overdispersion is present. In addition, in particular cases, the zero counts are not observed in data which is known as zero-truncation. In the presence of overdispersion in zero-truncated count data, the zero-truncated negative binomial (ZTNB) regression model can be used as an alternative to zero-truncated Poisson (ZTP) regression model. In this paper, for testing overdispersion in ZTNB regression model against ZTP regression model, the likelihood ratio test (LRT), score test, and Wald test are proposed. A Monte-Carlo simulation is carried out in order to examine the empirical power for statistics of these tests under different levels of overdispersion and various sample sizes. The simulation results indicate that Wald test is more powerful than the LRT and score test for detecting the overdispersion parameter in ZTNB regression model against ZTP regression model, since it provides the highest statistical power. Thus, the Wald test is preferable for detecting the overdispersion problem in zero-truncated count data.


Introduction
The counting data can be defined as the number of occurrences of an event within a fixed period of time, where which this data can take only non-negative discrete numbers. Poisson regression is the most common modeling technique for count data in a wide variety of fields such as biostatistics, agriculture, econometric, epidemiology, psychology, and many others. The standard Poisson regression models have the equidispersion limitation which the mean and variance of counts are equal. However, many count data do not satisfy the equidispersed property, they are either overdispersed (variance is greater than mean) or underdispersed (variance is less than mean) Cameron and Trivedi [4]. The negative binomial regression is an appropriate approach to model overdispersed count data as an alternative to Poisson regression model.
In addition, in many cases, the count data are recorded only over part of the response variable's range, then the data are said to be truncated such as the length of hospital stay, the age of an animal in years, and the number of accidents per worker in a factory. In particular, the zero counts are not observed which is known as zero-truncation, or more generally left-truncation or truncation from below. Also, right-truncation or truncation from above may be arise.
Zero-truncated Poisson (ZTP) distribution is the most widely used to model count data with zero-truncation Gurmu [8], Johnson et al [13]. It is also usually called the positive Poisson distribution, so for positive counts, the fit of truncated Poisson distribution has been preferred Singh [22], Matthews and Appleton [16]. Poisson regression model has been proposed to analyze the truncated count data by Shaw [21]. However, in the presence of overdispersion, poor fit will be resulted and the estimates of regression parameters will be biased and inconsistent Grogger and Carson [7], Cameron and Trivedi [4]. The most common way to handle overdispersion and truncated data at zero is to employ zerotruncated negative binomial (ZTNB) distribution which is presented by Sampford [20]. Further, the ZTNB regression model is suitable approach for modeling zero-truncated count data in the presence of overdispersion problem.
For testing overdispersion in count data, several tests have been proposed, one can refer to Lee [14], Dean and Lawless [6], Cameron and Trivedi [3], O'Hara Hines [18], Hilbe [10] for more details. According to three categories of nested models, various tests of overdispersion have been proposed in the Poisson model versus more general parametric model (e.g., Gurmu and Trivedi [9], Yang et al [25], Yang et al [26], Zhao et al [29], Molla and Muniswamy [17], Zamani and Ismail [27], Pongsapukdee et al [19]). These categories are based on the estimation of the unrestricted model as in the Wald test Winkelmann [24], the estimation of the restricted model as in the score test Cameron and Trivedi [2], Dean [5], Gurmu and Trivedi [9], and the difference between the restricted and the unrestricted log likelihood values as in the likelihood ratio test (LRT) Vives et al [23], Winkelmann [24].
The aim of this paper is to detect the overdispersion problem in zero-truncated count data based on ZTNB regression model against ZTP regression model. Therefore, the likelihood ratio test (LRT), score test, and Wald test are proposed for testing the overdispersion parameter in ZTNB against ZTP regression models. In addition, the empirical power for statistics of these tests is examined under different cases of overdispersion and sample sizes by simulation study to choose the most powerful test in detecting overdispersion.
The rest of this paper is structured as follows: A brief review of ZTP and ZTNB regression models is provided in section 2. Maximum likelihood estimates for parameters of ZTP and ZTNB are derived in section 3. Testing for overdispersion is discussed in section 4. A simulation study is carried out in section 5 to investigate the empirical power for statistics of LRT, score, and Wald tests under different levels of overdispersion and various sample sizes in ZTNB against ZTP regression models. Finally, some conclusions are summarized in section 6.

Zero-Truncated Poisson and Negative Binomial Regression Models
Consider , = 1, 2, … , be a count variable which follows by a discrete probability function = . Let + 1 are omitted, then the resulting distribution is called left-truncated and its probability function will be denoted by where is the truncated (above ) probability function, = = is the probability function of the random variable , and is the distribution function evaluated at .
The left-truncation or zero-truncation is the most common way of truncation in count models, in which = 0. Since many count data have been analyzed by several generalized distributions of Poisson, Grogger and Carson [7] proposed a Poisson distribution to model left-truncated count data at the value = 0 as follows: where the value of = 0 is omitted, * = )+, + -. , = 1, 2, … , is the mean of Poisson distribution, + is a 1 × vector of covariates, . is a × 1 vector of parameters, and 0 0 = 1 − ) % is the distribution function at 0. The conditional mean and variance of are given respectively by In many applications, since data are often overdispersed, the estimates of regression parameters of truncated Poisson model will be biased and inconsistent Cameron and Trivedi [4]. One way to handle overdispersion is to consider a mixture model with overdispersed distribution, for example, the negative binomial model is a gamma-Poisson mixture model that can be preferred when Poisson mean has a gamma distribution Winkelmann [24], Arrabal et al [1].
Then, the conditional mean and variance of are given respectively as follows:

Maximum Likelihood Estimation of ZTP and ZTNB Regression Models
For the model of ZTP regression as in (2), the maximum likelihood (ML) estimation method can be used to estimate its parameter by taking the partial derivative of the likelihood function with respect to the parameter and setting it equals to zero. The likelihood function is where P is the number of observations in the truncated sample. Then, the log likelihood function is given by The first partial derivative with respect to . is given as follows: Therefore, the first order condition for ML is given by maximizing ℓ for . as: An iterative procedure such as the Newton-Raphson or Fisher scoring can be implemented to solve (12) numerically.
The second partial derivatives with respect to . can be obtained, forming the elements of Fisher's information Then, the maximum likelihood estimator of . is normally asymptotic distribution with mean . and variance matrix [W . ] %$ .
In addition, for ZTNB regression model, the maximum likelihood estimation method can be used to obtain the parameters estimates as follows: From (6), the likelihood function is given by the following form: Then, the maximum likelihood estimators for the parameters in zero-truncated negative binomial can be obtained by setting the first partial derivatives of log likelihood function, ℓ in (15) with respect to . and = equal to zero as follows: One can use the Newton-Raphson procedure or Fisher scoring to obtain the solution of (16), and (17) numerically. The second order derivatives of ℓ with respect to the parameters, . and = are Let h = =, .be a , + 1 vector of parameters with . having , elements, then, the observed Fisher information matrix can be partitioned as follows:

Testing for Overdispersion
For detecting the overdispersion parameter in ZTNB regression model versus ZTP regression model, the following hypothesis can be tested: where the null hypothesis is rejected when there is evidence that the overdispersion parameter is significant. The likelihood ratio test (LRT) is one of possible tests that can be used to carry out the hypothesis in (22) based on the ratio of two log likelihood functions evaluated at the restricted and unrestricted maximum likelihood estimates. The statistic of LRT denoted ~• € and is given by where ℓ * y and ℓ * " , = ‚ are the log likelihood under ZTP and ZTNB respectively, and * " , = ‚ are the maximum likelihood estimates of * and = respectively. Under | ( , the ~• € statistic has an asymptotical Chi-square distribution with one degree of freedom. If ~• € statistic is significant, then the unrestricted model is said to fit the data significantly better than the restricted model. A score test is an alternative test that can be used for testing the significance of overdispersion parameter due to (22). Let h * = 0, . -be the restricted maximum likelihood estimates of h under | ( true. Then, the score test statistic denoted ~ ‡ and is given by where * y is the estimated value from the ZTP model. Under | ( , the score statistic has an asymptotical Chi-square distribution with one degree of freedom Cameron and Trivedi [2]. In addition, the statistic ~ ‡ in (25) can be equivalently written as follows: which has an asymptotical standard normal distribution under | ( . Also, the Wald type t-statistic can be used for testing overdispersion due to (22) defined as the ratio of the estimate of = to its standard error. An advantage of the Wald test over the LRT and score tests is that it only requires estimating the unrestricted model, which reduces the computations. Then, the Wald statistic denoted ~OE and is given as:

Simulation Study
In this section, a simulation study is carried out to compare the empirical power of LRT, score, and Wald tests for testing the overdispersion parameter in a zero-truncated negative binomial model under different situations. The model considered in this study is ~ ZTNB(=, * ), = 1, 2, … , where "QR * = 2 − 0.8 + $ + + b . A set of random numbers is generated from a continuous uniform [0, 1] distribution for the covariate + $ , and another set is generated from a continuous uniform  Table 1 and displayed in Figure 1. Based on Table 1, it is clear that the power of three tests: LRT, score, and Wald increases for all sample sizes 20, 50, 100, 6 -200 when = increases. Also, as increases, the power of these tests increases and the Wald test has the greatest power level uniformly for all cases of = and as displayed in Figure 1 . Moreover, as = and increase, the difference between ZTP and ZTNB increases for all tests which emphasizes that there is very strong evidence against the fit of the ZTP model to the overdispersed data. Overall, the Wald test dominates uniformly over score and LRT tests in terms of power for choosing between ZTP and ZTNB in the presence of overdispersion in the data.

Conclusion
Overdispersion is often encountered in count regression which leads to poor fit when inference about regression parameters. For testing overdispersion in ZTNB regression model against ZTP regression model, the LRT, score, and Wald tests were proposed. The empirical power for statistics of these tests was assessed under different levels of overdispersion by Monte-Carlo simulation. The results showed that the Wald test provides the highest statistical power. Thus, it is preferable for detecting the overdispersion problem in zero-truncated count data.
On the other hand, the power of overdipersion tests can be examined for right-truncated count data in future work.