Using Maximum Likelihood Ratio Test to Discriminate Between the Inverse Gaussian and Gamma Distributions
Zakariya Y. Algamal
Department of Statistics and Informatics, Computer science and Mathematical College, Mosul University, Mosul, Iraq.
To cite this article:
Zakariya Y. Algamal. Using Maximum Likelihood Ratio Test to Discriminate Between the Inverse Gaussian and Gamma Distributions. International Journal of Statistical Distributions and Applications. Vol. 1, No. 1, 2015, pp. 27-32. doi: 10.11648/j.ijsd.20150101.15
Abstract: One of the problems that appear in reliability and survival analysis is how we choose the best distribution that fitted the data. Sometimes we see that the handle data have two fitted distributions. Both inverse Gaussian and gamma distributions have been used among many well-known failure time distributions with positively skewed data. The problem of selecting between them is considered. We used the logarithm of maximum likelihood ratio as a test for discriminating between these two distributions. The test has been carried out on six different data sets.
Keywords: Inverse Gaussian Distribution, Gamma Distribution, Ratio Maximum Likelihood, Discrimination
It is well known that the inverse Gaussian distribution (IG) and gamma distribution (GAM) are used to analyze asymmetric positively data. In reliability and survival analysis we need these distributions on modeling the failure time data. Sometimes we see that the both distributions fit our data. So, the question is: which one will be preferable than the other? To answer to this question we use in this paper the likelihood ratio test to discriminating between the IG and GAM distributions. Six data sets have been taken to prove our test. Discriminating between any two general probability distribution function was studied by Atkinson (1969, 1970), Dumonceaux et al (1973), Dumonceaux and Antle (1973), and Kundu and Manglick (2004, 2005).
This paper is organized as follows. Section 2 and section 3 show the properties of the IG and GAM distributions, respectively. In section 4 the description of the likelihood ratio test is mentioned. Six data sets are analyzed in section 5.
2. The Inverse Gaussian Distribution
The inverse Gaussian distribution is used to model nonnegative skewed data. This distribution referred to the theory of Brownian motion because the distribution of the first passage time of a Brownian motion belongs to the inverse Gaussian (Cklikara & Floks 1988). Inverse Gaussian distribution has many applications and uses especially in reliability (survival analysis), and in the area on natural and social sciences. Since it is a positively skewed distribution, it has advantage over some other skewed distributions like lognormal, gamma, and weibull.
The p.d.f of an inverse Gaussian r.v X is
Where and. The parameter represents the mean of the distribution and represents the scale parameter. There are three other forms of (1) (Tweedie 1957).
The likelihood function of (2.1) is
And the natural logarithm of (2) is,
From (3) one can obtain the m.l.e for and (Tweedie 1956) as following:
3. The Gamma Distribution
The Gamma distribution is widely used in engineering, science, and business, to model continuous variables that are always positive and have skewed distributions.It is also a flexible life distribution model that may offer a good fit to some sets of failure data. The density function of the gamma distribution with shape parameter and the scale parameter will be
The likelihood function of the gamma p.d.f is,
The natural logarithm of (7) is,
By solving for (8), we get
And solving and substitute the equation (9), we get
Where digamma function , (9) and (10) represent the m.l.e for and (Johnson & Kotz, 1995).
4. Likelihood Ratio Test
A likelihood ratio test (LRT) is a statistical test relying on a test statistics computed by taking the ratio of the maximum value of the likelihood function.
Let are i.i.d random variables from a known distribution (with p.d.f). Recall that the likelihood function and its logarithm are given, and then the LRT (let us denoted it here by L) is defined as:
where and are the likelihood function of a known different p.d.f, and, and are the m.l.e of, and, respectively. Now, from our problem, we rewrite (11) as:
By taking the natural logarithm of (12) and from (3), (4), (5), (8), (9), and (10), one can get
Where, and is the arithmetic, geometric, and harmonic mean, respectively. The hypothesis test will be
= The data belong to the IG distribution.
= The data belong to the GAM distribution.
Our decision to choose whether the data belong to the IG or to the GAM distribution is based on the value of (13). If we choose the IG distribution as a fitted to the data, elsewhere ( ) we prefer the GAM distribution as a fitted to the data.
5. Analysis of Data
In this section we have taken six data sets in order to apply the formula (13) to discriminating between the two mentioned distributions.
5.1. Data Set (1)
Gacula and Kubala (1975) give the following data on shelf life (days) of a food product: 24, 24, 26, 26, 32, 32, 33, 33, 33, 35, 41, 42, 43, 47, 48, 48, 48, 50, 52, 54, 55, 57, 57, 57, 57, 61.
|K-S = 0.1386||K-S = 0.1378|
Both K-S values are significant (i.e. the data belong to the both distributions). But the value ofis 1.1369 > 0, therefore the IG distribution is more suitable than GAM distribution. Also, the K-S distance of IG is less than the K-S of GAM.
5.2. Data Set (2)
The second set gives data of precipitation (inches) from Jug Bridge, Maryland (Chhikara and Folks, 1978).
|K-S = 0.1458||K-S = 0.15|
Because of the value of = 1.8952 > 0, we conclude that the data well-fitted by the IG distribution.
5.3. Data Set (3)
Kumagai et al (1989) presented the following time series data for toluene exposure concentrations (8 hr TWAs) for a worker doing stain removing.
|K-S = 0.0973||K-S = 0.0952|
According to the values of K-S test of the two distributions, we conclude that the data are very well described by these two distributions. But = 2.4588 > 0, we prefer that the IG distribution well be more reasonable.
5.4. Data Set (4)
Kumagai and Matsunaga (1995) give these data 1.5, 1.7, 2.1, 2.2, 2.4, 2.5, 2.6, 3.8, 3.8, 4.2, 4.3, 5.6, 6, 7, 7.5, 9.3, 9.9, 10.2, 10.6, 12.3, 12.9, 13.7, 14.1, 17.8, 27.6, 31, 42, 45.6, 51.9, 91.3, 131.8.
|K-S = 0.2205||K-S = 0.088|
The value of is 5.9404 > 0. It suggest that the IG distribution to be preferred over the GAM distribution. According to the K-S test these data belong to both distributions.
5.5. Data Set (5)
This data represent the survival times in weeks for male rats. (Lawless, 2003).
|K-S = 0.09221||K-S = 0.1561|
Both K-S values are significant. But the value of is – 1.152< 0, therefore the GAM distribution is more suitable than IG distribution.
5.6. Data Set (6)
The following data are failure times (in minutes) of electronic components (Lawless, 2003).
|K-S = 0.10336||K-S = 0.25|
According to the values of K-S test of the two distributions, we conclude that the data are very well described by these two distributions. But = - 22.0976 < 0, we prefer that the GAM distribution well be more reasonable.