A Copula Based Test for a Two Component Bivariate Mixture Distribution

The paper presents a Copula based approach to test for a two component bivariate mixture distribution. The regular joint density is modeled by using the Copula and then the Locally Most Powerful test (LMP) test is derived by using this Copula based regular density. This is a fairly simple test compared to the dip / depth test developed by Hartigan. Our simulation results show that this Copula based (LMP) test is very powerful in detecting a mixture.


Introduction
Mixtures of multivariate distributions are a recurring phenomenon in many fields and the need for a reasonably good hypothesis test has been felt by the scientific community for a long time. Wolfe (1970) used the likelihood ratio to test for a two-component normal mixture against a normal null hypothesis. The test is very sensitive to the normality assumptions. Another test developed by Engelman and Hartigan (1969) will not work when the bimodal alternative is not a normal mixture. The test developed by Giocomelli et al (1971) needs the modes specified in advance. Another test, 'Dip Test of Unimodality' developed by Hartigan and Hartigan (1985) requires a fairly large sample for this test to work. In a way, the presence of multiple modes is an indication that a data follows a mixture of distributions. But, the irony is that a mixture of distributions may not possess multiple modes. Moreover, the maximum likelihood estimate for the density may not exist in the neighborhood of the mode. The situation becomes trickier in the case of multivariate distributions. So, when the "Dip" test fails to detect multiple modes that does not necessarily mean that the underlying distribution is not a mixture of distributions. On the other hand, the copulas are used as a general way of formulating a multivariate distribution in such a way that the dependence can be infused in a reasonable manner. This is based on a simple idea that the joint distribution can be represented as a transformation of the underlying marginal distributions (see Sklar 1959). There are several copulas and each of these differ according to the strength of the dependence and the direction of the association. For the literature review, the interested readers are referred to Nelson (2006). Here in this paper, we consider two particular copulas; the Gaussian Copula and the Clayton Copula. The Clayton Copula belongs to the family of Archimedean Copulas whereas the Gaussian Copula does not belong to the Archimedean family. The Copulas are used in the construction of the regular joint density and then the Locally Most Powerful test (see Rao (1973)) is derived by using this copula joint density. Based on these Copula densities, a locally most powerful (LMP) test is developed in place of the "Dip" test to test for a possible mixture of multivariate (in this case, bivariate) distributions. Here each component of the mixture is a bivariate distribution. In this paper, two different types of copulas are studied namely Gaussian and Clayton copulas. This LMP test is seen to be very powerful in detecting whether a distribution is a mixture or not in this bivariate situation.
We divide the paper into several sections. The first two sections are devoted for the introduction and the third section contains the methodology. The fourth section is for the numerical results and the last section has the conclusion.

Copulas
In this section, we present the results pertaining to the copulas.

Definition
A copula is a multivariate joint distribution defined on the ( ) has all the components equal to 1 except the th i one which is equal to i u .

Copula Construction
Here, we discuss the construction of the Copulas. There are primarily two major types of Copulas; Archimedean Copula and the non-Archimedean Copulas.
Archimedean Copula: This is a family of copulas and the k − dimensional Archimedean Copula is defined as follows.
where Ψ is known as the generator function and i u is the marginal distribution of the th i component. Any generator function which satisfies the following properties is the basis for a copula.

Clayton Copula
Let the generator function ( )

α >
One can show that the functional inverse, In the bivariate case, the Archimedean formulation yields the Clayton Copula as

Methodology
As we know, the two component situation arises in the context of many situations. In educational data, in econometrical data, in astrophysical data, the two component situation is very common.
Suppose that we have a bivariate two component mixture with the component densities given by 1 f and 2 f respectively for the components with p as the mixing proportion. Then, the mixture density function is given by and the likelihood function is By taking the natural logarithm, we have Next, taking the partial derivative with respect to p and then evaluating the derivative at 0 p = yields, This means that in order to test the hypothesis, As we can see from ( ) 3.5 , the test-statistic involves the ratio of the joint density functions. Here, we will use the Clayton Copula to construct the joint density function from the marginal distributions.

Joint Density Construction
Clayton Copula: According to ( ) 3.1 , the mixture distribution is given by and by using the Clayton Copula, one can write Let ( ) ( , , , Note that the partial derivative with respect to x , . .
Similarly, the partial derivative with respect to y,  .
Where , a b are the respective means of , X Y under the null distribution.
Example: Let us suppose that in a given situation, only the marginal distributions are known to us and that ( ) are the marginal distributions of the second component.
Here, we assume that the joint density can be modeled by using the Clayton Copula.
So, according to ( ) 3.8 , the density ratio is as follows, . .

Numerical Results
We did a lot of simulations by generating samples of sizes 100 n = . In order to generate the bivariate exponential sample, we set the parameters 1 0.7, for the respective marginal exponential distributions. Here is how we generated the bivariate exponential sample with the dependence parameter .

Power Computation (Clayton Copula based)
Here, we present the estimated power of the hypothesis test for various choices of the mixing proportion based on 1000 simulation runs from the Clayton Copula model. We used the same set of values that were used in the previous example.

Gaussian Mixture Copula Model
Here, we will investigate a mixture of Gaussian Copulas. Consider the following two component Gaussian mixture.

Power Computation (Gaussian Copula Based)
Here, we present the estimated power of the hypothesis test for various choices of the mixing proportion based on 1000 simulation runs from the Gaussian Copula model. We used the same set of values that were used in the previous example. In order to describe the dependence, we found the corresponding correlation coefficients.
For this hypothesis test, 0 : 0 H p = (no mixture) versus 1 : 0 H p > (mixture) Note that in Table 2, p represents the mixing proportion.

Power Comparison of Clayton and Gaussian Copulas
The power computations in Tables 1 and 2, show that the Gaussian Copula based method is slightly better in this case in detecting a mixture compared to the Clayton Copula based method. Overall, both methods are very good.

Conclusion and Discussion
As we can see from the power computations, our method is not only computationally very easy but also is very powerful. This test easily detects the bivariate mixture distributions. In this paper, only the Clayton and the Gaussian Copulas were considered. The author proposes to consider the other types of Copulas in the future work.