Two Factor Data Analysis with Unequal Cell Frequencies and Interaction

This paper proposes a non parametric method for two factor data analysis with unequal cell frequencies and interaction. Chi-square test statistic was developed for testing the null hypothesis of no treatment effect and interaction between factor A and factor B. The proposed methods are illustrated with some data and compared with the usual unweighted mean method. The result showed that the proposed method is more powerful than the method of unweighted mean.


Introduction
Analysis of variance (ANOVA) is generally regarded as the best analysis technique for balanced experiments that have equal number of subjects in each group that is cells with equal frequency [3]. Just as it may often be too difficult and too expensive to obtain more than one observation per treatment combination, it may also prove impossible to obtain equal number of observation per cell in a two factor analysis. For example, even though an experiment was planned with equal number of observations per cell. Some of the observation may end up missing for various reasons.
Some classifications of missingness were given as missingness at random (MCAR) as a situation where the probability of missing data does not depend on observed or un observed data, missing at random (MAR) as the probability that the missing data does not depend on the observed data while missing not at random (MNAR), is the probability that the missing data depends on the unobserved data conditional on the observed data [4]. Data with unequal cell frequency are not too far from those with equal frequency, it is sometimes possible to use approximate procedures that convert the former from the later. In practice the decision must be made when data are not sufficiently different from the case with equal frequency which makes the degree of approximation introduced relatively unimportant. [1] Two -way ANOVA with unequal cell frequencies without assumption of equal error variance was considered by taking generalized approach to finding p-values [5]. But when the sample size per treatment combination is not the same for all treatments in a two factor ANOVA, the factor effect become more complicated and the usual calculations are no longer directly applicable [7], [9]. In this situation, the easiest and exact way to obtain the proper sum of squares for testing factor effects and interactions is through regression approach. [8]. Approximate methods however exist including the socalled method of weighted means, assuming all the assumptions for the use of ANOVA t-test are satisfied [6], [10], [1]. We therefore however present an alternative nonparametric method that will take care of different factors and interaction effects.

Methodology
Let ilj x be the ith observation at the th l level of factor A and th j level of factor B , for . Let lj n be the number of observations in the ( ) th lj cell. Then an analysis based on the un-weighted means using the variable [6]. x ' as if they were single observations for the treatment combination per cell. However, the sum of squares are no longer additive in the sense that the individual sums of squares no longer add up to the total sum of squares. The sum of square error must now be calculated directly and independently from its basic definition, which may be sometimes more time consuming. Hence, instead of using the un-weighted means approach, we will propose a method based on the rank of sample observations. To has chi-square distribution with ( ) ( )    has the chi-square distribution with ( ) ( ) ( ) degrees of freedom and may be used to test the null hypothesis of no factor A by factor B interaction effects.
In two factor analysis a null hypothesis which is usually of interest is that there are no treatment effects. If this null hypothesis is rejected, then one may proceed to test the null hypothesis that the effects of each of the factors A and B are zero assuming that the interaction effects have been found not to be statistically significant or that the interactions have been removed by an appropriate data transformation.
The null hypothesis of no treatment effect is tested using the chi-square statistic of Equation (4). The null hypothesis is rejected at the α level of significance if If this null hypothesis is rejected then we would need to first test the null hypothesis of no significant factor A by factor B interaction effects. This null hypotheis is tested using the chi-square statistic for interaction in Equation (11).
The null hypothesis of no factor A by factor B interaction effect is rejected at the α level of significance if the chisquare value of Equation (10) or (11) is greater than the chisquare critical value with ( )( ) degrees of freedom. If this null hypothesis is rejected, one may then proceed to test the null hypothesis about factor A and factor B effects using Equations (8) and (9)

Illustration
We shall use the data on final cumulative grade point average (FCGPA) of students who graduated in statistics from a certain University by State of origin for four years. The result is presented in Table 1. [10]. Using the unweighted mean approach, we obtain the entries in Table 2 using Equation (1) The data in Table 2 are subjected to the standard balanced ANOVA technique without interrraction to obtain the sum of squares and the result of the analysis is presented in Table (3) The proposed method Observations are pooled together and assigned ranks. In the presence of tied observations, the mean of their rank are assigned to them. Further, the individual observations are replaced with their ranks and presented in Table 4. The ranks in each of the cells are summed to obtain lj R . and they presented in table 5 From Table 5, the chi-square values for the source of variations Total SS , SST , SSA , SSB and SSE were obtained and presented in Table 6

Conclusion
In this paper, we have proposed a non parametric method for two factor data analysis with unequal cell frequencies and interaction. This was done by using the ranks of the sampled observations to obtain the chi-square statistic for the testing the null hypothesis of no treatment effect and no interaction between factor A and factor B.
Further the application of the proposed method is studied in practice by considering a real life example on students' final cumulative grade point average (FCGPA) and State of Origin of these students. The chi-square test statistic were estimated based on the proposed methods and the result obtained showed better estimates when compared with the method of unweighted mean.