Intrinsically Ties Adjusted Partial Tau (C-Tap) Correlation Coefficient

This paper present a non-parametric statistical method for the estimation of partial correlation coefficient intrinsically adjusted for tied observations in the data. The method based on a modification of the method of estimating Tau correlation coefficient may be used when the population of interest are measurements on as low as the ordinal scale that are not necessary continuous or even numeric. The estimated partial correlation coefficient is a weighted average of the estimates obtained when each of the observations whose assigned ranks are arranged in their natural order as well as the observations whose assigned ranks are tagged along, with the weights being functions of the number of tied observations in each population. It is shown that failure to adjust for ties tends to lead to an underestimation of the true partial correlation coefficient, an effect that increases with the number of ties in the data. The proposed method is illustrated with some data and shown to compare favorably with the Kendall approach.


Introduction
proposed a non-parametric method for estimating the simple correlation coefficient as well as any desired partial correlation coefficient between any two samples drawn from any two populations while holding at constant level values of observations drawn from the third population.
In Kendall's approach, the populations of interest may be measurements on as low as the ordinal scale and need not be continuous or normally distributed.
Using the foundation of [4], [1] proposed a more formulated non parametric statistical method for the estimation of a partial correlation coefficient between two variables X and Y say, when the third variable Z say is held constant and again like Kendall's approach, these variables may not necessarily be continuous or normally distributed.
In this paper, we propose another non-parametric method for the estimation of partial correlation coefficient that is intrinsically adjusted for tied observations in the data.
Apart from the fact that data may not be necessarily continuous or normally distributed, it may also not be numeric. The proposed method is more generalized and covers cases in which there are tied observations in any of the three populations which is intrinsically adjusted for those tied observation and when there are equal and unequal observation on the sampled populations which [4] and [1] did not put into consideration. Now, This maximum possible total number of agreement between the ranks assigned to observations from population X and Y provided that ranking of observations from Y and X are in their natural order [4]. max a k S r S = (2) This is the Kendall's tau correlation coefficient rationale and is the basis of estimation where a S is the total sum of 1s or (+) and -1s or (-) obtained by numbers of each pair of ranks assigned to Y say, when the observation are arranged in natural ordered ranks of observations from X.
Using equation 2 for any two equal sample observation, we estimate a partial correlation coefficient between two populations X and Y say, when the values of a third population Z say are constant. This is expressed for obtaining partial correlation coefficient based on the simple tau correlation coefficient [4], [5], [7].
Where rxy , rxz and ryz are respectively the tau correlation coefficient between observations from population X and Y, X and Z, Y and Z. [4], [5] methods of obtaining . rxy z exist in a very tedious way in practice. Again Kendall basic formulae do not have provision for presence of ties in the data.
Here, we proposed to develop using equations 2 and 3 a more formulated non parametric statistical method for the estimation of partial correlation coefficient between two variables X and Y say, when a third variable Z say is held constant, and these variables are not necessarily continuous or normally distributed, but measured on at least the ordinal scale.
The proposed method is more generalized and covers cases in which there are no tied observations in any of the three populations, and when there are equal and unequal tied observations on the on the sampled population.
The method is referred to as C-TAP for ties adjusted partial tau correlation coefficients which differentiate the proposed method from the usual Kendall tau partial correlation coefficient.

The Proposed Method
Consider the variables i , and z i i x y for the th i observation in a random sample of size n sampled from populations X, Y and Z respectively. In this method, population X, Y and Z may be measurements on as low as the ordinal scales that are not necessarily continuous or even numeric. We define rix as the rank assigned to i x , riy the rank assigned to i y and riz the rank assigned to i z for i = 1, 2, 3,…, n as usual, tied observations in each sample are assigned to their mean ranks. Then 1, 1, 2,... 1; 2,3,... ; . for j n k n j k = − = < See [1] That is provided that the rank assigned to the th k observation from the population X comes after, that is succeeds the rank assigned to the th j observation from the same population when these observations are arranged in accordance with the natural ordering or ranking of the corresponding sister observations from population ( ) Now from Equations (4) and (5), we have that Similarly; , z x π and , z x π − are respectively the probabilities that the rank assigned to the th j observation from population X is less than, equal to or greater than the rank assigned to the th k observation from the same population if the rank assigned to this th k observation succeeds the rank assigned to the th j observation from X when these observations are arranged in accordance with natural ordering of the ranks assigned to the observations from population Z . These probabilities are estimated as n n n n n n π π π where zf x are respectively the number of 1s, 0s, and -1s in the frequency distribution of these numbers in ; . ; 1, 2... 1, 2,3.... , . ujk x z j n k n j k = − = < Hence the sample estimate of the total number of times the ranking of observations from population X are in their natural order and consistent with the natural order and consistent with the natural ordering of the ranks of observations from population Z less the number of times they are out of order is obtained from Equation (9) and (10) as ( ) As noted above, if these rankings are in their natural order then the maximum possible total number of arrangements or scores is max 2 (1)). Hence the Kendell tau correlation coefficient between observations from population X and observations from population Z uncorrected for ties in X may be estimated using Equations (i) and (ii) in Equations 2 as Note that if there are tied observations in X , then the estimated correlation coefficient zrx of Eqn 12 would not be unbiased estimate of the true tau correlation coefficient between X and Z . This is because even though the numerator zSx has by specifications been adjusted for possible ties in X , its denominator max S has not been so adjusted. Therefore max S needs to be adjusted. To do this, we subtract 0 f x , the number of tied observations in X from max S to obtain Hence an estimate of a tau correlation coefficient between X and Z adjusted for ties in X is The unadjusted and adjusted tau correlation coefficients between observations from populations Y and Z are similarly estimated. Thus having arranged the ranks assigned to observations from Z in their natural order and tagged along the ranks assigned to the corresponding observations from Y we let the rank assigned to the th k observation from population Y comes after, that it succeeds the rank assigned to the th j observation from the same population when these observations are arranged in accordance with the natural ordering or ranking of the corresponding sister observation from population ( ). Z j k < Similarly, from Eqns 18 and 19 Note that 0 , y y z z π π + , y z π − , and are respectively the probabilities that the rank assigned to the th j observation from population Y is less than, equal to or greater than the rank assigned to th k observation from the same population if the rank assigned to the th k observation succeeds the rank assigned to the th j observation when the ranks assigned to the these observations are arranged in accordance with the natural ordering of the ranks assigned to the observations from population Z. These probabilities are estimated as Where zf y + , 0 zf y and zf y − are respectively the number of 1s, 0s, and -1s in the frequency distribution of these numbers in ; . ; 1, 2... 1; 2,3... , ujk y z j n k n j k = − = < .
Hence the sample estimate of the total number of times the rankings of observations from population Y are in their natural order and consistent with the natural ordering of the ranks of observations from population Z less the number of times they are out of order is obtained from Equations (20) and (22) Hence as before, the tau correlation coefficient between Y and Z unadjusted for ties in Y is estimated as zf y zf y zry z y z y n n π π The corresponding estimate for variance is from Eqn (21) z y z y z y z y Var zry n n Now the tau correlation coefficient between Y and Z adjusted for ties in Y is estimated as To estimate the tau correlation coefficient between observations from population X and observations from population Y, we let That is provided that the rank assigned to the th k observation in the sample drawn from population Y comes after, that is, succeeds the rank assigned to the from Eqns (28) and (29) Similarly from Equations (31) and (32) Note that 0 , and xy xy yx π π π + − are respectively the probabilities that the rank assigned to the th j observation from population X is less than, equal to or greater than the rank assigned to the th k observation from population Y , when the observations have been arranged so that the rank assigned to them correspond with the natural order of the ranks of their sister observations from population Z with the rank assigned to the th k observation in Y succeeding the ranks assigned to the Following a similar argument as above the tau correlation coefficient between X and Y unadjusted for ties between these populations is estimated as ( ) The corresponding ties-adjusted tau correlation coefficient for ties between X and Y is estimated as That is As noted above, in the presence of ties in the data, the estimated tau correlation coefficient is not independent of which of the two populations being correlated has its assigned ranks arranged in their natural order and which has its assigned ranks tagged along. To adjust for this effect we would need to use each of these two sets of ranks to alternatively play each of the two roles. Thus to estimate , xrz the correlation coefficient between X and Z when X has its assigned ranks arranged in their natural order, and Z has its corresponding assigned ranks tagged along, we let 1, ; . E ujk z x x z x z Var ujk z x x z x z x z x z π π π π π π And ( ) ( ) Note that _ 0 , and x z x z x z π π π + are respectively the probabilities that the rank assigned to the th j observation from population Z is less than, equal to or greater than the rank assigned to the th K observation from the same population if the rank assigned to the th K observation succeeds the rank assigned to the th j observation when the ranks assigned to these observations are arranged in accordance with the natural ordering of the ranks assigned to the corresponding sister observations from population X and are estimated as Hence as before, the tau correlation coefficient between X and Z unadjusted for ties in Z is estimated as ( ) x z x z x z x z Var xrz n n π π π π Now the tau correlation coefficient between X and Z adjusted for ties in Z is estimated as That is The tau correlation coefficients between populations Y and Z unadjusted as well as adjusted for ties in Z are similarly estimated. Thus following the above procedures we have that The ties adjusted tau correlation coefficient between X and Z is estimated as a weighted average of . xrz c and . zrx c , where the weights are functions of the number of tied observations in the two sampled populations and is estimated as Similarly, the ties adjusted tau correlation coefficient between Y and Z is estimated as y z y z z y z y yrz zry ryz c y z z y y z z y π π π π π π π π Use of Equations 39, 56, and 57 in Eqn 3, yields an estimate of the ties adjusted or corrected partial tau correlation coefficient between observations from populations X and Y holding at a constant level of observations from population Z as rxy c rxz c ryz c C TAP rxy z c rxz c ryz c Note that the unadjusted partial correlation coefficient . rxy z is equal to the ties adjusted partial correlation coefficient, . ; rxy z c only if there are no ties in the data, otherwise . rxy z tends to provide an under estimate of the true partial correlation coefficient, a bias that increases with the number of ties in the data.

Illustrative Example
The following are the later grades earned by 13 candidates under three judges in a job interview. Two of the judges X and Y are males while judge Z is female. Interest here is to estimate the partial correlation coefficient between the score or assessment of the candidates by the two male judges when the female judge is controlled. The later grades awarded the candidates by each of the three judges are here ranked from the highest 'A + ' to the lowest 'F', assigning the rank of 1 to A + , the rank of 2 to A and so on until the rank of 13 is assigned to 'F'. All tied grades or scores under each judge are assigned their mean rank.
To apply the proposed method, we here first arrange the ranks assigned by judge Z in their natural order and then tag along the ranks of the grades assigned the candidates by each of the other two judges X and Y the results are shown in table 2 Table 2. Natural Order of ranks for Z with corresponding ranks for X and Y.

Candidate Number
Natural Order for Rank of ( ) 6.5 13 13 8 6.5 6.5 8 10 6.5 11 6 11 9.5 11 11.5 12 9.5 1.5 2 9 11 8.5 6 4 12.5 4.5 6 7 12.5 8.5 11.5 To calculate zix , the ties adjusted correlation coefficient between X and Z when the ranks of Z are arranged in their natural order and the corresponding ranks of X are tagged along, we may first obtain the values of ; . ujk x z of Equation 4 preferably in a tabular form (Table 3) From Table 3   The tau correlation coefficient rxy between X and Y is obtained using the values of ; . ujk xy z of equation 28 shown in table 5.  To estimate the tau correlation coefficient , xrz between X and Z when the ranks assigned to observations from X are arranged in their natural order and the ranks assigned to the corresponding observations from Z are tagged along, we use the values of ; . ujk z x of equation 41 shown in table 6.  Notice that the simple correlation coefficient between X and Y is about 0.180 while the partial correlations coefficient between X and Y is only 0,163, indicating that the assessment by the female judge seems to reduce the strength of the association or agreement between the male judges in the assessment of the candidates. Also, if no adjustments have been made for the presence of tied observations in the data, the estimated partial correlation coefficient would have been z rxy. = .
( . Showing that not adjusted for the presence of ties in the data, tends to lead to a probable underestimation of the true partial correlation coefficient.

Conclusion
This paper has presented a non-parametric method for estimating partial correlation coefficient between two variables holding a third variable constant, when there are tied observations in the data. The proposed method is more generalized and covers in which there are no tied observations in any of the three populations; and when there are equal and unequal tied observations on the sampled populations. The proposed method uses a modified approach to the estimation of tau correlation coefficient and assumes that the populations of interest may be measurements on as low as the ordinal scale. The estimated ties adjusted partial correlation coefficient is a weighted average of the estimates obtained when the ranks assigned to the observations from each of the sampled populations are alternatively used as those that are naturally ordered as well as those that are tagged along. It is shown that when there are ties in the data, failure to adjust for these ties tend to result to an underestimate of the true partial correlation coefficient. This bias increase with the number of tied observations in the data.
The proposed method is illustrated with some data and is shown to compare favorably with the Kendall approach.