Modeling Dependence Relationships of Anthropometric Variables Using Copula Approach

Copula model is introduced in modeling the co-dependence structures of anthropometric variables-Body mass index (BMI), Abdominal circumference, Adiposity and Percent body fat-because it can capture monotonic dependence. Four copula-based Kumaraswamy-epsilon distributions are derived and used to determine the best fit to the anthropometric data, these are new. These are the Gaussian, Clayton, Frank and Gumbel copulas. Clayton model provided the best fit in four bivariate pairs-BMI and Percent body fat, BMI and Abdominal circumference, Adiposity and Abdominal circumference and Abdominal circumference and Percent body fat-while Gaussian is best for BMI and Adiposity pair and Frank is best for Adiposity and Percent body fat pair. Copula-based Kendall’s tau and tail dependence are used as estimates for measuring the strength of the co-dependence. The results strongly recommend the use of BMI as an anthropometric index for estimating human body composition of adiposity. However for individuals with BMI values in the two extreme tails, their adiposity should be measured directly. The results do not find any suitable anthropometric indices for estimating percent body fat and therefore is recommended that for such epidemiological research, percent body fat should be measured directly. The results also clearly show that the Kendall’s tau and the corresponding Pearson correlation coefficient estimates are largely at variance whenever the co-dependence structure cannot be described as linear dependence. This can prompt contradictory conclusions. It is therefore suggested that for such research, whenever Pearson correlation coefficient method is in use, a coefficient of determination of a minimum of 75% should be obtained before any anthropometric index can be recommended for body composition substitution.


Introduction
In the study of dependence among variables, Pearson's correlation coefficient is the measure of dependence most widely used. It is actually a measure of linear dependence and not general dependence. The other measures of dependence are the Kendall's tau and Spearman's rho. These are distribution-free methods. When a bivariate distribution can assume an elliptical form (examples multivariate normal or multivariate Student), then the dependence structure among the variables is linear and as such the use of Pearson's correlation coefficient is appropriate. But when the distribution is non-elliptical, the use of Pearson's correlation coefficient may lead to misleading conclusions [1]. Consequently, there is a need for alternative measures of dependence that is appropriate when the bivariate distribution is non-elliptical. The copula-based ones are in use.
Copula provides a link between a bivariate (multivariate) distribution and its component marginal distributions. It has the advantage that the appropriate marginal distributions can be selected freely and be linked through a suitable copula. Copula-based measures of dependence measure the degree of monotonic dependence between two variables whereas linear correlation measures the degree of monotonic linear dependence only. The use of copula-based measures for monotonic dependence instead of the linear correlation coefficient is suggested [13]. These authors also opined that a copula is invariant under increasing and continuous transformations of the marginals. Thus, the copula approach is recognized as a powerful tool for modeling dependence between variables. A diagram distinguishing between monotonic dependence and monotonic linear dependence is shown in Figure 1 below.
In most literature, the study of co-dependence between anthropometric variables is done based on the assumption of a linear correlation. For example, it is found [7] that the linear correlation between body mass index (BMI) and some anthropometric variables is strong and positive. Other examples include [2,3,29,37]. In this study we introduce a copula approach. That is, we construct copula-based bivariate Kumaraswamy-epsilon distributions and apply the same in modeling co-dependence between anthropometric variables. This is new.

Review of Copulas
A copula is a multivariate distribution function on a unit cube 0,1 in with uniform marginal distributions. It relates an arbitrary distribution function on to a copula through the marginal distribution functions , … , . The name and theory of copula are rooted in Sklar's theorem [36], and the frequency of its appearance in the literature increased as from 1999 [15]. An elaborate article [13] motivated the application of copulas in the financial sector for assessment and management of risk in portfolio investments. Li argued "… why a copula function approach should be used to specify the joint distribution of survival times after marginal distributions of survival times are derived from market information, such as risky bond prices or asset swap spreads" [25]. Today, studies and applications of copulas have become very popular among academicians, engineers, economist, actuarial scientist, dynamic system modelers and more [14].
Many copula families are in use for constructing multivariate distributions; for example, elliptical, Archimedean, Archimax, and order statistics copulas. Comprehensive and elaborate studies on these families and some areas of application can be found in the literature, for examples [31,18,30]. Members of the elliptical copula family, for instance, are the normal and Student's t copulas, and they form a class of implicit copulas. These copulas have found application in modeling multivariate relationships in the financial sector [40,25,27,23].

Univariate Kumaraswamy-epsilon Distribution
The Kumaraswamy-epsilon distribution (henceforth denoted K-epsilon distribution) was introduced [16] as a new probability distribution with shapes similar to most lifetime distributions-for instance, gamma, Weibull and lognormal. It is a continuous probability distribution function of the Kumaraswamy-G family [9] with base epsilon probability distribution [12].
A continuous random variable is distributed according to the K-epsilon distribution with parameters , , and if its probability density function is given by where 1 " # , 0 $ $ and , , , % 0 ; and control the skewness and tail weight of the distribution, respectively.

Gaussian Copula-based Bivariate K-epsilon Distribution
The Gaussian copula is an implicit copula presented in the form of the bivariate Gaussian cumulative distribution function. It is given by where −1 < C < 1 and C denotes the copula parameter, which is copula-type specific. The corresponding Gaussian copula density function is given by where Ψ 4 = Q 3 4 , 5 = 1, 2 , and Q • is the inverse standard normal distribution function. The Gaussian copula-based bivariate K-epsilon probability density function is given by

Frank Copula-based Bivariate K-epsilon Distribution
The Frank copula distribution and density functions are, respectively, given by >

Gumbel Copula-based Bivariate K-epsilon Distribution
The Gumbel copula is both an Archimedean as well as an extreme-value copula. Its distribution and density functions are given, respectively, by  (16) where C ≥ 1. The Gumbel copula-based bivariate K-epsilon probability density function is, therefore, given by

Copula Measures of Dependence
A copula-based measure of dependence measures the degree of monotonic dependence between two variables, whereas Pearson's correlation coefficient measures the degree of monotonic linear dependence only. Copula-based measures of dependence for the Kendall's tau and tail dependence can be derived from relations obtainable from the literature, see for example [1]. The expressions for these derived measures are presented in Table 1 below.
C is the copula parameter and k = 9 l m 9 X n o LI, 6 = 1, 2 is the 6 mp order Debye equation

Review of the Anthropometric Variables and Impact on Human Health
Anthropometry is the scientific study of the measurements and proportions of the human body. They are applied in the textile industries for design purposes. For instance, there are 41 predefined feature lengths on the body most commonly used by the fashion industry [24] for footwear and clothing design, and they are also used by both working and household environments, to achieve the best match between products and their users [41]. However, the focus of this study is not for such but rather for their use in determining human health indices.
Certain anthropometric measures are used as indicators, or identifiers, of chronic human health risks. For instance, abdominal (waist) circumference is a relative determinant of adiposity, also known as obesity; and obese persons have high risk of cardiovascular diseases and diabetes mellitus [5]. Abdominal circumference is also used as a complimentary measure to provide information on percent body fat [4]. Percent body fat is also an indicator of human health risks and its assessment is used commonly for categorization in health and sports performance-men and women with more than 25 and 30 percent body fat, respectively are considered obese [39] and stand the risks of hypertension, dyslipidemia and hyperglycemia [44]. Body mass index (BMI) is also a common measure of obesity and studies have shown that it correlates with percent body fat. It is found [29] that the range of correlation between BMI and percent body fat is from 0.61 to 0.85 within location and sex groups in Nigerian, Jamaican and African American populations. Many other references on the interdependence of anthropometric variables and their impact on human health can be found in the literature, see for examples [3,6,2,26,37].
Body composition (adiposity and percent body fat) can be measured directly. For example, percent body fat can be measured [2] using bioelectrical impedance analysis (BIA), but this is very expensive. Consequently, for health research, it is important to find a reliable easy-to-use method of determining body composition. Research so far has concentrated on exploring the correlation between anthropometric variables and body composition measures by the use of Pearson correlation coefficient. This informed the reasoning behind substituting, for example, BMI for body composition assessment. Here, we introduce copula model alternatives because they have the advantage of capturing general dependence as opposed to the case of Pearson correlation coefficient which assumes only linear dependence.

The Model
In most studies on anthropometric variables, Pearson moment correlation coefficient is employed as a measure of dependence. The Pearson's correlation coefficient is a measure of linear dependence and not general dependence as explained earlier. For example, it was found [29] that the functional relationship between percent body fat and BMI were quadratic in all location and sex groups in African American and Jamaican populations. This implies that the use of Pearson moment correlation coefficient to depict dependencies in this, and many other similar scenarios may lead to misleading conclusions. This creates the need for alternative methods for capturing co-dependence such as copula-based measures. A copula-based measure of dependence measures the degree of monotonic dependence between two variables, whereas Pearson's correlation coefficient measures the degree of monotonic linear dependence only.
Consequently, we introduce the copula approach to measure co-dependence among anthropometric variables. What it entails is as follows: 1. We model the co-dependence of variables in anthropometry using copula-based bivariate K-epsilon distributions derived in equations (8,11,14) and (17). 2. We then estimate the copula-based Kendall's tau and tail dependence measures from the estimated copula parameter of the models. Four copula models are used in this study. This is done in order to provide a greater depth in the search for appropriate models that can best describe the co-dependence structure between pairs of anthropometric variables examined.

The Data
The data used for application are anthropometric measurements on four variables; namely, BMI, percent body fat, adiposity and abdominal circumference. The data were collected for 250 men between the ages 22 years and 81 years (average age is 44.88 years) and obtained from a sample data placed on Dr. John Ralph's statistics website: https://www2.statson.edu/jrasp/data.htm/body fat. The data were originally on 252 men but two were removed as a measure of data cleaning. Weight and height were initially in pounds (lb) and inches (inch), respectively, and were converted to kilograms (kg) and metres (m) upon multiplying with respective factors 0.453592 and 0.0254. BMI was computed as weight divided by the square of height.

Correlation Matrix Plots
Four anthropometric variables were chosen based on their relevance in determining possible human health risks [33,44,5]. Preliminary correlation plots to show the trend in the scatter plots of variables in the study are presented in Figure  1 below. This serves as an essential prelude to any statistical analysis.
The scatter plots present varied pictures, suggestive of the possible copula model that can capture their codependence.

Fitting the Univariate Distributions
Since interest here is to fit the copula-based bivariate Kepsilon distributions in equations (8,11,14) and (17), we need to first find the appropriate marginal distributions. This is determined by fitting the univariate K-epsilon and normal distributions to the datasets using fitdistrplus package in R. The results are presented in Tables 2 and 3, respectively. Both distributions fit the data. However, the values of the Akaike information criterion (AIC) suggest that the Kepsilon distribution performed better. Hence the choice of the K-epsilon distribution as marginals for determining the best copula model(s) is appropriate.

Fitting the Copula-based Bivariate K-epsilon Distributions
As derived above, we are using copula-based bivariate K-epsilon distributions as models for studying the co-dependence structure of the anthropometric variables in the study. In order to further understand the nature of these dependences we are fitting four types of copula-based bivariate K-epsilon density functions, namely, Gaussian, Clayton, Frank and Gumbel copula-based density functions. These are given in equations (8,11,14) and (17) Three methods can be applied for estimating the parameters of the log-likelihood function in equation (18). These are exact maximum likelihood (EML), inference function for margins (IFM) and canonical maximum likelihood (CML). The CML is a semi-parametric method that involves the use of the univariate empirical cumulative distribution function for the margins and plug into equation (19) below to estimate the copula parameter. This is not of interest in this study. The EML method involves estimating all the 9 parameters in equation (18) simultaneously. It produces inconsistent parameter estimates when the number of parameters is large and sample size small [20]. The IFM method is a sequential procedure that involves estimating the parameters of the marginal distributions first. That is, we evaluate | w = arg "… ∑ log N • , | P The IFM gives parameter estimates that are consistent and asymptotically normal [20]. It produces efficient parameter estimates [19]. Also, in an unpublished paper, Xu suggest "that the IFM method is highly efficient compared with the (exact) MLE method" [18]. Its other advantage is that it turns out to be the best method when the number of parameters in the margins is large. With nine parameters in the bivariate Kepsilon distribution, we considered the IFM method an appropriate choice. Equation (19) was used to estimate the copula parameter by using the estimated parameters values for the margins from Table 2. The estimation was done using optim package in R. The results of various copula parameter estimates for dependence among anthropometric variables of BMI, percent body fat, adiposity and abdominal circumference are presented in Table 4 below. The bolded row in every pair of variables indicates the best fitted copula. The relative measure of Kendall's Ž and lower tail dependence based on the best fitted copula in each paired combination are presented in Table 5. The corresponding estimates of Pearson's correlation coefficient and the assessment of strength of dependence in each case are also indicated in Table 5. Scatter plots of the observed and simulated data from the bivariate K-epsilon distribution based on the best copulas, and their respective marginal fit, are presented in Figures 3-5 for the dependence relationship between pairs of anthropometric variables.

Discussion of Results
In Table 4, the bolded row in every pair of variables indicates the best fitted copula. The results clearly show that Clayton copula is the best copula model for measuring the codependence in four out of six combinations of anthropometric variables examined. The Gaussian and Frank copulas are best in one each. Again from Figure 2, the scatter plots, superimposed with the simulated data points from the respective best copulabased K-epsilon distribution, depict same. The Gaussian copula model is compatible with the BMI and adiposity bivariate data. The Gaussian copula has an elliptical distribution as such the co-dependence between BMI and adiposity can be described by a linear dependence. Consequently, the Pearson linear correlation coefficient is an appropriate measure of this dependence. However, it should be noted that for the Gaussian copula there is no tail dependence. Indeed the coefficient of lower and upper tail dependence are zero. This means that irrespective of any high correlation coefficient estimate that may be obtained for BMI and adiposity, extreme events appear to occur independently.   The Clayton copula models are compatible with the bivariate data for the pairs; BMI and percent body fat, BMI and abdominal circumference, adiposity and abdominal circumference, and percent body fat and abdominal circumference. The Frank copula model is compatible with only adiposity and percent body fat bivariate data. The two types of copula models describe a co-dependence that is monotonic but not linear dependence. Copula-based measures of Kendall's tau and tail dependence are better [13] measures for this co-dependence than the linear correlation coefficient. It should be noted that Clayton copula has lower tail dependence. This means that at the lower extremes (not the upper), there is dependence between the two variables in the model in question.
The fitting of the various copula models to the bivariate data has thrown more insight into the nature of their codependence structure. But the measure of the co-dependence derivable from the model has far more practical implications. It provides a reliable base for the use of anthropometric measures as surrogates for estimating body composition (adiposity and percent body fat) in human health assessment.
The measures of co-dependence used in this study are the copula-based Kendall's tau and tail dependence, the results of which are tabulated in Table 5. The estimated values for the Kendall's tau range from 0.41 to 0.94; indicating moderate to very strong dependence. The highest value obtained is 0.94 for the co-dependence between BMI and adiposity. This indicates very strong dependence and supports the substitution of BMI for body composition assessment of adiposity. As explained earlier for the Gaussian copula model there is no tail dependence. Hence, for individuals with values of BMI at the two extreme tails it is better to measure their adiposity directly. Kendall's tau estimates obtained for the measure of co-dependence between adiposity and abdominal circumference is 0.65 indicating strong dependence. Hence abdominal circumference can also be used as a substitute for the assessment of adiposity but BMI is a better substitute. It should be noted that here, the copula model that fits the bivariate data of adiposity and abdominal circumference is Clayton, having a non-elliptical distribution with no upper tail dependence. This suggests that for individuals having abdominal circumference values in the extreme upper tails their adiposity should be measured directly.
It is observable that in this case, the Kendall's tau estimate of dependence measure and that obtained by Pearson correlation coefficient are approximately the same and suggestive of the same conclusion. It is not surprising because the Gaussian copula model suggests a linear dependence for which the Pearson correlation coefficient is also an appropriate measure. In each of all other pairs, the corresponding values of these estimates are conspicuously at variance-suggestive of contradictory conclusions. For example, consider the pair percent body fat and abdominal circumference, where a Kendall's tau estimate of 0.478 is obtained. This is within the limits of moderate dependence, hence the substitution of abdominal circumference for percent body fat cannot be strongly recommended. On the other hand, the corresponding estimate of Pearson correlation coefficient is 0.809. This is within the limit of very strong dependence and therefore, on the contrary, strongly suggests a substitution. This highlighted contradiction reflects in the results of all the other bivariate pairs. This is not surprising because for these pairs, the copula models have non-elliptical distribution for which Kendall's tau is a more appropriate measure of co-dependence than the Pearson's correlation coefficient. The contradiction highlighted supports the notion expressed in some literatures [1] that the use of Pearson's correlation coefficient in estimating co-dependence for nonelliptical bivariate distributions may lead to misleading conclusions.
Kendall's tau estimates for percent body fat and each of the anthropometric variables (BMI and abdominal circumference) are in the region of 0.4, which indicate only moderate dependence and consequently not suggestive of a substitution. However, it is interesting to look at the result of BMI and percent body fat pair as this provides an opportunity to contribute to the debate [29] of using BMI as a surrogate for percent body fat. Here a Kendall's tau estimate is 0.415, not suggestive of using BMI as a substitute. But on the contrary, Pearson's correlation coefficient estimate, 0.72, strongly supports using BMI as a substitute. It is therefore, tempting to adopt the conclusion from the Pearson correlation result more so when the estimated value is highly significant. However, if we assume a linear regression model for this dependence, a correlation coefficient estimate result of 0.72 will imply a coefficient of determination of 52%. That is, fifty-two percent of the variation in percent body fat is explained by fitting the linear regression model. The unexplained variation is 48%. This is too high and suggests that there are other important variables in the dependence structure that are not accommodated in the linear model. Hence, this estimated correlation coefficient obtained does not strongly suggest the use of BMI solely as a substitute. We, therefore, go with the suggestion [29] that for epidemiological research, there should be a direct measurement of percent body fat instead of using BMI as a substitute. We go further to suggest that for this study when Pearson correlation coefficient method is used, a coefficient of determination value of 75% should be obtained before any anthropometric variable can be recommended as a substitute for estimating body composition variables.

Conclusion
Copula-based bivariate K-epsilon distributions are derived. They are four copula types-Gaussian, Clayton, Frank and Gumbel copulas-fitted as models to anthropometric variables in order to capture the structure of their co-dependence. Appropriate fits were obtained and it was noted that Clayton model was best for four of the six pairs of variables examined. The fitted models suggest that the co-dependence structure can be described as general monotonic dependence in all the pairs considered except for the BMI and adiposity pair, which is a monotonic linear dependence. The results of the copula-based Kendall's tau for estimating the measure of co-dependence very strongly suggest the use of BMI as the best substitute for adiposity in body composition assessment. In this scenario, for individuals having extreme values of BMI, their adiposity should be measured directly. Also, in this scenario, the result gives a strong indication for the use of abdominal circumference as a substitute. For percent body fat, there are no strong indications for the use of either BMI or abdominal circumference as a substitute. Hence, for epidemiological research it is better to measure percent body fat directly instead of using a surrogate estimate as suggested in some research based on Pearson correlation coefficient.
It is noticeable that the Pearson's correlation coefficient estimates of the measure of dependence are conspicuously higher than those of the Kendall's tau, where they are not the appropriate measure to quantify the co-dependence structure. That is, in the scenario where the monotonic dependence is not linear, its use in such circumstance could prompt a misleading conclusion. Consequently, in studies where the Pearson's correlation coefficient is used for determining an appropriate surrogate anthropometric variable, it is suggested that a coefficient of determination derivable from the estimated Pearson's correlation coefficient be at least 75% before any recommendation can be advanced. It must be noted that this is a subjective criterion and as such allows the researcher to adjust the bench mark (75%) as appropriate in order to accommodate the peculiarities inherent in such study.