Estimating Average Variation About the Population Mean Using Geometric Measure of Variation

Measures of dispersion are important statistical tool used to illustrate the distribution of datasets. These measures have allowed researchers to define the distribution of various datasets especially the measures of dispersion from the mean. Researchers and mathematicians have been able to develop measures of dispersion from the mean such as mean deviation, variance and standard deviation. However, these measures have been determined not to be perfect, for example, variance give average of squared deviation which differ in unit of measurement as the initial dataset, mean deviation gives bigger average deviation than the actual average deviation because it violates the algebraic laws governing absolute numbers, while standard deviation is affected by outliers and skewed datasets. As a result, there was a need to develop a more efficient measure of variation from the mean that would overcome these weaknesses. The aim of the paper was to estimate the average variation about the population mean using geometric measure of variation. The study was able to use the geometric measure of variation to estimate the average variation about the population mean for un-weighted datasets, weighted datasets, probability mass and probability density functions with finite intervals, however, the function faces serious integration problems when estimating the average deviation for probability density functions as a result of complexity in the integrations by parts involved and also integration on infinite intervals. Despite the challenge on probability density functions, the study was able to establish that the geometric measure of variation was able to overcome the challenges faced by the existing measures of variation about the population mean.


Introduction
Geometric measure of variation about the mean, is a new measure of variation about the mean, that intends to solve the weaknesses of the current measures of variation about the mean, by using geometric averaging, to estimate the average deviation about the mean. Geometric averaging is a suitable measure of variation because based on past research it is not affected by outliers and skewed datasets. The technique also average product of numbers and not sum, a factor which makes it not to violate the algebraic laws [4,19,21].
Currently, there are three known measures of variation about the mean; Mean deviation which is the average absolute deviation about the mean, variance which is the average of squared deviations about the mean and standard deviation which is the square-root of average squared deviation about the mean [17]. Past studies have established that the existing measures of variation about the mean are not 100% efficient, this is due to various issue that arises during their use in estimating the average variation about the mean [2, 3, 11-13, 16, 18, 20]. For example, mean deviation has been determined by past studies has to violate the algebraic number theory. Based on the algebraic number theory, given that the mean deviation is an average of absolute deviations, an absolute number on the field P such that must satisfy 0 : p > • → ℜ the following conditions [5]; 1. 2.

3.
The mean deviation about the mean is given by the function [17]; (1) where is the absolute deviation from the mean This measure of deviation from the mean as an absolute number, violates the algebraic laws as illustrated by the third condition (3), hence the average deviation about the mean estimated by the measure are not accurate because; Therefore, the measure always gives bigger estimates than the actual deviation about the mean. However, the measure argues on the basis of the theory behind measurement of average deviation about the mean, which assumes that for measuring of deviation from the mean, the metric (the distance from the mean) is more important than the sign of the deviation. Mean deviation has also been determined by past studies not to allow further algebraic application because of the absolution, as a result the measure is not considered as efficient [6].
A second measure of variation about the mean is variance which is given by the function [1,17]; (2) Where is the squared deviation from the mean This measure of variation from the mean allows for further algebraic manipulations which is an improvement from the mean deviation, it also do not violate the algebraic number theory, however, the average of deviation about the mean given by the measure are squared, hence, are not of the same unit as the initial datasets (squared). This makes the results given by the formula to be inappropriate [1,17].
The last measure of variation about the mean is standard deviation, which is an improvement on variance by giving results which are of the same units as initial datasets. The measure is usually estimated by [1,3,13,16]; Over the years, standard deviation has been the most widely used measure of variation about the mean, because it is a capable of further algebraic manipulation and it also solves the problem of variance by giving estimates which are of the same unit as the original datasets (square-root). However, past studies have determined that standard deviation is affected by outliers and skewed datasets, factors which makes this measure not to be efficient especially when dealing with datasets which have outliers and those that are skewed [2-3, 11-13, 16, 18, 20].
Given the shortcomings of the three existing measures of variation about the mean. The paper aimed at a estimate the average variation about the population mean using the geometric measure of variation, which is would overcome the weakness of the current measures of variation about the mean by not violating the algebraic laws, giving estimates which are of the same unit as the initial datasets, not affected by outliers and skewed datasets, and allows further algebraic manipulations to be carried on it. This is because, geometric measure of variation would use geometric averaging, an averaging technique that average products hence not violating the algebraic laws behind absolute number. And also, an averaging technique that has been determined not to be affected by outliers and skewed datasets.

Un-Weighted Dataset Geometric Measure of Dispersion
Consider a vectors V of data points v i such that all the points are not weighted or coefficient by any weights. A geometric measure of deviation from the mean for unweighted datasets will be given by the function; (4) where n is the total number of data points in the dataset V, is the mean for the dataset P is the total number of data points which are not equal to the mean is the absolute deviation from the mean.
If all the data points then the deviation from the mean will be zero, hence the mean deviation from the mean for such data points is hence zero, therefore, for data set with all points similar to each other the geometric deviation from the mean will be zero. For data set with at least p points not equal to the mean ( ), the geometric mean deviation from the mean will be given by the formula; (5) To make the formula applicable for large data sets, a simplification of the formula can be carried out using the using the logarithmic transformation as below; Then the function (5) can be written as; Now introducing natural logarithm on both sides we obtain the following function (8) Equation (8) can be simplified as follows based on the logarithm rules; Transforming back to get the original function by introducing exponential on both sides we obtain the following equation (9) The function (9) will assist in reducing the problem that result from geometric rooting. The absolute value in the formula based on the algebraic theory helps in preventing us from having complex results when finding roots for negative numbers. The geometric mean is appropriate because it satisfies the algebraic formula [2]; (10) Which is an improvement based on accuracy from the arithmetic averaging of absolute values [2]; Hence the geometric averaging gives more accurate results than arithmetic averaging. As a simplification and improvement, equation (9) can be simplified further by replacing the product with sum because of the logarithmic law which states that log of product is same as sum of individual logs hence; (11) Based on the transformation in (11) we can now rewrite (9) as follows; (12)

Weighted Dataset Geometric Measure of Dispersion
Consider a data vector such that and a weighted vector of the same dimension as V, such that . The joint distribution of the two vectors results into a weighted vector of datasets . A geometric measure of deviation from the mean for weighted datasets will be given by the function; (13) where are the weights for every data point .
is the mean for the dataset P is the total number of data points which are not equal to the mean is the absolute deviation from the mean.
If all the data points then the deviation from the mean will be zero, hence the mean deviation from the mean for such data points is hence zero, therefore, for data set with all points similar to each other the geometric deviation from the mean will be zero. For data set with at least p points not equal to the mean ( ), the geometric mean deviation from the mean will be given by the formula; (14) To make the formula applicable for large data sets, a simplification of the formula can be carried out using the using the logarithmic transformation as below; Using (6) the function (14) can be written as; Now introducing natural logarithm on both sides we obtain the following function (16) Equation (16) can be simplified as follows based on the logarithm rules; Transforming back to get the original function by introducing exponential on both sides we obtain the following equation (17) The function (17) will assist in reducing the problem that result from geometric rooting. As a simplification and improvement, equation (17) can also be simplified further by replacing the product with sum based on logarithmic laws; (18) Based on the transformation in 18 we can now rewrite 17 as follows;

Application on Probability Mass Functions
Based on equation (18) it can be determined that and that is distributed with the same weights as . Therefore, extending this relationship on probability mass functions. Assume that the variable is discrete with probability mass function for all and 0 otherwise. Assume that which is equal to where is the mean of the random variable , is distributed in the same way as with a probability mass function . The geometric deviation for probability mass functions can be given as; (20)

Application on Probability Density Functions
Extending the relationship in equation (18) on continuous random variables. Assume that the variable is continuous on the interval with probability density function . Assume that which is equal to where is the mean of the random variable , is distributed in the same way as with a probability density function . The geometric deviation for probability density functions can be given as; (21)

Application on Un-weighted Datasets
Several simulations were conducted on both discrete and continuous data distributions of small populations of size 10 from Bernoulli, Binomial, Geometric, Normal, chi-square and F-distributed were. The results were as shown below;

Bernoulli Population
Consider a population of size 10 which is Bernoulli distributed with a probability of success 0.7, the geometric measure of variation from the mean for the population estimated as illustrated in table 1; The geometric measure of variation from the mean for unweighted dataset is given by the function; Geometric measure

Binomial Population
Consider a population of size 10 which is Binomial distributed with 30 trials and a probability of success 0.64, the geometric measure of variation from the mean was estimated as shown in table 2. Geometric measure

Geometric Population
Consider a population of size 10 which is Geometric distributed with a probability of success 0.5, the average deviation from the mean can be estimated using geometric measure of deviation from the mean for the population as illustrated in table 3; Geometric measure

Chi-square Population
Population of size 10 from Chi-square distribution with 1 degree of freedom was simulated as shown in table 4. The calculation of geometric measure for the population was estimated as illustrated in table 4.  Geometric measure

Normal Distribution Population
A population of size 10 from a Normal distribution with a mean of 10 and a standard deviation of 2 was simulated as illustrated in table 5. The estimation of average deviation from the mean using geometric measure of variation about the mean for the population was estimated as illustrated in table 5. Geometric measure

F-distributed Population
A population of size 10 from a F-distribution with 2 numerator and 5 denominator degrees of freedom was simulated as illustrated in table 6. The calculation of geometric measure of variation about the mean for the population was calculated as illustrated in table 6.

Application on Weighted Datasets
The study intended to determine if the geometric measure could be used in the estimation of average deviations from the mean for weighted datasets with specific concentration on frequency distributions, where the frequencies were used as the weights of the respective data points. Simulated data for three discrete distributions (Bernoulli, Binomial and Geometric distributions) and three continuous distributions (Normal, Chisquare and Fishers/F-distributions) where used. 100 observations were simulated for each distribution after which the data were summarized into frequency distributions before geometric measure for the average deviation about the mean was estimated for each frequency distributions. The results of the simulation and the estimation was as illustrated below;

Bernoulli Population
Consider a population of size 100 which is Bernoulli distributed with a probability of success 0.7, the frequency distribution and the estimation of the geometric measure variation from the mean for the population was as illustrated below; The geometric measure of variation from the mean (G) for weighted datasets is given by the function; Geometric measure

Binomial Population
Consider a population of size 100 which is Binomial distributed with 30 trials and a probability of success 0.64, the frequency distribution and the estimation of the geometric measure variation from the mean for the population was as illustrated in table 8.   Geometric measure

Geometric Population
Consider a population of size 100 which is Geometric distributed with a probability of success 0.5, the frequency distribution and the estimation of the geometric measure variation from the mean for the population was as illustrated in table 9. Geometric measure

Chi-square Population
Population of size 100 from Chi-square distribution with 1 degree of freedom was simulated, the frequency distribution and the estimation of the geometric measure variation from the mean for the population was as illustrated in table 10. Table 10. Estimation for weighted Chi-square population.

Normal Distribution Population
A population of size 100 from a Normal distribution with a mean of 10 and a standard deviation of 2 was simulated. The frequency distribution and the estimation of the geometric measure variation from the mean for the population was as illustrated in table 11. Geometric measure

F-distributed Population
A population of size 100 from a F-distribution with 2 numerator and 5 denominator degrees of freedom was simulated as illustrated in table 13. The calculation of geometric measure of variation about the mean for the population was calculated as illustrated in table 12. Geometric measure Based on the illustration shown by the calculations, the geometric measure can be used to estimate the average variation from the mean for both weighted discrete and continuous datasets.

Application on Probability Mass Functions
The study considered various probability mass functions to illustrate how the function can be used in estimating the average variation from the mean for probability mass functions. The results were as illustrated below;

Coin Tossing Gambling Game
In a coin tossing gabling game, a player loses 1 shilling whenever a fair coin tosses head and gains 2 shillings whenever the fair coin tosses tail. Estimate the average variation of the gain using geometric measure of variation from the mean.
The geometric variation for the probability mass function is estimated using the functions; The calculation of the estimate was as follows;

Dice
When an individual roll a die, he can get one of the numbers ranging from 1 to 6, estimate the variation of obtaining a number when a die is rolled.

Bernoulli Distribution
Consider a Bernoulli distribution with probability mass function; Where is the probability of success, the natural logarithm of the geometric measure of variation about the mean can be estimated for the function by; (23) Hence,

Binomial Distribution
Consider a binomial distribution with probability mass function (25) Where is the probability of success, the natural logarithm of the geometric measure of variation about the mean can be estimated for the function by;

Geometric Distribution
Consider a geometric distribution with probability mass function (28) Where is the probability of success, the natural logarithm of the geometric measure of variation about the mean can be estimated for the function by;

Poisson Distribution
Consider a Poisson distribution with probability mass function; Where is the average rate of event occurrence, the natural logarithm of the geometric measure of variation about the mean can be estimated for the function by; Hence, The above illustrations show how the geometric measure about the mean can be used to estimate the average variations about the mean of probability mass functions.

Application on Probability Density Functions
The study considered various probability density functions to illustrate how the function can be used in estimating the average variation from the mean for probability density functions. The results were as illustrated below;

Simple Probability Density Function
Consider a random variable which is distributed in the interval with a probability density function; We can estimate the average deviation from the mean for using the geometric measure as follows; The geometric average deviation from the mean is given by the function; Therefore, for the example pdf, we can obtain the expected value of the distribution as follows; The natural logarithm of the geometric measure of variation from the mean by definition is given by the formula; Hence, for the above probability density function; Therefore, the geometric measure of variation from the mean will be given by;

Uniform Distribution
Consider a random variable v which is distributed uniformly in the interval . By definition, the probability density function of v will be given by; The expectation of the function is given by; Therefore, the natural logarithm of geometric measure of variation from the mean for the distribution will be given by; Therefore, the geometric measure of variation from the mean will be given by;

Exponential Probability Density Function
Consider an exponential distribution with probability density function; By definition, Therefore, the natural logarithm of geometric measure about the mean is given by; Hence, The above illustrations show that the geometric measure of variation from the mean can be used to estimate the average deviation from the mean, however, the function faces a challenge in the estimation especially when the limits are infinite. Also, the challenge arises when conducting partial integration for complex probability density functions which do not have finite integrals for the integration by parts. For example, consider the case of standard normal probability density function;

Standard Normal
Consider a standard normal distribution for a random variable v illustrated as follows; Therefore, the natural logarithm of geometric measure about the mean is given by; Hence, This shows that it is almost impossible to integrate such functions. This is a short coming of the geometric measure of variation especially during its application on probability density functions

Conclusion
In conclusion, geometric measure of variation about the mean can be used to estimate the average deviation from the population mean for un-weighted datasets, weighted datasets, probability mass and probability density functions with finite intervals, however, the function faces serious integration problems when estimating the average deviation for probability density functions as a result of complexity in the integrations by parts involved and also integration on infinite intervals. The geometric measure of variation was also determined to give smaller estimates of the variation from the mean compared to standard deviation.