The Derivation and Choice of Appropriate Test Statistic ( Z , t , F and Chi-Square Test ) in Research Methodology

The main objective of this paper is to choose an appropriate test statistic for research methodology. Specifically, this article tries to explore the concept of statistical hypothesis test, derivation of the test statistic and its role on research methodology. It also try to show the basic formulating and testing of hypothesis using test statistic since choosing appropriate test statistic is the most important tool of research. To test a hypothesis various statistical test like Z-test, Student’s t-test, F test (like ANOVA), Chi square test were identified. In testing the mean of a population or comparing the means from two continuous populations, the z-test and t-test were used, while the F test is used for comparing more than two means and equality of variance. The chi-square test was used for testing independence, goodness of fit and population variance of single sample in categorical data. Therefore, choosing an appropriate test statistic gives valid results about hypothesis testing.


Introduction
The application of statistical test in scientific research has increased dramatically in recent years almost in every science. Despite, its applicability in every science, there were misunderstanding in choosing of appropriate test statistic for the given problems. Thus, the main objective of this study is to give a direction which test statistic is appropriate in conducting statistical hypothesis testing in research methodology.
Choosing appropriate statistical test may be depending on the types of data (continuous or categorical), i.e. whether a ttest, z-test, F-test or a chi-square test should be used depend on the nature of data. For example the two-sample independent t-test and z-test used if the two samples are independent, a paired z-test and t-test used if two samples are dependent, while F-test can be used if the sample is independent.
Therefore, testing hypothesis about the mean of a single population, comparing the means from two populations such as one-way ANOVA, testing the variance of a single population using chi-square test and test of two or several mean and variance using F-test are an important topic in this paper.

Derivation of Z, t, F and Chi-Square Test Statistic
Theorem 1: if we draw independent random samples, , , … , from a population and compute the mean and repeat this process many times, then is approximately normal. Since assumptions are part of the "if" part, the conditions used to deduce sampling distribution of statistic, then the t, and F distributions all depend on normal "parent" population.

Chi-Square Distribution
The chi square distribution is a theoretical distribution, which has wide application in statistical work. The term `chisquare' (denoted by Greek letter and pronounced with `chi') is used to define this distribution. The chi square theoretical distribution is a continuous distribution, but the statistic is obtained in a discrete manner based on discrete differences between the observed and expected values. Chisquare ) is the sum of independent squared normal random variables with mean 0 and variance 1 (i.e., standard normal random variables).
Suppose we pick , , … , from a normal distribution Research Methodology with mean and variance, , that is ~ , ) . Then it turns out that , , … , be independent standard normal variables ( ~ (0,1)) and if the sum of independent standard random variable (∑ = ) have a chi-square distribution with K degree of freedom given as follows: Based on the Central Limit Theorem, the "limit of the distribution is normal (as " → ∞). So that the chi-square test is given by

The F Distribution
ℱ * , & is the ratio of two independent chi-squared random variables each divided by their respective degrees of freedom as given by: Since 's depend on the normal distribution, the ℱ distribution also depends on normal distribution. The limiting distribution of ℱ * , & as . → ∞ is * .

Student's /-Distribution
The t distribution is a probability distribution, which is frequently used to evaluate hypothesis regarding the mean of continuous variables. Student's t-distribution is quite similar to normal distribution, but the exact shape of t-distribution depends on sample size, i.e., as the sample size increases then t-distribution approaches normal distribution. Unlike the normal distribution, however, which has a standard deviation of 1, the standard deviation of the t distribution varies with an entity known as the degrees of freedom.
We know that

Test Statistic Approach of Hypothesis Testing
Hypotheses are predictions about the relationship among two or more variables or groups based on a theory or previous research [3]. Hence, hypothesis testing is the art of testing if variation between two sample distributions can be explained through random chance or not. The three approaches of hypothesis testing are test statistic, p-value and confidence interval.
In this paper, we focus only test statistic approach of hypotheses testing. A test statistic is a function of a random sample, and is therefore a random variable. When we compute the statistic for a given sample, we obtain an outcome of the test statistic. In order to perform a statistical test we should know the distribution of the test statistic under the null hypothesis. This distribution depends largely on the assumptions made in the model. If the specification of the model includes the assumption of normality, then the appropriate statistical distribution is the normal distribution or any of the distributions associated with it, such as the Chisquare, Student's t, or Snedecor's F.

I-tests (Large Sample Case)
A Z-test is any statistical test for which the distribution of the test statistic under the null hypothesis can be approximated by a normal distribution [17]. In this paper, the >-test is used for testing significance of single population mean and difference of two population means if our sample is taken from a normal distribution with known variance or if our sample size is large enough to invoke the Central Limit Theorem (usually" ≥ 30 is a good rule of thumb).

One Sample I-test for Mean
A one-sample > -test helps to determine whether the population mean, is equal to a hypothesized value when the sample is taken from a normal distribution with known variance . Recall  . In this case, the mean of the sample from the population is not constant rather it varies depend on which sample drawn from the population is chosen. Therefore, the sample mean is a random variable with its own distribution is called sampling distribution. , respectively. Proof: consider all of the possible samples of size n from a population with expected value ( ) and variance ( ). If a sample , , … , are chosen, each comes from the same population so each has the same expected values, ( ) and variance ( ).
The first step is to state the null hypothesis which is a statement about a parameter of the population (s) labeled asH S and alternative hypothesiswhich is a statement about a parameter of the population (s) that is opposite to the null hypothesis labeledH . The null and alternatives hypotheses (two-tail test) are given by; H S : μ = μ S against H : μ ≠ μ S . Theorem 2: If , , … , are normally distributed random variable with and variance , then the standard random variable Z (one-sample Z-test) has a standard normal distribution given by: where is the sample mean, μ S is the hypothesized population mean under the null hypothesis, is the population standard deviation and " is the sample size, recall that √" ⁄ = Y̅ (standard error). However, according to Joginder K. [6] if the population standard deviation is unknown, but " ≥ 30, given the Central Limit Theorem, the test statistic is given by:

Two-Sample I-test (When Variances Are Unequal)
A two-sample test can be applied to investigate the significant of the difference between the mean of two populations when the population variances are known and unequal, but both distributions are normal and independently drawn. Suppose we have independent random sample of size n and m from two normal populations having means 4 and \ and known variances 4 and \ , respectively, and that we want to test the null hypothesis 4 − \ = ] , where ] is a given constant, against one of the alternatives 4 − \ ≠ ] , now we can derive a test statistic as follow.

Two Sample Z-test (When Variances Are Equal) for Mean
It can be applied to investigate the significance of the difference between the mean of two populations when the population variances are known and equal, but both distributions are normal. The null and alternatives hypotheses (two-tail test) are given by H S : μ l − μ m μ S and H : μ l − μ m ≠ μ S , respectively. Then test statistic is: If we don't know the variance of both distributions, then we may use: provided and E are large enough to invoke the Central Limit Theorem.

Paired Sample z Test
According to Banda Gerald [2], the words like dependent, repeated, before and after, matched pairs, paired and so on are hints for dependent samples. Paired samples also called dependent samples used for testing the population mean of paired differences. A paired >-test helps to determine whether the mean difference between randomly drawn samples of paired observations is significant. Statistically, it is equivalent to a onesample >-test on the differences between the paired observations. The null and alternative hypotheses (two-tail test) are given by H S : ] = n S and H : ] ≠ n S , respectively.
Theorem3. If , , … , and ^ ,^ , … ,^_ are normally distributed random variables, but and ^ are dependent, then the standard random variable > is: where n = −^ is a pairwise difference (is the sample mean of the sum of differences between the first results and the second results of participants in the study) also given byn ̅ = ∑ ] 6 7 68* , n S is the mean value of pair wise difference under the null hypothesis, and ] is the population standard deviation of the pairwise difference. The basic reason is that and ^ are normal random variable, so −^ = n is also normal random variable, thus n , n , … , n are thought to be a sequence of random variable.

Student's t/-test
Unfortunately, > -tests require either the population is normally distributed with a known variance, or the sample size is large. However, when the sample size is small (" < 30), the>value for the area under the normal curve is not accurate. Thus, instead of a z value, the t values are more accurate for small sample sizes since the 0value depend upon the number of cases in the sample, i.e., as the sample size change; the 0 value will change [10]. However, the 0 -test needs an assumption that the sample value should be drawn randomly and distributed normally (since the data is continuous). Therefore, the 0-test is a useful technique for comparing the mean value of a group against some hypothesized mean (one-sample) or of two separate sets of numbers against each other (two-sample) in case of small samples ( " < 30) taken from a normal distribution of unknown variance.
A t-test defined as a statistical test used for testing of hypothesis in which the test statistic follows a Student's t-distribution under the assumption that the null hypothesis is true [5].

One-Sample t-test
Banda Gerald [2] suggested that a one-sample 0-test used to compares the mean of a sample to a predefined value. Thus, we use one sample t-test to investigate the significance of the difference between an assumed population mean and a sample mean. The population characteristics are known from theory or are calculated from the population. Thus, a researcher can use one-sample t-test to compare the mean of a sample with a hypothesized population mean to see if the sample is significantly different. The null and alternative hypotheses (two-tail test) are given; H S : μ = μ S and H : μ ≠ μ S , respectively. Then the t-test statistic is: where < is the sample mean, is the population mean or hypothesized value, "is the sample size and is the sample standard deviation which is calculated as = ∑(4 6 !4 ) & ! .

Independent Two Sample /-test (Variances Unknown
But Equal) A two-sample t-test is used to compare two sample means when the independent variable is nominal level data while the dependent variable is interval/ratio level data [13]. Thus, independent samples 0-test is used to compare two groups whose means are not dependent on one another. Two samples are independent if the sample values selected from one population are not related with the sample values selected from the other population. Thus, an independent sample t-test helps to investigate the significance of the difference between the means of two populations when the population variance is unknown but equal. Let assume a group of independent normal samples given by , , … , and^ ,^ , … ,^_ distributed as ~A( 4 , ) and ^ ~A( \ , ), respectively. Then, the null and alternative hypotheses (two-tail test) are given by H S : μ l − μ m = μ S and H : μ l − μ m ≠ μ S , respectively. If sample sizes n E aren't large enough to invoke the Central Limit Theorem, then the test statistic is: where refres to sample size from the first group (X), E is sample size from the second group (Y), is the sample mean of the first group (X), ^ is the sample mean of the second group (Y), q is the pooled standard deviation for both groups which is calculated as follows: Research Methodology

Independent Two Sample t-test (Variances Unknown
But Unequal) However, if the variances are unknown and unequal, then we use a t -test for two population mean given the null hypothesis of the independent sample t-test (two-tail test) H S : μ l − μ m = μ S , the test statistic is where 4 = ! ∑ ( − ) and \ = _! ∑ (^ −^ ) _

Paired (Dependent) Sample/ Test
Two samples are dependent (or consist of matched pairs) if the members of one sample can be used to determine the members of the other sample. A paired t-test is used to compare two population means when we have two samples in which observations in one sample can be paired with observations in the other sample under Gaussian distribution [7,16]. When two variables are paired, the difference of these two variables,n = −^ , is treated as if it were a single sample. This test is appropriate for pre-post treatment responses. The null hypothesis is that the true mean difference of the two variables is n S as given by; w S : ] = n S . Then the test statistic: , μ S is the mean of the pairwise difference under the null hypothesis and ] is sample standard deviation of the pairwise differences. Proof: Let and ^ are normal random variable. Therefore, n = −^ is also a normal random variable. Then n ,n , … , n can be thought as the sequence of random variable, 0~0 ! .

Effect Size
Both independent and dependent sample t-test will give the researcher an indication of whether the difference between the two groups is statistically significant, but not indicates the size (magnitude) of the effect. However, effect size explains the degree to which the two variables are associated with one another. Thus, effect size is the statistics that indicates the relative magnitude of the differences between means or the amount of total variance in the dependent variable that is predictable from the knowledge of the levels of independent variable.
i. Effect Size of an Independent Sample t-test An effect size is a standardized measure of the magnitude of observed effect. The Pearson's correlation coefficient r is used as common measures of effect size. Correlation coefficient is used to measure the strength of relationship between two variables. From the researcher's point of view, correlation coefficient of 0means there is no effect, and a value of 1 means that there is a perfect effect. The following is the formula used to calculate the effect size; where @ is the calculated value of the independent sample t-test, n E is the sample size, if 0.1refers to small effect, = 0.3refers to medium effect and = 0.5 refers to large effect.
ii. Effect Size of Dependent Sample t-test The most common used method for calculating effect size independent sample t-test is the Eta Squared [11]. The formula for simplified Eta Squared is given by: where0 is the calculated value of the dependent sample 0test, n is the sample size. The calculated value of the Eta Squared should be between 0 and 1. @ yz{ |n = 0.01, 0.06 , and 0.14 are small, medium, and large effect, respectively.

Chi-Square Test Statistic
Karl Pearson in 1900 proposed the famous chi-squared test for comparing an observed frequency distribution to a theoretically assumed distribution in large-sample approximation with the chi-squared distribution. According to Pearson, chi-square test is a nonparametric test used for testing the hypothesis of no association between two or more groups, population or criteria (i.e. to check independence between two variables) and how likely the observed distribution of data fits with the distribution that is expected (i.e., to test the goodness-of-fit) in categorical data. In this case, we apply chi-square test for testing population variance, independence and goodness of fit.

Chi-Square Test of Population Variance (One-sample Test for Variance)
The chi-square test can be used to judge if a random sample has been drawn from a normal population (interval and ratio data) with mean ( ) and with specified variance ( ) [15]. Thus, it used to compare the sample variance to a hypothesized variance S (i.e. it is a test for a single sample variance). Assume the data , , . . , are independent normal samples with ~A( , ) and we know that ∑ > = ∑ The decision rule is to compare the obtained test statistic to the chi-squared distribution with . = − 1 degree of freedom. If we construct the critical region of size €in the two-tail case, our critical region of size € will be all values of such that either ≤ !‚ ⁄ , − 1or ≥ ‚ ⁄ , − 1.

Chi-Square Test of Independence
The chi-square test of independence is a nonparametric statistical test used for deciding whether two categorical (nominal) variables are associated or independent [1,5]. Let the two variables in the cross classification are X and Y, then the null H S : no association between X and Y against alternative H : some association between X and Y ' . The chisquare statistic used to conduct this test is the same as in the goodness of fit test given by: where χ is the test statistic that asymptotically approaches a chi-square distribution,™ ` is the observed frequency of the i th row and j th column, representing the numbers of respondents that take on each of the various combinations of values for the two variables. , reject the null hypothesis and conclude that there is a relationship between the variables. If χ < !‚[(N! )(-! )] , the null hypothesis cannot be rejected and conclude that there is insufficient evidence of a relationship between the variables.

Chi-Square Test for Goodness of Fit
A chi-square test for goodness of fit is used to compare the frequencies (counts) among multiple categories of nominal or ordinal level data for one-sample [13]. Moreover, a chisquare test for goodness of fit used to compares the expected and observed values to determine how well the experimenter's predictions fit the data [14].
The null hypothesis ( H S : ™ `= `) is stated as the population distribution of the variable is the same as the proposed distribution against the alternative (w : ™ `≠ `) the distributions are different. The chi square statistic is defined as: Large values of this statistic lead rejection of the null hypothesis, meaning the observed values are not equal to theoretical values (expected).

The F-test Statistic
F-tests are used when comparing statistical models that have been fit to a data set to determine models that best fit the population from the data being sampled [4]. The F distribution is a joint distribution of two independent variables, with each having a chi-square distribution [9]. The F Distribution needs both parent populations to be normal, have the same variance and independent sample. F-test is applicable for testing equality of variances and testing equality of several means.

Testing for Equality of Variances
The chi-square test used for testing a single population variance, while ℱ-test for testing the equality of variances of two independent populations (comparing two variances). Thus, it helps to make inferences about a population variance or the sampling distribution of the sample variance from a normal population given the nullw • : 4 \ against the alternativehypothesisw : 4 ≠ \ . Suppose two independent random samples drawn from each population (normally and identically distributed population) as given by ~ 4 , 4 ) and ^~ ( \ , \ ). Moreover, let 4 and \ represent the sample variances of two different populations. If both populations are normal and the population variances 4 and \ are equal, but unknown, then the sampling distribution of ℱ is called an ℱ-distribution.
where and E are samples from population X and Y,. = E − 1 = numerator degrees of freedom, . = − 1 = denominator degrees of freedom.

Testing Equality of Several Means (ANOVA)
Analysis of variance (ANOVA) is statistical technique used for comparing means of more than two groups (usually at least three) given the assumption of normality, independency and equality of the error variances. In one-way ANOVA, group means are compared by comparing the variability between groups with that of variability within the groups. This is done by computing an F-statistic. The F-value is computed by dividing the mean sum of squares between the groups by the mean sum of squares within the groups [5], [8].
Mathematically, X < = ∑ ∑ l ·¥ ¦ ¹ º ' and portion of total One-Way ANOVA Reject w S if the calculated ℱ is greater than the tabulate value (ℱ !AE,AE! ), meaning variability between groups is large compared to variation within groups, while fail to reject w S if the calculated ℱ is less than the tabulate value (ℱ !AE,AE! ), meaning the variability between groups is negligible compared to variation within groups.