Impact of Measurement Errors on Estimators of Parameters of a Finite Population with Linear Trend Under Systematic Sampling

The study involves investigating the impact of measurement errors on estimators of parameters of a finite population with linear trend among population values, under systematic sampling. The study provides deep understanding on the amount and nature of deviation introduced by errors and how these errors affect estimators of parameters of a population with linear trend. Consideration is given to measurement errors that assume a normal distribution. Systematic sampling technique is used where a sample of size n is selected randomly from a finite population with a fixed interval a. Systematic sampling is considered instead of simple random sampling in this case because of its effectiveness in dealing with linear trend. The explicit values of population totals, means and variances together with their estimates are derived. The results indicate that there can be overestimate of the population mean if the expected systematic errors tend towards positive values and underestimate if the expected systematic error tend towards negative values. When random errors are considered, there is no effect on estimated population parameters.


Introduction
In an ideal situation, it is assumed that through some kind of probability sampling, in this case systematic sampling, the observation i y on the th i unit is the correct value for that unit, and that sampling errors may arise solely from the random sampling variation that is present when n units are measured instead of complete population of N units. Contrary to the assumption, non-sampling errors that are due to measurement or observation do occur at data collection stage.
The true score theory is a good simple model for measurement. It consists of true value and two error components; random error and systematic error.
Where, y is the measured value t is the true value r e is the random error s ε is the systematic error Random error is caused by unpredictable fluctuation in the reading of a measurement apparatus or experimenter's interpretation of the instrumental reading. Systematic error is caused by any physical factor that affect an experiment or measurement of the variable across the sample in a predictable direction.
According to Fuller and Carroll et al [1,2], it is well known that measurement errors in observed data can lead researchers to draw incorrect inferences. Much of the early work in this area focused on the typical textbook model of classical measurement error. Bound et al [3] concluded the survey of measurement errors by calling for researchers to pay more attention to the possibility of non-classical measurement error, both in accessing the likely biases in the analyses that take no account of measurement error and in devising procedure that corrects such error.
In recent years, a number of papers have examined the consequences of non-classical measurement errors in labor economics. [4][5][6], all noted that non-classical measurement errors of the type typically found in income data, attenuates the role of white noise measurement error in models of earnings dynamics.
Measurement errors can best be studied if the true value is obtained. This approach is limited to items for which a feasible method for finding the true value exists. For instance majority of studies using Body Mass Index(BMI) rely on self-reported measures from survey data sets. However Conor et al shows that there is a large body of evidence which suggests that self-reported BMI tends to underestimate true BMI; this occurs both because people underreport their weight and overestimate their height [7]. Looking at measurement errors in self-reported BMI specifically, Plankey et al examines the consequences of these errors when classifying people according to obesity status [8]. Stommel et al [9] compared self-reported and recorded BMI using US data and found a substantial amount of misclassification of obesity status when using self-reported BMI, particularly in the extreme (overweight or underweight) categories. Consequences of this measurement errors were examined when analysing the impact of BMI on a range of health risks.
Belloc [10], compared data on hospitalization as reported in household interviews with the hospital records for the individuals. Hospital record produces the true values which are then compared to the observed values from household interviews. Gray [11] compared employee's statements of sick leave with the personal office records. The comparison of data was to determine the presence of measurement errors if any. [12][13] compares respondents' illness with either doctors' records on the respondents or with the results of a complete medical examination.
According to Särndal measurement errors arise during data collection stage, and may have a considerable impact on the estimates [14]. In recent studies; Nyabwanga [15] studied Effect of measurement errors on population in random order. Rosella et al [16] studied the influence of measurement error on calibration, discrimination and overall estimation of risk prediction model. O'Neil et al [17] examined the consequences of measurement errors in self-reported BMI when estimating the relationship between obesity and income. Subramani et al [18]  Finite population with linear trend consists of N units identified by the label 1, 2,..., N ordered in increasing size. Through systematic sampling, the population is then divided into a samples of n units each.
The table below shows sets of all possible samples.
Each sample or a cluster is selected with probability 1 a and observed completely as per the design.

Estimation of Parameters for a Finite Population with Linear Trend Under Systematic Sampling
Let N be the size of a finite population. Suppose the finite population is such that the observed values assume a hypothetical trend as where µ and β are constants and 1, 2,... i N = are ordered in increasing size of the label The population is then said to possess linear trend among its values.
Let population size N be a multiple of n , N an = . The estimate of population mean under linear systematic sampling in which case a single random start is taken is obtained. as ( ) According to Mukhopadhyay [20], population with linear trend has a systematic sample given by Then the sample mean is written as y is the mean of a systematic sample and through probability sampling, it is the unbiased estimator of Y Similarly, population total is denote as Since the interest is in estimating population total, from a design-based approach, Horvitz-Thompson estimator, HTE, Horvitz and Thompson [22] is used.
The estimator is defined as, Where 0 i π ≻ is the first order inclusion probability.
Under Systematic Sampling, SY design with sampling interval a , and the response variable i y , the population total estimator for ( ) According to Daroga et al [23], for removing effect of linear trend, systematic sampling is much more efficient than simple random sampling.

Estimation of Variance from a Single Systematic Sample
According to Särndal [14], a major drawback of SY is that there is no unbiased estimator for the variance of the estimator of population mean except for some cases of circular systematic sampling. This is because SY is equivalent to cluster sampling with only one cluster selected.
However, under some assumptions about the nature of the population, it is possible to propose estimators that are approximately unbiased for the design variance.
In this case, the appropriate variance estimator for the population with linear trend in which case values of the units are steadily increasing by a constant amount is considered.
Many biased estimators have been proposed for this kind of population; Wolter [22], made analytical studies on population with linear trend and proposed the following estimator for the variance of the estimator of population total. Assume 2 n m = .
Since a systematic sample can be looked upon as grouping the population in m groups and choosing 2 units from each group of size 2a , an estimator of the mean of the th g group is with the variance estimator Cochran [25], suggested the estimator below to be appropriate Unless n is small, 2 ' n n can be replaced by the factor 1 n .
for 1 2 i n ≤ − ≺ Yates [26], suggested the following estimator among others based successively on second and higher order differences.

Population Total Estimator and Its Variance in Presence of Random Errors
Through measurement procedure, the th i individual observed is accompanied by the random error term i e . The observed value is thus given as i y i e µ β = + + (13) The model can thus be expressed as Where i y -is the observed value for the th i individual, The joint expectation of the population total estimator is obtained as; The total variance of tπ ɵ with respect to sampling design ( ) . p and the measurement model m according to Särndal [14] is given as The variance of population estimator is the sum of the expected value of the conditional variance and the variance of conditional expected value.
Therefore total variance consists of measurement variance and sampling variance respectively.
Measurement variance when decomposed is expressed as follows; Sampling variance when decomposed is also expressed as  Correlated response variance also known as interviewer variance occurs because response errors are correlated for sample units interviewed by the same interviewer.

Mathematical Model for Errors of Measurement
Suppose measurement could be independently repeated many times on unit i s ∈ , we could generate different i y -values.
Let i y be the realized value in the repeated observation, then Unlike random error, systematic error tend to be consistently either positive or negative -because of this, systematic error is sometimes considered to be bias in measurement.

Measurement Bias and Expectation of π-estimator
Measurement bias arises when expected measurement value on elements do not agree with true element values.
The derived expected measurement value is expressed as The derive total measurement bias with respect to sampling design ( ) and measurement model m respectively is thus expressed as follows

Decomposing Variance of Population Total Estimator When Systematic Errors Are Present
According to Särndal [14] the total variance of tπ ɵ with respect to sampling design ( ) . p and the measurement model m is given as Measurement variance when decomposed is expressed as follows; ( ) Similarly, sampling variance is decomposed as follows Combining the results the total variance becomes ( )

Numerical Results and Discussion
A finite population of size N is generated for a population without errors, a population with random errors and a population with systematic errors.
The population total variances, the population means and the population totals are then computed.
In the selection of a systematic sample of size n , a random start r is selected between 1 and a inclusive in which case a is the sampling interval.
To estimate parameters, simulation of data is done 10 times in each case the estimate is obtained. The results are then averaged to get the estimates of all parameters required in the study.
Estimation of variance is done using the three estimators below simplified to reflect systematic sampling.
The tables from case 1 to case 3 consists of parameters and their estimates for populations without errors and populations with errors. Case 1 Let N=800, n=32, a=25    From tables 6 & 7, the results shows that: 1. When the sample size is increased, both the population variances and estimated variances are reduced.
2. Estimates from 1 v are much higher than the respective estimates from 2 v and 3 v . 3. Positive expected systematic errors overestimate population means and totals while negative expected systematic errors underestimate population means and total.

Summary
From the study, it is observed that: 1. The population means and hence the population totals are overestimated for the case where expectation of systematic errors is positive. 2. The population means and hence the population totals are underestimated for the case where expectation of systematic errors is negative. 3. Impact of random errors on population mean and population total is minimal and inconsistent. 4. The variances of population total estimator are all underestimated using the three estimators, 1 v , 2 v and 3 v . 5. Increase in sample size leads to decrease in estimated variance of population total estimator. 6. For population with systematic errors, the estimated variances are over represented. Estimator 1 v gives higher variance than estimators 2 v and 3 v .

Conclusions
The study has shown that: Impact of random errors on population mean, population total and estimated variance of population total estimator is very minimal. Systematic errors produces systematic bias that overestimate the population mean when the bias is positive and underestimate the population mean when the bias is negative.
All the three estimators underestimate population variances and therefore they are biased. Among the three, 1 v is better because it gives values closer to the population variance.
Generally systematic errors lead to over representation of the estimated variance while random errors have no impact on estimates of population variance.