American Journal of Theoretical and Applied Statistics
Volume 5, Issue 3, May 2016, Pages: 138-145

An Almost Unbiased Estimator in Group Testing with Errors in Inspection

Langat Erick Kipyegon, Tonui Benard Cheruiyot*, Langat Reuben Cheruiyot

Department of Mathematics & Computer Science, University of Kabianga, Kericho, Kenya

Email address:

(L. E. Kipyegon)
(T. B. Cheruiyot)
(L. R. Cheruiyot)

*Corresponding author

To cite this article:

Langat Erick Kipyegon, Tonui Benard Cheruiyot, Langat Reuben Cheruiyot. An Almost Unbiased Estimator in Group Testing with Errors in Inspection. American Journal of Theoretical and Applied Statistics. Vol. 5, No. 3, 2016, pp. 138-145. doi: 10.11648/j.ajtas.20160503.19

Received: April 26, 2016; Accepted: May 9, 2016; Published: May 25, 2016

Abstract: The idea of pooling samples into pools as a cost effective method of screening individuals for the presence of a disease in a large population is discussed. Group testing was designed to reduce diagnostic cost. Testing population in pools also lower misclassification errors in low prevalence population. In this study we violate the assumption of homogeneity and perfect tests by investigating estimation problem in the presence of test errors. This is accomplished through Maximum Likelihood Estimation (MLE). The purpose of this study is to determine an analytical procedure for bias reduction in estimating population prevalence using group testing procedure in presence of tests errors. Specifically, we construct an almost unbiased estimator in pool-testing strategy in presence of test errors and compute the modified MLE of the prevalence of the population. For single stage procedures, with equal group sizes, we also propose a numerical method for bias correction which produces an almost unbiased estimator with errors. The existence of bias has been shown with the help of Taylor's expansion series, for group sizes greater than one. The indicator function with errors is used in the development of the model. A modified formula for bias correction has been analytically shown to reduce the bias of a group testing model. Also, the Fisher information and asymptotic variance has been shown to exist. We use MATLAB software for simulation and verification of the model. Then various tables are drawn to illustrate how the modified bias formula behaves for different values of sensitivities and specificities.

Keywords: Group Testing, Maximum Likelihood Estimator, Almost Unbiased Estimator, Bias Adjuster Formula, Bias-Corrected Estimates

1. Introduction

Group testing, also known as pooled samples, occurs when units from a population are pooled and tested as a group for the presence of a particular attribute, such as a disease, or a defect. The problem of group testing is concerned with classifying each  given units in a population into two disjoint categories which are defectives and non-defectives. The characteristic feature is that any number of units  (in a group) can be tested simultaneously but the information obtained from a single test on  units, without any chance of error, is either negative or positive. When the test is negative, it implies that all the k units in that group are non-defective and when it is positive it implies that at least one of the units in k units is defective, but it is not known which ones or how many are defective. The problem is to devise a sequential sampling scheme which minimizes the expected number of tests required to classify all the  units as defective or non-defective. That is, to find p the proportions of defective units in the population. The idea is to construct  groups of size  of, say, biological samples (e.g. blood) from a population of size . The population  may be from a number of individuals pooled into n groups. Each group  is tested by a single test. If the reading is negative, the group is dropped from further investigation, otherwise, sequential testing is performed on the group. The sequential testing procedure provided will enable us construct an almost unbiased estimator or propose analytical procedures that reduce biasness.

Group testing where subjects are tested in pools rather than individually has a long history of successful applications in screening of infectious diseases. Whether the aim is to diagnose individuals (classification) or estimate disease prevalence, it is cost effective since the test is done on a group and not individuals. Group testing first appeared in the statistical literature in the context of blood testing [1] but has since been applied in many fields, including transmission of viruses by insect vectors [2], genetics [3], plant disease assessment [4] and quality control [5]. Pool testing is a two-fold procedure: The first procedure being the identification of positive individuals in a population cost effectively (see [1]). This involves testing batches of items and those that test positive, the constituent members are tested for identification of positive ones. There is abundant literature on this classification problem. For instance, [6] and [7] proposed hierarchical or multistage model based on Dorfman idea that involves subdividing positive pools into smaller pools with the purpose of reducing cost. They showed that some savings can be achieved via multistage models. The second procedure is the estimation of prevalence rate as championed by [8]. There is also an abundant literature on this problem as established by [9] and [10]. Still on estimation problem, [11] used the Maximum Likelihood Estimation (MLE) to estimate elements of drugs in a composition of elements. In multistage problem with the purpose of estimation, [12] proposed a multistage estimation model. [10] proposed confidence intervals for prevalence rate when pool testing procedures are applied. Bayesian inference on population prevalence has also been studied (see for instance [9]). Some procedures for bias reduction in group testing model without errors has been proposed, [13].

In the group testing literature, with the objective of estimating the prevalence of an attribute of interest, the MLE is the dominant procedure. If the group size is, the MLE has been shown to be unbiased estimator ([12] and [14]). Whereas, when the group size is, the MLE has been shown to be biased and this is a drawback in statistical inference in pool testing procedure as observed by [8]. A more general bias adjustment, which was not specifically derived for group testing, was described by [15]. The purpose of this study is to determine an analytical procedure for bias reduction in estimating population prevalence using group testing procedure in presence of test errors. Specifically, we construct an almost unbiased estimator in pool-testing strategy in presence of test errors and compute the modified MLE of the prevalence of the population. For single stage procedures, with equal group sizes, we also propose a numerical method for bias correction which produces an almost unbiased estimator with errors.

The rest of the paper is organized as follows. In Section 2, we give analytic construction of MLE where we discuss the MLE in group testing with errors, an almost unbiased estimator, the bias adjuster formula and bias-corrected estimates of the prevalence of the population. In Section 3 we give results and discussions while in Section 4 we give conclusion.

2. Analytic Construction of MLE

In this section, it is shown that the MLE for group size  is unbiased but for , it is biased. Secondly, ways of improving the MLE have been proposed using the Bias Adjuster formula.

2.1. MLE in Group Testing with Errors

Suppose we have a large population of size , the idea is to construct n groups from this population. The population  may be blood from a number of individuals then pooled into n groups. The probability of classifying a group as positive in absence of errors is


where  is the probability of an individual being classified as positive and k is the size of the pool. When the error element is introduced in (1) we obtain


where  and  denotes sensitivity and specificity of the test kits. By sensitivity we mean probability of classifying a positive group as positive while specificity is the probability of classifying a non-positive group as non-positive. For the derivations of (2), see [12] and [14]. Upon using (2) the MLE of  can be obtained as;


For , and upon using (3) the MLE of p is unbiased, that is, . But for , it has been shown to be biased. That is for ,  and this is a drawback to group testing inference. We therefore construct an analytical procedure that can help reduce bias in the subsequent sections.

2.2. Improved MLE to Almost Unbiased Estimator

In this section we construct a MLE in pool testing with errors in inspection such that when  the proposed  is almost unbiased. To achieve this, we require Gart’s formula


where  is the Fisher information, l is a log likelihood,  is the Gart’s bias and O is the order of the error, see [15]. We notice that the Fisher information, I, has been computed in pool testing literature (See, for instance [12] and [14]) and provided as;


From (5), we have



Detailed derivation of (6) is provided in Appendix A.



Technical derivation of (7) is provided in Appendix B.

Equations (6) and (7) are vital in the next sections.

2.3. Bias Adjuster Formula

With equations (6) and (7) at hand and Formula (4), upon substitution we have


On simplifying equation (8) above we obtain;


where  implies the order of the error.

The Gart’s Bias-Corrected estimates are given by;




as suggested by [13].

We distinguish these two approaches by describing them as ‘Vertical’ or ‘Horizontal’ correction or more briefly as ‘Gart-V’ and ‘Gart – H’. Gart–V correction has the disadvantage of not being able to handle , owing to a zero denominator in  Gart– H correction, in contrast, does not require , to be substituted in and so an estimate can be found. Gart’s method with Vertical correction is highly effective in reducing the bias for small p. With Horizontal correction, Gart’s method is moderately effective, (see [13]). In our discussion in the next section, our main focus is on Vertical correction since it is highly effective.

3. Results and Discussion

In this section, a sample from a population is taken, split in groups and tested for some attribute of interest. The estimates of p, the proportions of defective units in the population, under MLE, Bias Adjuster Formula and Gart’s Vertical Correction  are obtained. These estimates are then represented empirically by use of tables.

We considered bias as the main issue in group testing problems. We investigated the MLE for single stage procedure. The estimates in the case of the ‘all positives’ outcome are shown to have a large effect on bias calculations.

We base our discussions on Monte Carlo Simulation for Bias and MLE for various group sizes for given sensitivity and specificity (see Tables 1, 2, 3 and 4)

In the tables that follow, we have results for simulated MLE for various group sizes with sensitivity and specificity of and and prevalence rate of

The simplest possible group testing procedure is where a single stage with equal group sizes is considered. We take a population of 200 samples split into 8 groups each of size 25 samples and tested for the prevalence of some attribute of interest. Hence 5 and . From equation (3), the MLE of  for different values of  and  yields results as tabulated in Tables 1, 2, 3 and 4.

Also from equation (9) when , it reduces to;


Upon simulating on MATLAB, the MLE for different values of sensitivities and specificities, we obtain results provided in the following tables:

Table 1. Estimates of positive groups for .

Table 2. Estimates of positive groups for .

Table 3. Estimates of positive groups for .

Table 4. Estimates of Positive groups for .

From the values in the tables above, the MLE of  and the Bias Adjuster ofincrease with increase in sensitivity and specificity in the test kits. When the sensitivity and specificity are 95% and 90%, the MLE of  and the correction are negative. This is when there is no positive outcome in the group test. It is also seen that the Vertical correction decreases with decrease in sensitivity and specificity of test kits. On the other hand, when all the groups test positive,  is almost certainly an overestimate of as it is most unusual for every unit in a population to be positive. When all the individuals in the group are positive the probability is beyond 1 and the outcome is shown to have a large effect on bias calculations. The main reason for this rare occurrence is the presence of test errors i.e. the specificity and sensitivity and not human errors during the experiment since they are assumed to have more effect.

4. Conclusion

From the tables in Section 3, it is observed that the bias has been considerably reduced when compared to when , which conforms to the conventions of describing a bias of less than about 10% as acceptable. It is shown that the Vertical correction is most effective in reducing the bias. However, the correction is undefined when  =1 owing to a zero in the denominator in modified Bias Adjuster. Thus when all groups test positive, in this situation, pool testing procedure is not applicable. For instance, see [13] who simply stated that ‘if all groups turn out to be positive, no sensible estimate of the infection rate can be obtained from the data’. The derivative of the log-likelihood function of the distribution has been shown to yield the Fisher information () and the asymptotic variance. The modified Bias Adjuster Formula for bias correction has been shown to reduce the bias. This is evident in Tables 1, 2, 3 and 4 for different specificity and sensitivity.

In this study we have considered bias reduction by constructing an almost unbiased estimator in a simple group testing model. However, there exist complex group testing models in literature. For further research, it will be interesting to study the bias properties and suggest some modification to such models.


Appendix A

In this appendix, we provide detail derivation of equation (6). First, we know that;

On simplifying the above, we get

The first derivative can be obtained as


Substituting for

In the above, we get


On simplifying, yields


Factoring out  and , we get


Which simplifies to


But we know that



On simplifying, we get;




Appendix B

Detail derivation of equation (7) is accomplished in this appendix. Upon taking natural logs of l (p/x), we have

On finding the derivative with respect to, we have;

The second derivative becomes;


Upon simplification, we have:


The third derivative is given by;


To obtain equation (7), we take expectation on both sides of equation  to obtain;



Upon simplification, we get;



















Substituting  although to  into, we get;


which reduces to;


Hence on substituting  and  in, we get;




Equation  simplifies to;



  1. Dorfman, R. (1943). The detection of defective members of large population. Ann. Math. Statistics: 14, 436-440 .
  2. Walter, S. D., Hildreth, S. W and Beaty, B. J (1980). Estimation of Infection Rates in Population of Organisms using Pools of Variable Size. Ann. J. Epidem, 112, 124-128.
  3. Chick, S. E (1996). Bayesian Models for limiting dilution assay and group test data. Biometrics, 52, 1055-1062.
  4. Fletcher, J. D, Rusell. A. C and Butler. R. C (1999). Seed borne cucumber Mosaic virus in New Zealand lentil groups: yield effects and disease incidence. New Zeal J. Crop Hort 27, 197-204.
  5. Wanyonyi, R. W, Nyongesa, L. K, Wasike, A. (2015). Estimation of Proportion of a Trait by Batch Testing with Errors in Inspection in a Quality Control Process. International Journal of Statistics and Application, 5(6), 268-278.
  6. Johnson, N. L, Kotz, S, Wu, X. (1992). Inspection Errors for Attributes in Quality Control. Chapman & Hall.
  7. Nyongesa, L. K (2004). Multistage group testing procedure (group screening). Communication in Statistics-Simulation and Computation, 33, 621–637.
  8. Thompson, K. H (1962). Estimation of the Proportions of Vectors in a Natural Population of Insects; Biometrics, 18, 568-578.
  9. Bilder, C. R and Tebbs J. M (2005). Empirical Bayes Estimation of the Disease Transmission Probability in Multiple Vector Transfer Designs. Biometrika. J. 47, 502-516.
  10. Hepworth, G. (2005). Confidence intervals for proportions estimated by group testing with groups of unequal sizes. Journal of Agricultural, Biological and Environmental Statistics, 10, 478–497.
  11. Tu, M. X, Litvak, E. and Pagano, M. (1995). On the Informative and Accuracy of Pooled Testing in Estimating Prevalence of a Rare Disease: Application to HIV Screening. Biometrika, 82, 287–297.
  12. Brookmeyer, R. (1999). Analysis of multistage pooling studies of Biological specimens for Estimating Disease Incidents and prevalence. Biometrics, 55, 608-612.
  13. Hepworth, G and Watson, R. (2009). Debiased Estimation of proportions in group testing. Applied Statistics, 58, 105–121.
  14. Nyongesa, L. K (2011). Dual Estimation of Prevalence and Disease Incidence in pooling strategy. Communication in Statistics Theory and Method, 40, 1–12.
  15. Gart, J. J. (1991). An application of score methodology: confidence intervals and tests of fit for one-hit curves. In Handbook of Statistics (Eds C. R. Rao and R. Chakraborty), Vol. 8, pp. 395–406. Amsterdam: Elsevier.

Article Tools
Follow on us
Science Publishing Group
NEW YORK, NY 10018
Tel: (001)347-688-8931