Dealing with Outlier in Linear Calibration Curves: A Case Study of Graphite Furnace Atomic Absorption Spectrometry

Outlier in the calibration of lead by graphite furnace atomic absorption spectrometry (GF-AAS) has been studied with help of the statistical tool F-test and T-test. The process consisted on measuring five standard solutions with three replicate to prepare the calibration curve. Ordinary least squares method (OLSM) was used to get the equation of the linear calibration curve; the correlation coefficient R2 and analysis of variance (ANOVA) was used to validate the linearity of this calibration curve. Proceeding to the graphic representation and the residual plot of the calibration, a suspected outlier was found. The statistical tool F-test and T-test were used to examine the suspected outlier, they yield the same result and confirm the outlier data. The calibration curve with and without the outlier data were taken care to investigate the effect of this outlier in the limit of detection (LOD), limit of quantification (LOQ), the concentration of lead in the fourteen samples and the uncertainty related to calibration curve. It has been observed that the linearity of the calibration curve is accepted for both cases. For the case of the calibration with the outlier, the estimated LOD and LOQ were 1.76 μg L -1 and 5.18 μg L -1 , respectively. The concentrations of lead in the sample are between 2.85 μg L -1 and 22.61μg L -1 and the uncertainty related to calibration curve vary between 1.17μg L -1 and 1.41 μg L -1 . On the other hand, for the calibration without the outlier data, the value of LOD and LOQ were improved compared to the previous value, the value of these two parameters were 1.32 μg L -1 and 3.96 μg L-1, respectively. The concentration of lead in the sample vary between 2.75 μg L -1 and 22.52 μg L -1 , compared to the previous concentration these value decrease from 0.40% and 3.29%. The uncertainties related to calibration curve vary between 0.90 μg L -1 and 1.09 μg L -1 for the second case, compared to the uncertainty in the first case, the uncertainty decreases from 22.48% to 23.53%. In conclusion, dealing with outlier improves the quality of the measurement and allow producing a reliable analytical data.


Introduction
Graphite Furnace Atomic Absorption Spectrometry (GFAAS) consists one of the most relevant instrumental techniques used for determinations of trace elements, this instrument is high specificity, selectivity and sensitivity and low detection limits. GFAAS is a single-element technique and a calibration is required for each element analysis. Instrument calibration is an important step and should be take care before analyzing a sample. Calibration involves the prediction of an analyte concentration from a single instrumental response and one establishes a relationship between the value of known sample (standard) and an instrument response. Most of the analytical methods use linear relationships for the calibration curve. Examination of this curve is an important step in an analytical method validation [1]. Besides the testing of the linearity of the calibration curve it should also free from outlier and it should be an everyday task in routine analytical operations.
An ordinary least squares method (OLSM) is widely a statistical method to be used to build the calibration curve, the slope and the intercept of a linear calibration model are estimates based on a finite number of measurements of known sample [2] [3]. The presence of outlier value in the data for the creation of the calibration model has probably an effect on the calibration model and may has an effect on the result and uncertainty of an unknown sample which will be analyzed. Therefore, dealing with outlier data in calibration curve is important step to avoid error and improve the quality of measurement.
In the present work, lead was chosen because this element has carcinogenic properties and should be taken care [4], in addition it is observed that there is suspected outlier in the calibration curve of this element during the measurement.
The statistical tool F-Test and T-Test were applied to deal with outlier in the calibration curve for determination the concentration of lead (Pb) in the samples [5] [6] [7]. The correlation coefficient and analysis of variance (ANOVA) were used to validate the linearity of the calibration model [8] [9].
During this study, there are two calibration curve, the first calibration have an outlier and the second is the same calibration but the outlier is removed after it is confirmed statically. For the both calibration, the limit of detection, limit of quantification, the concentration and the uncertainty related to the calibration were calculated and compared.

Instrument
A Varian AA240Z Atomic Absorption Spectrometer is used to determine the concentration of lead. This instrument is equipped with autosampler PSD 120. A hallow cathode lamp was used, with slit 0.5 nm, current 10.0 mA and wavelength 283.3 nm. Background correction is applied during the measurement [4] [10].
The condition for the atomization for the determination of the lead during the measurement is shown in the table 1. The sample and standards are injected automatically in the graphite furnace with volume 20µL. Five calibration standards with concentrations of 5, 10, 15, 20 and 25 µg L -1 are analyzed in triplicate. The values from the measurement of these standards are used to create the calibration curve to determine the concentration of the lead in the samples.

Ordinary Least Squares Method (OLSM) [2] [8]
The ordinary least squares is a widely statistical method to create a linear calibration model, and at this step the intercept and slope were estimated by analyzing multiple data from knows samples concentrations. The calibration model is described by the following linear function: Where Y is the instrument response, is the intercept, is the slope and is the analyte concentration. To remind, the following equations were used for the calculation intercept and slope: Where n is the number of the calibration data, ̅ is the average concentration of all the standards, and is the average of the instrument response.
The inverse of the equation 1 is used to calculate the concentration of an unknown sample from the instrument response.

= (4)
Where is the concentration of the sample and is the instrument response.

Linearity Verification of the Calibration Curve
The Correlation coefficient (R²) is one of the parameters to verify the linearity of the curve, this value changes between -1 and +1. The correlation coefficient ranges from -1 mean a perfect negative relationship and to +1 a perfect positive one, correlation coefficient 0 means no relationship. If absolute value of this value is at levels equal to at least 0.999, we may talk about linearity of the method within the range of concentrations for which standard solutions were prepared to determine the calibration graph [6]. The following equation is used to calculate the correlation coefficient.
The lack-of-fit test ANOVA is another method to validate the linearity of the regression model [5] [6], the objective is to calculate the value of F and compare with the F-distribution at the significance level P (95% or 99%) and the degrees of freedom k-2 and n-k. If F >F (P, k-2, n-k), the linear regression model is inadequate. On the other hand, if F < F (P, k-2, n-k), Graphite Furnace Atomic Absorption Spectrometry the linear regression model is justified [2]. The table 2 shows  the ANOVA Table for Regression.

Outlier Tests for the Calibration Curve [5] [7] [11]
Statistical outlier test is the tool par excellence tools used to check outlier data in the linear calibration curve; potential outliers can also be identified from the residual analysis. F-test and t-test was the statistical outlier test used to identify the outlier data for the construction of the calibration model.
For the F-test, the test value is calculated as the equation 5. After the calculation, this value is compared with F in statistical table. For the t-test, the data for the calibration should be in the prognosis range as calculated in the formula 6, if the value is outside this range, this value is considered as outlier.

Standard Uncertainty of Linear Calibration Curve
The uncertainty is one of value to describe the quality of measurement. The less the uncertainty is small, the more the result is reliable. The Guide to the Expression of Uncertainty in Measurement (GUM) is widely a method to estimate the uncertainty and the standard uncertainty of linear calibration curve contribute in the expended uncertainty of the measurement [12]. The standard uncertainty of linear calibration curve is calculated in the same way as in the following equation [6] [13].
Where: t= two-tailed Student's value for significance level α and degrees of freedom n-2 N=number of sample measurement n=number of the calibration point S KL =Residual standard deviation of the calibration curve

Limit of Detection and Quantification [6] [9]
The limit of detection and quantification is one of the parameter for validate a method it can be calculated based on the standard deviation of the response of the calibration model and the slope according to the formula 8 and 9

Samples
INSTN-Madagascar is a Research Centre (Research Laboratory) in the Antananarivo University Campus. There are seven departments and the TFXE department is one of them. This department receives samples (soil, water …) from customer too. EDXRF is used to determine the concentration of heavy metals in samples and the GFAAS, FAAS and IC are used to determine the concentration of the heavy metals and major and minor ions in water. In the case study, the GFAAS was used to determine the concentrations of lead in 14 water samples from customers during the year 2014.

Calibration
The instrument responses (absorbance) of the measurement of the standard solutions presented in the table 2 were used to determine the calibration model using the statistical tool OLSM. The intercept and the slope of the calibration curve are 0.001517 and 0.002374, respectively. The value of the correlation coefficient is 0.9943. The figure 1 shows the plot produced by the calibration data and the filled line and the equation of the curve is presented in the equation 10. Y = (0.002374 * X) +0.001517 Where Y= Instrument response (Abs) and X is the concentration of Pb.   As we can see in the figure 1, the plotted point of the standard with concentration 15 µg L -1 and absorbance 0.0339 abs is not placed in the filled line. The residual plot in the figure 2 confirms also that the distance between the filled line and the point of this standard is higher compared with other standards. This means that this value is suspected as outlier. To deal with outlier in the calibration data, the F-test and t-test in the formula 5 and 6 were used. To process, the linear regression parameters were recalculated by omitting the suspected value. The values of the parameters for the calibration model are presented in the table 3.
The formula of the calibration curve is presented in the equation 12.
= 0.002374 [ 0.001517 (12) According the formula 6, the value of the F-test is 10.2283 and the value of the F (f1 = 1, f2 =12, P= 99%) critic is 9.33 Since F critic < F Calculated the date with value 0.0339 in the calibration data may be regarded as an outlier.
Using the t-test, the confidence interval of this test as in the formula 7 is 0.03736 ± 0.00228. The suspected outlier value 0.0339 abs lies outside the prognosis range 0.03508 through 0.03963. Therefore, the F-test and t-test yielded the same result and it is proven that the value 0.0339 in the calibration data in the table 3 is an outlier. The calibration curve and residual plot of the calibration without the outlier data are illustrated in the figure 3 and 4.
A data in the calibration curve is proven an outlier. Now, we are going to calculate the result obtained using the calibration data with and without the outlier. Then, these results will be compared. In particular, the estimated limit of detection, limit of quantification, the concentration of the lead in the sample and the uncertainty related to calibration curve were compared.

Results Using the Calibration with Outlier
The linearity of the calibration curve was checked using the correlation coefficient and the ANOVA method. The coefficient correlation of the curve was R²= 0.9943, this mean the data have a good correlation and have good linearity. Likewise for ANOVA, the calculated test value F ] equal to 0.426334 and the one sided critical value F (95%, 4, 12) is 3.259167, the F-critic is greater than the test value F ] . Thus, the linearity of the regression function is accepted.
The LOD and LOQ of the measurement of lead were estimated from the calibration data as described in the formula 8. The estimated LOD and LOQ are 1.73 µg L -1 and 5.18 µg L -1 respectively.
Each sample was analyzed three times and the mean values of the absorbance are presented in the table 4. The concentration of the Pb in the sample WS02 and WS03 are lower than the estimated limit of detection; the other samples have a concentration between 2.85 µg L -1 and 22.61 µg L -1 . The uncertainty of the concentration of Pb related to the calibration for each sample was calculated using the equation 7. This uncertainty vary between 1.17 µg L -1 and 1.41 µg L -1 .

Results Using the Calibration Without Outlier
After removing the outlier data in the calibration, the correlation coefficient R² was 0.9969. This value is better than the previous value and the linearity of the calibration is accepted. The ANOVA yield the same result; the calculated F is 0.2398727 which is lower than the critical value F-critic (P=95%, 3,9) 3.862548.
The estimated LOD and LOQ were 1.32 µg L -1 and 3.96 µg L -1 , respectively. Compared to the value obtained above, there values are lower. This mean, with this calibration the recovery is better and the sensitivity is increased [14].
By removing the outlier, the concentration of Pb in 14 the samples are presented in the table 6. The sample WS02 and WS3 remain under estimated limit of detection, the concentration of Pb in the other sample vary between 2.75 µg L -1 and 22.52 µg L -1 . Regarding the uncertainty of Pb related to the calibration, the minimum uncertainty is 0.90 µg L -1 and the maximum is 1.09 µg L -1 .

Comparison
The linearity of the calibration curve was validated with the correlation coefficient R² and ANOVA. The result shows that, the R² of the calibration without outlier is better and the linearity is accepted. The ANOVA test yield same result and the linearity of the calibration curve with and without outlier are accepted.
LOD and LOQ is one of the parameter for validate an analytical method, smaller value means a better sensitivity and recovery. These parameters can be estimated from a repeated analysis of blank or from calibration curve [5] [6]. For this case study, the calibration curve was used to determine the LOD and LOQ, the calculated value of these parameters is presented in the table 6. The LOD and LOQ obtained are better for the calibration without outlier. The deference between the concentration of Pb using the calibration with and without outlier is not very big, it vary between + 0.40% and + 3.29%. The positive sign mean that the concentration is higher when the outlier is not removed. Regarding the uncertainty of Pb related to the calibration curve, the difference is not negligible, these value vary between + 22.48% and + 23.53%. Therefore, checking and removing the outlier in the calibration data improve quality of the measurement.

Conclusion
The measurement of the concentration of lead in the fourteen water samples was achieved by means of graphite furnace atomic absorption spectrometry, a suspected outlier is found when plotting the data in the calibration curve and the residual plot. With help of the statistical tool F-test and T-test, the suspected outlier has been confirmed. The linearity of the calibration curve was checked in the both case with and without outlier, the correlation coefficient and ANOVA show that the linearity of the calibration is accepted for the two cases. The LOD, LOQ, concentration of lead and uncertainty related to the calibration were also compared, the comparison shows that the outlier decrease the quality of these parameters. Also, the difference is remarkable in the value of LOD, LOQ and the uncertainty. The effect in the value of the concentration is not very big.
Thus, the presence of the outlier in the calibration data reduce the quality of the measurement, it is therefore important to deal with outlier in the calibration data to make sure it is free from outlier and also to improve the quality of the measurement and to have a reliable analysis.