A New Receptor Model Based on the Alternating Trilinear Decomposition Followed by a Score Matrix Reconstruction for Source Apportionment of Ambient Particulate Matter

A new receptor model based on the alternating trilinear decomposition followed by a score matrix reconstruction (ATLD-SMR) was developed for the source apportionment of urban PM10 for the first time. First, simulated three-way data arrays of gas chromatography-mass spectrometry (GC-MS) were used to verify the feasibility of the ATLD-SMR method. Then, PM10 samples (receptor) at five locations and TSP samples of ten pollution sources were collected during July and August, 2018 in Loudi City, China. The collected samples were measured by GC-MS. PAHs were used as tracers and their concentrations were accurately obtained by the ATLD-SMR analysis of GC-MS data of these samples after the problems of GC-MS including baseline drift, retention-time shift and unexpected peaks overlapping were successfully resolved. The highest concentrations of individual PAH in these samples were for phenanthrene and benzo [a] pyrene (40.76 ng m -3 and 39.63 ng m -3 in Liangang steel-making workshop, respectively). Last, a relative contribution matrix of the source to the receptor was estimated by the ATLD-SMR method. The proposed method was employed to apportion the source contributions to PM10 particles at five locations and reasonable results were obtained, thus presenting a promising tool for source apportionment of complex ambient particulate matter.


Introduction
Atmospheric pollution is an environmental concern worldwide leading to increasing morbidity due to cardiovascular disease and cancer as a result of exposure to polluted air [1][2][3]. Accurate source apportionment of ambient particulate matter has become among the hot topics in the atmospheric sciences. Until now, three approaches have been applied to estimate source contributions from source sample data and/or ambient sample data. The first, emission inventory is a list model of anthropogenic emissions based on their history and heritage [4][5]. The second, diffusion model (also termed a source model), evaluates the influences of pollution sources on airborne particulates at a given site based on the discharge rates of the point source and meteorological data [6]. The third, known as a receptor model, focuses on studying the contribution of the pollution source to the receptor [7][8][9].
Receptor models include source unknown receptor models, such as principal component analysis (PCA) [10,11], PCA/multiple linear regression (PCA/MLR) [12,13] and positive matrix factorization (PMF) [14], and source known receptor models, for example, the classic chemical matter balance model (CMB) [15]. Receptor models have been widely applied in the field of source apportionment of ambient particulate matter [7,[16][17][18][19]. In these methods, inorganic matter (metallic elements and inorganic ions) or organic compounds are commonly used as tracers to calculate the presence of the effluent from different sources within the ambient particulate matter [20,21]. Therefore, there are two important problems that must be solved before source apportionment. The first is tracer identification in a complex environmental sample and the second is accurate quantification of tracers. However, ambient particulate samples contain the most chemically complicated components with low concentrations and severe mutual interference. This makes accurate tracer qualitative and quantitative determination in these samples even more difficult or impossible even with the aid of modern analytical instruments such as liquid or gas chromatography-mass spectrometry (LC/GC-MS). However, the accuracy of these methods is closely associated with tracer qualitative and quantitative results. Thus, there is a great need to develop one alternative strategy to carry out the nondestructive and accurate identification and quantification of tracers in source and receptor samples even in the presence of unknown interferents. Fortunately, the chemometrics second-order calibration method provides a good analytical strategy to conduct source apportionment of ambient particulate matter. It has the advantages of simplifying sample pretreatments, resolving overlapped chromatographic peaks and providing physicochemical information on components in an actual system because of its well-known "second-order advantage" [22][23][24].
According to the aforementioned discussion, there may be three methods which can be developed to construct source apportionment models for analysis of the three-way data array. A graphical representation of these ways was shown in Figure  1. The first method is based on the trilinear decomposition of only the source dataset with second-order calibration algorithm to obtain the relative contributions of pollution sources. The second is built on the iteration and decomposition of only the receptor dataset to extract the number and category of pollution sources. The third is based on first decomposing the source dataset and the receptor dataset with second-order calibration algorithm, respectively, and then estimating the relative contribution coefficients by combining the obtained source and receptor score matrices. These aforementioned methods may be found in some existing approaches elsewhere [25][26][27]. However, there are some deficiencies. For example, only the information from one side of the source and receptor can be utilized in the first two ways and the chemical compositions of sources may not be fully matched with those of the receptor in the third way.
Thus, in this work, a method based on another new way is proposed to construct a source apportionment model. The proposed method first decomposed the combined three-way GC-MS data array of source and receptor samples with alternating trilinear decomposition algorithm and then estimated the relative contribution of a pollution source to ambient particulate matter according to the reconstructed score matrix of tracers extracted from the source and receptor samples. This new method, the alternating trilinear decomposition followed by a score matrix reconstruction (ATLD-SMR), has the advantages of fast, simple, accurate interference-free quantification of tracers and full utilization of the fingerprint information of samples to estimate the contributions of the source to the receptor. The developed method is elaborated in the following sections.

Alternating Trilinear Decomposition
In this work, alternating trilinear decomposition (ATLD) was used to decompose the obtained three-way data array of the source and receptor samples. ATLD was proposed by Wu et al in 1998 [28]. The algorithm has fast convergence and is insensitive to excess factors, which are very useful properties in real environmental situations for three-way data decomposition. Details of ATLD can be found in the literature [28]. The matrices A, B and C can be provided by ATLD decomposition of three-way data array. The analyte (s) concentrations can be obtained by regression of the appropriate column of C corresponding to each analyte against its standard concentrations.

ATLD Followed by a Score Matrix Reconstruction
In the case of GC-MS, if the measured matrices of the k samples including the k 1 calibration samples, k 2 source samples and k 3 receptor samples (k=k 1 +k 2 +k 3 ), are stacked in a three-way data array (GC×MS×Sample), it can be decomposed via ATLD according to Equation (1) where, x ijk is the element of the three-way data array X (I×J×K), and a in , b jn , and c kn are the corresponding elements of the loading matrices A (I×N), B (J×N) and score matrix C (K×N) of X, respectively. The term e ijk is the element of the three-way residue data array E (I×J×K). A and B collect the normalized elution-time and mass-to-charge (m/z) profiles of the N factors, respectively. C collects the relative concentration or score of N factors in all samples. Then, Equation (1) can be written in terms of matrix X k , where X k is the kth (I×J) slice of X. ' ' ' 1 1 1 2 2 2 n n X a b + a b + + a b + E , 1,2,..., ; 1,2,..., .
where a n and b n are the nth column of the loading matrices A and B, respectively, and ( ' n n a b ) is the loading matrix of the nth factor. The score value (c kn ) is the relative contribution of the loading factor ( ' n n a b ) to matrix X k . However, N denotes the total number of tracers of interest, interference and background. Thus, we can extract the vectors of tracers from raw score matrix. By regression of the vector (corresponding to relative concentrations) of each tracer against its standard concentrations in calibration samples, the targeted concentrations of markers can be obtained, which is second-order calibration.
For the reconstruction of a score matrix of only markers to estimate the contribution of the source to the receptor, the following equation should first be calculated for the k 2 source The subscript "r" represents "receptor".
P is a reconstructed score matrix, which is also called a relative contribution matrix. X k s to the receptor matrix 3 , X k r . Finally, the measured matrix of any receptor sample can be expressed with respect to the k 2 measured matrices of pollution sources as follows: Thus, we can obtain the relative contribution of the pollution source to ambient particulate matter based on the alternating trilinear decomposition followed by a score matrix reconstruction (ATLD-SMR) of the three-way GC-MS data of source and receptor samples.

Simulated Chromatography-mass Spectrum-sample Data
A chromatography-mass spectrum data set was simulated for 15 samples with four components. The chromatographic profiles were generated as follows: A( , 1)=Gaussian peak(0.10, 4, 10, ), Then, a three-way data array was obtained with the size of 80 × 200 × 15. To simulate a real environmental measurement process, homoscedastic and heteroscedastic noises were added to the simulated data set as follows: where σ homo and σ heter are the homoscedastic and heteroscedastic noise levels, taking values of 0.001, 0.002, 0.005 and 0.01.

Sampling Procedure
PM10 (particulate matter with dp < 10 µm) samples were collected for 12 hours once every four days during July and August of 2018 at five functional zones of Loudi City, China using a large-volume air sampler (YH-1000, Qingdao Jingcheng Instrument Co., Ltd, China) at a flow rate of 1.0 m3 min -1 . The sampler was mounted on the roof of a building approximately 15 m above the ground. PM10 was captured onto quartz-fiber filters 200 mm × 240 mm in size. TSP (total suspended particulates) samples of different sources were collected for 24 hours during the month by using four medium-volume samplers (JCH-120F (one) and JCH-6120 (three), Qingdao Juchuang Environmental Co., Ltd, China) at a flow rate of 100.0 L min -1 . TSP was captured onto 88-mm diameter quartz-fiber filters. Figure 2 shows the locations of the sampling sites of the PM10 and TSP samples. The information for these samples is provided in supplementary Table A1. These samples were separately wrapped in zip-top plastic bags and stored at -20.0°C in a refrigerator.

PAHs Extraction
The Soxhlet method was used to extract PAHs from the filter samples. A total of 300.0 mL of dichloromethane/n-hexane with a volume ratio of 3:2 was used as an extraction agent and added into a Soxhlet extractor to fully immerse the filter samples. The extraction temperature was set at 78°C. The extraction time was 20 min for each cycle and the filling/emptying cycles exceeded twelve. After the extraction, the solvent was concentrated using rotary evaporation at 40°C under low pressure to 2.0 mL and the solution was maintained on standby in a small beaker. 2. Instrumentation and software Retention time-mass spectral matrices were obtained on gas chromatography-mass spectrometry (GC/MS-QP2010, Shimadzu Corporation, Japan). The column for GC separation was an Rxi®-1MS (0.25 µm and 0.25 mm × 30 m) nonpolar capillary column (Restek, USA). The temperature program of the GC was that it was initiated at 150°C for 2.0 min, increased to 300°C at a rate of 30°C min -1 , then maintained at 300°C for 5.0 min. The mass spectrum detector was operated with electron impact (EI) source and under the following conditions: full scan mode with a range of 50-400 amu; injector temperature, 250°C; ion source temperature, 230°C; interface temperature, 250°C; and detection voltage, 0.94 kV. In addition, the sample injection volume was 1.0 µL with a split ratio of 5:1 and helium was used as carrier gas at a flow rate of 2.7 mL/min. The data analysis was performed on Matlab (MATLAB, 7.9.0, Math Works, USA) software with home-written programs in the environment of a Microsoft Windows 8 operating system.

Analytical Procedures
3. Standard solutions Each of analytical standards including BeP and BbF was weighed accurately, dissolved into a little dichloromethane, then transferred into a 100.0 mL volumetric flask and diluted to the mark with methanol to obtain the appropriate stock solutions. Stock solutions of the other eight PAHs were prepared by directly dissolving corresponding analytical standards into methanol. These stock solutions were stored at 4.0°C in a refrigerator.
4. Sample sets Seven calibration samples (C01-C07) were prepared by mixing the working solutions of 10 PAHs, and then diluting to volume with pure methanol in 10.0 mL volumetric flasks. Concentrations of 10 PAHs in these samples were summarized in supplementary Table A2. Samples of PM10 (n=5) and TSP (n=12) were directly extracted for analysis by GC-MS. All of the samples were filtered with 0.22-μm nonsterile PTFE syringe filter (i-Quip®N2536) before auto-injection into the separation column. Note that human exposure to PAHs may cause fatal risk, so effective precautions have to be taken in the whole experiment. The sample preparation should be carried out in a fume hood with wearing suitable gloves, eye/face protection and other protective clothes.

A Noise-free Three-way Data Array
First, a three-way chromatography-mass spectrum data array without any noise was simulated to investigate the accuracy of the novel method. In the simulated data, four source samples were designed to reproduce four receptor samples according to Equation (7) and four species were used as markers in the source and receptor samples. A display of the normalized chromatogram, mass spectrogram and concentration profiles of simulated data is shown in Figure A1. From Figure A1 (A) and (B), we can see that it was complex system because there was serious overlap between the four components in the dimensions of both chromatography and mass spectrometry. However, ATLD-SMR can be firstly applied to decompose the simulated three-way data array to obtain the accurate chromatograms, mass spectrograms, and concentrations of the four components (The information is not shown here because the discussion on the resolution and quantification of analytes with second-order calibration methods can be found elsewhere). Then, the contributions of different sources to receptors can be estimated by the obtained concentrations of the four components with the ATLD-SMR method. The fourth column of Tables A3-A4 and the third column of Table 1 list the predicted contributions and statistical results of the four sources to four receptors in the simulated data. The accuracy of the developed method was 100% for the noise-free three-way data, which indicates that the proposed method is feasible.

A Three-way Data Array with Different
Homoscedastic Noise Levels In real environmental analysis, noise often occurs in measured data arrays. Thus, different homoscedastic noise levels (0.001, 0.002, 0.005 and 0.01, respectively) were first added in the simulated data array to further check the performance of the proposed method. From Tables A3-A4 and  Table 1, it can be seen that when the homoscedastic noise level was less than 0.005, the results of the developed method were satisfactory with an AD (×10 -2 ) ranging from 0.2 to 3.8 and an RMSEP (×10 -3 ) ranging from 0.3 to 9.5. When the noise level increased to 0.01, the results of the new method were still satisfactory with an ARC ranging from 0.9609 to 1.0417 and T-test values less than 1.5. These results of the developed method were acceptable for a homoscedastic noise level as high as 0.01.

A Three-way Data Array with Different
Heteroscedastic Noise Levels Then, different heteroscedastic noise levels (0.001 to 0.01) were integrated into the simulated three-way data array to verify the feasibility of the novel method. The results are summarized in Table 2 and Supporting Information A5-A6. It was found that the results were similar to those of the homoscedastic noise. The AD and RMSEP values increased with an increase in the heteroscedastic noise level. When the noise level was 0.01, the results of the proposed method remained acceptable, although there was one outlier value of related coefficient (see Supporting Information A6). During the real measurement process, the noise level was rarely greater than 0.01. Thus, the new method is a promising tool for source apportionment of complex ambient particulate matter as demonstrated by real atmospheric data in the forthcoming sections.
where ctrue and cpredicted are the actual and predicted contribution, respectively, and I is the number of prediction samples.  Figure 3 (A) shows the total ion current (TIC) chromatograms of all of the samples studied in this work. As seen from Figure 3 (A), there were serious baseline drifts in the studied retention domains for most of the samples. Another troublesome problem was that the chromatographic baseline drift of each sample differed from that of one another. Thus, it was clear that it was impossible to carry out an accurate identification and quantification of tracers of interest for such an actual complex situation with univariate regression. Usually, the baseline drift can be fitted as one or more extra component (s) in LC×LC-DAD or LC-MS data with second-order calibration method [29][30]. Herein, the ATLD method was firstly introduced to address the difficult problem of baseline drift in GC-MS data for the first time. As illustrated in Figure 3 (B), raw GC-MS data array of BbF in calibration samples (7.57-7.72 min, only one analyte of BbF) was decomposed by the ATLD method for 100 times with factor number N=1, the result was very poor and there was only 33% probability to get BbF with regression coefficient (R) of only 0.9834. However, when N=2, the raw array was fitted very well by ATLD (R=0.9999). The results demonstrated that baseline drift was fully extracted out as an extra component and "pure signal" of BbF can be exactly identified. Therefore, the problem of serious baseline drifts in GC-MS analysis can be easily resolved with the aid of the proposed strategy. For the sake of providing accurate solutions and reducing data analysis and handling time, the elution region of 10 PAHs could be divided into four sub-segments, including 2.75-4.73, 5.50-5.79, 6.73-6.80 and 7.62-7.99 minutes, respectively. The analytes in these sub-segments overlapped with each other partially or fully on the retention time. In the fourth sub-segment, there were obvious retention-time shifts, which could be found in raw TIC profiles of Figure 3 (C). Additionally, when N=3 in the example of BbF mentioned above, the shift component was also extracted out by the ATLD method (see the retention-time and sample-number plots of Figure 3 (B) (N=3)). Thus, the retention-time shift must be aligned before applying ATLD-SMR to analyze the three-way data array. Many chemometric strategies had been developed for aligning chromatographic peaks shifts among samples [31][32][33]. In the work, a named abstract subspace difference (ASSD) method was used to align the time shift occurring in the fourth sub-domain [34]. The aligned chromatographs of seven calibration and three test samples by ASSD method were shown in Figure 3 (C), which showed that the problem of retention-time shifts had been resolved successfully.

Identification of Tracers in Source and Receptor
Samples During the progress of model construction, a three-way dataset of twenty-two samples including seven calibration samples, ten source samples and five receptor samples was firstly divided into four sub-three-way data arrays according to four elution chromatographic segments mentioned above. A pictorial display of the identified results with the ATLD-SMR method was given in Figure 4. In the dimension of elution time, it was found that chromatograms of ten tracers are heavily overlapped with each other or unknown interference co-eluted. Even so, ATLD-SMR can still extract clear chromatographic profiles of ten tracers in all of samples. The resolved chromatographic profiles of ten tracers were in good agreement with their corresponding real ones, which were shown in Figure 4 (A1-A4). In the dimension of mass spectrum, besides overlapped with uncalibrated components, there were three pairs of tracers completely overlapped with each other, such as (1) FLO and PYR, (2) BaA and CHR, (3) BbF, BaP and BeP. However, the resolved mass spectra of ten tracers matched quite very well with the actual ones (shown in Figure 4 (B1-B4)). These results indicated the ATLD-SMR method had the accurate and reliable qualitative performance, which can be applied to identify tracers from complex environmental samples even in the presence of unknown and uncalibrated interference.

Quantification of Tracers in Source and Receptor
Samples Based on the above reliable qualitative results, a quantification of ten tracers in source and receptor samples can be obtained by the developed method. Table 3 listed the predicted concentrations of tracers in source and receptor samples. From the table we can see, the highest concentrations of individual PAHs in these samples were for phenanthrene and benzo [a] pyrene (40.76 ng m -3 and 39.63 ng m -3 in Liangang steel-making workshop, respectively). Then, the obtained concentrations of ten tracers can be applied to estimate the relative source contributions to different PM10 samples with the ATLD-SMR method in the following step.

The Estimation of the Source Contribution to PM10
According to the obtained concentrations of ten tracers in the source and receptor samples, the relative contribution matrix of P can be calculated with the ATLD-SMR method.
Wherein, the concentrations of ten tracers in the source samples were used as the score matrix of sources, and these in the receptor samples as the score matrix of receptors. Then, we obtained the relative contribution of each single source to the PM10 samples. They are listed in Table 4. At site BWPM, the main pollution sources were AED (73.35%), SYB (9.93%) and LGSW (6.19%). At site RWSPM, MGPM, and SYPM, the contribution of AED was also the highest, accounting for 58.25%, 38.58% and 47.32%, respectively. Only for site LGPM, the main pollution sources were BSD (35.05%), SYB (21.29%) and AED (17.19%). These results showed that AED was the principal single pollution source to all measured PM10 samples expect for LGPM. In order to apportion the sources of PM10 better, 10 pollution sources were classified into five types, including paved road dust (SYB and EG), indoor emissions (NFY), coal and charcoal combustion (BSD, LGPW and LGSW), construction fugitive dust (BJMD) and vehicle exhaust (MED, AED and TED). A diagrammatic representation of the combined contributions is provided in Figure 5. As shown in Figure 5, vehicle exhaust was the most significant pollution source at the sampling point of the Biwu building, which was near a main urban road, contributing 77.64% of the PM10. Paved road dust provided the second highest contribution, accounting for 13.16%. Industrial coal combustion and construction fugitive dust accounted for 6.19% and 3.01%, respectively.
For the railway station point, vehicle exhaust was also the main contributor, accounting for 64.72%, which is easy to understand for a heavy traffic area, and then followed by the contributions that coal and charcoal combustion (15.77%), construction fugitive dust (11.89%) and paved road dust (7.63%) gave ( Figure 5 (B)). For the Liangang industrial area ( Figure 5 (C)), coal and charcoal combustion, vehicle exhaust and paved road dust provided the main contributions to the PM10, accounting for 42.54%, 25.12% and 21.29%, respectively. For the municipal government and Shuyi building ( Figure 5 (D) and 5(E)), the contribution of vehicle exhaust was the highest as 71.65% and 49.83%, respectively. These results showed that vehicle exhaust was the main contributor for all sampling points except for Liangang industrial area in the studied city. This can be easily explained as Loudi is a city with few coal-burning industries but the use of family car increases sharply in the city in recent years. Also, all the contributions estimated by the proposed method conformed to the actual situation in these five areas in the studied city.

Conclusion
In the work, the procedure based on the alternating trilinear decomposition followed by a score matrix reconstruction (ATLD-SMR) of three-way GC-MS data of source and receptor samples allowed the source apportionment of PM10 in different urban locations. The principal of the developed method was based on firstly decomposing the combined three-way dataset of source and receptor samples by ATLD algorithm, then calculating the relative contribution matrix according to the reconstructed score matrix of tracers extracted from source and receptor scores. The contributions of different sources as well as chromatographic and mass spectral profiles of ten tracers were obtained using the procedure. From the obtained contributions, it was found that vehicle exhaust was the principal pollution source to all measured PM10 mass except for Liangang. At Liangang, the main pollution source was industrial coal and charcoal combustion. The new ATLD-SMR method can also address other three-way data such as those produced by excitation-emission matrix fluorescence or phosphorescence, high-performance liquid chromatography-diode array detection, liquid chromatography-mass spectrometry, etc. It should be applied in other fields, such as the source apportionment of water and soil pollution, in the near future. Therefore, the newly developed method is a promising tool for source apportionment of environmental organic pollutants. Figure A1. A plot of the normalized chromatogram (A), mass spectrogram (B) and concentration profiles (C) of the simulated data.