Estimating Canopy Nitrogen Content of Rice Using Hyperspectral Reflectance Combined with SG-FD-CARS-ELM in Cold Region

In this study, visible and near infrared hyperspectral imaging technique was used to predict canopy leaf nitrogen content (CLNC) of rice in cold region. Canopy hyperspectral images of rice were acquired at tillering, jointing and heading stage, respectively. Original spectra was extracted using ENVI5.0 software, and leaf nitrogen content was obtained by chemical analysis method. 5 pre-processing methods of savitzky-golay smoothing (SG), multiplicative scatter correction (MSC), standard normal variate (SNV), first derivative (FD) and second derivative (SD) were used to eliminate unexpected noise. After comparing the performance of PLSR models based on spectra of full wavelengths after pre-processing, SG combined with FD had the best performance for eliminating the noise interference and improving the performance of models. In order to further simplify and enhance the models, 3 variable selection methods of successive projections algorithm (SPA), uninformative variable elimination (UVE) and competitive adaptive reweighted sampling (CARS) were used to select the characteristic wavelengths, and partial least square regression (PLSR) and extreme learning machine (ELM) were used to establish prediction models. After comparing the performance of PLSR models and ELM models, CARS could effectively select the wavelengths that had strong information and were not sensitive to external disturbance factors, and the nonlinear ELM model was more suitable for predicting CLNC of rice in cold region, the specific values of RC 2 and RP 2 of ELM models based on CARS were 0.906 and 0.888 for tillering stage, 0.903 and 0.892 for jointing stage, and 0.894, 0.887 for heading stage, respectively. The results of this study could provide a reference for quantitative analysis of nitrogen content of rice using hyperspectral technology.


Introduction
Nitrogen is one of the essential nutrients for rice growth and the most active factor in soil fertility [1,2]. In certain range of nitrogen application, the nitrogen uptake, nitrogen utilization efficiency, traits and yield of rice will be improved with nitrogen application increased. However, if nitrogen application is excessive, it will lead to reduced nitrogen utilization efficiency, soil degradation, decreased rice yield and inferior grain quality, and may even cause ecological pollution. Therefore, it is very meaningful to realize the rapid diagnosis of nitrogen status of rice in order to rationally and accurately apply nitrogen fertilizer. Traditional nitrogen diagnostic methods are visual diagnosis, chemistry diagnosis and chlorophyll meter diagnosis. The visual diagnosis is intuitive, but it's easy to cause confusion and misjudgment. The chemistry diagnosis is more accurate, but so much work is needed to do and the cost is high. The chlorophyll meter diagnosis can only quantitatively estimate nutritional content of specified leaf, which is difficult to reflect nutritional status of large areas of cropland [3][4][5]. Therefore, traditional diagnosis methods have been difficult to meet the actual demand of large scale rice production in time and space hyperspectral technology has the advantages of convenience, accuracy and environmentally friendly, and has become one of the most effective techniques for crop nutrition diagnosis. At present, there are many research results in this field, involving grain crops [6], fruit trees [7], vegetables and so on. In terms of nitrogen nutrition diagnosis of rice, also made a lot of research results. Tian found that the ratio vegetation index constructed by 553 and 537 nm had a good contribution to estimating leaf nitrogen content of rice [8]. Chu used the ratio vegetation index constructed by 770 and 752 nm to well predict nitrogen accumulation of rice leaf [9]. Yu analyzed the laws of nitrogen content and canopy spectra of rice under different nitrogen levels, and found that the optimum multiple narrow band reflectance (OMNBR) model established by maximum R 2 improvement method (MAXR) could improve the accuracy for predicting nitrogen content [10]. Qin believed that the linear model constructed from the ratio of the first derivative of 738 and 522 nm selected by contour of determination coefficient was the optimal prediction model for nitrogen content of rice [11]. Similar to the above studies, scholars mostly used sensitive wavelengths or vegetation indices constructed by sensitive wavelengths to establish the simple linear or nonlinear models to predict nitrogen content of rice. The advantages of this method are simple and intuitive, computing fast and easy to implement. However, the disadvantages are that the accuracy is generally low, the anti-interference ability is poor, and the models in different studies are quite different. At present, there are few studies on the use of various pre-processing methods to filter original spectra, the use of multiple variable selection methods to extract characteristic wavelengths, and then to establish the higher precision linear and nonlinear models. As the raw spectra obtained by hyperspectral technique usually has the obvious noise and contains a large amount of irrelevant information that will weaken the performance of models, especially the canopy spectra collected in the cropland [12]. Therefore, it is necessary to carry out the elimination of uninformative variables and the selection of key variables before using hyperspectral data to quantitatively analyze nitrogen content of rice. Meanwhile, some new studies have shown that the nonlinear models had a more pronounced advantage in quantitative analysis of plant nutrition than the linear models [13]. Therefore, it is very meaningful to establish the nonlinear models to predict nitrogen content of rice. In the existing studies, there are also few studies on hyperspectral monitoring of nitrogen content of rice in cold region of northeast China. Especially, the monitoring periods, indicators and models under the canopy scale are still lack of systematic research. Heilongjiang Province, located in the northeastern part of China, is not only the largest growing area of rice in cold region, but also the most important commodity grain base in China. Using hyperspectral technology to monitor nitrogen content of rice, to provide rich and comprehensive research results, which is of great significance to guarantee rice yield and quality. Therefore, the rice cultivated in Heilongjiang Province is selected as the research object in this study. On the basis of experiments with different nitrogen levels, the canopy spectral information of rice at tillering, jointing and heading stage was obtained by visible and near infrared hyperspectral imaging technique. The performance of various pre-processing methods, variable selection methods and modeling methods in predicting canopy leaf nitrogen content (CLNC) of rice was systematically compared, and then the optimal pre-processing method, wavelength selection method and modeling theory for quantitative analysis of CLNC of rice in cold region were obtained.

Experimental Design
The field experiments were carried out in 2016 in the area of Harbin, Heilongjiang Province, as shown in Figure 1.
The climate is medium-temperate continental monsoon, with very cold winters and warm summers. The annual average temperature and precipitation are 3-4°C and 500-800 mm, respectively. The climatic characteristics are suitable for many field crops (e.g. rice, soybeans, wheat and corn), which have only one harvest per year, and the most widely planted crop in this area is rice. In this study, the experimental rice variety was Daohuaxiang, the major cultivar grown in Heilongjiang Province. The experimental soil was meadow paddy soil, organic matter content was 35.5 g·kg -1 , total nitrogen content was 1.44 g·kg -1 , effective phosphorus content was 51.8 g·kg -1 , available potassium content was 111 g·kg -1 and pH was 6.30. The experimental field was designed with four replications and four nitrogen gradient treatments: N0 (0 kg·ha -1 ), N1 (60 kg·ha -1 ), N2 (120 kg·ha -1 ), N3 (180 kg·ha -1 ), that were used to obtain a large range of nitrogen content. The individual plot size was 4 by 4 m. Other field management practices, such as irrigation, pesticide application, etc., followed local standard practices.

Canopy Images Acquisition
The canopy hyperspectral images of rice were captured with SOC710VP ® HS-Portable (Surface Optics Corp., CA, USA). Its spectral range was 372-1 038 nm and resolution was 4.68 nm. The canopy images were obtained between 10 a.m. and 14 p.m. local time under clear and cloudless conditions, on June 25th, July 15th and August 20th, corresponding to tillering stage, jointing stage and heading stage, respectively. Prior to the canopy images acquisition, calibration measurements were done with a white reference panel. When the images were captured, hyperspectral equipment was placed at a height of 1 m above the rice canopy, two positions were randomly selected to capture images in each plot, and two rice plants were contained in each image. Hyperscanner software (Surface Optics Corp., CA, USA) was used to complete the acquisition and transmission of hyperspectral images. The canopy hyperspectral images of rice at tillering stage under different nitrogen levels are shown in Figure 2.

Nitrogen Content Measurements
After the canopy images acquisition was done, the top 10 leaves of each rice plant in captured images were cut off and put into a numbered sealed bags. All leaves were rinsed with water, put into the oven at 105°C for 30 min and dried at 80°C to constant weight, then crushed and digested with Kjeldahl method. Total nitrogen content of leaves was measured by AA3 flow analyzer (SEAL Analytical Corp., Norderstedt, Germany) according to indophenol blue method, and the average value of 10 leaves was taken as CLNC of the corresponding rice plant.

Reflectance Measurements
A total of 96 canopy images were captured at 3 growth stages, and the reflectance measurements were performed by ENVI5.0 software (Research Systems Inc., CO, USA). 5 regions of interest (ROI) were selected for each rice plant, and the average value of reflectance was taken as the canopy reflectance of the corresponding rice plant. At tillering stage, 64 sets of data were obtained. After removing 2 sets of abnormal data, 42 sets of data were selected randomly as calibration set, the rest were prediction set. In the same way, at jointing stage, 41 sets of data were selected randomly as calibration set, 20 sets of data were prediction set. At heading stage, 42 sets of data were selected randomly as calibration set, 20 sets of data were prediction set.

Spectral Pre-processing
In both spectroscopy and hyperspectral imaging, the spectra is often disturbed by various disturbances. For example, the path length of light transmission is usually affected by the thickness of sample, and the measured values are often affected by the physical properties such as particle size and distribution [14]. Spectral pre-processing was performed using Unscrambler software (Version 9.7, CAMO, Oslo, Norway). The purpose of spectral pre-processing is typically to eliminate the influence of light scattering, background noise, baseline shift, and random error caused by uncontrolled external factors [15]. In this study, 5 spectral pre-processing methods, namely savitzky-golay smoothing (SG), multiplicative scatter correction (MSC), standard normal variate (SNV), first derivative (FD) and second derivative (SD) were applied in 11 strategies, namely SG, MSC, SNV, FD, SD, SG-FD, SG-SD, MSC-FD, MSC-SD, SNV-FD and SNV-SD. SG can filter out the high frequency noise in spectral data, MSC is a transformation method used to compensate for additive and multiplicative effects and SNV is commonly applied to remove the variability caused by light scattering [16]. FD and SD are often used to remove background noise, baseline drift and enhance small spectral features [17]. In order to screen out the optimal pro-processing method from the above methods, PLSR was used to model and predict using the spectral data after various pre-processing methods. Then, the appropriate pre-processing method was selected according to the determination coefficient (R 2 ) and root mean square error (RMSE) of calibration set and prediction set.

Characteristic Wavelengths Selection
Hyperspectral image data is characterized by its 3-dimensionality with multicollinearity, redundancy among contiguous wavelengths, which make the data processing time consuming and could weaken the performance of models [18]. Therefore, the most informative wavelengths should be selected from the whole spectral range of samples to reduce or even eliminate redundancy, thus speeding up data processing and improving the efficiency of data analysis [19]. In this study, 3 variable selection methods, namely successive projections algorithm (SPA), uninformative variable elimination (UVE) and competitive adaptive reweighted sampling (CARS) were used for wavelengths selection in 4 strategies, namely SPA, UVE, UVE-SPA and CARS. SPA, UVE and CARS are all typical variable selection methods for spectral analysis [20][21][22]. SPA selects the variables with minimally redundancy to solve the collinearity problems [23]. UVE selects the informative variables according to their stability calculated from PLSR regression analysis [24]. In the calculation of CARS, the wavelengths with larger absolute regression coefficients of PLSR models are considered as good candidates and selected based on the principle of 'survival of the fittest' from Darwin's Evolution Theory [25]. In addition, besides the calculation of SPA based on full wavelengths, SPA is also commonly carried out after UVE calculation to select the variables that informative but no collinearity.

Models Establishment and Evaluation
The application of chemometrics in modeling spectral data is widely used and is considered as a standard procedure for establishing prediction models in the analysis of hyperspectral images [26]. In this study, partial least square regression (PLSR) and extreme learning machine (ELM) were respectively used to establish prediction models between the spectral data of samples and the corresponding CLNC. PLSR is a classic linear multivariate statistical analysis method that is widely used in stoichiometric modeling analysis. Its principle is to perform factor analysis on characteristic wavelengths matrix X and sample target matrix Y, decompose X and Y into multiple latent variables, and select the optimal latent variables by cross validation method for regression. The cross validation method can well verify the accuracy of models and whether they are supersaturated [27,28]. ELM is a simple supervised learning algorithm for single-hidden layer feedforward neural network (SLFN), which randomly generates the connection weights between the input layer and the hidden layer, and the threshold of neurons in the hidden layer. In the training process, the uniquely optimal solution will be obtained, just by setting the number of neurons in the hidden layer. Compared with traditional computational intelligence techniques, ELM has proved to be an alternative in terms of generalization performance, learning speed, and computational stability [29]. After the models were established, the performance of models needed to be evaluated quantitatively to determine the merits and demerits of them. The determination coefficient of calibration set (R C 2 ) and determination coefficient of prediction set (R P 2 ) were the main criterion, the root mean square error of calibration set (RMSEC) and root mean square error of prediction set (RMSEP) were the auxiliary criterion. The best model should have high R C 2 , R P 2 and low RMSEC, RMSEP. All prediction models development procedures were carried out with MATLAB R2014a (The Math Works, Inc., Massachusetts, USA). The main steps of CLNC prediction of rice from sampling to modeling is shown in Figure 3.

Comparison of Spectral Pre-processing Methods
The canopy spectra of rice of each growth stage was treated by 11 pre-processing methods, respectively. The prediction models were established by PLSR, and the performance of models is shown in Table 2. The performance of PLSR models based on spectra of full wavelengths after different pre-processing methods was different, because not all methods were able to reduce the noise effects and improve the robustness of models. At tillering stage, the performance of PLSR models based on SG and SG-FD was better than model based on original spectra, the specific values of R C 2 and R P 2 were 0.831 and 0.827 for the SG model and 0.848, 0.834 for the SG-FD model. The PLSR models based on SNV, FD and SG-SD also had higher R C 2 values, but the R p 2 values were not as good as that of original spectra. In addition, the performance of models based on MSC, SD, MSC-FD, MSC-SD, SNV-FD and SNV-SD was not better than model based on original spectra. At jointing stage, the performance of PLSR models based on SG and SG-FD was also better than model based on original spectra, the specific values of R C 2 and R P 2 were both higher. Meanwhile, the regularity of performance of PLSR models based on SNV, FD, SD, SG-SD, MSC-FD, MSC-SD, SNV-FD and SNV-SD was the same as tillering stage. At heading stage, the PLSR models based on SG and SG-FD still performed better than model based on original spectra, but the performance of models based on other pre-processing methods was not better. Among them, the PLSR models based on FD, SD and SG-SD performed slightly better than MSC, SNV, MSC-FD, MSC-SD, SNV-FD and SNV-SD. From tillering stage to heading stage, compared the performance of SG, MSC and SNV, the PLSR models based on SG were the best, indicating that the noise of canopy spectra was mainly caused by the high frequency noise. The PLSR models based on SNV had higher R C 2 values and lower R P 2 values at tillering and jointing stage, lower R C 2 and R P 2 values at heading stage, which indicated that the light scattering was really present in original spectra, but it was not the main factor that affected the performance of models. The PLSR models based on SMC with lower R C 2 and R P 2 values indicated that although the additive and multiplicative effects were compensated, the high frequency noise in original spectra was also amplified. Compared the performance of SG, SMC, SNV combined with FD and SD, the performance of models based on SG-FD, SMC-FD and SNV-FD was better than models based on SG-SD, SMC-SD and SNV-SD, indicating that FD was more suitable for the filtering of canopy spectra than SD. Meanwhile, the performance of models based on MSC-FD, MSC-SD, SNV-FD and SNV-SD was not better than models based on single one pre-processing, indicating that the small spectral features enhanced by FD and SD contained some noise, so the noise was also amplified. In summary, at tillering, jointing and heading stage, SG and SG-FD had better performance in eliminating the noise interference and improving the performance of models.
Meanwhile, the R C 2 and R P 2 values of models based on SG-FD were higher than that of models based on SG. Therefore, the best model to predict CLNC of rice should be based on SG-FD pre-processing method. Original spectra and spectra after SG and SG-FD are showed in Figure 4.

Characteristic Wavelengths Selection
After selecting the optimal spectral pre-processing method, SPA, UVE, UVE-SPA and CARS were used to select the characteristic wavelengths to reduce the number of input variables, improve the operation speed, and improve the accuracy and robustness of models. In the SPA calculation, after comparing the RMSEs of different candidate subsets of variables that were obtained by a sequence of projection operations, 30, 26 and 22 wavelengths which had the lowest RMSEs were selected as the characteristic wavelengths at tillering, jointing and heading stage, respectively. The number of these selected wavelengths, which was 23.43% of the number of full wavelengths at tillering stage, 20.31% at jointing stage and 17.18% at heading stage, respectively. In the UVE calculation, the distribution of stability values of all wavelengths was obtained, and the wavelengths with stability values outside the threshold line were defined as the characteristic wavelengths. 56, 51 and 57 wavelengths were obtained at tillering, jointing and heading stage, respectively. The number of these wavelengths was 43.47% of full wavelengths at tillering stage, 39.84% at jointing stage and 44.53% at heading stage, respectively. Besides the calculation of SPA based on full wavelengths, SPA was also commonly carried out after UVE calculation to select the variables that informative but no collinearity [30]. In this study, this strategy was also applied, 16, 16 and 14 wavelengths were obtained at 3 growth stages, respectively. After the UVE-SPA calculation, the information in original spectra was greatly compressed, the number of selected wavelengths was just 12.5% of full wavelengths at tillering stage and jointing stage, 10.93% at heading stage, respectively. At last, CARS was carried out to select the characteristic wavelengths based on the identification of wavelengths with higher absolute coefficients of PLSR models. 11, 10 and 10 wavelengths were selected at tillering, jointing and heading stage, respectively. The number of these wavelengths was just 8.59% of full wavelengths at tillering stage, and 7.81% at jointing stage and heading stage, respectively. The characteristic wavelengths of each growth stage selected by SPA, UVE, UVE-SPA and CARS are shown in Figure 5.

PLSR Models
The characteristic wavelengths of each growth stage selected by the above methods were used to establish the new PLSR models to predict CLNC of rice, the results are shown in Table  3. At tillering stage, the performance of PLSR models based on UVE, UVE-SPA and CARS were better than model based on full wavelengths, the specific values of R C 2 and R P 2 were 0.895 and 0.879 for the UVE model, 0.863 and 0.860 for the UVE-SPA model, and 0.881, 0.871 for the CARS model. But the PLSR model based on SPA had lower R C 2 and R P 2 values than that of full wavelengths. At jointing stage, the PLSR model based on CARS performed better than model based on full wavelengths, the specific values of R C 2 and R P 2 were 0.886 and 0.851. However, the performance of PLSR models based on SPA, UVE and UVE-SPA was not as good as full wavelengths. Among them, the PLSR models based on UVE performed slightly better than SPA and UVE-SPA. At heading stage, the regularity of performance of PLSR models based on SPA, UVE, UVE-SPA and CARS was the same as tillering stage.
From tillering stage to heading stage, the performance of PLSR models based on SPA was not better than PLSR models based on full wavelengths, which indicated that even after filtering by SG-FD, some noise still existed in original spectra, which was non-colinear with the important spectral information. The noise existed in the wavelengths selected by SPA and the performance of PLSR models would be affected. Compared with PLSR models based on full wavelengths, the PLSR models based on UVE had higher R C 2 and R P 2 values at tillering and heading stage, lower R C 2 and R P 2 values at jointing stage, perhaps because during the elimination of uninformative variables, a small amount of useful information was also removed, which resulted in lower R C 2 and R P 2 values at jointing stage. Meanwhile, at tillering and heading stage, the PLSR models based on UVE performed the best, but the number of wavelengths was too large. After carrying out the SPA calculation on the wavelengths selected by UVE, the colinear variables were removed, and the number of wavelengths was significantly reduced, but the R C 2 and R P 2 values of PLSR models also decreased slightly. Compared with SPA, UVE and UVE-SPA, the PLSR models based on CARS always had the better performance. The prediction effect of prediction set of each growth stage based on CARS-PLSR is shown in Figure 6.

ELM Models
Next, the selected characteristic wavelengths of each growth stage were used to establish the ELM models to evaluate the ability of non-linear models in predicting CLNC of rice, the results are shown in Table 4. When implementing the ELM algorithm, the number of neurons in the hidden layer ranged from 5 to 50 in increments of 5, and the number of neurons that achieved the best prediction results was chosen. After repeated training, the optimal number of neurons for 3 growth stages was 15. At tillering stage, jointing stage and heading stage, the regularity of performance of ELM models based on SPA, UVE, UVE-SPA and CARS was consistent with PLSR models. Compared Table 3 and Table 4, the nonlinear ELM models were superior to the linear PLSR models in predicting CLNC, and the ELM models based on UVE, UVE-SPA and CARS all obtained better performance. This might be because, on the one hand, when the amount of nitrogen fertilizer changed, rice plants would undergo complex chemical changes, so there might be a nonlinear relationship between the spectral characteristics and CLNC. On the other hand, although the characteristic wavelengths were selected by SPA, UVE and CARS, which are the variable selection methods based on the linear analysis, there might be still non-linear information in the selected wavelengths.
Compared the performance of different wavelength selection methods in Table 3 and Table 4, if SPA was used to select the characteristic wavelengths from original spectra directly, the performance of models established by these wavelengths was not ideal due to the influence of many external disturbances. UVE could select the wavelengths that had strong information and were not sensitive to external influencing factors, but the number of wavelengths was too large and the performance of models was unstable. By combining UVE with SPA, the number of characteristic wavelengths could be reduced to a minimum of 14, only 10.93% of full wavelengths, but the accuracy also declined. Compared with SPA, UVE and UVE-SPA, CARS could also effectively select the wavelengths that had strong information and were not sensitive to external influencing factors. The maximum number of the selected wavelengths was only 8.59% of full wavelengths, and the performance of PLSR models and ELM models established by these wavelengths was better, especially the ELM models. Therefore, the comprehensive evaluation showed that CARS-ELM could be used as an effective wavelength selection and modeling method for predicting CLNC of rice in cold region. The prediction effect of prediction set of each growth stage based on CARS-ELM is shown in Figure 7.

Conclusions
This study was conducted to evaluate the feasibility of using visible and near infrared hyperspectral imaging technology combined with multiple spectral pro-processing methods, different characteristic wavelength selection methods, linear and nonlinear models for the rapid and non-destructive prediction of CLNC of rice in cold region. In order to eliminate the noise influence and the redundant information, and then select the key variables to establish the higher precision linear and nonlinear models, 5 pre-processing methods of SG, MSC, SNV, FD and SD, 3 variable selection methods of SPA, UVE and CARS, and 2 modeling methods of PLSR and ELM were applied. The results of comprehensive comparison showed that, SG-FD was the optimal pre-processing method to eliminate unexpected noise and enhance the performance of models. CARS could effectively select the characteristic wavelengths that had strong information and were not sensitive to external disturbance factors, and the nonlinear ELM model was more suitable for predicting CLNC of rice in cold region. The results of this study could provide a reference for quantitative analysis of nitrogen content of rice using hyperspectral technology, and technical support for guiding the application of nitrogen fertilizer during the growth process of rice in cold region. Future research is needed to test the performance of models with more samples from different places and different varieties in Heilongjiang Province. Meanwhile, for the ELM model, the wavelengths selected by the methods used in this study may not be the optimal variables. How to develop the nonlinear variable selection methods will also be studied in the future.