Kernel Collaborative Representation Classification Based on Adaptive Dictionary Learning

: In recent years, with the progress of technology, face recognition is used more and more widely in various fields. The classification algorithm based on sparse representation has made a great breakthrough in face recognition. However, face images are often affected by different poses, lighting


Introduction
Recently, sparse coding and dictionary learning are one of the most successful applications of pattern recognition and computer vision, and dictionary learning based pattern recognition has been widely concerned. Due to some face images are affected by different postures, illumination and expression changes [1,2], it is usually difficult for test samples to be represented by original training samples with limited number. Based on this problem, dictionary learning can effectively use a large number of data to model the attitude, illumination, and facial expression information of the corresponding changes. Therefore, the test sample can be better represented by the atoms from the optimized dictionary.
Wright et al. [3] have demonstrated that the test image can be approximated by a sparse linear combination of the training images. The sparse coefficients corresponding to most of the training samples are zero [4]. Finally, we determine which class the test samples belong to according to the minimum reconstruction error of different classes. It tries to find the potential contribution of different training samples to test images. All the training sample images are used as the shared dictionary D. When the changes in the face image are small or the training samples are full, the shared dictionary can fully capture the main features of face image, so the dictionary can represent and classify the test sample. Sparse representation based classification algorithm has made great breakthroughs in face recognition [5][6][7][8]. However, due to various noises in the real application, training samples usually have large intra class changes. It is usually difficult for test samples to be represented by original training samples with limited number of changes. At present, the research of face recognition based on sparse representation is mainly focused on two points: (1) different sparse constraints; (2) dictionary optimization with different constraints.
For sparse constrained problems, the sparse representation based on 0 ℓ -norm convex programming algorithm can get preferably recognition results [9]. However, minimizing 0 ℓ -norm is a NP problem with large amount of computation. The traditional sparse representation algorithm model uses 1 ℓ -norm as a convex optimization function approximating to 0 ℓ -norm to solve this kind of sparsity problem [10]. However, there are still problems with the complexity of the algorithm and the relatively high time cost. Zhang L [11] proposed a collaborative representation with the 2 ℓ -norm. Considering that the training dictionary is sufficient, the recognition performance is similar to the 1 ℓ -norm. and it has a closed analytic solution compared to the sparse representation of the L1 norm, thus it has a lower computation Complexity. Research shows that in sparse representation classification performance, inter-class collaboration has more important contribution than sparsity. The authors in [12] gave a probabilistic explanation for collaborative representation and proposed probabilistic cooperative sparse representation. Zheng J [13] proposed iterative constraint group sparse and adaptive weight learning face recognition. It obtains more structural information and discriminant information compared with other methods based on linear regression. The methods above are all based on the same basic assumption that the test samples can be represented by a linear combination of training samples. However, the original face images are interfered by the nonlinear factors such as facial expression change, posture, illumination, occlusion and so on. Thus, this linear hypothesis does not exploit the nonlinear relationship between the training samples, which may reduce face recognition robustness. To classify faces more effectively, Xu Y [14] used nonlinear functions to transform the original space to the feature space where the test image can be approximated by a sparse linear combination of the training images. Based on this assumption, kernel method is combined with sparse representation [15,16] and low rank representation [17,18] to solve classification and approximation problems. Dong Wang et al. proposed the Kernel Collaborative Representation (KCR) [19] to select face function and combine Hamming Kernel and LBP features for face recognition, and the experimental results show that the method has a good recognition effect especially when the training sample is not sufficient.
Based on the different dictionary optimization problems, Rubinstein R [20] described how to use mathematics and learning models to obtain the evolution of the dictionary, which showed the importance of the dictionary learning to the sparse representation model. Besides, Xu Y [21] provided a comprehensive overview of face recognition dictionary learning methods. To solve the occlusion problem and improve the robustness of face recognition, Yang M [22] used Gabor features to compress the dictionary, and proposed a dictionary model with the sparse constraint framework. Aiming at the single sample problem, Wei C [23] proposed the auxiliary dictionary learning algorithm to expand the original dictionary and thus improve the classification performance. Hu Y S [24] used D-KSVD dictionary learning method to get a discriminative structured dictionary. In [25], a domain adaptive dictionary learning algorithm was proposed to expand the intra class diversity of the original training samples by collaboration with the source data to solve the problem of visual image classification in different source domains. Xu et al. [21] first selected train samples that are near to the test samples. Then the test image is estimated by the linear combination of the selected training samples, and the recognized result is obtained. The experiments show that the simple sparse representation face recognition algorithm (NTS) [26] can also obtain better performance. The above algorithm indicated that the quality of dictionary affected the performance of image classification.
Besides the above, face images are affected by different postures, lighting and expression changes. It is difficult for test samples to be represented by limited original training samples. For each test image, the best representation dictionary may be different. Considering the conventional dictionary learning methods suffer from the problems of lacking adaptability, we propose to construct an adaptive dictionary associated with the test sample. The labeled atom dictionary is learned from each kind of training samples by sparse approximation. Based on this scheme, we could obtain an efficient algorithm to generate an adaptive dictionary which related with the test sample. Secondly, the coarse to fine sparse representation is related to the adaptive dictionary learning problem. We fully extracted the non-linear factors such as facial expression changes, posture, lighting, and occlusion that existed in the face image. The kernel collaborative representation is used to realize the inter class competition classification.

Kernel Collaborative Representation
Classification Based on Adaptive Dictionary Learning

Sparse Representation Based Classification
Suppose that there are L individuals and each sample can be represented as a column vector. Let is the j th sample of the i th training sample and m is the dimension of the training sample. i n is the number of i th training sample. n is the total number of samples.
The purpose of sparse representation classification is that when a training sample set is given, it can correctly identify which category the test sample belongs to. Since we can't judge the label of test sample y , the test sample y can be represented by a linear combination of all training samples D as: x is sparse coefficient vector, and sparse coefficient is The non-zero coefficient in x correspond that those associated with the ith individual. In face recognition, Face images usually satisfy m n < , so the linear equation y Dx = is underdetermined. The solution of the equation is not unique. Sparse vectors only have a small number of elements nonzero, so the solutions are sparse. In order to solve the problem, the problem of norm optimization 0 l is adopted: Among them, 0 l norm is the number of nonzero elements of a coefficient vector. However, Eq (2) is a NP-hard problem. It is difficult to accurately solve. The existing optimization theory shows that if sufficient and sparse, the norm 0 l optimization problem can be transformed into a norm 1 l optimization problem: However, the face images collected in reality are inevitably affected by noise. It is usually difficult to accurately represent test samples. Thus, the existence of error is permitted and the limit of error tolerance is defined as . Thus, Eq. (3) can be revised as: Finally, the test sample y is classified into the minimum residual: The authors [4] proposed a classification algorithm based on collaborative representation. It was proved that the role of collaboration between classes in representing the query sample is more important than the sparse constraints. Compared with the 1 l -regularized sparse representation based classification (SRC), the 2 l -regularized CRC_RLS has very competitive FR accuracy but with significantly lower complexity. Here ρ can be obtained: where λ is the regularization parameter. The role of the regularization term is twofold. On the one hand, it guarantees the stability of ρ , and then guarantees the sparsity of the coefficient solution. The solution of CR with regularized least square in Eq. (6) can be easily and analytically derived as: of query y . So it can be pre-calculated as a projection matrix P . Once a query sample y comes, we project y onto P via Py . The proposed CRC with regularized least square (CRC_RLS) algorithm is summarized as follows.
(1) Normalize the columns of D to have unit 2 l -norm.

Kernel Collaborative Representation
Kernel collaborative representation (KCR) face recognition algorithm first uses the kernel function to transform the original space to the feature space. Then Suppose that any sample in feature space is a column vector, we can rewrite Eq. (11) into the following equation: , since Φ may not be a square matrix and ϕ is unknown, we cannot directly solve Eq. (12). But, Eq. (12) can be expressed as follows: The kernel function is defined as ( , ) ( ) ( ) Eq. (13) can be converted into: Where , , (14) can be solved as follows: If K is singular, Eq. (14) can be solved as: Where µ is a positive constant and I is the identity matrix. Eq. (16) can be represented as: This indicates that the representing problem of test samples in the feature space has been converted into a new problem of representing y K by i K , i K represents the i th column of the matrix K . We refer to i K as kernel vector of the i th training sample. It seems that different classes of training samples make different contributions to representing y K . We evaluate the contribution of each class and classify the test sample as follows: first, we calculate the sum of the contribution of the training samples from each class.
− is, the greater contribution of the kth class is. We identify the class that contributes the most to the test sample Y (that is, the class that corresponds to the minimum error) and assign Y to this class. This equivalent to classify the test sample into the class that is the most similar to this category, because the smallest residual k e means that the linear combination of the training samples of class k is closest to the test sample.

Kernel Collaborative Representation Classification Based on Adaptive Dictionary Learning
The original sparse representation uses all the training sample sets as a dictionary, the dictionary selection is not flexible enough, and the time complexity is high. Aiming at the problem of dictionary optimization, a kernel collaborative representation and classification based on adaptive dictionary learning is proposed. Fig. 1 shows the flow chart of kernel collaborative representation and classification based on adaptive dictionary learning. The method mainly includes two steps. Suppose there are L categories. First, the labeled atom dictionary is learned from each kind of training samples by coefficient approximation. Based on this scheme, we could obtain L fitting images as adaptive dictionary which related with the test sample. Secondly, the test sample and the adaptive class dictionary D are projected to the kernel space, and the test samples are classified by the linear combination of the label adaptive dictionary in the kernel space. Each training sample and test sample was converted into column vectors.  The steps of the kernel collaborative representation classification based on adaptive dictionary learning are as follows: (1) We code test sample y through each class of training (2) We perform sparse approximation of test samples through each type of training sample î (3) Using sparsely approximated images as adaptive dictionary D , Map the test sample y and the adaptive class dictionary D to the kernel space and solve the linear system: If K is a nonsingular matrix, it can be solved as follows: If K is singular, it can be solved by using: The test sample is represented by the class k dictionary (5) Classifying test samples using residuals: The kernel space classification method has two advantages: (1) it proposes a kernel linear system and uses it to classify test samples; (2) the linear system identification method has a lower time complexity. Suppose m is the number of training samples for each class, there are L individuals, n Lm = is the total number of training samples. The kernel representation method should solve only one linear system in the form of Eq. (11), the time complexity is 3 2 (n n ) O + . The 1 l sparse representation uses an iterative method to solve the linear system, which has a higher time complexity. Even if we do not consider the iterative process and only solve the linear system, the time complexity of the 1 l sparse representation algorithm is 3 2 (n n M nM) O + + , where M is the dimension of the sample vector in the original space. Thus, the kernel space representation method has a lower time complexity than the 1 l sparse representation method.

Experimental Results
In order to verify the effectiveness and stability of the proposed algorithm, We adopted the Gaussian kernel function ( ) , here σ is the parameter of the ernel function. We perform the experiments on the AR [27], FERET [28], and GT [29] face databases. To verify the effectiveness of our approach, ADL、NTS [26]、SRC were selected as benchmarks. Additionally, all the comparison experiments are conducted on a PC with MATLAB R2014a software.

Experiment on AR Face Database
The AR database contains 126 people I and each person has 26 images. In this paper, 50 women and 50 males were selected from the AR face database, each of which contains 14 images, including changes in hairstyle, expression, and illumination. The size of facial image is 40 50 × and all the images are down-sampled to 20 25 × pixels. Fig. 2 is an adaptive class dictionary obtained by fitting a training sample on an AR face database with 8 training samples per class.
In this subspace, N (=2, 4,6,8,10) images were selected as training samples in each class, and the rest images were taken as test samples. Table 1 shows the recognition rates of kernel collaborative representation classification based on adaptive dictionary learning (KADL), the sparse representation classification based on adaptive dictionary learning (ADL), the simple fast sparse representation (NTS) and the sparse representation classification algorithm (SRC) with the number of different training samples, respectively. From Table 1, the recognition rate improved with the increase of training samples since the number of training samples increases, discriminating information increases. In addition, our method fits each type of training sample and obtains more global information than the NTS method, and has a higher recognition rate than SRC and ADL. We take the effect of image dimensions into account and performed the experiments with different sample dimensions to demonstrate the robustness of the algorithm.
To analyze the impact of dimensionalities on the methods, experiments were performed with different sample dimensions to demonstrate the robustness of our algorithm. Figure 3 shows the correct recognition rate change curve for different recognition algorithms when the dimension of image features changes. It can be seen that the recognition rate of the proposed KADL approach is higher than other algorithms as a whole.

Experiment on FERET face Database
The experiment was simulated on the FERET database. The FERET dataset contains a large number of face images, and the images from the same person's images have different ages, poses, expressions, and lighting changes. This experiment uses a cut-out FERET face database, which includes 200 individuals, each with 7 face images for a total of 1400 images. The size of all images in the database is 80 80 × , All the images are resized to 32 32 × . In each category, 5 images were randomly selected as training samples, and the remaining 2 samples were used as test samples. Figure 4 shows an adaptive class dictionary image obtained by fitting a training sample on a FERET face database.  Table 2 shows the recognition rate of KADL, ADL, NTS and SRC in the FERET database. In order to further verify the validity of the KADL algorithm, experiments are performed under different feature dimensions. Figure 5 shows the recognition rate curves of each algorithm when the feature dimensions of the image are different. From the data, we can see that the classification effect of our proposed algorithm is better than other methods.

Experiment on GT face Database
This section uses the GT face dataset to test the proposed approach. The GT dataset contains 50 persons and each person has 15 images. Some sample images in GT database are shown in Figure 6. In the experiment, N (= 7, 8, 9, 10) images were randomly selected as training samples and the rest were used as test samples. All the images are down-sampled to 32 32 × . Table   3 shows the recognition rates of KADL, ADL, NTS and SRC with different number of training samples. In order to further verify the performance of KADL algorithm, we also performed experiments on different dimensions. Figure 7 shows the recognition rate curves of different methods under different dimensions. From the Figure  7, we can see that the results with the proposed KADL approach are generally higher than other methods.

Conclusions
In order to overcome the shortcomings of traditional dictionary learning mode, such as fixity and lacking of adaptability, this paper discusses adaptive dictionary learning. For each test image, a proper dictionary model is constructed according to its own characteristic, and an adaptive dictionary approximates the input pattern is generated. In order to fully extract the nonlinear factors such as facial expression change, posture, illumination and occlusion in face images, test sample and adaptive dictionary are mapped to high-dimensional feature space using kernel function, we classify the test sample in the kernel space instead of the original space. The problem of face recognition is solved by combining coarse and fine two step sparse representation with adaptive dictionary learning. A series of experiments are carried out in the AR face database, the FERET face database and the GT face database, which prove the validity of the kernel collaborative representation classification based on adaptive dictionary learning.
Science Foundation of Hebei Province of China under Grant (No. F2016203422) and Hebei key laboratory of information transmission and signal processing. The authors declare that there is no conflict of interests regarding the publication of this paper.