Multimodal Biometrics Data Analysis for Gender Estimation Using Deep Learning

In the recent past with the rapid growing technology security problem is ubiquitous to our daily life pertinent to it, now a day the usage of biometrics is becoming inevitable. Correspondingly, the field of biometrics has gained tremendous acceptance because of its individualistic and authentication capabilities. In many practical scenario the multimodal-based gender estimation will helps to increase the security and efficiency of other biometrics system. Likewise, in contrast to it uni-modal biometric, the multimodal biometrics system would be very difficult to spoof because of its multiple distinct biometrics features. Gender identification using biometrics traits are mainly used for reducing the search space list, indexing and generating statistical reports etc In this paper, a robust multimodal gender identification method based on the deep features are computed using the off-the-shelf pre-trained deep convolution neural network architecture based on AlexNet. The proposed model consists of 20 subsequent layers which contain different window size of convolutional layers following with fully connected layers for feature extraction and classification. Extensive experiments have been conducted on a homologous SDUMLA-HMT (Shandong University Group of Machine Learning and Applications) multimodal database with 15052 images. The proposed method achieved the accuracy of 99.9% which outperforms the results noticed in the literature.

Moreover, any uni-modal biometric systems suffers from a variety of problems including [29] distorted data, intra-class variations, a constrained level of freedom, non-comprehensiveness, spoofs, and intolerable error rates and so forth. However; these confines can be mitigated by fusing different uni-modalities. Intuitively multimodal systems are more reliable and viable solution as multiple independent characteristics of modalities are fused together which interns can provides higher accuracy rather than any uni-modal system. Multi-modal biometrics based gender identification represents an emerging trend and practically have many real-world applications like human-computer interface, security, gender-based advertisement, forensics and surveillance etc [30].
In this work, we present a gender identification system using deep neural network architecture from multi-modalities like Face, Iris, and Fingerprints. The proposed work divulges effective and real time method for multimodal gender estimation by network fine tuning strategy which interns provides competitive result. The paper is organized as follows section 2 contains the literature review; section 3 is focused on proposed methodology. In section 4, database is described and in section 5 results are discussed and finally, the conclusions are drawn in Section 6.

Related Work
Preceding research has shown that the possibility of authenticating an individual from their respective modalities, However from the related literature, it is observed that a limited work has been is carried out for identifying a person's gender using their multi-modal biometrics. This section reviews the studies which have been reported on gender identification using multi-modal biometric, Xiong Li et al. [24] have performed multimodal based gender identification by combining local binary pattern and Bag of words features based on decision level fusion on the face and fingerprint traits of the internal database of 397 volunteers from Han nationality and obtains an accuracy of 94% using Bayesian Hierarchical model. Mohamed A et al. [25] have performed multimodal based gender identification by combining features, Eigenvalue, Syntactic Complexity, Response length, shallow and deep syntax and mean, heart rate max-min difference features are combined on five different traits i.e. visual linguistic, physiological, thermal and acoustic traits. A database of 51 males and 53 females were utilized and obtained an overall accuracy of 80.6% using decision tree classifier. Abdenour H et al. [26] have utilized three different unimodal databases namely CRIM, vidTimit and Cohn-Kanade dataset of face and video which are further combined to form a multimodal dataset (face and video) by using local binary pattern obtained an overall accuracy of 96.3%by support vector machine. Caifeng Shan et al. [27] performed multimodal gender identification by combining the face and gait modalities from CASIA Gait b dataset from which frontal face image is extracted from 119 subjects which are further combined with gait videos. AdaBoost based face detector and Background subtraction feature extraction techniques are implemented respectively and using the support vector machine classifier an overall accuracy of 97.2% is achieved. S S Gornale et al. [28] have utilized the binarized statistical image features along with multi-block local binary pattern features on face, iris, and fingerprint traits of SDUMLA-HMT dataset. Then by Support Vector Machine classifier the highest accuracy of 99.8% accuracy is yielded.
Further, S. S. Gornale et al. [29] have performed synthesization of features along with fusion of classifiers on Face, Fingerprints and Iris images of SDMULA-HMT and KVK-Multimodal datasets an highest accuracy of 99.9% and 99.8% is noted respectively.
From the literature it is observed that, major focus has been given on the handcrafted features and limited work is presented on deep learning algorithm there is still scope to develop a multi-modal system for gender classification based on deep convolution method.

Proposed Methodology
Deep learning is well known and special kind of neural network which processes the data in grid like topology. The CNN accepts the whole image as input and automatically extracts the meaningful features which are required for efficient image classification. Deep learning is a branch of machine learning algorithm which have been used in solving many classical artificial intelligence problems, such as image classification, medical imaging for skin disease identification, cancer cell detection and natural language processing [31] etc.
In this work, we have trained AlexNet from the scratch on SDUMLA multimodal images for gender identification. AlexNet was initially designed by Alex Krizhevsky [32] in 2012 to solve the 1000 class image classification problem in (ILSVRC-2012). We have fine tune the AlexNet because AlexNet is a good model to be trained with limited amount of data to train the net directly and AlexNet is a deep network that can extract many abstract and high scale multimodal features.
Gender classification using multimodal biometrics generally involves three steps namely pre-processing, feature extraction and classification. The initial task is to do pre-processing, as each image is resized to 224*224 grey scale images. Since the input requirement for the AlexNet is a fixed size of 224*224 color image, we empirically converted grey image to color by duplicating the grey channels. Feature computation step deals with extraction of features information using meaning full feature map. Lastly, is evaluated with different binary classifiers. AlexNet is simple feed forward neural network architecture with multiple layers in subsequent connected denser neurons with successive hidden layers. In this work, we performs the fine tuning of AlexNet architecture by configuring the network with activation function is rectified linear unit (ReLU) whose operation is non-linear in nature. With ReLU activation function the training time is seen significantly decreased with the network hyperbolic non-linearly tangent. Where k indicates the current layer and u k is the input of ReLU.
Generally, there are 3 fully connected layers in the network namely FC6, FC7 and FC8, where FC6 and FC7 layers have 4096 neurons and FC8 has 1000 neurons. An image can be presented to any connected layer which results in a feature map. During training we have initialized with FC7, and then with propagation the resultant feature map is convolved with set of weights called filters. Likewise different filters sets were applied over different features map however same kind of filters has been shared among all neurons. A pooling layer reduces this spatial content of the representation output by a convolution layer resulting in a reducing number of the parameters and the number of computations within the network. Pooling work independent on every depth slice of its input usually by applying max operation. Here wk is a weigh of the layer, xk-1 represents the output of the k-1 layer and b is the bias which is initialized to 0.
Overfitting is one of the most common problems in neural networks which can be prevented by dropout and Batch Normalization. The Dropout is based on the idea of leaving a certain neurons in each iteration to train the network and the batch normalization is applied which intern reduces covariate shifts of the over fitted data and makes training faster. Gradient descent (GD) is the learning process of neural network is based on the searching for the combination of learning parameters that assures the lowest errors for the loss functions. The gradient of the loss function is used to find the direction towards the weighted vector Where α represents the learning rate and 'Wi' are weights of δ layers, during the training the weights are tends to increases in each such iterations. The stochastic gradient descent (SGD) is considered to be as optimization method which automatically updates its corresponding weight at each iterations, The learning rate was 0.003 is initialized in all cases, it may seems to small but accordingly to the dataset size, the weight updates were to control the better training, Basically here weights are been transferred instead of sparse sparsefying the weight which results in higher accuracy.

Database
In this work, we have used publicly available SDUMLA-HMT standard dataset which is collected and maintained by machine learning and application lab of Shandong University [33]. The dataset includes real multimodal data from 106 individuals out of which 59 are male volunteers and 47 are female volunteers. The dataset contains face images which were collected from different poses, expressions and accessories. An Iris images were captured giving proper direction to volunteers, images from the both eyes are collected. Likewise, fingerprint dataset images are acquired with FT-2BU sensors, from each such subject images of both-hand thumb, index and middle finger were collected by giving prerequisite directions. Thus in total the database consists of 15052 multimodal images. A sample dataset of male and female images are shown in figure 2.

Results and Discussion
In this study, we have trained the AlexNet from the scratch on multimodal biometrics images which are further been validation over different binary classifiers i.e. Naive Bayes, K-Nearest Neigbour, Support Vector Machine and Decision Tree classifier on publically available SDUMLA-HMT database.
The algorithmic steps are as follows: Input: Multimodal images. Output: Gender Identification.
Step 1: Input the images from database.
Step 2: Image pre-processing is carried out.
Step 3: Fine tuning of pre-trained AlexNet model is performed.
Step 4: Resultant feature map is trained and tested.
Step 5: Gender Classification is performed by different binary classifiers.
The proposed architecture consists of 20 layers, which includes five convolutions layers with different convolution window sizes i.e. of 11×11, 5×5 and 3×3 pixels, followed by two fully connected layers and a softmax layer. The first layer is the input layer which accepts the image with 250×250×3 pixel image dimension. The subsequent layers are convolution convlayer1 which have a window size of the 11×11 pixels with 96filters and then next layer is a ReLU, which have the non-linear activation function shadowed by cross normalization layer and Max-pooling layers Likewise, then ReLU is tailed with a convolution i.e. convlayer2 which have a window size of the 5×5 pixels with 256 filters and an intermediate pooling is sub-sampled with another cross normalization layer and Max-pooling layers, followed with a convolution convlayer3 which have a window size of the 3×3 pixels with 384 filters and ReLU function. Follow by convolution convlayer4 which have a window size of the 3×3 pixels with 384 filters and another ReLU function. Further, with convolution convlayer5 which have a window size of the 3×3 pixels with 256 filters and another ReLU function an intermediate pooling is down-sampled Next layer is a fully connected layer1 (FC7) with 4096 neurons, by ReLU activation function followed by the dropout layer is used to reduce over fitting in neural network, fully connected layer2 (FC8) compel with 1000 class classifier (neurons), The last softmax layer has 2 neurons that identify the class membership which predicts whether the multimodal biometrics images belong to male or female. The results are depicted in the form of confusion matrix in Table 1. With the Table 1 it is witnessed that the highest accuracy of 99.9% is observed using the SVM classifier and the lowest accuracy of 81.2% is noted by using Naive Bayes classifier. Further, by KNN classifier K=3 which is empirically fixed, the City-Block distance an accuracy of 98.7% is noted. With Decision Tree classifier and K-NN classifier Euclidean distance a similar accuracy of 98.6% is observed respectively.

Conclusion
Gender identification has been explored by several researchers, in the past decades major focus is been towards handcrafted features based unimodal methods and its implementation. In this paper a robust algorithm has been deliberated which automatically perform multimodal based gender identification using fine-tuned AlexNet architecture. In this work, deep convolution method is considered to get robustness in gender determination and achieved an accuracy of 99.9% on SDUMLA-HMT multimodal biometric database.