Product of Likelihood Ratio Scores Fusion of Face, Speech and Signature Based FJ-GMM for Biometrics Authentication Application Systems

: The paper proposes a likelihood ratio fusion of face, voice and signature multimodal biometrics verification application systems. Figueiredo-Jain (FJ) estimation algorithm of finite Gaussian mixture modal (GMM) is employed. Automated biometric systems for human identification measure a “signature” of the human body, compare the resulting characteristic to a database, and render an application dependent decision. These biometric systems for personal authentication and identification are based upon physiological or behavioral features which are typically distinctive, Multi-biometric systems, which consolidate information from multiple biometric sources, are gaining popularity because they are able to overcome limitations such as non-universality, noisy sensor data, large intra-user variations and susceptibility to spoof attacks that are commonly encountered in mono modal biometric systems. Simulation show that finite mixture modal (GMM) is quite effective in modelling the genuine and impostor score densities, fusion based the resulting density estimates achieves a significant performance on eNTERFACE 2005 multi-modal database based on face, signature and voice modalities.


Introduction
The word biometrics comes from the ancient Greek words: bios living and metros measure, meaning life measurement. In this context, the science of biometrics is concerned with the accurate measurement of unique biological characteristics of an individual in order to securely identify them to a computer or other electronic system. Biological characteristics measured usually include fingerprints, voice patterns, retinal and iris scans, face patterns, and even the chemical composition of an individual's DNA [1]. Biometrics authentication (BA) (Am I whom I claim I am?) involves confirming or denying a person's claimed identity based on his/her physiological or behavioral characteristics [2]. BA is becoming an important alternative to traditional authentication methods such as keys ("something one has", i.e., by possession) or PIN numbers ("something one knows", i.e., by knowledge) because it is essentially "who one is", i.e., by biometric information. Therefore, it is not susceptible to misplacement or forgetfulness [3]. These biometric systems for personal authentication and identification are based upon physiological or behavioral features which are typically distinctive, although time varying, such as fingerprints, hand geometry, face, voice, lip movement, gait, and iris patterns. Multi-biometric systems, which consolidate information from multiple biometric sources, are gaining popularity because they are able to overcome limitations such as nonuniversality, noisy sensor data, large intra-user variations and susceptibility to spoof attacks that are commonly encountered in mono-biometric systems.
Some works based on multi-modal biometric identity verification systems has been reported in literature. S. K. Sahoo et al. [4] present a bimodal biometric system using speech and face features and tested its performance under degraded condition based a Sum rule scores fusion in which the Speaker verification (SV) system is built using Mel-Frequency Cepstral Coefficients (MFCC) followed by delta and delta-delta for feature extraction and Gaussian Mixture Model (GMM) for modeling and the face verification (FV) system is built using the combination of Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA). Danpinder Kaur et al. [5] Propose new technique fusion at the feature extraction level named msum by combining sum method & mean method to enhance security and accuracy. In this work, database was gathered from 14 users. Each user contributes 4 samples of signature & speech also and Forgeries are also added to test system. 14 forgeries are used for testing purpose. The SIFT features are extracted for offline signature which results as a feature vector of 128 numbers & MFCC features are extracted for speech which results as a feature vector of 195 numbers. The experimental results demonstrated that the proposed multimodal biometric system achieves a recognition accuracy of 98.2% and with false rejection rate (FRR) of = 0.9% & false acceptance rate (FAR) of = 0.9%. Sheetal Chaudhary et al. [6] Describes a new multimodal biometric system by combining iris, face and voice at match score level using simple sum rule in which the match scores are normalized by min-max normalization and The Experimental evaluations are performed on a public dataset which demonstrating the accuracy of the proposed system. The effectiveness of proposed system regarding FAR (False Accept Rate) and GAR (Genuine Accept Rate) is demonstrated with the help of MUBI (Multimodal Biometrics Integration) software. Girija M. K et al. [7] Develop a Multimodal Biometric System using speech, signature and handwriting features, with the objective of improving performance and robustness in which Mel Frequency Cepstral Coefficients (MFCCs) of speaker is determined by extracting and analyzing speaker -specific features from the speech signal and Features like Horizontal Projection Profile (HPP), Vertical Projection Profile (VPP) and Discrete Cosine Transform (DCT) are determined for Signature Recognition, and Handwriting biometric features are used. Mendu Anusha et al. [8] Presents a multimodal biometric system by integrating iris, face and fingerprint to identify a person using Daugman's algorithm for iris recognition, WLD and Eigen faces for face recognition and minute feature and decision tree algorithm for fingerprint recognition. The Experimental estimations are performed on a public dataset indicate the accuracy of the proposed system and The effectiveness of proposed system with respect to False Accept Rate and Genuine Accept Rate is demonstrated with the help of Multimodal Biometrics Integration software. P. S. Sanjekar et al. [9] Presents an overview of multimodal biometrics, includes the block diagram of general multimodal biometrics, modules of multimodal biometric system, different levels of fusion in multimodal biometrics with related work also covered. Mandeep Kaur et al. [10] discusses about Multimodal Biometric System such as signature and speech modalities which are used to overcome some of the problems of uni-modal systems like noise in sensed data, intra-class variations, distinctiveness, and spoof attacks.
A multi-modal biometric verification system based on dynamic facial, signature and vocal modalities is described in this paper. Both face images, signature and speech biometrics are chosen due to their complementary characteristics, physiology, and behavior. In multimodal systems, complementary input modalities provide the system with non-redundant information whereas redundant input modalities allow increasing both the accuracy of the fused information by reducing overall uncertainty and the reliability of the system in case of noisy information from a single modality. Information in one modality may be used to disambiguate information in the other ones. The enhancement of precision and reliability is the potential result of integrating modalities and/or measurements sensed by multiple sensors [11].

Face Extraction and Recognition
Face recognition, authentication and identification are often confused. Face recognition is a general topic that includes both face identification and face authentication (also called verification). On one hand, face authentication is concerned with validating a claimed identity based on the image of a face, and either accepting or rejecting the identity claim (one-to-one matching). On the other hand, the goal of face identification is to identify a person based on the image of a face. This face image has to be compared with all the registered persons (one-to-many matching). Thus, the key issue in face recognition is to extract the meaningful features that characterize a human face. Hence there are two major tasks for that: Face detection and face verification.

Face Detection
Face detection is concerned with finding whether or not there are any faces in a given image (usually in gray scale) and, if present, return the image location and content of each face. This is the first step of any fully automatic system that analyzes the information contained in faces (e.g., identity, gender, expression, age, race and pose). While earlier work dealt mainly with upright frontal faces, several systems have been developed that are able to detect faces fairly accurately with in-plane or out-of-plane rotations in real time. For biometric systems that use faces as non-intrusive input modules, it is imperative to locate faces in a scene before any recognition algorithm can be applied. An intelligent vision based user interface should be able to tell the attention focus of the user (i.e., where the user is looking at) in order to respond accordingly. To detect facial features accurately for applications such as digital cosmetics, faces need to be located and registered first to facilitate further processing. It is evident that face detection plays an important and critical role for the success of any face processing systems.
On the results presented on this paper only size normalization of the extracted faces was used. All face images were resized to 130x150 pixels, applying a bi-cubic interpolation. After this stage, it is also developed a position correction algorithm based on detecting the eyes into the face and applying a rotation and resize to align the eyes of all pictures in the same coordinates. The face detection and segmentation tasks presented in this paper was performed based on 'Face analysis in Polar Frequency Domain' proposed by Yossi Z. et al. [12]. First it extract the Fourier-Bessel (FB) coefficients from the images. Next, it compute the Cartesian distance between all the Fourier-Bessel transformation (FBT) representations and re-define each object by its distance to all other objects. Images were transformed by a FBT up to the 30 th Bessel order and 6 th root with angular resolution of 3˚, thus obtaining to 372 coefficients. These coefficients correspond to a frequency range of up to 30 and 3 cycles/image of angular and radial frequency, respectively. Figure 1. Shows the face and eyes detections for different users from the database, and figure 2. Shows the face normalization for the same users.  Polar Frequency Analysis: The FB series is useful to describe the radial and angular components in images [12]. FBT analysis starts by converting the coordinates of a region of interest from Cartesian (x, y) to polar (r, θ). The f (r, θ) function is represented by the two-dimensional FB series, defined as: where J n is the Bessel function of order n, f (R, θ) = 0 and 0 ≤ r ≤ R. α n,i is the i th root of the J n function, i.e. the zero crossing value satisfying J n (α n,i ) = 0 is the radial distance to the edge of the image. The orthogonal coefficients A n,i and B n,i are given by: if n > 0. An alternative method to polar frequency analysis is to represent images by polar Fourier transform descriptors. The polar Fourier transform is a well-known mathematical operation where, after converting the image coordinates from Cartesian to polar, as described above; a conventional Fourier transformation is applied. These descriptors are directly related to radial and angular components, but are not identical to the coefficients extracted by the FBT.

Face Verification
Feature Extraction: The so-called "eigenfaces" method [13] is one of the most popular methods for face recognition. It is based on the Principal Components Analysis (PCA) of the face images in a training set. The main idea is that since all human faces share certain common characteristics, pixels in a set of face images will be highly correlated. The K-L (Karhunen-Loeve) transform can be used to project face images to a different vector space that is of reduced dimensionality where features will be uncorrelated. In the new space nearest neighbor classifiers can be used for classification. Euclidean distances d in the projection space are mapped into the [0, 1] interval of the real line using the mapping function: f = d / (1+d). It is easily seen that f is also a metric with distance values in [0, 1]. Thus, the decomposition of a face image into an eigenface space provides a set of features. The maximum number of features is restricted to the number of images used to compute the KL transform, although usually only the more relevant features are selected, removing the ones associated with the smallest eigenvalues. Two different approaches, database training stage and the operational stage [13]. The concept verification system is illustrated in figure 4.
The training stage: Face spaces are eigenvectors of the covariance matrix corresponding to the original face images, and since they are face-like in appearance, they are so are called Eigenfaces.
Consider the training set of face images be 6 , 6 + , … , 6 8 ; the average face of the set is defined as: where M is the total number of images.
Then, the eigenvectors A B and the eigenvalues C B with a symmetric matrix C are calculated. A B Determines the linear combination of M difference images with to form the Eigenfaces: From these Eigenfaces, I J H Eigenfaces are selected corresponding to the I highest eigenvalues.
At the training stage, a set of normalized face images, {i}, that best describe the distribution of the raining facial images in a lower dimensional subspace (Eigenface) is computed by the following operation: Where 1, … , H and L 1, … , I.
After that, the training facial images are projected onto the eigenspace, M N , to generate representations of the facial images in Eigenface: where 1, 2, … , H. The operational stage: This approach is based on the same principles as standard PCA, explained in the training stage. The difference is that an eigenface space is extracted for each user. Thus, when a claimant wants to verify its identity, its vectorized face image is projected exclusively into the claimed user eigenface space and the corresponding likelihood is computed. The advantage of this approach is that it allows a more accurate model of the user's most relevant information, where the first eigenfaces are directly the most representative user's face information. Another interesting point of this method is its scalability in terms of the number of users. Adding a new user or new pictures of an already registered user only requires to compute or recompute the specific eigenface space, but not the whole dataset base as in the standard approach. For verification systems, the computation of the claimant's likelihood to be an specific user is independent on the number of users in the dataset. On the contrary, for identification systems, the number of operations increases in a proportional way with the number of users, because as many projections as different users are required. In the verification system described in this article, the independent user eigenface approach has been chosen. Each user's eigenface space was computed which 16 frames extracted from the database still faces.

Voice Analysis and Feature Extraction
Gaussian Mixture Models (GMMs), is the main tool used in text-independent speaker verification, in which can be trained using Figueiredo-Jain (FJ) algorithm [14] [15]. In this work the speech modality, is authenticated with a multi-lingual textindependent speaker verification system. The speech trait is comprised of two main components as shown in figure 5: speech feature extraction and a Gaussian Mixture Model (GMM) classifier. The speech signal is analyzed on a frame by frame basis, with a typical frame length of 20 ms and a frame advance of 10 ms [16]. For each frame, a dimensional feature vector is extracted, the discrete Fourier spectrum is obtained via a fast Fourier transform from which magnitude squared spectrum is computed and put it through a bank of filters. The critical band warping is done following an approximation to the Mel-frequency scale which is linear up to 1000 Hz and logarithmic above 1000 Hz. The Mel-scale cepstral coefficients are computed from the outputs of the filter bank [17]. The state of the art speech feature extraction schemes (Mel frequecy cepstral coefficients (MFCC) is based on auditory processing on the spectrum of speech signal and cepstral representation of the resulting features [18]. One of the powerful properties of cepstrum is the fact that any periodicities, or repeated patterns, in a spectrum will be mapped to one or two specific components in the cepstrum. If a spectrum contains several harmonic series, they will be separated in a way similar to the way the spectrum separates repetitive time patterns in the waveform. The description of the different steps to exhibit features characteristics of an audio sample with MFCC is showed in figure 6.   [17].
The distribution of feature vectors for each person is modeled by a GMM. The parameters of the Gaussian mixture probability density function are estimated with Figueiredo-Jain (FJ) algorithm [14]. Given a claim for person C's identity and a set of feature vectors Q RS T U VW supporting the claim, the average log likelihood of the claimant being the true claimant is calculated using: where _ S T|C ∑`: V a : b S T ; c d eeeT; f : and C g`: , c d eeeT , f : h :

11) Biometrics Authentication Application Systems
Here C Z is the model for person C. i 9 is the number of mixtures, `: is the weight for mixture j (with constraint ∑`: 1 V a : ), and b S T ; c T , f is a multi-variate Gaussian function with mean c T and diagonal covariance matrix f. Given a set RC j U j k of B background person models for person C, the average log likelihood of the claimant being an impostor is found using: The set of background person models is found using the method described in [19]. An opinion on the claim is found using: 32 r X Q|C Z < X QlC Z The opinion reflects the likelihood that a given claimant is the true claimant (i.e., a low opinion suggests that the claimant is an impostor, while a high opinion suggests that the claimant is the true claimant).

Signature Verification Systems
Handwritten signature is one of the first accepted civilian and forensic biometric identification technique in our society [20] [21] [22]. Human verification is normally very accurate in identifying genuine signatures. A signature verification system must be able to detect forgeries and at the same time reduce rejection of genuine signatures. The signature verification problem can be classified into categories: offline and online. Offline signature verification does not use dynamic information that is used extensively in online signature verification systems. This paper investigates the problem of offline signature verification. The problem of offline signature verification has been faced by taking into account three different types of forgeries: random forgeries, produced without knowing either the name of the signer or the shape of his signature; simple forgeries, produced knowing the name of the signer but without having an example of his signature; and skilled forgeries, produced by people who, looking at an original instance of the signature, attempt to imitate it as closely as possible. and A whitening linear transformation is finally applied to each discrete-time function so as to obtain zero mean and unit standard deviation function values. Seven dimensional feature vectors are used for GMM processing described in the following section. Figure 9 shows x-, y-, p-and velocity signals of an example signature.

Multimodal Biometric Fusion Decision
The process of biometric user authentication can be outlined by the following steps [23]: a) acquisition of raw data, b) extraction of features from these raw data, c) computing a score for the similarity or dissimilarity between these features and a previously given set of reference features and d) classification with respect to the score, using a threshold. The results of the decision processing steps are true or false (or accept/reject) for verification purposes or the user identity for identification scenarios.
The fusion of different signals can be performed 1) at the raw data or the feature level, 2) at the score level or 3) at the decision level. These different approaches have advantages and disadvantages. For raw data or feature level fusion, the basis data have to be compatible for all modalities and a common matching algorithm (processing step c) must be used. If these conditions are met, the separate feature vectors of the modalities easily could be concatenated into a single new vector. This level of fusion has the advantage that only one algorithm for further processing steps is necessary instead of one for each modality. Another advantage of fusing at this early stage of processing is that no information is lost by previous processing steps. The main disadvantage is the demand of compatibility of the different raw data of features. The fusion at score level is performed by computing a similarity or dissimilarity (distance) score for each single modality. For joining of these different scores, normalization should be done. The straightforward and most rigid approach for fusion is the decision level. Here, each biometric modality results in its own decision; in case of a verification scenario this is a set of trues and falses. From this set a kind of voting (majority decision) or a logical AND or OR decision can be computed. This level of fusion is the least powerful, due to the absence of much information. On the other hand, the advantage of this fusion strategy is the easiness and the guaranteed availability of all single modality decision results. In practice, score level fusion is the bestresearched approach, which appears to result in better improvements of recognition accuracy as compared to the other strategies.
Adaptive Bayesian Method Based Score Fusion Let Q ?Q , Q + , … , Q y @ denote the match scores of K different biometric matchers, where X k is the random variable representing the match score of the k th matcher, L = 1, 2, … , I. Let gen (S) and imp (S) be the conditional joint densities of the K match scores given the genuine and impostor classes, respectively, where S = [S 1 , S 2 , … , S K @. Suppose we need to assign the observed match score vector X to genuine or impostor class. Let Ѱ be a statistical test for testing H 0 : X corresponds to an impostor against H 1 : X corresponds to a genuine user. Let Ѱ (x) = i imply that we decide in favor of H i , i = 0, 1. The probability of rejecting H 0 when H0 is true is known as the false accept rate (size or level of the test). The probability of correctly rejecting H0 when H1 is true is known as the genuine accept rate. The Neyman-Pearson theorem [24] [25] states that: (1). For testing H 0 against H 1 , there exists a test Ѱ and a constant ŋ such that: and (2). If a test satisfies equations (16) and (17) for some ŋ, then it is the most powerful test for testing H 0 against H 1 at level ∝.
According to the Neyman-Pearson theorem, given the false accept rate (FAR) ∝ , the optimal test for deciding whether a score vector X corresponds to a genuine user or an impostor is the likelihood ratio test given by equation (17). For a fixed FAR, it can select a threshold ŋ such that the likelihood ratio test maximizes the genuine accept rate (GAR). Based on the Neyman-Pearson theorem, we are guaranteed that there does not exist any other decision rule with a higher GAR. However, this optimality of the likelihood ratio test is guaranteed only when the underlying densities are known. In practice, it estimate the densities fgen(x) and fimp(x) from the training set of genuine and impostor match scores, respectively and the performance of likelihood ratio test will depend on the accuracy of these estimates [23] [26]. (

1). Estimation of Match Score Densities
Gaussian mixture model (GMM) has been successfully used to estimate arbitrary densities and it is used for estimating the genuine and impostor score densities [14] [27].
Let Š y (S; c, ⅀) be the K-variate Gaussian density with mean vector µ and covariance matrix ⅀, i.e., Assuming that it has independent, identically distributed data, it can write the above equations as: The maximum for this function can be find by taking the derivative and set it equal to zero, assuming an analytical function.
The incomplete-data log-likelihood of the data for the mixture model is given by: Which is difficult to optimize because it contains the log of the sum. If it considers X as incomplete, however, and posits the existence of unobserved data items ˜ Rs U V whose values inform us which component density generated each data item, the likelihood expression is significantly simplified. That is, it assume that s ∈ R1 . . IU for each i, and s = L if the i-th sample was generated by the k-th mixture component. If it knows the values of Y, it obtains the complete-data loglikelihood, given by: which, given a particular form of the component densities, can be optimized using a variety of techniques [28]. EM Algorithm: The expectation-maximization (EM) algorithm [23] [27] [29] [30] is a procedure for maximumlikelihood (ML) estimation in the cases where a closed form expression for the optimal parameters is hard to obtain. This iterative algorithm guarantees the monotonic increase in the likelihood L when the algorithm is run on the same training database.
The probability density of the Gaussian mixture of k components in Ʀ oe can be described as follows: where ∅(S| is a Gaussian probability density with the parameters ` , ∑ ), ` is the mean vector and ∑ is the covariance matrix which is assumed positive definite given by: and ‹ ∈ ?0, 1@ (6 = 1,2, … , L) are the mixing proportions under the constraint ∑ ‹ B = 1. If it encapsulate all the parameters into one vector: ¡ B = (‹ , ‹ + , … , ‹ B , , + , … , B ), then , according to Eq. (27), the density of Gaussian mixture can be rewritten as: For the Gaussian mixture modeling, there are many learning algorithms. But the EM algorithm may be the most well-known one. By alternatively implementing the E-step to estimate the probability distribution of the unobservable random variable and the M-step to increase the log-likelihood function, the EM algorithm can finally lead to a local maximum of the log-likelihood function of the model. For the Gaussian mixture model, given a sample data set ¢ = {S , S + ,· · · , S V U as a special incomplete data set, the log-likelihood function can be expressed as follows: which can be optimized iteratively via the EM algorithm as follows: Although the EM algorithm can have some good convergence properties in certain situations, it certainly has no ability to determine the proper number of the components for a sample data set because it is based on the maximization of the likelihood. (

2). Figueiredo-Jain Algorithm
The Figueiredo-Jain (FJ) [23] [25] [29] [30] algorithm tries to overcome three major weaknesses of the basic EM algorithm. The EM algorithm presented previous section requires the user to set the number of components and the number will be fixed during the estimation process. The FJ algorithm adjusts the number of components during estimation by annihilating components that are not supported by the data. This leads to the other EM failure point, the boundary of the parameter space. FJ avoids the boundary when it annihilates components that are becoming singular. FJ also allows starting with an arbitrarily large number of components, which tackles the initialization issue with the EM algorithm. The initial guesses for component means can be distributed into the whole space occupied by training samples, even setting one component for every single training sample.
The classical way to select the number of mixture components is to adopt the "model-class/model" hierarchy, where some candidate models (mixture pdf's) are computed for each model-class (number of components), and then select the "best" model. The idea behind the FJ algorithm is to abandon such hierarchy and to find the "best" overall model directly. Using the minimum message length criterion and applying it to mixture models leads to the objective function: Where N is the number of training points, V is the number of free parameters specifying a component, and = ³ is the number of components with nonzero weight in the mixture (∝ ¯> 0). in the case of Gaussian mixture is the same as in (Eq. 11) the last term ln ℒ (Q, ) is the log-likelihood of the training data given the distribution parameters (Eq. 27).
The EM algorithm can be used to minimize (Eq. 36) with a fixed = ³ . It leads to the M-step with component weight updating formula: This formula contains an explicit rule of annihilating components by setting their weights to zero.
The above M-steps are not suitable for the basic EM algorithm though. When initial C is high, it can happen that all weights become zero because none of the components have enough support from the data. Therefore a componentwise EM algorithm (CEM) is adopted. CEM updates the components one by one, computing the E-step (updating W) after each component update, where the basic EM updates all components "simultaneously". When a component is annihilated its probability mass is immediately redistributed strengthening the remaining components.
When CEM converges, it is not guaranteed that the minimum of ¬( , Q) is found, because the annihilation rule (Eq. 35) does not take into account the decrease caused by decreasing = ³ . After convergence the component with the smallest weight is removed and the CEM is run again, repeating until = ³ = 1. Then the estimate with the smallest ¬( , Q) is chosen. The implementation of the FJ algorithm uses a modified cost function instead of ¬( , Q).

Experiments and Results
The experiments were performed using still faces, signatures and audio database extracted from video, which is encoded in raw UYVY. AVI 640 x 480, 15.00 fps with uncompressed 16bit PCM audio; mono, 32000 Hz little endian. Uncompressed PNG files are extracted from the video files for feeding the face detection algorithms. The capturing devices for recording the video and audio data were: Allied Vision Technologies AVT marlin MF-046C 10 bit ADC, 1/2" (8mm) Progressive scan SONY IT CCD; and Shure SM58 microphone. Frequency response 50 Hz to 15000 Hz. Unidirectional (Cardiod) dynamic vocal microphones. Thirty subjects were used for the experiments in which twenty-six are males and four are females. For each subject, 30 signatures (with dat header) are used. Each line of a (dat files) consists of four comma separated integer values for the sampled x-and y-position of the pen tip, the pen pressure and the timestamp (in ms); the lines with values of -1 for x, y and pressure represent a pen-up/pen-down event; The device used for recording the handwriting data was a Wacom Graphire3 digitizing tablet. Size of sensing surface is 127.6mm x 92.8mm. With spatial resolution of 2032 lpi (lines per inch), able to measure 512 degrees of pressure. The signature data is acquired with a non-fixed sampling rate of about 100Hz. The audio is extracted as 16 bit PCM WAV file (with wav header), sampled at 16000 Hz, mono little endian. For the audio six multi-lingual (.wav files) of one minute each recording were used for each subject. The database obtained from eNTERFACE 2005 [31]. Thirty subjects were used for the experiments in which twenty-five are males and five are females. For face experts, ninety-six face images from a subject were randomly selected to be trained and projected into Eigen space, and the other twenty-four samples were used for the subsequent validation and testing. Similarly, four samples were used in speech experts for the modeling (training); two samples were used for the subsequent validation and testing. For signature experts, twenty four signatures from a subject were randomly selected for training, and the other six samples were used for the subsequent validation and testing. Three sessions of the face database, signature and speech database were used separately. Session one was used for training the speech and face experts. Each expert used ten mixture client models. To find the performance, Sessions two and three were used for obtaining expert opinions of known impostor and true claims.
Performance Criteria: The basic error measure of a verification system is false rejection rate (FRR) and false acceptance rate (FAR) as defined in the following equations: False Rejection Rate (FRR i ): is an average of number of falsely rejected transactions. If n is a transaction and x (n) is the verification result where 1 is falsely rejected and 0 is accepted and N is the total number of transactions then the personal False Rejection Rate for user i is Equal Error Rate (EER), is an intersection where FAR and FRR are equal at an optimal threshold value. This threshold value shows where the system performs at its best.
As a common starting point, classifier parameters were selected to obtain performance as close as possible to EER on clean test data (following the standard practice in the face and speaker verification area of using EER as a measure of expected performance). A good decision is to choose the decision threshold such as the false accept equal to the false reject rate. In this paper it uses the Detection Error Tradeoff (DET) curve to visualize and compare the performance of the system (see Figure 11).

Conclusions
The paper has presented a human authentication method combined dynamic face, signature and speech information in order to improve the problem of single biometric authentication, since single biometric authentication has the fundamental problems of high FAR and FRR. It has presented a framework for fusion of match scores in multi-modal biometric system based on adaptive Bayesian method. The likelihood ratio based fusion rule with GMM-based Figueiredo-Jain (FJ) density estimation achieves a significant recognition rates. As a result presented a combined authentication method can provide a stable authentication rate and it overcomes the limitation of a single mode system. Based on the experimental results, it has shown that EER can be reduced down significantly between the face, signature mode and a combined face-voice-signature mode.