Identification of Nonlinear Model with General Disturbances

The nonlinear model has a linear dynamic system following some static nonlinearity. The dominating approach to estimate the components of this model has been to minimize the error between the simulated and the measured outputs. For the special case of Gaussian input signals, we estimate the linear part of the Hammerstein model using the Bussgang’s classic theorem. For the case with general disturbances, we derive the Maximum Likelihood method. Finally one simulation example is used to prove the efficiency of our theory.


Introduction
Many nonlinear systems can be modeled by a Hammerstein model (linear time-invariant (LTI) block following some static nonlinear block), Wiener model (LTI block preceding some static nonlinear block), or Hammerstein-Wiener model (LTI block sandwiched by two nonlinear blocks).
The Hammerstein model is a special kind of nonlinear systems which has applications in many engineering problems and therefore, identification of Hammerstein models has been an active research topic for a long time. Existing methods in the literature can be roughly divided into six categories: the iterative method, the over-parameterization method, the stochastic method, the nonlinear least squares method, the separable least squares method and the blind method.
The paper focuses on the identification of Hammerstein models shown in Figure 1 which consist of a nonlinear memory less element followed by a linear dynamical system. The input signal is denoted by ( ) u t , the output signal by ( ) y t and ( ) x t denotes the intermediate, immeasurable signal. We will call ( ) w t process noise and ( ) e t measure noise, and assume that they are independent. Note that since G is a linear system, the process noise can equally well be applied anywhere after the nonlinearity with an additional filter. This paper will focuses on parametric models. We will assume f and G each belongs to a parameterized models class. Examples of such a model class may be polynomials, splines, or neural network for the nonlinear function f -in general a basis function expansion. The nonlinear f may also be a piecewise linear function, like a saturation or a dead-zone. Common model classes for G are FIR filters, rational transfer functions (OE models) or state space models, but also for example Laguerre filters may be used.
If the process noise w and the intermediate signal x are unknown, the parameterization of the Hammerstein models is not unique. Notice that in the characterization of the Hammerstein model shown in Figure 1. f and G are actually not unique. Any pair kf u G z k for some nonzero and finite constant k would produce identical input and output measurements. In other words, any identification scheme cannot distinguish between f u G z . Therefore to get a unique parameterization, without loss of generality, one of the gains of ( ) f u and ( ) G z has to be fixed. (We may also need to scale the process noise variance with a factor k .) Given input and output data, and model classes for f and G , we want to find the parametersθ and η that best match the data, measured as input u and output y from the system.

A Standard Method and Possible Bias Problems
Several different methods to identify Hammerstein models have been suggested in the literature. A common approach is to parameterize the linear and nonlinear block, and to estimate the parameters from data, by minimizing an error criterion.
If the process noise ( ) w t in Figure 1 is disregarded or zero, a natural criterion is to minimize This is a standard approach and has been used in several papers. If the process noise is indeed zero, this is the prediction error criterion. If the measurement noise is white and Gaussian, (2) is also the Maximum Likelihood Criterion and the estimate is the consistent.
While measurement noise e is discussed in several papers, few consider process noise w . Reference [10] is one exception where both the input and output are subject to noise. Consistency of the estimate method is, however, not discussed in that paper. It may seem reasonable to use an error criterion like (2) even in the case where there is process noise.
is not the true predictor in this case. We will name this method the approximate prediction error method, and we will show that the estimate obtained this way is not necessarily consistent. Suppose that the true system can be described within the model class, i.e. there exist parameters ( ) 0 0 , θ η such that.
An estimate from a certain estimation method is said to be consistent if the parameters converge to their true values, when the number of data N tends to infinity.
To investigate the minimum of the approximative PEM criterion (2) we write the true system as We may regard ( ) w t ɶ as a (input-dependent) transformation of the process noise to the output. Stochastic properties, such as mean and variance of the process noise will typically be preserved in the transformation from ( ) . Now insert the expression for y in Eq. (4) into the criterion (2): Now assume all noises are ergodic, so that time averages tend to their mathematic expectations as N tends to infinity. Assume also that u is a (quasi)-stationary sequence, so that it also has well defined sample averages. Let E denote both mathematical expectation and averaging over time signals.
Using the fact that the measurement noise e is zero mean, and independent of the input u and the process noise w , means that several cross terms will disappear. The criterion then tends to Note that the criterion has a quadratic form, and the true value ( ) , θ η will minimize the criterion Using the partial derivation about the

Bussgang's Theorem and Its Implication for Hammerstein Models
We generalize the following Bussgang's theorem Theorem 1. Let ( ) y t be the stationary output from a static nonlinearity f with a stationary Gaussian input ( ) Assume that the expectations Using some knowledge of probability, we have We insert (13) into (12) then Since we are dealing with stationary processes, the ensemble average and the time average can be equated, that is and we assume ( ) The theorem 1 has thus been shown. Let yu φ and u φ denote the z-transforms of yu R and u R , respectively. Provided that these transforms are well-defined, (10) can also be written as Bussgang's theorem has turned out to be very useful for the theory of Hammerstein and Wiener system identification. The reason for this is that Bussgang's theorem explains why it is possible to estimate the linear and nonlinear parts of a Hammerstein system separately when the input is Gaussian. It can be used to obtain a good estimate of the linear part of the model. It is interesting to note that the result applies also to our more general situation with process noise w .
Theorem 2. Consider the model structure defined by Figure   1. Assume that the input ( ) u t and the process noise ( ) w t are independent, Gaussian, stationary processes (not necessarily white). Assume that the measurement noise ( ) e t is a stationary stochastic process, independent of u and w . It is however not assumed that e is neither white nor Gaussian. Let Then since e is independent of u the cross spectra between u and y will be: , since u and w are Gaussian, then ( ) x t is Gaussian, so Bussgang's theorem tells us that Now it is well known (see e.g. chapter 8 in Ljung (1999)) that ˆN θ will converge to a value that minimizes The theorem is a consequence of the fact that the best linear system approximation that relates u and y is proportional to the linear part 0 G of the true system.
Basically this means that an estimate of the linear system ( ) G q will be consistent for many other common linear identification methods. Note that the gain of G cannot be estimated anyway, since a gain factor can be moved between G and f without affecting the input-output behavior.

An Example
Consider a generalized Hammerstein system ( ) ( ) ( ) ( ) In Figure. 2, we compare the two output responses of the real system and the identification system. In Figure. 3, we draw the error curve between the two output responses. From these Figures, we can see that with the number of measured data pairs tend to infinity, the identification system can suit the real system and the error will tend to zero.

Derivation of the Likelihood Function for White Disturbances
The likelihood function is the probability density function (PDF) of the outputs This means that for given N u , ( ) y t will also be a sequence of independent variables. This in turn implies that the PDF of N y will be the product of the PDF of ( ), 1, 2 y t t N = ⋯ . It is thus sufficient to derive the PDF of ( ) y t . To simplify notation we shall use ( ) , ( ) y t y x t x = = ⋯ for short. To find the PDF, we introduce the intermediate signal x as a nuisance parameter. The PDF of y given x is basically a reflection of the PDF of e , since ( ) ( ) ( ) ( ) , y t G q x t e t θ = + .
It is easy to find if e is a white noise: where e p is the PDF of e .
The same is true for the PDF of x given N u if w is white noise, where w p is the PDF of w . Now by integrating over all x R ∈ , we then eliminate this immeasurable signal from the following equations: We now assume that the process noise ( ) w t and the measurement noise ( ) We can calculate y p and its gradients for each θ and η . This means that the ML criterion can be maximized numerically.
We may also note that each integral in (25) depends on ( ) x t for only one time instant t, so they can be computed in parallel. If the noise covariance w λ and e λ are unknown, they can be just included among the parameters θ and η and their ML estimates are still obtained by (20).

Special Case: No Process Noise or no Measurement Noise
Most approaches suggested in the literature restrict the noise to either process noise or measurement noise. In this case, the likelihood function (25) is considerably simplified, and the criterion is reduced to something we recognize from other references.
First the case of no process noise, 0 w λ = . Since the only stochastic part is the measurement noise, we have This is the prediction error criterion discussed before. It gives a consistent estimate if the condition of no process noise is satisfied. However if there is process noise, this criterion does not use the true prediction, and may be give biased estimates.
For systems with no measurement noise 0 e λ = the criterion (25) forces ( ) ( ) ( ) The maximum can be found by maximizing the logarithm, which reduces the problem to minimizing the criterion Remark. We should add the following theorem to describe the exist of the ( ) 1 , G q θ − Theorem 3. Assume that the filter G is stable, and All that is needed is that the function ( )

,
G q θ be analytic in 1 q ≥ ; that is, it has no poles on or outside the unit circle. We could also phrase the condition as ( ) , G q θ must have no zeros on or outside the unit circle. This ties in very nicely with the spectral factorization result according to which for rational strictly positive spectra, we can always find a representation ( ) , G q θ with these properties. If the process and measurement noise are colored, we may represent the Hammerstein model as in Figure.

Colored Noise
The only stochastic parts are e and w . For a given sequence N x , the joint PDF of N y is obtained in the stand way.