Data-Driven Modelling of Natural Gas Dehydrators for Dew Point Determination

The presence of water in natural gas stream is a recurring problem that the oil and gas industry have been dealing with over the years. Failure to remove water vapor from natural gas stream leads to the formation of hydrates and corrosion of critical facilities. Determination of natural gas dew point which is the temperature at which water vapor condenses out of natural gas is a tricky endeavor, hence there is need to explore modern and more effective ways of determining the dew point. In this research, data driven modeling technique is utilized to generate an expression for the Dew Point of a Natural Gas stream exiting a Molecular Sieve Dehydrator Bed. After data quality analysis, various model structures were utilized for modeling. The Autoregressive Moving Average with exogenous inputs model proved its suitability for predicting the plant output with a highest level of accuracy.


Introduction
Determining the precise extent of moisture present in a natural gas well stream prior to carrying out dehydration is of utmost significance, as this will aid and expedite the dehydration procedures and can be beneficial in resolving issues when defects and problems arise in natural gas processing plants [1]. Natural Gas dew point is the temperature at which the free water and other moisture contents of the natural gas begin to condense and drop out of the gas stream [2]. Dew point is the back-bone in the recovery of Natural Gas Liquid (NGL) from a natural gas well stream and must be made as low as possible for efficient production of the Natural Gas Liquid. If the desired natural gas dew point of the NGL recovery plant which is a cryogenic plant is not met, there is a likelihood of hydrate formation and corrosion effects in downstream of the recovery plant [3]. Due to the above mentioned problems, there have been tremendous efforts targeted at lowering and achieving an accurate dew point for natural gas processes. Taking the dynamics of the gas processing plant into consideration, the desired and accurate dew point can be achieved if and only if the natural gas dehydrators are adequately modeled with sufficient number of input and output. The dew point of the natural gas leaving the dehydrators must be lower than the coldest point of the natural gas liquid production plant; this will prevent the formation of hydrates in the production plant at any point [4].

Background Information on Data-Driven Modelling
Data driven modelling is a technique that is used to examine a time-series input and output data of a physical process, by means of mathematical equations that are not obtained from physical process under examination [5]. Furthermore, data driven modelling could be seen as a means of illustrating a difficult and large data sets in a very basic way using mathematical processes without making reference to the fundamentals principles of the physical system [6]. The models that are constructed through the process of fitting equations to the data obtained from the physical process for the purpose of predictions without acknowledging the physics principle backing the physical process [7]. It has also been observed that the function of the data driven modelling is to aid and guide in the process of deriving a correct and useful mathematical function relating the inputs and output variables [8]. System identification is a tool of data-driven modelling. System identification employs two key techniques in data-driven modelling. The first one is known a black-box modelling specifically useful when your primary interest is in fitting the data regardless of a particular mathematical structure of the model, and is usually a trial-and-error process, where you estimate the parameters of various structures and compare the results. The second is the greybox modelling which is used when estimating the values of the unknown parameters of your model structure that has already been deduced from physical principles [9].

Data Acquisition
The data for this analysis was recorded from a data acquisition device of Natural Gas Liquid (NGL) Molecular Sieve Dehydrator Bed with three input variables, Feed Gas Flow-rate, Feed Gas Temperature and Feed Gas Pressure; and Output, is the Dew point Temperature. These parameters were chosen because the quantity of moisture present in the natural gas is as dependent on the temperature, pressure and the constituents of the natural gas, thus requiring monitoring of the temperature and pressure of the dehydrator beds [10].
The Pi software used with combination of variable sensors and triplex and delta V monitoring interface in the data acquisition process samples the value of the measured data at 6.00 am (once a day) and log it down for future use. The data obtained from the pi software for the purpose of this project covered a period of two and a half years, ranging from 1 st January 2011 to 1 st July 2013.
Additionally, the pi software system was accessed and molecular sieve dehydrator bed system trend and logged variables were selected, the period of two and a half years (1 st January 2011 to 1 st July 2013) was also selected. The data was then down loaded into a Microsoft excel file format. A sample of the data is shown in Table 1.

Data Pre-Processing
The obtained raw data from the data acquisition device of Natural Gas Liquid (NGL) Molecular Sieve Dehydrator Bed cannot be used for the modelling of the dew point of natural gas exiting the molecular sieve dehydrator instantly as it contains some defects. Such defects include missing values, outliers, offsets and drift and some disturbances [11].
Some Data pre-processing techniques carried out include detection and removal of outliers filtering, detrending, resampling, as well as replacement and removal of missing values.
These pre-processing can be applied to the acquired data to cater for the deficiencies in the data set and make it suitable for the required modelling. Some of these pre-processing techniques will be discussed as they apply to this project. However, some of the pre-processing technique may not be applied to the data set immediately until a data quality analysis is carried out on the obtained data to ascertain their states [5].
When modeling with data in an off-line situation, plotting of the data must be the first thing to be done so as to determine and examine the defects [12].

Data Quality Analysis
Prior to commencing the estimation of models from data, the measured data was checked for the presence of any undesirable characteristics by carrying out a time plot of the data. The undesirable characteristics may include the following: a. Missing data samples b. Drifts and outliers c. Offsets and trends

Transient Plot of Data
To determine and examine the quality of the data obtained for the modeling of the dew point of natural gas exiting the molecular sieve dehydrator bed, the data was plotted as shown in Figure 1.

Handling Missing Data
Certain factors are also responsible for the missing values noticed in the acquired and logged data. These factors include: a. Shutdown of the NGL extraction plant and other plants that serve as sources of feed gas to the dehydrator beds, either for the purpose of turnaround maintenance or unscheduled shutdown. b. Failure of either the process variable sensing element or the transmitter. c. Malfunction of the pi software itself. d. Network failure within the Oso facility instrument networking and the data acquisition system. This is one of the pre-processing treatments that can be applied to the data set before carrying out data quality analysis. Once a data file with missing values is loaded into the MATLAB program workspace, MATLAB convert the missing values to NaN which means Not-a-Number without interfering with the structures of the variables containing the missing values

Data Frequency Response Plots
The spa function was used to estimate frequency response with fixed frequency resolution using spectral analysis before representing this on a bode plot. The algorithm further computes the Fourier transforms of the covariance and the cross-covariance before it finally calculates the frequencyresponse function and the output noise spectrum of the system. The Bode plots are presented in Figure 2. Frequency response function describes the steady-state response of the dehydrator system to sinusoidal inputs. There are amplitude peaks at certain frequencies for all input-output data combinations which show that the dehydrator will become unsteady at these frequencies. The amplitude peaks at the frequencies of about 0.7 rad/day, 0.5rad/day and 0.4rad/day suggest a possible resonant behaviour (complex poles) for all the input-to-output combination. Moreover, there is rapid phase roll-off at frequency > 0.3rad/day, suggesting the presence of time delays in the data sampling. The frequency response for other pairs show similar characteristics. The step response and impulse response analysis are not presented due to space constraints.

Decisions
After careful examination of the obtained results of the analysis carried out on the data set, a decision was reached not to detrend, filter and remove the outlier in the data set as these may affect the estimation of the model due to the presence of non-linearity found in the data set but resampling was done to aid in recovery of some lost information.

Resampling
This was the only data pre-processing technique that was carried out on the data sets. In this process, the resampling technique utilizes an antialiasing low pass FIR filter with a 0.08 sampling interval (about 2 hours) to interpolate for the missing information about the dynamics of the system. It is also advisable to decimate the data if it was sampled at much faster rate because such data may contain high-frequency noise outside the frequency range of the system. For the purpose of model estimation and validation, the resampled data was split into two sets as displayed in Table 2.

Modelling Structure
The following model structures were utilized for model cross-validation [5]: i. Nonlinear Auto-Regressive with exogenous inputs (NARX) model. The structure of this model is written as shown: ii. Auto-Regressive (AR) Model vii. State-Space Model Where: is a function that relies on known number of previous input and output , is the number of past output terms used to predict the current output, is the number of past input terms used to predict the current output and is the delay from the input to the output, specified as the number of samples.
is the time shift operator dependent on the number of delays in the data samples and the term models the noise sequence or disturbance inherent in the system.
The model evaluation criteria utilized were Fitness (FIT), Loss Function (V), and Akaike's Final Prediction Error (FPE). Model validation was carried out to examine the models-output plot to see how well the models' outputs match the measured output in the validation data set.   Therefore any of the following models structures can be used to model the dehydrator bed dynamics; Nonlinear ARX, linear ARX, State-Space, Box-Jenkins and Linear ARMAX. Table 8 summarizes the evaluation criteria values for all types of model structures used in modeling the dehydrator bed. It can be seen from Table 8 that both nonlinear and linear ARX model structures yields the same results in their level of being able to capture the dehydrator bed dynamics. However, linear ARMAX model seems to perform better than the rest of the other models with the highest fitness of 99.70% for estimation and 99.78% for validation, lowest loss function of 0.0047564 and final prediction error of 0.0047890. This confirms that linear ARMAX model was able to capture the dynamics of the dehydrator bed with the greatest accuracy, given the data sets used in this research, and discomfit the appropriateness of the suggestions offered by the advice command.

Selection and Presentation of Dehydrator Bed Model
Model estimation and validation carried out on different types (NARX, ARX, state-space, BJ and ARMAX) of linear model structures prove successful with no failure on the validation data set. Hence, the best estimated model would be selected among the five resultant models. From table 8, the model tagged linear ARMAX produces the best combination of our model validity criteria for estimation and validation data. Therefore, linear ARMAX was selected because it is the model that is seen to have accurately captured the dehydrator bed dynamics. Table 9 highlights the model parameters: The expression generated by MATLAB for the molecular sieve dehydrator bed is Discrete-time IDPOLY model: Where: A(q) = 1 -2.511 q^-1 + 1.536 q^-2 -0.1147 q^-3 + 1.475 q^-4 -2.172 q^-5 + 0.7871 q^-6 B1(q) = -0.01352 q^-10 + 0.01252 q^-11 + 0.01147 q^-12 -0.01085 q^-13 B2(q) = 0.0001241 q^-10 -0.0001583 q^-11 B3(q) = -1.653e-005 q^-1 C(q) = 1 + 0.6052 q^-1 -1.089 q^-2 -1.088 q^-3 + 0.6042 q^-4 + 0.9976 q^-5 Rearranging the expressions above mathematically, the Auto-Regressive Moving Average with exogenous inputs model structure is expressed as shown below and it is Discrete-time polynomial model as equation 10 Where is the output (dew-point of the dehydrator feed gas) at time , 1 is the first input (feed gas temperature), 2 is the second input (feed gas flow-rate) and 3 is the third input (feed gas pressure) and q is the time shift operator. The estimated linear ARMAX model was modeled only for future predictions of the dehydrator dew-point from past and observed inputs and outputs.

Conclusion
An expression for the dew point of natural gas exiting the molecular sieve dehydration bed with feed gas temperature, feed gas flow-rate and feed gas pressure as the three input variables has been successfully obtained using data driven modelling and generation of expression approach with the aid of the System Identification (SID) toolbox available in MATLAB program.
The modelling process commenced with the Non-linear Auto-Regressive with eXogenous input (NARX) as suggested by the advice command but on comparison with the linear models, the Autoregressive Moving Average with exogenous inputs model (ARMAX) emerged as the model with the highest fitness of 99.70% for estimation and 99.78% for validation, lowest loss function of 0.0047564 and final prediction error of 0.0047890.
Also achieved, was the investigation of the model's appropriateness for simulation and prediction purposes. The linear and nonlinear models were confirmed to exhibit low level of accuracy in reproducing the dew-point of the dehydrator bed given sets of measured inputs values. On the contrary, the resultant Autoregressive Moving Average with exogenous inputs model proves its suitability for predicting the plant output with a high level of accuracy.