Computer Simulation-Based Designs for Industrial Engineering Experiments

: Computer simulations have been receiving a lot of attention in industrial engineering as the rapid growth in computer power and numerical techniques. In contrast to physical experiments which are usually carried out in factories, laboratories or fields, computer simulations can save considerable time and cost. From the statistical perspective, the current research work about computer simulations is mostly focusing on modeling the relationship between the output variable from the simulator and the input variables set by the experimenter. However, an experimental design with careful selection of the values of the input variables can significantly affect the quality of the statistical model. Specifically, prediction on the edge area of the experimental domain, which is extremely critical for an industrial engineering experiment often suffers from inadequate data information because the design points usually do not well cover the edge area of the experimental domain. To address this issue, a new type of design, called semi-LHD is proposed in this paper. Such a design type has the following appealing properties: (1) it encompasses a Latin hypercube design as a sub-design so that the design points are uniformly scattered over the interior of the design region; and (2) it possesses some extra marginal design points which are close to the edge so that the prediction accuracy on the edge area of the experimental domain is fully taken into account. Detailed algorithms for finding the marginal design points and how to construct the proposed semi-LHDs are given. Numerical comparisons between the proposed semi-LHDs with the commonly-used Latin hypercube designs, in terms of prediction accuracy, are illustrated through simulation studies. It turns out that the proposed semi-LHDs yield desirable prediction accuracy not only in the interior but also on the edge area of the experimental domain, so they are recommended as the experimental designs for simulation-based industrial engineering experiments.


Introduction
Computer experiments are becoming widely used in industrial engineering where computational methods are used to simulate the real-world phenomena (see, for example, references [1][2][3][4][5][6][7]). Most computer experiments for industrial engineering are unique in that the response has no randomerror component. That is, replicates of the same inputs to the computer code will yield the same response. Since a computer experiment is typically treated as a black-box and is time demanding, the experimental design should be chosen judiciously and explore the experimental domain as thoroughly as possible. To this end, Mckay, Beckman and Conover [8] first proposed the Latin hypercube designs (LHDs) which possess the property that when projected onto any one dimension, they achieve maximum stratification. LHDs are the most popular designs for computer-based industrial engineering experiments and have enjoyed lots of development (see, for example, references [6,7,[9][10][11][12]).
The issue of building prediction models for computer experiments has drawn significant attention of researchers. Currin, Mitchell, Morris and Ylvisaker [13] applied Bayesian prediction to analyze the outputs of computer experiments; Joseph, Hung and Sudjianto [14] proposed blind kriging for developing surrogate models; Kennedy and O'Hagan [15] gave a method for predicting the output from a computer experiment when fast approximations are available. For a detailed treatment of this topic, please refer to references [6,7]. Most of the existing work are paying attentions on modeling methods, while a design with careful planning can significantly affect the prediction accuracy of the model. When predicting the response at an unobserved site located in the interior of the design region, information is often sufficiently received from the design points around it. However, when the unobserved site is located on the edge of the design region, inadequate information may be received because the relatively small number of design points around it. As a result, the prediction accuracy on the edge of the design region may be poor, especially for industrial engineering experiments. The engine block and head joint sealing assembly experiment, the robot arm experiment and the integrated circuit experiment described by [1] are some good examples of this and very few experimental designs have been proposed to address this issue.
Motivated by the aforementioned issue, a new design type called semi-LHD is proposed for computer-based industrial engineering experiments which pays extra attention to the edge area of the experimental domain. A semi-LHD adds extra points to the marginal points of an LHD so that the proposed design inherits the space-filling property of the LHD. Furthermore, the proposed semi-LHD helps to increase the prediction accuracy on the edge while not deteriorating the overall prediction accuracy. Methods for finding marginal points and choosing the added points are systematically proposed, which are the central contributions of this work.
The remainder of this paper is organized as follows. Section 2 provides useful definitions and notations. Section 3 presents the design construction method and gives some numerical examples that simulate industrial engineering experiments. Concluding remarks in Section 4 close this paper.

Definitions and Notations
In this section, some useful definitions and notation will be given. A Latin hypercube design (LHD) is a n p × matrix, denoted by LHD (n, p) in which the number of levels in each column equals to the sample size and the levels are equally spaced.
and ij u is a random number drawn from U(0, 1) and ij a and ij u are independent, i=1,...n, j=1,...p. Then D is a random LHD.
One primary goal of a computer experiment is to establish an emulator (also called a metamodel) which can capture the relationship between the inputs and the output of the computer code. In this paper the most commonly-used model, the Gaussian process model (see references [6,7,16]) is considered. Given an input [0,1] p x ∈ , the response ( ) y x is modeled as r ⋅ Θ is the correlation function defined by the parameter Θ . The most commonly-used correlation function is the Gaussian correlation function, which is given by where ' i s θ are required to be positive. Throughout this paper, this correlation function is used since it is infinitely differentiable which is favorable in applications for industrial engineering [3].
A popular method for analyzing the model in (2) is through kriging [7]. Given a design of n inputs , according to the Gaussian process assumption, y follows the following multivariate normal distribution The negative log-likelihood of y is proportional to where | | A denotes the determinant of matrix A . Following the maximum likelihood approach, the maximum likelihood estimates (MLEs) of β and 2 σ are given by respectively. It is easy to see that β ∧ , 2 σ ∧ and R are functions of the parameter Θ . Plug the MLEs of β and 2 σ into formula (4), the MLE of Θ is then given by There are several optimization algorithms available to solve the problem in (7), see Fang, Li and Sudjianto [1] for details. Plug ∧ Θ back into Eqs. (5) and (6), the best linear unbiased predictor (BLUP) of the response y at an untried point * x is given by It is easy to verify that predictor (8) interpolates the response at any observed point i x , which is a desired property for deterministic computer experiments.

The Design Construction Method
In this section an easy-to-implement method is presented to construct a new type of design called semi Latin hypercube design (semi-LHD). This design type is useful for predicting the responses at untried points which are specially located on or close to the edge of the experimental region. As for the points far away from the edge, the proposed design can also yield good prediction accuracy. The main idea of the proposed design is to add points to an existing LHD, thus the resulting design contains an LHD as a subdesign.
Suppose an n -run semi-LHD is to be constructed, the number of runs of the built-in LHD 0 ( ) n n < is determined according to the needs of the experiment. Determine the parameter m which is the number of "marginal points" of the built-in LHD. Given n , 0 n and m , the general framework of the proposed design construction method is presented as follows.
Display 1 (Framework of the construction of semi-LHDs).
Without loss of generality, let x are rows (points) of 1 D .
Step 2. Step 4. Combine the ml points added in the previous step The steps in Display 1 are given in a general manner, hereafter the details surrounding the framework are presented. As alluded to earlier, 0 n is determined according to the experimental needs. If n is not large, say 0 50, / 2 2 / 3 n n n n ≤ ≤ ≤ is recommended as a standard choice so that at least a small number of design points can be added; if n is large, 0 n can be a relative large portion of n . In Step 1 of Display 1, a random LHD or a maximin distance LHD with sample size 0 n can be used. Other optimal LHDs could also be considered at this step. For simplicity, a random LHD is used throughout this paper. In Step 2, the average distance i a from i x of 1 D to the remaining points in 1 D is calculated by d . An algorithm for how to add the points is given as follows.
Step x < α is a random sample drawn from standard normal distribution and γ is a random sample drawn from uniform distribution on (0,1).
Step 2. Step 1 of Algorithm 1 defines the scope of the neighborhood which is neither too close to nor too far away from the marginal points and chooses points from this neighborhood. Fraction 0 2 / n controls the degree of farness which could be altered to other values according to the practical needs and 0.5 0.5γ + controls the degree of closeness which could also be changed. Since the coordinates of the points added according to Step 1 of Algorithm 1 may exceeds the scope of the design region, Step 2 is needed to deal with this issue.
Step 3 of Algorithm 1 assesses the combination of the added and existing points, i.e. * D , so they are not too close from each other. If the points in * D are too close from each other, during the analysis of the experiment some computational issues, such as the singularity problem, may be encountered. Moreover, Algorithm 1 still maintains the space-filling property of the resulting design.
After implementing Algorithm 1, Step 3 of Display 1 is completed and one needs to go on to Step 4 of Display 1. In this step, since ml may be larger than In practice, the behavior of an industrial engineering experiment on (or near) the edge of the experimental domain is usually of great importance to practitioners since it may be unstable or variable to influence the statistical inference. The proposed designs not only spread points over the experimental region but also put effort on the edge area, which benefits predictions on both the edge area and interior of the experimental region.
An example illustrating the construction of semi-LHDs is given below.
Example 1. Suppose one would like to construct a semi-LHD (40, 2;24). Firstly, an LHD (24, 2), denoted by 1 D is generated as the built-in LHD. Next, eight marginal points are found from 1 D . Following the steps of Algorithm 1, two extra points are added to each of the marginal points. Combine all the added points with points in 1 D to obtain the destined design which is denoted by (1) D . Similarly, a semi-LHD (40, 2; 30) is constructed which is denoted by (2) D . To show the difference between semi-LHDs and LHDs, an LHD (40, 2), denoted by (3) D , is constructed. The bivariate projections for (1) D , (2) D and (3) D are visualized in Figure  1. It is not difficult to observe that (1) D has more points near to the edge than (2) D since the built-in LHD of (1) D has less samples than that of (2) D while the sample sizes of (1) D and (2) D are the same. As alluded to earlier, it is up to the practical needs to determine the sample size of the built-in LHD. On the other hand, the LHD spreads design points more evenly over the design region. It worthwhile to note that Steps 1-4 of Algorithm 1 may suffer from an endless loop since the condition 0 / 3 a a ≥ in Step 4 of Algorithm 1 may be hard to achieve. It is thus recommended that the number of circulation should be preset.

Simulation Studies
Two simulative examples are given in this section to illustrate the usefulness of the proposed design. The prediction performances between random LHDs and semi-LHDs are compared. As for the criteria assessing the prediction performance, the leave-one-out cross-validation error (CVE) and the empirical mean squared prediction error In the simulative examples, the following steps are carried out to assess the prediction performance.
Step 1. Create two data sets A and B . Let A be the collection of 1 N random points drawn from [0,1] p and B be the collection of 2 N points drawn from the area near the edge of [0,1] p .
Step 2. Let 1,..., r t N = . Given t , this step proceeds as follows: (1) Construct a semi-LHD ( 0 , ; n p n ), denoted by 1t X , and conduct the computer experiment based on 1t X to obtain the response vector 1t y . Fit model (2) based on 1 1 { , } t t X y . Calculate CVE in (10) and denote the result by 1t c . Calculate MSPE in (11) based on A and B and denote the results by 1t a and 1t b , respectively. (2) Construct an LHD ( , n p ), denoted by 2t X , and conduct the computer experiment based on 2t X to obtain the response vector 2t y . Fit model (2) based on X y . Calculate CVE in (10) and denote the result by 2t c . Calculate MSPE in (11) based on A and B and denote the results by 2t a and 2t b , respectively.
Step 3. Let Example 2. Consider a function from Currin, Mitchell, Morris and Ylvisaker (1991) which is given by 3 The shape of (12) is visualized in Figure 2. This function is treated as a deterministic computer model for this example.
Once an experimental design is arranged, the response vector can be obtained from Eq. (12). As described at the beginning of this section, two testing data sets A and B are created where the former are collections of 40 points of a random  Tables 1 and 2.
From Table 1 it is easy to see that when the testing data set is A , the prediction performance of semi-LHDs compares favorably to that of LHDs. On the other hand, when the testing data set is B , the mean and standard deviation of MSPE of semi-LHDs are considerably smaller than those of LHDs. This indicates that the proposed designs benefit the prediction capability on the area near to the edge while still behave well over the entire design region. Note that both of these design types have lower prediction accuracies on set B than set A . This is expected because predicting on the edge of the design region usually has more uncertainty, especially for industrial engineering experiments. Table 2 displays the CVE values from which the advantage of semi-LHDs over LHDs is also obvious.
Example 3. In this example, the following function     Tables 3 and 4, similar conclusions as in Example 2 can be drawn. That is, Semi-LHDs benefit the prediction accuracy on the edge area while still behave well over the entire design region.

Conclusion
With the advent of computing technology and numerical methods, industrial engineers frequently use computer simulations to study actual or theoretical physical systems. Typically, the simulation model is resource-intensive in terms of model preparation, computation, and output analysis. As a result, the availability of a "cheap-to-compute" statistical model as a surrogate to the original complex simulation model is particularly useful. To establish a high quality statistical model, choosing a good set of design points is an important issue. In this paper, a novel type of experimental design, namely semi-LHD is proposed and applied to simulation-based industrial engineering experiments. Compared to the most commonly-used Latin hypercube design [8], the proposed design has advantage in helping making predictions on the edge of the design region where the behavior of the response often exhibits substantial uncertainty for an industrial engineering process. An explicit design construction algorithm is given and simulation studies confirm that the proposed semi-LHDs are desirable for computer simulation-based industrial engineering experiments because they can yield good prediction accuracy not only in the interior but also on the edge area of the experimental domain.