Estimation of Parameters in the SIR Epidemic Model Using Particle Swarm Optimization

Susceptible, Infected and Resistant (SIR) models are used to observe the spread of infection from infected populations into healthy populations. Stability analysis of the model is done using the Routh-Hurwitz criteria, basic reproduction number or the Lyapunov Stability. For stability analysis, parameters value are needed and these values are usually assumed. Given data cannot be used to determine the parameter values of SIR model because analytic solution of system of nonlinear differential equation cannot be determined. In this article, we determine the parameters of the exponential growth model, logistic model and SIR models using the Particle Swarm Optimization (PSO) algorithm. The SIR model is solved numerically using the Euler method based on the parameter values determined by PSO. The simulation results show that the PSO algorithm is good enough in determining the parameters of the three models compared to analytical methods and the Gauss-Newton’s method. Based on the average hypothesis test the relative error obtained from the PSO algorithm to determine the parameters is less than 3% with a significance level of 1%.


Introduction
Mathematical models can be interpreted as mathematical equations that explain behavior in the real world. This equation is formed by transforming the form of events in the community into variables or parameters. The mathematical model that is quite widely used is the model in the form of differential equations. For example, the model of motion, whether it is a spring, pendulum or aircraft maneuver, is expressed in a system of ordinary differential equations. In addition, the disease spread model better known as the Susceptible, Infected, Resistant (SIR) model is also expressed in the ordinary differential equation system. The SIR model is very widely used to analyze the spread of diseases in the human environment such as the ebola virus [1], zika [2], malaria [3], diabetes [4]. Not only diseases that attack physically, but diseases that are bad habits can also be analyzed with the SIR model. Mu'tamar [5] developes a SIR model to analyze the spread of habits of consuming alcoholic beverages as well as implemented the optimum control for treatment measures. In addition to the human environment, the SIR model can also be used to analyze the spread of viruses in a computer environment [6]. If this mathematical model is combined with data, then mathematics can be an excellent tool for environmental observation and the basis for policy making. Unfortunately, to process data and mathematical models in the form of ordinary differential equations is not easy. This is because the process uses the curve fitting which has so far only been carried out on functions that have an explicit form. It is difficult to do in the SIR model because this model cannot be solved analytically so the solution of the equation in the explicit form cannot be determined.
Kennedy and Eberhart [7] introduced a search method called Particle Swarm Optimization. This method was developed from the behavior of herd animals such as bees and ants in finding food locations. This method does not require complex mathematical theories such as Jacobian or Hessian in determining the solution or the maximum of a system of nonlinear equations. Therefore, its usege is very broad, especially in the field of control. Naiborhu et. al. [9] use PSO to determine an alternative path when the exact linearization method failed to determine the control of a nonlinear system due to discontinuity. Mu'tamar and Naiborhu [10] use PSO and is combined with fuzzy logic to determine the weighting matrix of the LQR control which is applied to the track control system. Hasni et. al. [11] use PSO to determine parameters in the GreenHouse climate model and compared with genetic algorithms. Jalilvand et. al. [12] use PSO and modify the aspect of random numbers using position and PersonalBest ratios so as to speed up the process of finding a solution. Chiu et. al. [13] applies PSO to determine parameters in the antenna array so that signal noise can be minimized. Solihin and Akmeliawati [14] utilizes PSO to determine the optimum control parameters which are applied to the inverted pendulum linear form.
It should be emphasized here, in other studies that have been done before, the parameters to be determined using PSO are parameters of functions or functions that are explicitly available. The GreenHouse model in [11] uses a mathematical model where explicit solutions are available so that the fitness value can be calculated easily. Whereas in this study, the parameter determined value is the parameter of the mathematical model whose solution is not available using analytical methods. However, the results of determining the parameters of the SIR epidemic model using PSO indicate that the PSO algorithm has succeeded in finding the origin parameters with very low error rates. Based on the average hypothesis test the relative error obtained from the PSO algorithm to determine the parameters is less than 3% with a significance level of 1%.

Material and Methods
The method used in this research is the study of literature, which develops previous research. Because the parameter estimation method in dynamic systems has not been done in previous studies, this research will be carried out on a simple model that is an exponential and logistic growth model. Furthermore, the method will be applied to the Susceptible, Infected and Resistant (SIR) epidemic models. The work steps in this research are (1) determining the data to be used as work materials whose characteristics meet the exponential and logistical models, (2) determining the analytical solutions of the exponential and logistical models. Both of these models involve two parameters of unknown value, (3) determining the numerical solution of the logistical model and the SIR epidemic model. The logistic model has again determined its numerical solution for the comparative test of the success of the PSO method in determining parameters, (4) determining the parameter values of each model where the PSO method is used for the whole model, the linear curve fitting method for exponential and logistic models, while the Gauss-Newton's method only for exponential models, (5) comparing data and function results based on the generated parameter values, (6) specifically for the SIR epidemic model, the data obtained by simulation with predetermined parameter values. Therefore, a hypothesis test is performed to see whether the resulting parameter gives a small error value compared to the proposed hypothesis value Furthermore, some definitions and theories related to this research are presented in the following discussion.

Linear Function Curve Fitting
Lets given n pairs of data ( , ) … and select the overlay function for that data, , a a R ∈ and 1 0. a ≠ Defines an error between the data and the approximation function

Particle Swarm Optimization (PSO)
Particle Swarm Optimization (PSO) is a heuristic method used to determine the goal function solution based on the behavior of ant or bee herds developed by Kennedy and Eberhart [7]. The goal function solution is the swarm position calculated by the equation 1 1 where V is the speed of swarm motion expressed in the equation The definitions of the symbols in equations (6), (7) are given in Table 1.  (6) and (7).
Symbol Description Individual and social cognitive swarm, a number that expresses the level of swarm's ability to determine solutions and the ability to develop with the herd. The best value based on research is 1 2 Computer generated random numbers Pb PersonalBest, the best solution of swarm position Gb GlobalBest, the best solution for all swarms from all iterations

The Susceptible, Infected and Resistant (SIR) Epidemic Model
The SIR epidemic model is a mathematical model in the form of a system of ordinary differential equations that is used to describe the spread of disease from infected individuals. The SIR epidemic model is expressed in equations The variables and parameters in equation (8) are described in Table 2. Table 2. The Parameters and variables in equation (8).

Gauss-Newton's Method
Given system of nonlinear equations, y (x). f = The value x e is the root of the system of nonlinear equation if it satisfies (x ) 0. e f = To determine the solution of system of nonlinear equations, Gauss-Newton's method can be used, where is given by where 0 x is the initial guess and J is the Jacobian matrix of the system of nonlinear equation, i.e 1 1 1

Euler Method for Ordinary Differential Equation Systems
For example, given a first order and autonomous ordinary differential equation system, y ' The Euler method for solving numerical solutions of systems of ordinary differential equations is given by where h is the width of partition I which is

Result and Discussion
This section will discuss the determination of parameters contained in the exponential growth model, logistics and SIR epidemic models. The parameter determination method that will be used includes the exact method, the numerical method using Gauss-Newton's and the PSO algorithm. For the logistics model, parameter determination will be carried out using the exact method and the PSO algorithm. The Gauss-Newton's method requires a partial derivative of each variable which causes the equations involved in the logistics model to become very complex. In the SIR epidemic model, the parameter determination method used is only the PSO algorithm because there is no analytical method to solve the SIR differential equation system.

Determination of Exponential Model Parameters
The exponential growth model is expressed in the form of an ordinary first order linear differential equation, i.e ( ) ( ) dN t N t dt λ = (11) where λ is a comparative parameter whose value is positive to describe the increase and negative for the decrease. The analytical solution of an exponential growth model using variable separation is given by where 0 N nitial value, the value of ( ) N t when 0 . t t = To determine the parameters in equation (12), equation (12) needs to be expressed in a linear equation. Transformation of natural logarithms in each of the segments in equation (12) will result.

Determination of Logistics Model Parameters
The logistic growth model is an improvement model of the exponential model by changing the value of proportionality with a linear function with a negative slope. The form of logistic growth model is given by with a, b is a positive parameter that states the proportion of natural growth and decline due to population saturation. Using the variable separation method, an analytic solution will be obtained from equation (14), i.e. max ( )

Determines the SIR Model Parameters Using PSO
To determine parameters with PSO, a solution from the SIR epidemic model is needed. Because the SIR epidemic model cannot be solved analytically, this model will be solved numerically, using the Euler method. PSO uses swarm to find food sources, which in computing is the solution to the problem. Because there are ( ) nV parameters in the SIR epidemic model, in this case , α β with each parameter using nS swarm and updating itM calculations, x is the position matrix of the PSO swarm that stores the solution and is expressed as The same is true for V as the velocity matrix for the PSO swarm. Each swarm has its own search history and will store the best search results as PersonalBest. Therefore PersonalBest matrix is defined as the search history of each The results of searching for all swarms on one variable will be best selected as a reference for the whole swarm motion step known as GlobalBest. GlobalBest is the best position matrix for all swarms in one parameter throughout the iteration so that it is defined  1 2 , , 1000 c c itM = 3. Input the numeric parameter 1E 5 tolM = − 4. Set the swarm position at first iteration randomly. 5. Set the swam speed at first iteration randomly. 6. Calculate fitness value based on Algorithm 4.

Set
PersonalBest, which is the value of the swarm position in the first iteration 8. Set GlobalBest, the swarm position value that gives the best fitness value. 9. For i = 1: itM 10. Update the swarm position value with equation (6) 11. Update swarm speed values with equation (7) 12. Calculate the i th fitness swarm value using Algorithm 4 13. Determine the PersonalBest of each swarm from first iteration to i th iteration 14. Determine the entire global swarm from first iteration to i th iteration 15. Check the correctness of one of the following conditions a) The difference of i th and (i-1) th iteration swarm is less than or equal to tolM b) The difference in the fitness value of all parameters is less than or equal to the tolM c) Maximum iteration of itM has been reached 16. If one of the criteria is true, the iteration is stopped 17. End for.
GlobalBest and PersonalBest are determined based on fitness values. Fitness value is the value of the function to determine the solution. The fitness value in the SIR epidemic model is the absolute difference between the data and numerical solutions generated using the Euler method with parameter data from the PSO.
The parameter determination step in the SIR epidemic model using PSO is given in Algorithm 3 and the fitness value is determined by the procedure given in Algorithm 4.  (8)

Numerical Simulation
In this section numerical simulations are performed to see a comparison between analytic methods, Gauss-Newton's and PSO in determining the parameters of dynamic models. For numerical simulation purposes, the basic parameters used in PSO are given in Table 3. First, the exponential model parameters will be determined using analytical methods, Gauss-Newton's and PSO from temperature and humidity comparison data in the earth's atmosphere [15] presented in Table 4.
Based on the data provided, using the procedure in Algorithm 1 obtained a matrix A and vector b i.e. 80 13 20.32703 A , b 8100 80 693.6217 Because the determinant of matrix A in equation (19) is not zero then there is a single solution, namely For PSO three different swarm values will be used with 1 2 1.8. c c = = Full simulation results are given in Table 5. In Table 5 it can be seen that the PSO algorithm generates parameters that make the exponential model approach the data provided compared to the analytic and Gauss-Newton's methods. In the Gauss-Newton's method there are different results resulting from the selection of different initial values. The selection of an incorrect initial value in the Gauss-Newton's method results in a divergent method or does not succeed in finding a solution. The swarm movement with {30,100} nS = from the randomly chosen starting point to the final solution point, is given in Figure 1. Figure 1 is a picture of the position of each swarm in each iteration. Each swarm in each iteration will find a new position that is a potential solution. Eventually, all swarms will gather at the same point where that point is the final solution which is a parameter of the exponential model.  Table 4.
Comparison between data and numerical solutions using parameters determined by PSO and theirs error is given in Figure 2. In the logistics model, the method to be compared is the analytical method and PSO. The dataset used is the Bison population data in the Yellowstone area [16] presented in Table 6. Based on the data in Table 6   Furthermore, the results of the comparison of parameters from the data in Table 6 for the logistic model by comparing the analytical method which is the system of linear equations solution of equation (20) and the PSO algorithm are given in Table 7. Table 7 shows that the PSO algorithm obtains parameters that make the logistic model approach the data with a very small error rate difference (0.02%) compared to analytical methods. When viewed, the analytical method and the PSO algorithm produce parameter values that are far different but both produce almost the same function values. Swarm movement with {30,100} nS = from the randomly selected starting point to the final solution point, is given in Figure 3. The comparison curve between data and numerical solution using parameters determined by PSO and the error is given in Figure 4.   Table 6.   Table 4 at [1902,1931] Data describing the SIR epidemic model are not easy to find. Therefore, this simulation will use simulation data that is created using predetermined parameters. The PSO algorithm is used to re-guess the parameters based on data that has been made previously. To see whether the error generated for all parameters is below the maximum tolerance, it will be tested by hypothesis testing using R software. The SIR epidemic model parameters used are given in Table 8. These parameters are randomly selected at intervals (0.1). The data made consisted of 100 pieces of data for each compartment defined at. The simulation results of the determination of parameter α and β in Table 8 using PSO are presented in Table 9. Table 9 show num E that is cumulative absolute error between generated data and numerical solution using the parameters, itM is iteration that PSO needed to obtained desired parameters, E α and E β are absolute errors parameter α and , β ( ) Rel α and ( ) Rel β are relative errors each parameters. From Table 9, each parameter has error less than 10%. The biggest error occur in parameter set number 18, the smallest error occur at number 2 and average error is 3.35%.     cognitive at a value of 1.8. It can be seen in Figure 5 that the swarm does not continue to reach itM = 1000 because at a certain iteration point, the stoping condition in Algorithm 3 has been fulfilled. It can be seen in Figure 5 that all swarms gather at two points which are the parameter values to be determined.  The numerical solution curves of the SIR epidemic model using parameters resulting from the determination using PSO are shown in Figure 6.
The error curve between data and numerical solution from SIR using parameters generated by PSO is shown in Figure   7. The error values are obtained using ( ) ( ) ( ) The final step of this simulation is to test whether the relative error resulting from the α and β parameters generated by PSO is as expected. For this purpose, a statistical hypothesis is made: 0 H : The average relative error of the parameters α, β in Table 9 is less than equal to 3%. 1 H : The average relative error of the parameters α, β in Table 9 is greater than 3%. Hypothesis testing is done by t-test on R software by selecting a significance level of 1%. The output of the Table  9 test with the t test using R is shown in Figure 8. Figure 8 shows that for a significance level of 1% with a mean of 3% the test yields a p-value = {0.1245, 0.2938} which means it does not reject 0 H or in other words, It is true that the relative error produced by PSO to determine the parameters α, β is less than or equal to 3% with a significance level of 1%.  Table 9.

Conclusions
This article has discussed the determination of parameters in the exponential, logistical and SIR epidemic models using the PSO algorithm. For each models, the parameter value are obtained by applying an introduced algorithm related to the models. Exponential and logistical models are the basic models from which analytic solutions can be determined. Based on the simulation, it is clear that the PSO algorithm gives better results than the analytical method and the Gauss-Newton's method. For the SIR model, the data used is simulation data generated from parameters whose values are known. The result of parameter determination using PSO shows that the PSO algorithm can find the origin parameters with a very small error rate. Based on the average hypothesis test, it appears that the relative error resulting from the PSO algorithm for parameter determination is less than 3% with a significance level of 1%. In the simulation of determining SIR parameters, only one PSO parameter is used. This is due to the need for a heavy simulation due to having to solve numerical solutions of the system of differential equations for each swarm in all iterations. For further research, it is necessary to consider the effect of swarm number and cognitive swarm selection on the resulting error. In addition, adaptive PSO can also be applied to studies that have been carried out in terms of swarm convergence speed for the determination of parameters involving SIR and forms of SIR development such as SEIR and SIRS model.