The Modeling of a Stochastic SIR Model for HIV/AIDS Epidemic Using Gillespie's Algorithm

Mathematical modeling of disease has been an indispensable tool in accounting for disease transmission dynamics as well as disease spread. Epidemiological disease models have been used to explain the dynamics of HIV/AIDS in the population from the early 1900s. The models developed however faced considerable challenges ranging from inaccurate representation of natural data for deterministic models, to methods of forecasting such as statistical extrapolation which assumes that current conditions will prevail which is not always the case. Despite the spread of HIV/AIDS having been explored widely, not much literature is available on the Gillespie Algorithm based SIR model. This algorithm is able to give a statistically correct of the course of a disease with initial conditions to begin with and propensity functions to update the system. The purpose of this paper is to build on the concept of Gillespie's Algorithm based SIR models by developing a stochastic SIR model to simulate disease evolution in the population setting. The values produced through simulation by the model developed in this paper using a tau value as the time step of the model were compared to HIV/AIDS data from 1985 to 2018, given by NACC. We conclude that the simulated model reflects reality.


Introduction
On June 5, 1981 a mysterious disease was recognized among MSM in the USA. In 1982 the CDC identified the same disease among IVDU, hemophiliacs and Haitian residents. In the same year, it was identified that it attacks the immune system of the host, incapacitating them to heal subsequently leading to death. It was characterized by its etiological agent HIV in 1983, [1]. These researchers from France, Francoise Barre-Sinoussi and Jean-Claude Chermann together with Luc Montagnier, speculate that this virus could be what latter develops to AIDS. A serological test, was then made available. In 1984, Robert Gallo discovered that HIV was responsible for AIDS.
This virus, being highly transmittable is analyzed and capturing how it is transmitted is crucial in attempting to model the disease. There are several channels in which the virus can be transmitted such as inter-species transmission, vector transmission, direct transmission or environmental transmission. HIV is transferred from one individual in three modes: through blood, sexual intercourse and mother-tochild. However, this is not always the case as there exist individual differences in the ability to transmit and acquire HIV that remain unexplained [2]. In 1984, several HIV and AIDS cases were documented in Kenya. The

Literature Review
Mathematical modeling of HIV is the use of statistical tools and procedures to recognize the general pattern in the transmission of HIV and to translate a problem into a statistical form for subsequent analysis. There are various questions still left unanswered to date on the HIV epidemic. These questions are encompassed in the modeling of the HIV immunology, the HIV dynamics as well as the AIDS dynamics such as the dynamic distribution of the disease in the population and its likely magnitude. This study employs mathematical modeling tools in the transmission probability of HIV and analyses done on how the cumulative number of infected individuals responds as well as the AIDS death probability and how the cumulative cases of removed individuals responds to this probability.
The following reviews consider models developed for HIV/AIDS data that either differed too greatly with other model estimates or still fail even with developments on the model. In 2010 several authors came up with a model to predict HIV transmission in China in 2002 [3]. They applied these dynamic models to forecast the transmission of HIV for the Chinese population. In this model, there were no forms of intervention. The average number of partners was different at different ages in the HIV to AIDS cycle. The transmission parameter was held constant for all stages of HIV. The formulated model was used to forecast the number of PLWHA. This model approximated that there would be 6000000 cases of HIV and 400000 cases of AIDS in China if there were no forms of interventions implemented. In 2007 the government of China alongside UNAIDS made an estimate of 700000 cases of HIV and 85000 cases of AIDS in China at the time, which is much lower that the estimates made by Liu [4]. The number of HIV infections in 2010 was predicted to approximately 1000000. The group most affected would be the 31-40 years group. The group comprising of the largest individuals predicted to be living with HIV/AIDS was approximately 650000.
There is need now more than ever to develop the SIR model since its application is going beyond epidemiological application such as how cues influence behaviour in a social setting and the spread of ideas [5].
There are several challenges facing models used for HIV estimates developed by UNIADS. In several concentrated epidemics, HIV prevalence estimates do not match reported cases and mortality estimates do not match reported deaths, even after adjusting. There are issues estimating prevalence in high risk groups and the size of high risk groups. Furthermore, it provides inaccurate estimates where an epidemic has not gone beyond its peak [6]. Even with the 2013 updates of Spectrum where adjustments were made in the parameter values empirically to improve the fit to program data, the estimates given by Spectrum still differed with data available. More adjustments are needed as they desire to make the process where Spectrum selects the incidence curve for the data an automatic process [7].
A stochastic differential equation SI model with demographic stochasticity has already been developed [8]. They considered and analyzed a two stage SI model that allowed for random variation in the demographic structure of the population with the population size changing at different times which had an exponentially distributed rate of infection. The parameter depended on the varying population size N. This meant that both the population size varied as well as the transmission/contact rate. They used the Milstein method to simulate for analysis.
Despite the fact that a lot of research has been done on modeling disease trajectory, not much literature is available on the use of Gillespie based SIR models to simulate the trajectory of a disease in the population. The Gillespie's algorithm based SIR model concept considered the Gillespie algorithm, Euler alongside other CME based exact methods which showed that Gillespie's algorithm had the least execution time [9]. This makes it a prime candidate for the tau step vantage point. Events are selected stochastically in the tau time step such that in the least possible computed time step, one or several events are selected to occur randomly.
Other authors have made contributions to mathematical epidemiology by performing simulations that explain the process of disease spread. In their works they build a disease spread prediction model based on the SIR model and applied parameter values to a stochastic model based on Gillespie's algorithm. This is applied to data and the conclusion was that the model well explains the process of the spread of the disease in the population [10].

Rationale of the Study
Mathematical models generated as deterministic have been used in the past and they offer a lot to be studied and concluded from statistically. According to Koopman, deterministic differential equation models cannot capture the real-life representation because no matter how finely they divide populations into geographic and social space, the infectious population is spread out to cover the entire space. The inability of differential equation models to capture stochastic effects therefore has been demonstrated by studies done by Koopman [11].
Infection-transmission deterministic models are based on the characteristics of population growth, disease occurrence, and spread within a population. There is need to come up with a stochastic mathematical model that better expresses the changing number of HIV/AIDS cases. This study seeks to incorporate a stochastic aspect in the deterministic SIR epidemiological model. A stochastic process, also called a random process is one in which the outcomes are uncertain. By contrast, in a deterministic process, there is no randomness [12]. This will allow us to derive new insight from the analysis of the simulation of this SIR model.
In spite having a lot of work done on mathematical modeling, there isn't adequate literature on the modeling the evolution of disease in the population through simulation. Several execution options have been suggested for the SIR model such as Gillespie's algorithm and agent-based models but they have not been extensively explored in literature. This paper will contribute and build on to the existing literature on modeling disease dynamics in the population with the model tested on HIV/AIDS data 1985-2018 to investigate if the simulated values would reflect results that are close to reality. This paper will help bridge the gap between conceptual epidemiological models and its simulated version by providing a developed version of an SIR model that solves one inherent problem that deterministic models do not reflect the natural data.
In some instances, these deterministic models do not capture some model characteristics and this could lead to biases.

The Stochastic SIR Model
The Classic SIR model The Kermack-McKendrick theory illustrates individuals grouped as susceptible and removeds only [13]. The transmission and infection rates were considered to be variant. The initial conditions changed over time and demographics not being included such that change over time was described as; (1) The Kermack-McKendrick theory was later developed to a version where they tackled the problem of endemics [14], [15]. They set the transmission and infection rates as invariant for all ages and this allowed the inclusion of an infectives class. This transformed the theory to the basic SIR model such that when demographics were included becomes (4) (5) (6) where N denotes the total host population. denotes the birth rate and death rate denotes the infection rate denotes the recovery rate t denotes time point Model development The Gillespie algorithm was used to simulate a statistically correct trajectory given initial SIR conditions. The model explored how altering transmission dynamics affected the model as a whole. The death rates were distinguished such that one death event led an individual out of the model while the other death event led an individual into a different classes. The SIR model explained how the epidemic manifests in all the compartments. The reliability of the simulated values would set the precedent for the valued to be predicted based on the model is also explored. All these aspects determine the quality of the inference drawn. The graphical representation of the developed stochastic model is shown; Figure 1. The stochastic SIR model.
denotes the rate of birth denotes the rate of non-AIDS death denotes the rate of infection denotes AIDS death rate denotes model's time step Gillespie's procedure The Gillespie simulation procedure was developed to produce a statistically correct course for finite well-mixed populations [16,17]. The assumption is that the population is finite and is sub-divided into categories of finite discrete compartments. The interaction between states is made possible by events outlined in this model as birth, infection, non-AIDS death an AIDS death. The compartments consist of initial state values S(t 0 ), I(t 0 ) and R(t 0 ) are contained in a vector and described at initial time t 0 .
This Gillespie's algorithm based stochastic SIR model generates a statistically correct trajectory from the initial vector as where i=s, i, r S+I+R=N i denotes the population size of the state at time t denotes a function characterized by two quantities as a state change vector and a propensity function.
, the state change vector defined as , , where is the change in state i caused by one event. Assuming that the resulting state is . A propensity function is the probability of one event occurring in the time interval ! , ". Continuous-time Markov chains are the basic tool for building discrete population epidemic models. The Markov property lets us specify a model by giving the transition probabilities-defined as rates-on a small interval between the compartments. Considering the fact that the propensity functions require to be in probability form, we explore this assumption further by defining and interpreting it.
A Markov chain model is one where the probability of the next event depends on the probability of the present state. This implies the probabilities are individual therefore discrete. Discrete evolution is modelled in discrete time. A Markov chain is interpreted here then, as a stochastic discrete-valued model with the Markov property that future states of a process depend on the current state. Continuoustime Markov chains are the basic tool for building discrete population epidemic models. The Markov property lets us specify a model by giving the transition probabilities-defined as rates-on a small interval between the compartments. The transition probabilities assigned are defined on an open interval (t, t + ), such that the probability an individual moves from the susceptible compartment to the infectives compartment is [ 1 ]. The SIR Markov chain model transition probabilities for a closed population are; where denotes the rate of flow of individuals from susceptibles to infectives denotes the rate of flow of individuals from infectives to removeds.
The wait times between events can either assume an exponentially distributed wait time or the rate of flow between compartments can assume any of the following distributions depending on the results.

Exponential Increments Between & '( Events
The wait times between one event and the next can assume an exponential distribution

Poisson Increments in & '( Events
The counting process for the flows in the compartments has a Poisson model with evolution in time

Binomial Increments with Linear Probability in & '( Events
The counting process for the flows in the compartments has a binomial model with linear probability in the evolution of time.

Binomial Increments with Exponential Decaying Probability in & '( Events
The counting process for the flows in the compartments has a binomial model with exponentially decaying probability in the evolution of time.

Goodness-of-fit of the Stochastic Model to HIV/AIDS Cases
In order to assess how the simulated data performs against natural data, a modified chi-square test was used. The data was obtained from NACC for HIV/AIDS cases. The means and variances of the simulated and natural data were computed. Considering the hypotheses, 4 = If the mean and variance of the simulated and natural data are equal, the simulated mean does not fit the data. 4 5 = If the mean and variance of the simulated and natural data are not equal, the simulated data fits the natural data.
A modified chi-square test for simulation models was used to see how well the simulated data fit the natural data [18].

The Simulated Stochastic SIR Model
A stochastic SIR model was simulated with a mean step size of 0.006336446. 537 tau steps were made in the model. Variables in the model were S = 3507162, I = 45820, R = 4597, parameters in the model are crude birth rate of 0.06, non-AIDS death rate of 0.025, transition rate of 0.1 and AIDS death rate of 0.48. Curves produced are illustrated below.

Goodness-of-fit for the Stochastic Model
In order to have confidence in the predicted value, we apply a test to check the simulated values against the natural data values by employing hypotheses. An upper tailed test was done-since chi-square test is an asymmetrical distribution-at 33 degrees of freedom and 6 0.05. The results produced show non-equal means and variances. This prompted the use of a modified chi-square test [3]. To begin with, the Pearson's goodness-of-fit test is where C represents the variance of D The calculated value found was 64.958. The critical value was 47.4. Since the calculated value is greater than the critical value, the decision rule is to reject the null hypothesis. Therefore, the conclusion is that the simulated data model fits the natural data model.

Conclusion and Recommendation
Mathematical modeling of disease trajectory using Gillespie based algorithms is yet to be explored extensively in literature. In this study, a simulation was carried out on the SIR model to explain the trajectory of the disease by employing a stochastic element using Gillespie's simulation algorithm. After simulating, values were produced by the algorithm for each time step. The simulated curves were compared to HIV/AIDS data. The simulated curves were found to resemble the data available in reality. Therefore, the implementation of a stochastic factor to an epidemiological model is a useful contribution to mathematical modeling.
Mathematical modeling is an area that requires more research. Recommendation for research would be to explore other variations of the SIR model such as SI, SEIR under Gillespie's algorithm.
Furthermore, making parameter values time-varying under the Gillespie algorithm and comparing it with the version where parameters are invariant to see which performs better is another recommendation.
The SIR model as well as Gillespie algorithm could continue to be applied other areas such as viral marketing and behavioural science as has already been done successfully.