A Simple Stochastic Stomach Cancer Model with Application

Survival analysis majors mainly on estimation of time taken before an event of interest takes place. Time taken before an event of interest takes place is a random process that takes shape overtime. Stochastic processes theory is therefore very crucial in analysis of survival data. The study employed markov chain theory in developing a simple stochastic stomach cancer model. The model is depicted with a state diagram and a stochastic matrix. The model was applied to stomach cancer data obtained from Meru Hospice. Transition probability theory was used in determining transition probabilities. The entries of the stochastic matrix T were estimated using the Aalen-Johansen estimators. The time taken for all the people under the study to transit to death was estimated using the limiting matrix.


Introduction
In this section, background of the study was discussed by outlining the history and impacts of cancer in the world in terms of morbidity and mortality. Previous studies by other researchers were also reviewed. Study problem was then formulated by stating the study objectives. The scope of the study was then discussed and the assumptions of the study outlined.
Background of the study American cancer society report 2012 in [1] defines cancer as a group of diseases characterized by the uncontrolled spread of abnormal cells, which if not controlled, results in the death of the person affected.Cancer is caused by both external factors such as tobacco intake, infectious organisms, chemicals and radiations and internal factors such as inherited mutations, hormones and mutations that occur from metabolism. Sirpa Heinavaara in [2] argues that survival of cancer patients is known to depend on prognostic factors such as age of the patient at diagnosis, gender of the patient and site of the cancer. According to the Kenya cancer statistics and National strategy report in [3], cancer remains a global health problem and it is estimated that globally, cancer causes more deaths than Human Immunodeficiency Virus, Tuberculosis and Malaria combined. World Health Organization report in [4] show that cancer accounted for 7.9 million in the year 2009, which is 13% of all deaths worldwide. The annual mortality is attributed to main types of cancer which include lung cancer ( 1.3 million deaths), stomach cancer(803,000 deaths), colorectal cancer (639000 deaths), liver cancer( 610000 deaths) and oesophageal cancer (380000 deaths). According to Kenya National cancer control strategy in [5], the burden of cancer is projected to continue rising, with an estimate of 15.5 million new infections and 12 million deaths by the year 2030. Kenya cancer statistics and National strategy report in [6] ranks cancer third in terms of morbidity and mortality, accounting for 7% of all deaths annually. There are an estimated 39000 new cases of cancer each year in Kenya and 27000 deaths annually. 60% of Kenyans affected by cancer are below 70 years old and therefore productive economically. In women, breast cancer leads in morbidity and mortality followed by cervical cancer while in men prostate cancer leads in morbidity and mortality. Meru Hospice in [7] ranks stomach cancer as the highest cause of morbidity and mortality in Meru, followed by breast cancer. Cervical cancer is ranked third and prostate cancer fourth. At the moment, there are approximately 500 cancer cases reported and an average of 5 new cases reported each day. According to Tiwari et al. in [8], cancer has a great negative effect on the economies of many countries globally, Kenya included. Thus, accurate estimation of the number of deaths caused by cancer is very crucial as it helps the government in planning, resource allocation and communication. Inaccurate predictions result in inefficient planning and budgeting resulting in ineffective government services. Kenya National cancer control strategy in [9] classifies cancer research in Kenya as non commensurate with the magnitude of the problem. This is because of the inadequate funding and training facilities in cancer research. One of the intervention measures suggested in the strategy is to strengthen research through cancer data collection, analysis, interpretation and dissemination. Besides, there is no statistical research on stomach cancer in Meru County or if any, it is little known of research, despite stomach cancer leading in morbidity and mortality in Meru County.
Various models have been used in the analysis of survival data. Moghimi et al. in [10] used Cox regression model and compared it with weibull, exponential and lognormal models to evaluate prognostic factors affecting survival of patients with stomach cancer. Age of the patients at the time of diagnosis and grade of tumour were used as the independent prognosis factors. Akaike information criterion was used to determine the best model. Results of the data analysis showed that Cox and parametric models were approximately similar. However, according to Akaike Information Criterion, weibull and Exponential models were the most favourable for survival analysis. Chun et al. in [11] used Multi-task logistic regression model to study patient specific cancer survival distributions. The study revealed that Multi-task logistic regression method predictions were more accurate than popular survival models like Cox and Aalen regression models. The study further revealed that using patient specific attributes can reduce the prediction error on survival time by as much as 20% when compared to using cancer site or stage only. Vallinayagam et al. in [12] compared the performance of different parametric models like Exponential, Weibull, Gompertz, Lognormal and Log-logistic using breast cancer patients. From the analysis results,Lognormal model recorded the lowest deviance, followed by Log-logistic and the Weibull models. Farid et al. in [13] did a review of methodological approaches that can be used to deal with violation of proportionality assumption using Cox proportional hazards model in survival analysis. The four modifications suggested in the study were stratification of covariates, partitioning of time axis, modelling time dependence of the coefficients and lastly to allow the data to select the functional form of the time dependence, for instance by using splines. Limitation of the Cox Proportional hazards model assumption of proportionality of the hazard function, which if violated renders the model inapplicable, was emphasized in the study. The study suggested the use of the four modifications or use of a different model if the Proportionality assumption is violated. Uku et al. in [14] compared the efficiency of a mixture of two distributions with a single distribution in modelling heterogeneous survival data. Single distributions like Gamma, Weibull and Exponential were compared with mixture distributions of Exponential-Gamma, Exponential-Weibull and Gamma-Weibull. Mixture distribution of any two distributions provided a better estimate than any single distribution alone. Among the three mixture distributions, Gamma-Weibull mixture distribution gave the best model. Duke et al. In [15] used Cox Proportional model to forecast survival time for cancer patients. Weibull model was also applied to the data of patients diagnosed with lung and bronchus cancer in light of censoring. The researchers observed that survival time is non-negative and consequently its distribution is positively skewed. They argued that for positively skewed distribution, the option is to use either Exponential or Weibull model. However, they noted that Weibull model is more suitable since it provides more flexibility than only one parameter of the Exponential distribution which results in the assumption that all patients are equally likely to die regardless of how many years they have survived. Duke et al. In [16] used Cox Proportional hazards model to estimate the survival time of lung and bronchus cancer patients. To cater for the unobserved heterogeneity, a weibull mixture model was applied to the data. Age, gender, race and registries were used as independent prognostic variables. The results showed that age, gender and registry significantly affect individual survival time. The study concluded that Weibull distribution is more suitable than other long tailed distributions such as Log-logistic and Expo-power distributions. Henry A, Glick in [17] described Markov models as recursive decision trees that are used for modelling conditions that have events that May occur repeatedly overtime, or for modelling predictable events that may occur overtime like screening for a disease after some fixed intervals. The study described various ways of estimating transition probabilities. He argued that if available data are hazard rates per unit time, they can be translated into probabilities by using the formula ( ) = 1 − where ( ) is the probability of moving from state at the beginning of a period t to state at the beginning of a period t+1; is the instantaneous hazard rate per period and t is the length of the period. Juergen Jung in [18] estimated transition probabilities for individual health status as a function of observables characteristics. The study employed three methods. One of the methods used is the counting method in which the transition from state ℎ to state is estimated by where ( ) is the average transition probability from state ℎ to state , % is the realization of a particular transition from state ℎ to state . The second method used in the study is the ordered logit and ordered probit regression models. The third method used is the semi parametric Cox Proportional hazards model. Abner et al. in [19] described markov chain process as a stochastic process that describes the movement of an individual through a finite number of states. The study identified various methods of analyzing time to event data, including Cox Proportional hazards model and Kaplan Meier estimation. Heggland in [20] argued that in multistate models, the past and the future are independent given the present. The researcher argues that multistate models are often assumed to be markovian models.
The study asserts that a process X(t) is said to be markov if where Ƒs-is the history of the process up to time s(Information about the earlier transitions of the process). The study further argues that estimation of transition probabilities is done by solving the Chapman -Kolmogorov forward equations. The study defined % ) ' to be the number of individuals observed to move from state g to state h within an interval (0, t). The study estimated % * ' , % , ' and % *, while % * ' , % , ' and % ,* ' were zero for all values of t. % * ' was zero because no transitions from state 2 to state 1 were witnessed while % , ' and % ,* ' were zero because state three is an absorbing state. The study then set % ' % * ' -% , ' while ' and * ' were the number of healthy and diseased individuals respectively before time t. Scope of the study Simple stochastic stomach cancer model was derived and stomach cancer data from Meru County fitted on the model. This was analyzed using R software by use of the TPmsm package.
The main aim of the study was to describe the cancer progression among patients and determine the probability of moving from one state of cancer to another. A state diagram was designed to describe the movement of cancer patients from one state to another. Stochastic matrix T was designed and its entries computed by fitting the data on the model. The study assumed that future states depended only on the current state and not on past events and that all the individuals under study began from an initial state ('Healthy state').

Introduction
In this section, model development was discussed. Data collection and analysis was also discussed.

Simple Stochastic Stomach Cancer Model
The study employed the following methods: Markov chain theory was used to describe the movement of a patient through stomach cancer states. The assumption was that movement to the next state depended on the current state occupied and that all the individuals under study began from the same initial state ('Healthy state'). This was depicted with a state diagram. Transition probability theory was used in designing the stochastic matrix and deriving the transitional probabilities. R software was used in analysis of the data by use of the TPmsm package. Stomach cancer data was obtained from the Meru Hospice, who are the custodians of cancer data in Meru. Data for 274 stomach cancer patients was used in the study.
Stomach cancer model was developed by first identifying stomach cancer stages and then deriving transition probabilities. Stomach cancer has mainly three stages (Tumour grades); that is grade 1, grade 2 and grade 3. The model was limited to grade 3 stage because of the nature of secondary data available. Consequently, the model has three states namely: H-State in which an individual is free from stomach cancer, S-state in which the individual is suffering from stomach cancer and D-State in which an individual dies of stomach cancer. This was formalized in the following transition diagram: Transition probability matrix T takes the form: The transition probability matrix T has entries that represent the probabilities of moving from one state of stomach cancer to another. Notice from figure 2.2.2 that D state is an absorbing state, therefore WW ( ) = 1. Therefore T takes the form as shown below: The estimate of ?(') which is ? : (')was used to estimate . ( ) in equation 4 above.
The estimate of . ( ) is given by: Where . ( ) is the probability of moving from state to state for = 1 and = 1 (That is the probability of remaining in state 1), A * is the number of transitions from state 1 to state 2, A , is the number of moving from state 1 to state 3 and B is the number of people in the health state.
Where . ( ) is the probability of moving from state to state for = 1 and = 2, A * is the number of transitions from state 1 to state 2 and B is the number of people healthy.
Next, the derived results in equations 7, 8 and 9 were applied on Meru County data.

Introduction
In this section, equations 7,8 and 9 were applied on our data to compute the entries of T. The section also discusses the results.

Transition Matrix
Equations 7, 8 and 9 were used to generate the entries of T, that is the transition probabilities from state 1 to state = 1,2,3 and transitions from state 2 to state = 1,2,3 as provided below: .  The transition matrix T above shows the entries of T, which represent the probability of a person to transit from one state to another. They present the probability of a person moving to state given that the person is in state . For instance, the probability of a person moving to state 2(sickness state) given that he/she is in state 1(health state) is 0.027. The probability of a person moving to death state given that he/she is in health state is 0.343. The probability of a stomach cancer patient transiting to a healthy state is zero. The probability of a stomach cancer patient remaining in the same state for time t is 0.161. The probability of a stomach cancer patient transiting to death state is 0.839. Once a patient enters death state, he/she cannot go back since death is an absorbing state.
The limiting matrix from T was obtained by: This means that all persons under study will eventually transit to death state, either as a result of stomach cancer or as a result of other causes.

Conclusions
The following conclusions were made Markov chain theory was used to develop a simple stochastic stomach cancer model, which was used in describing the movement of a person from one state to another. Movement of a person from one state of cancer to another has been shown to be stochastic and depends on the state the person is in at time t. The model was depicted with a state diagram and a stochastic matrix. Transition probability theory was used to determine the probability of moving from one state of cancer to another. The model was applied to the data and Aalen-Johansen estimators were used to obtain the entries of probability matrix T. The limiting probability matrix was obtained from T, by multiplying T by itself n times. The results show that once a person enters sickness state, the probability of returning to health state is zero. The probability of a stomach cancer patient moving to the absorbing state is 0.839, which is quite high. The limiting probability matrix show that eventually, all the people under study will move to death state, either as a result of stomach cancer or as a result of other causes.

Recommendations
The following recommendations were made from the study: a. The study was restricted to three states of stomach cancer.
Further research should be done to incorporate more cancer stages hence more states. b. The study used non parametric method in the analysis of the data. More research using parametric or semi parametric methods should be done to incorporate prognostic variables that affect survival time of stomach cancer patients. c. The study considered right censored data. More research should be done to include left censored data d. The government and other stakeholders should embrace preventive measures instead of curative measures e. People should undergo regular cancer screening to facilitate early diagnosis