Marshall-Olkin Exponential Pareto Distribution with Application on Cancer Stem Cells
Khairia El-Said El-Nadi, L. M. Fatehy, Nourhan Hamdy Ahmed
Department of Mathematics and Computer Science, Faculty of Science, Alexandria University, Alexandria, Egypt
Email address:
To cite this article:
Khairia El-Said El-Nadi, L. M. Fatehy, Nourhan Hamdy Ahmed. Marshall-Olkin Exponential Pareto Distribution with Application on Cancer Stem Cells. American Journal of Theoretical and Applied Statistics. Special Issue: Statistical Distributions and Modeling in Applied Mathematics. Vol. 6, No. 5-1, 2017, pp. 1-7. doi: 10.11648/j.ajtas.s.2017060501.11
Received: December 8, 2016; Accepted: December 27, 2016; Published: January 24, 2017
Abstract: A Marshall–Olkin variant of exponential Pareto distribution is being introduced in this paper. Some of its statistical functions and numerical characteristics among others characteristics function, moment generalizing function, central moments of real order are derived in the computational series expansion form and various illustrative special cases are discussed. This density function is utilized to model a real data set of cancer stem cells patients. The new distribution provides a better fit than related distributions. The proposed distribution could find applications for instance in the physical and biological sciences, hydrology, medicine, meteorology and engineering.
Keywords: Pareto Distribution-Cancer Stem Cells-Biological Sciences
1. Introduction
Cancer stem cells (CSCs) are a subset of tumor cells that possess characteristics associated with normal stem cells. Specifically, they have the ability to self-renew, differentiate, and generate the diverse cells that comprise the tumor. CSCs have been identified and isolated in several human cancer types, including breast, brain, colon, head and neck, leukemia, liver, ovarian, pancreas, and prostate. These CSCs represent approximately 1% of the tumor as a distinct population and cause relapse and metastasis by giving rise to new tumors. While chemotherapy and other conventional cancer therapies may be more effective at killing bulk tumor cells, CSCs may manage to escape and seed new tumor growth due to the survival of quiescent CSCs. Therefore, traditional therapies often cannot completely eradicate tumors or prevent cancer recurrence and progression to metastasis. With growing evidence supporting the role of CSCs in tumor genesis, tumor heterogeneity, resistance to chemotherapeutic and radiation therapies, and the metastatic phenotype, the development of specific therapies that target CSCs holds promise for improving survival and quality of life for cancer patients, especially those with metastatic disease.
Tumors consist of heterogeneous cell populations in which only a small fraction, less than 1%, is able to seed new tumors by transplantation, functionally defined as cancer stem cells (CSCs). There is growing interest in identifying markers and therapeutically targeting the CSC population in tumors. Recent studies have shown that CSCs have different drug sensitivities compared to the bulk population and represent an attractive therapeutic target. Studying these cells, however, has been a challenge due to their low abundance in vivo and the phenotypic plasticity they exhibit during expansion. Using current methods, isolated CSCs lose the expression of CSC markers and tumor initiating capacity when cultured in vitro or in vivo in xenograft animal models. The proportion of CSCs tends to an equilibrium level of less than 1% over time, and the cell population derived from CSC cultures typically recapitulates the heterogeneous nature of the original population. Thus, the goal of this contract topic is to meet the critical need to develop cell culture systems that can specifically grow CSCs for basic and translational research.
Developments in stem cell engineering and tissue engineering have generated new culture systems to accelerate the expansion of embryonic, induced pluripotent, and adult stem cell populations in vitro. These systems include technologies such as three dimensional (3D) culture systems containing extracellular matrix components and topological features, or bioreactors for large scale culture of cell spheroids. Preliminary data suggest that these technologies or similar culture systems may be applicable for quick and reproducible expansion of CSCs. Thus, commercial development of these culture systems specifically for CSC culture may have a significant impact in basic research and drug screening applications.
2. Marshall-Olkin Method
Marshall and Olkin (1997) [1] introduced a method of adding a new parameter to an existing distribution. The resulting new distribution, known as the Marshall-Olkin extended distribution, includes the original distributions a special case and gives more flexibility to model various types of data.
The Marshall-Olkin extended distribution has survival function , which is the baseline survivor function.
Then, the Marshall-Olkin extended distribution has survival function , given By
,
or written in an equivalent form
The corresponding PDF takes the form:
.
Finally, central role is playing in the reliability theory by the quotient of the probability density function and the survival function
called hazard function (or also frequently called failure rate function).
Here we introduce exponential Pareto distribution using the method proposed by (Marshall and Olkin, 1997), inputting the exponential Pareto distribution. Some statistical properties of the novel distribution are established and certain their special cases are discussed. The associated density function is utilized to model a real data set. The new distribution provides a better fit than related distributions as measured by the Anderson–Darling and Cram´er–von Mises statistics.
Now, we recall the exponential Pareto distribution by (Kareema Abed Al-Kadim Mohammad Abdalhussain Boshi et al. (2013) [2]. The so–called exponential Pareto function one defines
Where is the Pareto distribution, , and is the exponential distribution,
So that
Where is a constant of the Pareto distribution (the lower bound of the possible values that Pareto distributed r.v. can take on), is the shape parameter.
is the number of events per unit time (rate parameter)
Hence the CDF of the exponential Pareto distribution can be written in the form:
.
Also the p.d.f of this distribution is given by:
.
This distribution is similar to Weibull distribution that given by:
,
where ,,
So now we can take the baseline CDF according to increase one more parameter to apply Marshall Olkin technique as follows:
(1)
Hence the p.d.f is given by:
(2)
Applying Marshal-Olkin technique we will get a new distribution that called Marshall-Olkin exponential Pareto distribution (MOEP) which have the CDF that denoted by:
(3)
According to (2), the related PDF reads
After a few steps we can get the final result as follow:
(4)
Here >0, >0, α>0, >0 while denotes the characteristic function of the set A, that is, = 1 when x ∈ A, and vanishes elsewhere. Accordingly, the four–parameter distribution of the r.v X having CDF in the form (2) we will signify this correspondence X ∼MOEP (,, , ).
3. Moments
Before concentrating on the deviation of the raw moment of the MOEP ( ) distribution, we introduce the fox-wright function [3], [4], which is a generalization of the familiar generalized hyper geometric function with numerator parameters and denominator parameters , defined by
Where the empty products are conventionally taken to be equal 1, while , ,
We now derive closed form representations of the real order moments of a r.v MOEP ().
It is easy to expand the denominator of the PDF (4) into a power series as follows:
Or we can write it on another form:
Now, interchanging the integral and sum, we have:
The moment is a linear combination of integrals (considered already for a similar purpose by Nadarajah and kotz in [5], [6] where
The following representation of this integral for general parameter values was obtained by Pogany and Saxena in [5]:, see [6,7],
Thus, for all , we have
when =1, we have
The remaining values of the parameter lead to the expected value:
Thus, we get the following result:
Theorem. Let the r.v. and all parameters then, for we have:
4. Parameter Estimation
In this section, we will make use of the MOEP, extended Weibull (Ex. W) (Peng and Yan 2014), exponential-Weibull (EW) (Cordeiro et al., 2013c), [8-13], two parameter Weibull (Weibull) distribution to model two well–known real data sets, namely the ’Carbon fibers' (Nichols and Padgett, 2006) and the ’Cancer patients’ (Lee and Wang, 2003) data sets[14]. The parameters of the MOEP distribution can be estimated by the maximum likelihood in conjunction with the N Maximize command in the symbolic computational package Mathematica. Additionally, two goodness-of-fit measures are proposed to compare the density estimates.
In order to estimate the parameters of the proposed MOEP density function as defined in Equation (4), the Log Likelihood of the sample is maximized with respect to the parameters. Given
the data xi, i = 1,..., n, the Log Likelihood function is
(5)
(6)
(7)
(8)
Solving these equations simultaneously yields the maximum likelihood estimates (MLEs) of the four parameters. Numerical iterative techniques are then necessary to estimate the model parameters. It is possible to determine the global maxima of the log-likelihood by taking different initial values for the parameters. However, we observed that the MLEs for this model are not very sensitive to the initial estimates. For interval estimation on the model parameters, we require the Fisher information matrix; however in this article we leave this routine calculation to the interested reader.
5. Goodness-of-Fit Statistics
The Anderson-Darling and the Cram´er-von Mises statistics are widely utilized to determine how closely a specific distribution whose associated cumulative distribution function is denoted by cdf (.000) fits the empirical distribution associated with a given data set.
These statistics are:
,
respectively, where the values being the ordered observations. The smaller these statistics are, the better the fit.
6. Application
Now, we will make use of the MOEP, two parameter gamma (Gamma), two parameter Weibull [23], generalized gamma (GG) [13], provost type gamma_Weibull [10], [11], extended Weibull (ExtW)[12], distributions to model two well–known real data set, namely the ‘Cancer patients’ [14] data set. The parameters of the MOEP distribution can be estimated from the loglikelihood of the samples in conjunction with the N Maximize command in the symbolic computational package Mathematica.
More specifically, the models being considered are:
• The classical gamma distribution with density function:
• The classical Weibull distribution with density function:
• The generalized gamma (GG) distribution [13-stacy, 1962] with density function:
• The Provost type gamma Weibull distribution (Provost et al., 2011), [11], with density function:
• The extended Weibull (ExtW) distribution [Peng and Yan, 2014], [12], with density function:
7. The Cancer Stem Cells Patients Data Set
The second data set represents the remission times (in months) of a random sample of 128 bladder cancer patients as reported in Lee and Wang (2003). The data are
0.08, 2.09, 3.48, 4.87, 6.94, 8.66, 13.11, 23.63, 0.20, 2.23, 3.52, 4.98, 6.97, 9.02, 13.29, 0.40, 2.26, 3.57, 5.06, 7.09, 9.22, 13.80, 25.74, 0.50, 2.46, 3.64, 5.09, 7.26, 9.47, 14.24, 25.82, 0.51, 2.54, 3.70, 5.17, 7.28, 9.74, 14.76, 26.31, 0.81, 2.62, 3.82, 5.32, 7.32, 10.06, 14.77, 32.15, 2.64, 3.88, 5.32, 7.39, 10.34, 14.83, 34.26, 0.90, 2.69, 4.18, 5.34, 7.59, 10.66, 15.96, 36.66, 1.05, 2.69, 4.23, 5.41, 7.62, 10.75, 16.62, 43.01, 1.19, 2.75, 4.26, 5.41, 7.63, 17.12, 46.12, 1.26, 2.83, 4.33, 5.49, 7.66, 11.25, 17.14, 79.05, 1.35, 2.87, 5.62, 7.87, 11.64, 17.36, 1.40, 3.02, 4.34, 5.71, 7.93, 11.79, 18.10, 1.46, 4.40, 5.85, 8.26, 11.98, 19.13, 1.76, 3.25, 4.50, 6.25, 8.37, 12.02, 2.02, 3.31, 4.51.
If we take the special case of MOEP when p=1, then the PDF and CDF estimates of the MOEP distribution for Cancer patients data are plotted in Figure (1).
The estimates of the parameters and the values of the Anderson-Darling and Cram´er-von Mises goodness–of–fit statistics are given in Table (1). It is seen that the proposed MOEP model provides the best fit for the both data sets.
Distribution | Estimates | ||||
Gamma (ξ,ϕ) | 1.17251 | 7.98766 | 0.77625 | 0.13606 | |
Weibull (k,λ) | 1.04783 | 10.6510 | 0.96345 | 0.15430 | |
GG (k,λ,ξ) | 0.52010 | 0.59510 | 1.94927 | 0.30087 | 0.04626 |
P_{g} W (k,ξ,λ) | 0.52001 | 1.42917 | 0.595104 | 0.49168 | 0.084301 |
ExtW (a, b, c) | 1.96210 | 1*10^{-21} | 3.74383 | 13.3317 | 2.49818 |
MOEP (λ,β,k,α) | 1.62267 | 1*10^{-6} | 0.6160 25.3808 | 0.2565 | 0.0374 |
(For different applications see [21-30].
Acknowledgments
We would like to thank the referees and Professor Dr. Judy Garland for their careful reading of the paper and their valuable comments.
References