Extended Cox Modeling of Customer Retention in Mobile Telecommunication Sector of Rwanda
Diane Ingabire*, Samuel Musili Mwalili, George Otieno Orwa
Department of Statistics and Actuarial Sciences, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya
To cite this article:
Diane Ingabire, Samuel Musili Mwalili, George Otieno Orwa. Extended Cox Modeling of Customer Retention in Mobile Telecommunication Sector of Rwanda.American Journal of Theoretical and Applied Statistics.Vol.4, No. 6, 2015, pp. 471-479. doi: 10.11648/j.ajtas.20150406.17
Abstract: Retaining customers improves profitability, importantly reduces the cost incurred in acquiring new customers and moreover a firm can increase profits by 25-95 percent if it could improve its customer retention rates by 5 percent. As markets mature and competitive pressure intensifies, companies can no longer ignore the importance of customer retention as their existing customer bases have become their precious assets. This research aims to model customer retention in Rwandan telecom sector using survival analysis technique in order to inform the concerned institutions and companies about telecom customer retention in Rwanda. The Cox regression model and extended Cox model were developed using simulation approach in order to assess which model is the best for customer retention. It was found that the customer’s socio-economic, demographic and behavioral characteristics have an effect on churn rate. The extended Cox model was the best description of how customer retention is achieved. These findings hold implications for industry operators on key areas to pay attention to in order to achieve customer retention.
Keywords: Customer retention, Cox model, Extended Cox Model
Mobile communication has become the backbone of the society. The mobile telecommunication sector continues to offer unprecedented opportunities to economic growth in both developing and developed markets and mobile services have become an essential part of how economy works and functions. The total mobile penetration has more than doubled in all the region of the world since 2005(Williams et al. 2012).
Customer retention refers to customer’s stated continuation of a business relationship with the firm (Timothy et al. 2007). Since last decade, many companies perceive the retention of the customer as a central topic in their management and marketing decisions (van Den Poel & Lariviere 2004). Many studies on customer retention argue that retaining customers improves profitability, importantly reduces the cost incurred in acquiring new customers. Reichheld & Schefter (2000) discovered that a firm can increase profits by 25-95 percent if it could improve its customer retention rates by 5 percent. Furthermore, a retained customer will be loyal due to the attachment and commitment to the organization (Sharmeela-Banu et al. 2012).
As markets mature and competitive pressure intensifies, companies can no longer ignore the importance of customer retention as their existing customer bases have become their precious assets. Customer churn is a focal concern for most companies which are active in industries with low switching cost and among all industries that suffers from this issue, telecommunications industries can be considered at the top of the list with approximate annual churn rate of 30% (Ali, 2009).
Rwanda is a poor, small and landlocked country with surface area of 26,338 square kilometers situated in east-central Africa. In Rwanda, regular efforts have been made to develop the service sector and to stimulate investment in the industrial sector and Vision 2020 seeks to transform Rwanda from a low-income agriculture-based economy to a knowledge-based, service-oriented economy with a middle-income country status by 2020 and ICT is one of the cross cutting issues of vision 2020 (RDB, 2013).
The Rwandan telecom sector has shown particularly strong growth in recent years in terms of subscriptions, revenues and investments, buttressed by a vibrant economy and a GDP which has sustained growth of between 7% and 8% annually since 2008. As a result, the country is rapidly catching up with other markets in Africa, with increased penetration particularly evident in the internet and mobile sectors. Although the country was slow to liberalize the mobile sector, there is effective competition among the three current operators, each of which provides wide geographic coverage (Dudde, 2014). The deregulation of the industry has caused a lots of service providers to enter the industry and it can be stated that the telecommunication industry has been very competitive; as at the end of December 2014, there were two fixed line telephony operators and three mobile telephony operators who are fully operational, namely, Mobile Telecommunication Network (MTN) having both fixed line telephony and mobile telephony, Millicom Rwanda Limited (Tigo), Airtel Rwanda and Rwandatel (RURA, 2015).
As the competition continues to increase in the telecommunications industry, customers are continuously leaving one company to another and retaining customers has become a critical concern for most companies; hence the cellular phone companies are doing everything possible to attract new customers and retain the existing ones. However, despite the large amount of research done on customer retention in mobile telecommunications, there is an absence of studies about the sector in Rwanda. In order to support telecommunications companies manage churn reduction, not only do we need to predict which customers are at high risk of churn, but also we need to know how soon these high-risk customers will churn; so that the telecommunications companies can optimize their marketing intervention resources to prevent as many customers as possible from churning. Conventional statistical methods (e.g. logistics regression, decision tree, and etc.) are very successful in predicting customer churn, but these methods could hardly predict when customers will churn, or how long the customers will stay with. However, survival analysis was, at the beginning, designed to handle survival data, and therefore is an efficient and powerful tool to predict customer churn (Lu, 2002).
Survival analysis is a collection of statistical procedures for data analysis for which the outcome variable of interest is time until an event occurs. Survival analysis examines and models the time it takes for events to occur. Survival modeling examines the relationship between survival and one or more predictors (covariates). The main aim of this paper is to model customer retention in telecommunication sector using survival analysis technique in order to inform the concerned institutions and companies about telecom customer retention in Rwanda.
2.Review of Previous Research
Many studies have been done on customer retention in telecommunication sector. In the study aimed to understand and predict customer lifetime in a contractual setting in order to improve the practice of customer portfolio management, Portela & Menezes (2011) used Accelerated Failure Time (AFT) models after estimating the Cox PH model in order to test the PH assumption based on Schoenfeld residuals and found the PH assumption did not hold. Appropriate parametric model was found to be a log-logistic based on the fact that it had the lowest AIC. Wong (2011) instead used the Cox regression model in studying customer retention in the context of a Canadian wireless telecommunications company and explored the predictors of churn incidence as part of customer relationship management. On the other hand, Ahn et al. (2006) also investigated factors leading to customer churn using a sample of 5789 actual customer transactions and billing data. In addition, the mediating effects of customer status between churn determinants and customer churn were analyzed by adopting logistic regressions. The parameters of logistic response functions were estimated with the maximum likelihood method. The likelihood ratio test indicates that these models fit the data very well but this study had a number of limitations in that: first, data for some variables, such as account tenure (also called customer duration) and each subscriber’s age were not available and in particular the account tenure is a very important variable explaining customer churn. Secondly, the 8 months data collection period for the study was relatively short which suggested that an additional longitudinal study with a longer period of data collection and time-series data was necessary. Ocloo & Tsetse (2013) instead adopted the descriptive survey method which employed a hybrid of qualitative and quantitative methods in their study on customer retention in the Ghanaian mobile telecommunication industry.
It is evident that many models has been used to model telecom customer retention globally; however, all the models reviewed above have a number of weakness which make them less suitable than extended Cox regression adopted in this research. The logistic regression models the outcome as a binary variable taking value 1 or 0, it hence ignores the effect of survival times and censoring information. The AFT models make assumptions on the distributions of the survival times and does not take into account the effect of time-dependent variables; Cox regression model ignores the effect of time dependent variables and the other customer retention models used are weak compared to extended Cox model in the fact that they did not take into account the effect of survival times and censoring information; thus the main aim of this paper is to model telecom customer retention using Cox regression model and extended Cox model and determine which is the best retention model.
3.1.Basic Analytical Quantities
Let T represents the survival time; the actual survival time of an individual t can be regarded as the value of the variable T which can take any non- negative value. T is regarded as random variable with cumulative distribution function F(t) = Pr (T ≤ t) and probability density function f(t) = dF(t)/dt. The basic analytical quantities for time-to-event data are the survival function
which gives the probability that a customer survives longer than some specified time t and the hazard function
which is also referred to as hazard rate , instantaneous failure rate, or conditional failure rate. In the context of this study, it can be interpreted as the risk of canceling the contract (or unsubscribing) at time t. Another basic quantity is the cumulative hazard function defined as
A Cox model is a statistical technique for exploring the relationship between the survival of an individual and several explanatory variables. It allows us to estimate the hazard (or risk) of death (or cancelation) for an individual given their prognostic variables. The general proportional model is given by
Where xi is a set of covariates for the ith individual and h0(t) is the baseline hazard function that depends only on the time and not on the covariates.
The hazard is a product of two terms: the baseline hazard function and the set of covariates which does not depend on time t. The most important assumption in Cox model is the proportional hazards assumption; that is, the hazard ratio of any two individuals is constant over time in the setting where the predictor variables do not vary over time.
In estimating the parameters in the Cox regression model, Cox (1972) derived the same likelihood, and generalized it for censoring, using the idea of a partial likelihood.
Suppose we observe for customer i, where ti is survival times, δi is the failure (cancelation)/ censoring indicator (1=cancel, 0=censor), Xi represents a set of covariates and i = 1, 2,.......n. Then the likelihood function is given by
Where R(ti) is the risk set at time ti. The corresponding log-likelihood function is given by
The maximum likelihood estimates of β-parameters can be found by maximizing this log-likelihood function using numerical methods.
The Newton-Raphson procedure is a numerical method used to fit models for censored survival data by maximizing the partial likelihood function.
Let u(β) be the vector of the first derivatives of the log-likelihood function in equation (3.6) with respect to the β-parameters. This quantity is known as the vector of efficient score. Also let I(β) be the p× p matrix of negative second derivatives of the log-likelihood, so that the (j,k)th element of I(β) is
The matrix I(β) is known as the observed information matrix.
According to the Newton-Raphson procedure, an estimate of the vector of parameters at the (s + 1)th cycle of the iterative procedure is and is given by
for s = 0,1,2,.........., where is the vector of the efficient score and is the inverse of the information matrix, both evaluated at βs. The procedure can be started by taking βs= 0. The process is terminated when the change in the log-likelihood function is sufficiently small or when the largest of relative changes in the values of parameters estimates is sufficient small.
3.3.The Extended Cox Model
One of the strengths of the Cox model is its ability to encompass covariates that change over time; such covariates are known as time-dependent variables. If time-dependent variables are considered, the Cox model form may still be used, but such a model no longer satisfies the PH assumption and is called the extended Cox model. The extended Cox model for time-dependent variables is given
where is the entire collection of predictors at time t.
The extended model contains a baseline hazards function h0(t) and an exponential function which contains both time-independent predictors, as denoted by the Xi variables, and time-dependent predictors, as denoted by the Xj(t) variables. Even though the values of the variable Xj(t) may change over time, the hazard model provides only one coefficient for each time-dependent variable in the model. Thus, at time t, there is only one value of the variable Xj(t) that has an effect on the hazard, that value being measured at time t.
An important assumption of the extended Cox model is that the effect of a time-dependent variable Xj(t) on the survival probability at time t depends on the value of this variable at that same time t, and not on the value at an earlier or later time.
The PH assumption is not satisfied for the extended Cox model, thus the hazard ratio depend on time t. The hazard ratio for the extended Cox model is then
The two sets of predictors, X*(t) and X(t), identify two specifications at time t for the combined set of predictors containing both time-independent and time-dependent variables.
The extended Cox model for customer retention used in this study is given by
where is a baseline hazard function
is a set of time independent covariates where,
=sex, =age,= marital status,=residence, =employment status, = number of subscriptions
=plan rate and it is the time-varying covariate
= set of coefficients of time independent covariates and =the coefficient of time-varying covariate.
The regression coefficients in the extended Cox model are estimated using a maximum likelihood (ML) procedure. ML estimates are obtained by maximizing a (partial) likelihood function L. The partial log likelihood function from equation (3.6) can be generalized to the case of extended Cox model, thus, the partial log-likelihood becomes
where R(ti) is the risk set at time ti, the death time of the ithindividual in the study, i = 1,2,.....,n and δi is an event indicator that is zero if the survival time is censored and unity otherwise. The estimates of β−parameters are obtained by maximizing the partial log-likelihood function.
3.4.Simulation to Assess the Validity
R statistical software (version 3.0.3) was used for simulating and analyzing data. Due to unavailability of data and inconsistent recording of telecom data, simulation approach has been used to generate data and customer retention model. According to different research, some customer’s socio-economics, demographics and behavioral characteristics have been found to affect the hazard of churning. In this view, six time independent covariates namely the customer’s sex, age, marital status, residence, employment status and number of subscriptions and one time dependent covariate namely customer’s plan rate were simulated. The maximum following time was set to be 25 months and furthermore permalgorithm function was used to generate survival times condition to this set of covariates and coxph was used to fit the Cox model. In modeling telecom customer retention two models were fitted: the correct model and incorrect model in order to determine which model explains well telecom customer retention. The correct model is the extended Cox model which includes the time dependent variable and the incorrect model is the Cox regression model with time independent variables only, that is; it assumed the time dependent variable (TDV) "plan" to be time independent variable (TIV). This means that dataset1 generated to fit a correct model contained seven covariates in which "plan rate" covariate is time varying covariate and dataset2 generated to fit an incorrect assumed plan rate covariate to be constant over time. The appendices A and B show the first fifty observations generated in the two data sets for both models. The validity of the models were tested through simulation where 500 and 1000 simulations were run for both models and two samples of size 500 and 1000 have been generated for both models in order to assess the effect of sample size in both models.
4.Results and Discussion
* Average value of the covariates’ coefficients estimates, #average standard errors.
Table 4.1 and table 4.2 show the incorrect and correct model respectively. In both models, the first column represents the average estimated value of coefficients of each covariate, the estimates of coefficients of time independent covariates are mostly equal in both models but the estimates of coefficients of time dependent covariate "plan" are different in both models. The exponentiated coefficients in the second column of the table are interpretable as multiplicative effects on the hazard. For example, taking n=1000 and R=1000 and holding other covariates constant in the correct model, an additional year of age reduces the hazard of churning by a factor of 0.7526 or 24.74 percent and an additional in customer’s number of subscriptions reduces the hazard of churning by a factor of 0.4904, or 50.96 percent. The third column represents the average standard errors for each covariate. The standard error of the sample is an estimate of how far the sample mean is likely to be from the population mean and it tends to zero with the increasing sample size. This is also true for our research as the standard errors tend to decrease with the increase of the sample size. The fourth column represents the bias of the estimator, which is the difference between the estimator’s expected value and the true value of the parameter being estimated, was determined and it was found that the bias of the estimate of coefficient of time varying covariate in the incorrect model is larger than bias of the estimate of coefficient of time varying covariate in the correct model. Also the average probability values and proportion of p-values which are significant for each covariates have been determined, and the covariate "plan" is significant in the correct but not in the incorrect model. Thus, by ignoring the effect of time dependent variable we can reject the variable plan when we don’t have to reject it.
Furthermore in assessing the effect of the sample size on variables’ significance, it was found that the variables become more significant with the increase of the sample size, thus all variables in the correct model were found to significantly affect the risk of churning highly for n=1000.
All the estimated values for the six TIV in both models are almost the same. A great difference lies between the estimated values of the variable "plan" for two models. Table 4.3 shows a comparison between the two models for the estimates of coefficient of time varying covariate "plan". It shows the bias, the standard error and the mean squared errors (MSE) for the two models each with two samples of size 500 and 1000. The bias shows how the estimate is close to the true value and the standard error shows how far the sample mean is likely to be from the population mean. The smaller the bias and standard error, the better is the estimate but it is common to trade-off some increase in bias for a larger decrease in the standard error and vice-versa. The bias and the standard error decrease as the sample size increases. The mean squared error captures the error that the estimator makes. The smaller the MSE the better the estimate is. The results from the table below shows that the MSE of the estimate of the coefficient for the correct model is smaller than the MSE for the incorrect model; thus, ignoring the effect of time varying covariate increases the MSE in the incorrect model and this means the extended Cox model is the best model compared to the Cox model.
|Incorrect model||Correct model|
A box plot has been also shown for the estimates of time varying covariate "plan" in both models for both 500 and 1000 iterations as shown in figure 4.1 and it shows the difference in the median of the estimate of the coefficient of covariate. The median of the incorrect model is greater than the median of the correct model.
5.Conclusion and Recommendations
This study presented a method of modeling telecom customer retention using survival analysis technique. A set of seven covariates which consists of customer’s sex, age, marital status ,employment status, customer’s residence, number of subscriptions, customer’s plan rate have been generated and "PermAlgo" package has been used to generate survival times conditional to this set of covariates. It was found those customer’s demographic, socio-economic and behavioral characteristics affect the likelihood of churning. Two models (incorrect model and correct model) were fitted to determine which model analyzes well customer retention. The results highlighted the usefulness and the effect of the time dependent covariate in the Cox regression model; that is, by assuming the time dependent covariate (TDC) to be time independent covariate (TIC), the covariate became insignificant when it was significant and hence we can end up by making miscellaneous conclusions. A box plot has been used to show the difference in the two models for estimates of coefficients of time varying covariate (plan rate). It was found that the median of time varying covariate was smaller than the median of time independent covariate. We also assessed the effect of the sample size on the model; as expected; the results indicated that the bias, the standard error, the mean squared error decreases as the sample size increases and the covariates become more significant as the sample size increases. In summary, the extended Cox model was the best description of how customer retention is achieved.
Most telecommunications companies in developing countries do not take into account the role of statistical data to achieve customer retention and may unwillingly make wrong decisions which can result into companies’ losses. One limitation of this study was that companies do not keep proper customers’ records; hence we recommend that consistent and well organized records which include all possible customers’ demographic and socio-economic characteristics should be kept for every customer for better monitoring and evaluation in order to achieve customer retention. Given the usefulness of time effects in Cox model and the fact that more customers tend to change their rate plan or any other variable over time, we should analyze customer retention using extended Cox model instead of Cox regression model.
Appendix A: data set 1
Appendix B: dataset 2