Development of a Credit Scoring Model for Retail Loan Granting Financial Institutions from Frontier Markets

The primary focus of this paper is to develop a retail credit scoring model specifically suitable for financial institutions from emerging economies, where availability of reliable data is scarce. In addition, the study seeks to illustrate the efficacy of such credit scoring models and emphasize improvements that can be achieved in the decision-making function of consumer credit granting process.


Introduction
The primary activity of commercial banks is extending credit to borrowers by generating loans. Therefore, a significant portion of a bank's risk lies in the quality of its assets that need to be in line with that bank's risk appetite. In order to manage risk efficiently, quantifying risk with the most advanced statistical tools is essential for the credit issuer's existence and growth.
Over the last two decades, consumer lending has become increasingly sophisticated and sensitive, as lenders have moved from traditional interview based decision-making to data-driven models to quantify credit risk. Credit scoring is identified as a systematic method for measuring credit risk as it provides a consistent analysis of the contributing factors of credit risk. A credit score is a numerical value that represents the degree of default risk. It rates how risky a borrower is. The higher the score, the lesser risky the borrower is to the creditor. Since 1960, credit scoring models have been productively utilized by financial institutions of developed countries for quick and precise assessment of the risk level borne by their potential or existing borrowers.
Credit risk is the most typical risk of a bank by the nature of its activity. In terms of potential losses, it is obviously the riskiest. Credit risk is also often referred to as default risk, performance risk or counterparty risk.
Credit risk arises when a borrower defaults or does not comply with his obligations to service debt. Several reasons can be attributed for default. Usually, either the borrower is financially insolvent or simply rejects to fulfill debt service obligations (e.g., in the case of fraud or legal dispute). Technical defaults may occur because of the flaw in the information system or technology.
There are many definitions of default event. The most general definition of default event is a payment delay of at least of 90 days.
Empirical and historical evidences suggest that the magnitude of potential loss by financial institutions due to ineffective credit risk management can eventually lead the institution from insolvency to bankruptcy. Hence, credit granting institutions have to observe and gather accurate information about their potential borrowers and monitor performance of accepted borrowers over time. This implies that managerial supervisory and credit risk strategies directly influence the return and risk of a loan portfolio. Because of this, relevant assessment and prudent management of credit risk are of critical consequence, specifically, in terms of decreasing costs, increasing profit, and remaining financially healthy and solvent.
The risk manager is challenged to create risk assessment instruments that can not only satisfactorily gauge creditworthiness, but also keep per-unit processing cost low, while minimizing turnaround time for analyzing customers' applications. The success of credit scoring systems has established them to be a key decision-support tool in today's risk measurement and management practice.
However, the development of a scoring system, particularly for emerging market commercial banks, where availability, accessibility and interpretation techniques of data are often scarce, often prove to be a difficult task. Problems frequently faced by small banks and banks in emerging markets commonly include the lack of internal databases and complex data mining tools and applications, shortage of powerful integrated software and experienced staffs. Usually managers of such financial institutions are unwilling to invest in these resources by associating the development of quantitative models with excessive, uneconomical costs and are more comfortable relying solely on a routine experience-based judgment technique to assess the potential risk of default of individual applicants.
I know, however, that the definitive goal of implementing quantitative models is to increase precision of credit appraisal decision making process through identifying creditworthy borrowers, and thereby reducing the cost & risk of lending, which eventually turn the costs incurred during model design, construction and development into continuous profitable ramifications.
This paper aims to illustrate the development process and significance of a sound credit risk scoring system for quantifying the probability of loan defaults. The practical example presented in the paper uses empirical data of borrowers' past performance obtained from a sample of emerging market banks and decisively emphasizes the benefits of establishing such credit risk scoring systems even for small banks with scarce data.
The review of existing research works and the background necessary to entirely understand this article are given in section 2. After the literature review, section 3 delves deeper into the description of credit risk models, data preparation methodology and consideration of different applicable quantitative models that can be selected for developing an advanced credit risk assessment system. Finally, section 4 presents the step-by-step procedure of developing a credit risk scoring model, which can be readily replicated by any credit granting institutions with confidence.

Literature Review
Since the inception of modern banking, credit risk has always been one of the major risks that financial institutions have faced most recurrently. Due to this fact, there has been a lot of research on this topic. Altman's Z-score may be considered a classic and the backbone of credit risk estimation. It is discussed in references [1] and [2]. Later, structural models Ire introduced by Merton, Black and Cox. These models concentrate on capital structure of the firm rather than on empirical evidences of portfolio default events. They are discussed in references [3] and [4]. However, they focused more on bonds and traded securities rather than on non-tradable loans. Efficiency of Altman model in recent years was eloquently discussed by Hayes, Hodge, and Hughes in [5]. Miller's (2009) major finding was that Merton's model outperformed Altman's Z-score and is discussed in reference [6]. Altman revisited credit scoring models in Basel II environment in [7] and outlined that Zscore as Ill as structured models could be used as early warning indicators for distress events.

Credit Risk Models
Credit scoring can be defined as a quantitative method used to measure the probability that a loan applicant or existing borrower will default. Such models aid to determine whether credit should be granted to a borrower or not. A good credit scoring model has to be highly discriminative. Low scores correspond to very high risk, and high scores indicate almost no risk (or the vice versa, depending on the sign condition).
Originally, credit approval decision was made using a purely judgmental approach by merely analyzing the application form details of the borrower. The decision maker focused on the 5C's of a customer: Character: measures the borrower's character and integrity (e.g., reputation, honesty, etc.); Capital: measures the difference between borrower's assets and liabilities; Capacity: measures the borrower's ability to comply with obligations (e.g., job status, income, etc.); Collateral: measures the collateral to use in the case of default; Condition: measures the borrower's circumstances (market condition, competitive pressure, seasonal character, etc.). It is worth mentioning, that this expert-based judgmental attitude towards credit scoring is still widely used by emerging market banks in the case of limited information and data unavailability. The credit scoring model weighs key characteristics obtained from the application form to identify an aggregated or a range of score of risky borrowers. These weights are determined according to the relationship between the values of characteristics and the default behavior. First, decision makers arbitrarily set a Cutoff Rate and, then, classify rule by comparing a customer's overall credit score (Score) to cutoff rate (Cutoff), as shown below: Classification Rule={(if score > cutoff, customer did not default; if score < cutoff, customer did default} Assuming new loan consumers will act like old ones, the credit scoring system can be engaged to generate a credit score for new loan applicants and to assign them to a high or low default risk category. When the applicant has a score that exceeds the cutoff or threshold value, the loan is granted. The logic behind the scoring system is simply that it should mimic what a qualified, skilled expert would seek in a borrower's application.
The major advantage of utilizing automated application scorecards is the reduction of time for assessing new applications. Applications can be screened and scored in realtime, which is imperative in today's highly competitive credit market. Another key advantage of using this system lies in its simplicity; the scorecard is extremely easy to examine, understand, analyze and monitor. Analysts can perform these functions without having in-depth knowledge of statistics or programming, making the scorecard an effective instrument for managing credit risk. Finally, the development process for these scorecards is remarkably transparent and can easily meet any regulatory requirement.

Data Preparation
Scoring systems are developed on the assumption that future performance will reflect past performance. The performance of previously opened accounts is analyzed in order to predict the performance of new applicants.
Obtaining reliable and statistically valid data is crucial for the development of such a scoring system. The quantity and quality of data should comply with the requirements of statistical significance and randomness. Financial institutions can opt to rely solely on internal data, or supplement existing internal data with information from external sources.
Custom scorecards are developed using data from borrowers' accounts of one institution solely, while generic scorecards are constructed using data from multiple institutions. For example, several small banks, none of which has sufficient data to construct its own custom scorecards, may aggregate their data for consumer loans. Several steps should be taken to make data relevant for scoring system development: Step 1. Exclusion -certain types of accounts need to be excluded. For example, if there are markets where the company no longer operates the data from these markets should be excluded; Step 2. Seasonality -this is to ensure that the development sample does not contain any data from "abnormal" periods. The aim here is to comply with the assumption that the future will replicate past.
Step 3. Definition of "good", "bad" and "indeterminate" -this step classifies historical debtors as good, bad or indeterminate. For bankruptcy, definition of "bad" is clear. However, there are many definitions of "bad" based on levels of delinquency. The definition must be consistent with organizational prospects. If the aim is to increase profitability, then the definition must be established at a delinquency point where the account becomes unprofitable. The most widely accepted definition of default event or "bad" account is a payment delay of at least of 90 days. Several analytical methods, such as "roll rate analysis" or "current versus worst delinquency comparison" analysis can be used to confirm the definition of bad. Indeterminate are those debtors that do not decisively fall into either the "good" or "bad" categories and don't have enough performance history to be classified. After dropping these indeterminate debtors, one looks for characteristics that indicate the propensity to pay and tries to estimate their relative significance.
Step 4. Segmentation -sometimes it is useful to develop several scorecards for a portfolio in terms of achieving better risk differentiation. This becomes relevant when a population consists of distinct subpopulations. In section 3.2, we examine the distinctive characteristics of different quantitative methods that can be used for the purpose of constructing effective credit risk scoring models for frontier market financial institutions where the predicament of unavailability and reliability of readily accessible data predominantly persists.

Credit Scoring Models
Credit scoring models are quantitative models that use borrower-specific characteristics either to calculate a score that represents the applicant's probability of default or place borrowers into distinct default risk classes. For retail loans, the characteristics typically include socio-demographic variables, such as, income, age, occupation, and location etc. and credit bureau reports of the applicant. For corporate loans, cash flow information and financial ratios of the corporate entity are commonly analyzed. Now, we will investigate some of the most widely used credit scoring models in practice by contemporary financial institutions.
The Linear Probability Model uses past data to explain repayment experience on old loans. It assumes that probability of default (or repayment) varies linearly with factors used as inputs. The relative importance of the factors used in explaining past repayment behavior then predicts repayment probabilities for new applicants. Old loans are divided into two groups: those that defaulted ( A major weakness of the Logit model is the assumption that cumulative probability of default takes on a form of particular function, which reflects a logistic function. Cumulative probability refers to the fact that over time default rate will increase and thus needs to be considered with caution.
An extension of the Logit model as an alternative to linear probability model is Probit Model. The Probit model also forces the predicted probability of default to lie between 0 and 1, but differs from the Logit model by assuming that the probability of default has a cumulative normal distribution rather than a logistic function.
Although credit scoring has significant benefits, its limitations should also be noted. One of the problems that may arise in the development process of a credit scoring model is the use of biased sample of borrowers. This may take place because potential borrowers who are rejected will not be considered in the data for developing credit scoring models. Hence, the sample will be biased as good customers are too heavily represented. Another problem related to the construction of a credit scoring model is the change of patterns over time. Sometimes the tendency for the changes in distribution of characteristics is so rapid and random, that it requires constant refreshing of the credit scoring model to stay applicable.
While linear probability and Logit models predict a value for the expected probability of default, Linear Discriminant Models seek to establish a linear classification rule or formula that best distinguishes between particular categories. Specifically, discriminant models classify borrowers into high or low default risk buckets depending on their observed characteristics. As in the case of Linear Probability Model, Linear Discriminant Models use relative importance of the factors used in explaining past repayment behavior to predict whether the loan falls into high or low default category. In the next section of this research, we develop a credit scoring model appropriate for retail loans, which will shed insights into the level of loan delinquency and creditworthiness among individual borrowers and will certainly reduce the number of nonperforming loans and will effectively manage credit risk specifically in the lending practice of emerging market banks.

The Credit Scoring Model for Retail Loans
While quantitative models are widely used in banking systems around the world, majority of the banks in an emerging economy like Bangladesh are still reluctant to develop quantitative models for assessment of credit risk. Several external and internal reasons might exist why quantitative models are not applied in decision making process for granting credits. Let's examine these reasons first by considering the application of credit scoring on business loans.
Construction of credit scoring models intended for business loans predictably encounters significant complexities. Specifically, the lack of a central database is a serious obstacle; currently no centralized database exists on defaulted business loans. Also, due to rapidly changing financial conditions, there is no reason to expect that borrower-specific financial ratios will remain constant over any period of time.
As I see, the development of quantitative models for business lending may prove to be challenging and, in some cases, practically impossible. On the other hand, construction of quantitative models for retail lending is entirely feasible. Internal data about borrowers past credit performance is all the information required to develop a quantitative model for the assessment of retail credit risk.
Once segments are identified for retail lending, appropriate quantitative model must be chosen for decision-making to grant credit. This issue does not require judgment, because it is empirically confirmed that credit scoring system (such as linear probability, Logit and Probit models) is the most appropriate method to assess the default risk of retail loan applicants. Now, let us consider the process of developing credit scorecards and scoring model for retail loans. In the process of scorecard development, the first goal is to select the best set of characteristics. Examples of scorecard characteristics are socio-demographic parameters (e.g., age, time at job, time at residence), existing relationship (time at bank, number of products, previous claims, and payment performance), credit bureau reports (inquiries, trades, delinquencies, and public records), real estate and so forth.
Reliable, clean data is needed with minimum acceptable number of "good" and "bad" accounts to develop a credit scorecard of high precision. As a rule of thumb, there should be nearly 2000 "bad" and 2000 "good" accounts to develop a credit scorecard.
Nearly 3000 "bad" and 3000 "good" retail loan accounts are obtained from a sample of carefully chosen 25 commercial banks of Bangladesh 1 that do not employ credit scoring models for the assessment of applicants' default risk. However, the major difficulty in the development of a credit scorecard that will work as a "sole arbiter" happened to be scarce nature of the borrower-specific characteristics. In particular, we obtained information regarding only four pertinent characteristics (new/existing client, requested amount, payment-to-income ratio, loan type); while, ideally a credit scorecard should consist of 8 to 15 characteristics. Also as the analysis below identifies, not all of these four characteristics are strong. Hence, we have no illusion to construct a scorecard whose predictive power will be poor. Rather, our goal is to show that even with this scarce nature of characteristics, the development of a credit scorecard matters and scorecard developed using appropriate number of relevant characteristics will be a strong predictive tool for the assessment of applicants' default risk. In order to initiate analysis, we need to assess the strength of each characteristic using the following criterion: Predictive power of each attribute measured by the weight of evidence (WOE); The range and trend of WOEs across attributes within a characteristic; Predictive power of characteristic measured by the information value (IV). The WOE is used to measure the strength of each attribute in isolating "good" from "bad" accounts. It is calculated using the following formula: . ln( ) 100 .

Distr Good
where, Distr. Good -percentage of "good" accounts in the sample data, Distr. Bad -percentage of "bad" accounts in the sample data Negative number of WOE would indicate that the specific attribute is isolating a higher proportion of "bads" than "goods". Information value of each characteristic is calculated using the formula: 1. As per request of the management, the names of the banks involved have been excluded due to confidentiality issues and sensitive nature of the data.
Best practice regarding inferences from information value would be that characteristic with IV measure of: Less than 0.02 -Not Predictive; 0.02 to 0.1 -Weak; 0.1 to 0.3 -Medium; and More than 0.3 -Strong. The following measures of the WOE for the attributes of each characteristic and the IV for each characteristic are predefined as: New/existing clients; Requested amount; Payment-to-Income ratio (PTI); and Loan type. Table (1) and table (2) summarize the strength of characteristics and their attributes for "New/Existing borrowers" and "Loan Type".
As the variables "Requested Amount" and "PTI" are continuous, it requires binning in order to calculate WOE and IV. There are a lot of algorithms for optimal binning; however, this issue goes far beyond the scope of this study. Thus, bins for these variables are chosen arbitrarily and the final results are furnished in tables below: As is seen from calculations, predictive power of the characteristic "PTI" is strong, and the WOEs of its attributes are good enough. The predictive power of another characteristic, "Requested amount", measured by the IV proved to be medium with WOEs of its attributes also being satisfactory. The remaining two characteristics -"New/existing" and "Loan type" -are not adequate. However, we will use these inadequate characteristics along with "PTI" and "Requested amount", because our key objective is to illustrate the importance of developing credit risk scoring models even with scarce characteristics.
Since we have already identified the WOE of each attribute, we can now use them as inputs of logistic regression. Regression analysis can be conducted using logistic regression techniques to identify the best possible model.
Where, i P -Probability of Default estimated by logistic regression, see (1); Y -Dependent variable of regression, i.e. 1 for "bad" accounts and 0 for "good" accounts; N -Number of loans. As a result of optimizing 1; ...; n β β regression coefficients to obtain maximum value of ( ) p L , I obtained the following regression coefficients: After identifying variable inputs and coefficients, I set rejection cutoff probability of default at 50% and calculate accuracy of the resulted model in the following tables: Type I error indicates that out of observed "goods", my model identified 8% as "bad" accounts. Thus, 92% of "goods" was identified correctly. Type II error indicates that out of observed "bads", my model identified 45% as "good" accounts. Further, I used Gini coefficient as a measure of discrimination power of my model. Gini coefficient is calculated using the following formula: Where, G -Gini Coefficient; L (q) -Lorenz curve, cumulative probability distribution. Lorenz curve measures values of cumulative default with respect to cumulative percentage of loans. Information in Lorenz curve can be summarized by Gini coefficient and Lorenz asymmetry coefficient. I have utilized Gini coefficient, which measures the inequality among values of a frequency distribution. In developing credit scoring models, Lorenz curve represents empirical distribution of bad accounts and Gini coefficient becomes a measure of discrimination power of the developed credit scoring model.
In the model developed for the purpose of this paper, Gini coefficient amounted to 27%, indicating low discrimination power. As a result, I can conclude that the credit scoring model developed in this study gives a solid foundation for developing the model further as more data is added in the model, increasing its predictive power and efficiency of loan granting process. Several statistical techniques have been utilized to improve the predictability of credit scoring models (Linear regression, Probit analysis, Bayesian methods, etc.), but the logistic regression implemented in this paper still remains the most accepted method. Again, I must mention that the development of this particular model was conceived with the aim of designing a suitable prototype framework applicable to any loan granting financial institution, particularly operating in frontier markets where reliable date is scarce, rather than focusing on the establishment of a sound, sophisticated and stable scoring model for any specific institution serving in a developed market.

Conclusion
Through this study, an easy to understand credit scoring model is developed for estimating probability of default on retail loans, particularly for institutions operating in frontier markets, where readily available reliable data is scarce. Only four characteristics and their attributes are used to develop the credit scoring model. The scoring model developed with such scarce data demonstrated Type I error of 8%, and Type II error of 45%. However, the model can be extended to incorporate more variables that will result in increased accuracy of the model. In addition, the discrimination power of the developed scorecard model as measured by Gini coefficient proved to be low, namely 27%. Hence, the goal of constructing an illustrative credit scoring model with scarce data may be considered as achieved. Thus, even with scarce data, it is possible to build a credit scoring model that will help decision makers expedite the credit appraisal process and increase overall organizational efficiency of the retail loan granting institutions.