Comparison Between Discriminant Analysis Model and Logistic Regression Model to Predict the Distress of the Algerian Economic Companies

This study aims to select the appropriate method the most able to distinguish between distressed and nondistressed economic companies, which will lead us to know the most important variables that can detect the pre-distress for Algerian economic companies, specifically in the Algerian environment, through the comparison between the findings of the application of two of the most important statistical methods, i.e, discriminant and logistic analysis, on a sample of 60 companies; half of them is distressed and the other half is non-distressed according to four financial ratios selected among the financial ratios the most used by researchers, namely: sales to total assets, working capital to total assets, profit before interest and tax and the ratio of equity to total assets. We concluded, by comparing the findings of the discriminant model and logistic model classification, that the latter is more able to distinguish between distressed and non-distressed Algerian economic companies by 96.7%, compared to the findings of the Fischer discriminant model classification by 91.7%.


Introduction
Discriminant and logistic analysis are two of the most important methods used in assessing the financial position of economic companies. They have proved in several studies great power and the ability to distinguish between distressed and non-distressed companies. Moreover, they have proved high accuracy in predicting the company distress before it happens a certain period, which enables employers taking appropriate actions to make necessary reforms and avoid this danger. To achieve early prediction of distress in the Algerian economic companies acting in the Algerian environment, it is necessary to select a perfect model that includes the most important independent variables (financial ratios) that would predict the danger before it happens.

Study's Problem
The problem of this study stems from the possibility of finding a perfect model, adapted to the Algerian environment, which can predict the distress of economic companies before it happens. Thus, the following question is asked: What is the appropriate method to construct a model that can be adopted to predict the distress of Algerian economic companies? Other sub-questions are derived from this one: Is discriminant model able to distinguish between distressed and non-distressed Algerian economic companies?
To what extent is logistic model able to predict the distress of Algerian economic companies a year before it happens?

Research Importance
The importance of this research consists in the attempt to compare the findings of the application of discriminant analysis method and logistic regression method on a sample of Algerian economic companies, to determine the ability of these two methods to predict the occurrence of financial distress before a certain period, in addition to select one of the two models as appropriate to the Algerian environment.

Research Objectives
This research aims to get a perfect discriminant model fit with the Algerian environment, where we can accurately distinguish between distressed and non-distressed economic companies, in addition to providing some of the proposals and recommendations on the issue of the financial distress prediction in Algeria.

Hypotheses
This study is concerned with the comparison between discriminant analysis method and logistic regression method to determine the possibility of applying the most able to distinguish and predict the distress of economic companies acting in the Algerian environment by testing the following hypotheses: The ratio of sales to total assets and the ratio of profit before interest and tax are the best variables by which discriminant linear model can be constructed. The ratio of sales to total assets, the ratio of working capital to total assets and the ratio of equity to total assets are the best variables that can be relied upon to construct the logistic model.

Previous Studies
There are many studies that have addressed the issue of financial distress prediction using discriminant method and logistic method. This table (Table 1) shows some of the most important of these researches:

The Concept of Discriminant Analysis
The earliest use of discriminant analysis to the statistician England Karl Pearson in 1920, when he proposed an intergroup distance index called (CRL) [1] that was extensively studied by the scientist M. Morant in 1920. In the same year, several studies on distance indexes started in India to be formalized by P. C. Mahalanobis in 1930. Fisher translated, in 1930, the idea of intergroup distance to a composite of variables derived for the purpose of two-group classification. In general, the distance and variable composite appeared in print prior to Fischer's article 1936 entitled "The use of multiple measurements in taxonomic problems", where they were used later in the process of discrimination. [2] Discriminant analysis is defined as a statistical method allows the discrimination between observations depending on the individual characteristics of each of them. Discriminant analysis is also used in order to predict a certain phenomenon where the dependent variable is qualitative. [3] Discriminant analysis is a process of devising rules to assign a new individual data point into one of K known groups. the method is usually based on previously known information related to the K-groups known as training data whose classification information are correct. [4]

The Use of Discriminant Analysis in Classification
If we consider D the group of distressed companies, N the group of non-distressed companies, x = (x 1 , x 2 ,..., x p ) p ray of companies ratios, µ N and µ D are x means for each group and T is the whole variance-covariance matrix. The discriminant function responds to two rules of decision: The first rule for decision-making for geometric standard: Comparison of distances d(x,μ )≤ d(x, ) ↔e belongs to Group D Using the metric matrix T -1 , the rule becomes as follows: If ƒ (x) is negative, e belong to Group D. Where: The geometric model works to: [5] maximize the separation between the two groups (maximizing the variance or the distance between the sections) and minimize the distance between the elements of one group (minimizing the variance or the distance within the sections). In order to achieve this aim, the following relationship must be maximized: Where: F: Discrimination Coefficient B: Inter-sectionvariance-covariance matrix W: Intra-section variance-covariance matrix. Maximizing this relationship leads us to get an F ray identical to the biggest intrinsic value u. whenever the value of u is higher, the relationship is maximized and, thus, the separation between the two groups is the best and the distance inside the elements of the group is low.
The second rule for decision-making is Bayes' rule that decreases the probability of classification error. In the case of the normal distribution and equal variance-covariance for each group, the same discriminant function is produced: [5] ƒ(x) = (μ − μ ) T − 1 x − The discriminant equation can be rewritten as follows: Model to Predict the Distress of the Algerian Economic Companies In the case of constructing the discriminant function and the result was that whenever the mark is higher, the company is in a good position, it means that the negative contributions interpret the weaknesses of the company while positive contributions interpret the positive points.

The Concept of Logistic Analysis
Logistic regression was used, for the first time, by Joseph Berkson in 1944; he derived this model depending on his works in studying probabilistic regression and biological analysis carried out with Chester Bliss 10 years ago. During 1949, George Barnard employed the term log-odds in the context of logarithmic model. If p is the probability of success, the probability of response takes the value of 1, and the odds of success is defined by the value p / (1-p); therefore, the log-odds of success is ln (p / (1-p)) which is the linear prediction xβ for logistic dichotomous model that plays a key role in logistic modeling. The term "deductive likelihood" was used by Barnard et al in 1962. Rothamsted's Gavin Ross was the first to establish the maximum likelihood software that estimates both the probabilistic model and logistic model. [7] Logistic regression is a probabilistic method for classification; it aims to determine the probability of the occurrence of the event of distress in a particular facility depending on its financial characteristics. [8] Logistic regression is a method of compensation for the use of linear regression by broad use if the dependent variable is qualitative may take two values or more, it is not possible to use linear regression in this case because it generates expected values may be true, i.e., ranging from negative numbers to infinity, while the qualitative variable can take a limited number of separated values within a specified range. [9]

Using Logistic Analysis in the Classification
Logistic regression model is constructed in the case of two groups on a basic presumption that the dependent variable (y), the response variable we study, is a dichotomous variable follows Bernoulli's distribution, takes the value of (1) by the probability (p) and the value of (0) by the probability q = (1-p), i.e., to the occurrence or non-occurrence of response. As we know in the linear regression whose independent variables and the dependent variable take constant values, the model linking variables is as follows: $ = % & + % ' + ( where: (y) represents a constant observable variable, and suppose that the mean of the values (y), observable or actual, at a certain value of the variable X is E (y).
The variable e represents error e = y -y * On this basis, the model can be written as follows: Since the value of the right side is limited between the two numbers (0.1), the regression model can be written in the case one independent variable as follows: Where: exp is the inverse of the natural logarithm.

The Application of the Methods of Discriminant and Logistic Analysis to Distinguish Between Distressed and Non-distressed Algerian Economic Companies
To apply the methods of discriminant analysis and logistic regression in the Algerian environment, we have selected a sample of 60 economic companies; 30 of them are distressed, and 30 are non-distressed. We have relied on the financial data of this sample within three years 2011, 2012 and 2013; the dependent variable is the dichotomous variable consisting in the qualitative variable (distress and non-distress) that holds the value of 0 or 1; there are four independent variables selected among the best financial ratios most widely used among researchers as follows: Sales / total assets, net profit before interest and taxes / total assets, working capital / total assets and equity / total assets.
Using SPSS-20, we have obtained the following results:

Selection of the Most Discriminant Independent Variables
So the discriminant equation includes the best variables that have a discriminant character, the variables are reduced using Wilks's lambda that contributes to give the best findings is used. The table 2 shows the findings of Wilks' Lambda in selecting the best variables.  Table 2 shows that all the variables are not significant except for the two variables R 1 and R 2 ; these independent variables are two of the most important and capable variables to distinguish between distressed and non-distressed economic companies. These two variables have the greatest value for the rate F and the lowest value for the rate of Wilks' Lambda. So, the various non-significant variables will be excluded while retaining the significant ones.
The value of the exact F lambda is 0.000 which is less than the significance value of 0.005, meaning that the two ratios have together high predictive ability. Thus, the variables introduced into the construction of the discriminant function are: net profit before interest and tax ̸ total assets and sales / total assets.

Test of Significance and Relationship Power
The significance and the relationship power are tested by calculating the intrinsic value and Wilks' Lambda.

Intrinsic Value
The intrinsic value is: "the ratio between the total of intergroup squares to the total of intragroup squares for the analysis of variance whose dependent variable is a discrimination function." [9] The following table shows the intrinsic value reached depending on the SPSS software. The Value of canonical correlation shows us the relationship between the discriminant point and the groups; whenever the value of canonical correlation is closer to one, the model is better. Through the table 3, the obtained value of the canonical correlation is 0.693, which indicates the good discrimination ability of the discriminant function.
As for intrinsic value, whenever it is greater, the covariance in the linear composite is greater too which leads to the good performance of the discriminant function; the intrinsic value (eigenvalue) is 0.922 in our study, which is good and demonstrates the predictive power of the discriminant function.
The variance ratio of 100% demonstrates the importance of the discriminant function while the accumulation shows the accumulation ratio of the discriminant functions variances that are added each time to the table. If we have many discriminant functions, the function has a value more than 90% in the cumulative variance column is the most important in the analysis, and the variance accumulation is equal to 100% because we have one discriminant function.
The number 1 expresses the number of discriminant functions; the existence of a single discriminant function is due to the existence of two groups (the group of nondistressed companies and the group of distressed companies) where the number of discriminant functions equals the number of subgroups minus one.

Wilks' Lambda
The table 4 below shows that the value of Wilks' Lambda is 0.520, which indicates that the variables collected in the discriminant function play a good role in the discrimination; the value of the X 2 is 37.248 that tests the significance of Wilks' lambda. As the value of significance equals 0.000 less than the significance level of 0.005, this shows that the obtained discriminant function presents a good and consistent set of financial ratios that accurately makes the predictive process.

Standard Discriminant Coefficients
Based on the following table, the standard discriminant function can be written as follows: Therefore, the standard discriminant function is as follows: Z=0.842 R 1 + 0.442 R 2 The standard discriminant function demonstrates the importance of the variables included in the predictive model construction; whenever the value of a certain variable coefficient is higher, this indicates the importance of the latter in the discriminant function. Through the abovedescribed standard discriminant equation, the importance of the ratio number 1 is shown (profit before interest and tax ̸ total assets), then the ratio number 2 in the order of importance (sales / total assets).

Non-Standard Discriminant Coefficients
Non-standard discriminant coefficients are an estimate of the parameters b 1 and b 2 described in the following equation: Z=b 0 +b 1 X 1 +b 2 X 2 Where: b 0 = -0.319 b 1 = 8.303 b 2 = 0.427 X 1 = R 1 = ratio of profit before interest and tax ̸ total assets X 2 = R 2 = sales / total assets

Determination of the Barycenter
After compensating independent variable values-the financial ratios included in the discriminant function construction -using the financial statements of a particular company, we get a discriminant score for each company. This score is compared to the barycenter in order to know the group to which belongs the company. The barycenter is calculated by the following table: The table above demonstrates that, whenever the discriminant score for a particular company is closer to the value of the discriminant score mean for distressed companies -0.944, they are classified as distressed companies; but if the discriminant score is closer to the value of the discriminant score mean for non-distressed companies 0.944, they are classified within non-distressed companies.
It is also noted that the area between both discriminant score means for distressed and non-distressed companies, i.e., between -0.944 and 0.944, is a small area, indicating the good discriminant ability of the proposed predictive function.
In order to facilitate the classification, it is better to construct a base for decision-making, where the barycenter is equal to the mean of the groups' discriminant score means; since we have two groups in this study, the barycenter will be as follows: &.IJJB&.IJJ = 0 Therefore, the decision-making rule is as follows: If the value of Z≥0: the company is non-distressed. If the value of Z <0: the company is distressed. The area limited between the two values (-0.944 and 0.944) represents the gray or critical area where it is not known if the company is distressed or non-distressed.

Classification
Classification is one of the methods measuring the discriminant ability of the proposed model. The ratio of good classification is measured by dividing the total of wellclassified companies within the two groups on the total number of companies. The following table indicates the findings of classification: According to the table above, it is demonstrated that the good classification of companies is: the number of distressed companies classified as distressed is 29 (96.7%); while the number of non-distressed companies classified as non-distressed is 26 (86.7%). The error in the companies classification is: the number of distressed companies classified as non-distressed is 1 (3.3%); while the number of non-distressed companies classified as distressed is 4 (13.3%).
According to these findings, the good overall classification percentage of distressed and non-distressed companies can be calculated as follows: The good overall classification of distressed and non-

The Methodology Used in the Logistic Model Construction
Using the method "step-by-step forward selection" available in SPSS-20, we have constructed the logistic model. The findings are as follows: The table above indicates that the software has undergone eight iterations to reach the perfect logistic function estimate; it excluded the variable number 1 because it is not compatible with the other remaining ratios in constructing the logistic model. The second column, related to the -2log-likelihoo estimate, indicates that its value continues to decrease, reaching to 19.105 after it was 49.893. The table demonstrates also the ability of the three variables to devaluate (-2LL) until it reaches the perfect model where (-2LL) is in its smaller value.

Correction of the Achieved Model
To ascertain that the achieved model is valid, and to determine the importance of the process, it is necessary to make some statistics to test the hypotheses that would determine the proposed model significance. The findings are shown in the following table:

Tests of the model's specifications X 2 Ddl
Sig.

Step1
Step 64.073 3 .000 Bloc 64.073 3 .000 Model 64.073 3 .000 The table above indicates that the statistical value X 2 is 64.073, and when compared to the tabular value of X 2 by a degree of freedom 3, we find that the value of the latter is less than the value we have obtained, in addition to a significance level value of 0.000 which is less than 0.05. Therefore, we can ascertain that there is one coefficient at least not equal to zero can contribute to distinguish between distressed and non-distressed companies groups. Consequently, we reject the null hypothesis states that all the logistic function coefficients are equal to zero.
To determine the significance of the variances existing between the value -2LL at a particular step and the value -2LL at the precedent step, we use the tests described in the following table: The semi coefficients of determination of 0.875 for Nagelkerke R 2 , and of 0.656 for Cox & Snell R 2 indicate that the financial ratios included in the logistic model construction contributed by 65% (using Cox & Snell R 2 coefficient) and 87% (using Nagelkerke R 2 coefficient) in the interpretation of distress (dependent variable), so that whenever the values of the two coefficients R are higher, this indicates the quality of the model. Therefore, we can say that we have obtained a model has good discriminant ability.
To ensure obtaining a perfect model where all variables contribute the prediction process, it is necessary to evaluate the significance of the estimated coefficients by relying on Wald's test as shown in the following table: The table a indicates the model's parameters estimates; they are considered the most perfect and significant where the values of the significance of all estimates are lower than the adopted level of significance (0.05) by one degree of freedom. Therefore, we can say that the three variables included in the logistic model construction are important in interpreting the distress of Algerian economic companies.
For the third column (Wald's test), we can note that all the test values are greater than four meaning that all the estimated coefficients are not significantly equal to zero by probability of 95%, and we can also say that all the determinants affect the situation of the company.
The exponential function of coefficients, in the sixth column, indicates the direction of the relationship between the latter and the regression coefficient; if the value of exponential function coefficient is greater than one, it means that the relationship with the regression coefficient is positive, and thus the latter is positive. However, if the value of exponential function coefficient is lower than one, it means that the relationship with the regression coefficient is negative, and thus the latter is negative.
Logit coefficients are interpreted as follows: log-odds takes the value one if these coefficients increase or decrease by a certain amount whenever the independent variable increases by one unit. Logit coefficients have economic significations, where the logit coefficient related to the ratio of sales to total assets whose value is (β O ' =3.525) indicates that whenever the ratio of sales to total assets increases by one degree, the value of the logit coefficient increases by 3.525 when adjusting the other independent variables.
The same thing with regard to the interpretation of the second ratio included the logistic model construction, i.e., the ration of working capital to total assets, as the logit coefficient value of this ratio is (β O =5.533) meaning that whenever the ratio of working capital to total assets increases by one degree, the value of the logit coefficient increases by 5.533 when the other variables are constant.
The same method is used to interpret the third ratio of the model, equity to total assets, in which the value of the logit coefficient is (β O P =6.756), where the latter indicates that the increasing of the equity ratio by one unit will increase the logit coefficient by 6.756 when the other explanatory variables remain constant.
Therefore, we can formulate the final logistic model that includes the best variables with most ability to interpret the distress or the non-distress of companies as follows: Q12 R * 'BR * = -2.618 + 3.525X 1 + 5.533X 2 +6.756X 3 Where: X 1 : R 2 : Ration of Sales to total assets. X 2 : R 3 : Ration of working capital to total assets. X 3 : R 4 : Ration of equity to total assets. To test the variance of the observed values from the expected ones, we use Hosmer-Lemeshow test which is used to determine the success in obtaining a perfect logistic model where the differences between the observed values and the expected ones non-existent; it depends, in achieving this aim, on the X 2 Lambda. Therefore, it aims to test the null hypothesis states that the model identical to the study data. The findings of this test are shown in the following table: This table indicates that the value of X 2 is equal to 8.375 by a degree of freedom of 8 and a statistical test signification level of 0.398 which is greater than the adopted significance level (α = 0.05), meaning that the test is not significant. Therefore, we accept the null hypothesis states that the observed variables are equal to the expected variables, and thus the quality of the achieved model and its great ability in the well representation of data.

Classification
The classification process is one of the many methods used to check the quality of achieved models. To test the discriminant ability of the achieved logistic model, we have prepared the following table of classification: This table indicates that the number of distressed companies erroneously classified as non-distressed is 1, while the number of distressed companies correctly classified is 29 companies. The number of non-distressed companies erroneously classified as distressed is 1, while the number of non-distressed companies correctly classified is 29 companies meaning that the percentage of the good classification of distressed companies =

Results:
Through the obtained findings, we can say that both the achieved models using discriminant analysis and logistic regression have succeeded in the classification of the sample under study by 91.7% for the discriminant model and 97.7% for the logistic model. The achieved logistic model can be adopted because of its high ability in predicting the distress of economic companies compared with the discriminant model findings. It is a good method to distinguish between distressed and non-distressed companies in Algeria.
The findings of the Algerian economic companies discrimination have demonstrated that the two ratios: sales / total assets and net profit before interest and tax ̸ total assets are two of the most important financial ratios that contribute to predict the financial distress in the Algerian environment using Fisher discriminant model. The findings have also demonstrated that the following three ratios: sales / total assets, working capital / total assets and equity / total assets are three of the most important financial ratios that have succeeded in predicting the distress of Algerian economic companies using the logistic model.

Recommendations:
The achieved model can be adopted and relied upon to predict the distress of Algerian companies. The accuracy of the findings of the achieved model is mainly dependent on the accuracy of the information and data submitted by the companies, so it is necessary to verify the authenticity of the information provided from the source to avoid making mistakes. It is important to review the achieved model before using it by making the necessary adjustments every particular period to be adapted to the changes in the Algerian environment. The need to pay greater attention to financial distress in Algeria, complete the necessary research on this subject and, why not, attempt to accomplish other models based on other modern methods compatible with the Algerian environment changes and the development of the situations related to it.