Clustering Problem with Fuzzy Data: Empirical Study for Financial Distress Firms
Slah Benyoussef^{1, 2}
^{1}Airport Rd, Al-Imam Muhammad Ibn Saud Islamic University, Riyadh 11432, Arabie Saoudite
^{2}Faculté des Sciences Economiques et de Gestion de Sfax, route Aéroport km 4, BPN°1088, 3018 Sfax, Tunisie
Email address:
Slah Benyoussef. Clustering Problem with Fuzzy Data: Empirical Study for Financial Distress Firms. American Journal of Applied Mathematics. Vol. 3, No. 2, 2015, pp. 75-80. doi: 10.11648/j.ajam.20150302.17
Abstract: In many real applications, the data of classification problems cannot be precisely measured. However, in an increasingly complex environment, these variables can be imprecise, qualitative or linguistic. In such a case, fuzzy set theory seems to be the convenient tool to fill this insufficiency. Thus, we proposed a new approach, based on the ranking function, which consists in solving the classification problems via fuzzy linear programming model. This approach has been applied for the financial distress firms. The obtained results are satisfactory in terms of correctly classified rates
Keywords: Bankrupcy firms, Classification problems, Fuzzy logic, Linear programming, Ranking function
1. Introduction
The bankruptcy problem has become more and more important as the competition between financial institutions has come to a totally conflicting stage. More and more companies are seeking better strategies through the help of credit scoring models and hence discriminant analysis techniques have been widely used in different credit evaluation processes. Therefore, classification problems are one of the applications that have gained serious attention over the past decades.
To slove the classification problem, there are many parametric discriminant methods proposed, the first is the linear discriminant function who is the oldest discriminant method initiated by Fisher in 1936 [14], it is the optimal combination which separates the averages from two groups.
This method of discrimination requires that the sample be distributed normally and that the variances-covariances matrixes of the two groups are homogeneous. The second method is the quadratic function suggested by Smith in 1947 [3], this method supposes the normality of the sample with heterogeneous variances-covariances matrixes. The last parametric method of discrimination is the logistic regression, which is an econometric method whose endogenous variable is binary, it requires neither the normality of the sample nor the homogeneity of the variances-covariances matrixes of the groups.
Recently, several linear programming models were proposed for resolving the classification problems by various authors such as Freed and Glover [10, 11, 12], Glen [8], Bajgier and Hill [16], Hasan and al., [1], Markowski and Markowski [13], Gehrlein [17], Glover and al., [6], Nath and Jones [15], Jones [4], Koehler and Erenguc [7] and Stam and Ragsdale [2].
Nevertheless, the all linear programming models suppose that the variables (or attributes) are measured with certainty. However, in an increasingly complex environment these variables can be imprecise, qualitative or linguistic. From where need for the recourse to fuzzy set theory (L. zadeh [9]). In this respect, we proposed a new approach, which consists in solving the classification problems via fuzzy linear programming, models based on ranking function proposed by F. Hosseinzadeh Lotfi and B. Mansouri [5].
The rest of the paper is organized as follows. In Section 2 we define the ranking function and these properties. In section 3, we present our new approach to solving the linear fuzzy classification problems. In section 4, the empirical study was carried out on a sample of 65 Tunisian firms, for which financial and account statements data are collected and 14 financial ratios are calculated. And in section 5, we give the concluding points and the future research.
2. Ranking Function
To deal quantitatively with imprecise data in classification problems, the concept of fuzzy has been introduced. When variables are fuzzy, the objective function and the constraints of the decision model also become fuzzy.
In fact, we represent an arbitrary fuzzy by such that: : medium value; : lower value and : upper value. According to F. Hosseinzadeh Lotfi and B. Mansouri [5], one of the most effective approaches to control all fuzzy numbers F(R) is to define a ranking function such as:
Ÿ if and only if
Ÿ if and only if
Ÿ if and only if
With and are fuzzy numbers.
We restrict our attention to linear ranking function which is a ranking function τ such that: for all and belonging to and for . Indeed, for a fuzzy number we use the ranking function .
Hence, for the triangular fuzzy numbers and we have:
3. Proposed Methodology
Suppose there are n observations denoted by each observation is characterized by p independent fuzzy variables denoted by for the ith observation. Suppose also that the observations are classified into two groups and containing, respectively, and observations such as: and . The membership of observations in each group is known a priori. The objective is to find a rule that correctly classifies most imprecise observations. This rule enables us to find the group membership of any new imprecise observation. The classification rule is obtained from two stages. In stage 1 we determined a nonparametric function which reclassifies the observations; the second stage explains how to determine the membership of the observations which were not correctly classified at a stage. The objective is to minimize the total deviation of misclassified observations. These two stages are mathematically formulated as follows:
In the above model and represent, respectively, the positive and negative weights, and represent, respectively, the positive and negative deviations of the observations of . and represent, respectively, the positive and negative deviations of the observations of .
The model (1) is a fuzzy linear programming model, to obtain an equivalent deterministic model we use the ranking function τ. According to the property if and only if it is possible to change the previous model as follows:
According to the properties of the ranking function , we can replace the first two constraints by:
The final model in the first stage, when we apply the ranking function (τ), is formulated as follows:
Let and are the optimal solutions of the model above, the classification rule is as follows:
Ÿ
Ÿ
Otherwise belongs to the area of overlap. In order to classify the observation , the second stage begins. Before starting the second stages we define the following sets:
Ÿ
Ÿ
Ÿ
Ÿ ,, ,
Hence, the model of the second stage is formulated as follows:
Let now and are the optimal solutions obtained in the second stage. Then the classification rule is as follows:
Ÿ
Ÿ
4. Empirical Study and Results
Our data base which was obtained from the "bourse des valeurs mobilières de tunisie (bvmt)" web site (http://www.bvmt.com.tn) based on a real data of 65 Tunisian firms divided into two groups. The first group () consists of 46 non-bankruptcy firms. The second group () constituted of 19 bankruptcy firms. Each firm is described by 14 financial ratios.
In this empirical study, we assumed that the 14 financial ratios characterizing the 65 firms are fuzzy triangular numbers.
The membership function of these variables are given in the Fig 1:
In the remainder of this section, we will exhibit the performance of the fuzzy classification linear programming using the ranking function defined in section 2.
The coefficients of the discriminant function and the value of the objective function of the first and the second stage of this model are given by the following table 1 (all results are given by the LINDO software):
Stage1 | Stage2 | ||||||
| 0.0000 |
| 0.0000 |
| 0.0000 |
| 0.0000 |
| 0.0000 |
| 0.0000 |
| 0.0000 |
| 0.2432 |
| 0.0000 |
| 0.0000 |
| 0.0000 |
| 0.0001 |
| 0.0000 |
| 0.0000 |
| 0.0000 |
| 0.0895 |
| 0.0000 |
| 0.0000 |
| 0.0000 |
| 0.0000 |
| 0.0000 |
| 0.0000 |
| 0.0000 |
| 0.0000 |
| 0.0000 |
| 0.2267 |
| 0.0000 |
| 0.2742 |
| 0.0000 |
| 0.0000 |
| 0.0000 |
| 0.0053 |
| 0.0000 |
| 0.5627 |
| 0.0000 |
| 0.1211 |
| 0.0000 |
| 0.0765 |
| 0.0000 |
| 0.0332 |
| 0.0000 |
| 0.0000 |
| 0.0221 |
| 0.0000 |
| 0.0000 |
| 0.0000 |
| 0.0000 |
| 0.0000 |
| 0.1340 |
| 0.0000 |
| 0.0000 |
| 0.0000 |
| 0.0000 |
| 0.0000 |
| 0.0000 |
| 0.2113 |
| -1.0884 |
| -0.7412 | ||||
| -0.4118 | ||||||
VOF | 10.9998 | VOF | 0.1019 |
VOF: Value of the Objective Function
The objective of the first stage is to identify the overlap between the observations based on the score given by the first discriminant function. Indeed, there is an overlap if and only if we have the classification score is between and (i.e).
The classification score showed the existence of an overlap between observations. The result of assigning observations at this stage showed that 19 observations belong toand 16 observations belong to.
While, the objective of the second stage is to find a new discriminant function with a new threshold to reclassify misclassified observations.
Hence, the new classification rule is as follows:
Ifthe observations belong to
Ifthe observations belong to .
Moreover, it was noted that the value of the objective function in stage 2 has decreased compared to stage 1.
With regard to any classification problem, we must evaluate the performance of our results by referring to the criterion of the percentage of correctly classified. The classification result of our approach is given by the following table 2:
Group | The provided affectation Class | Total | ||
G1 | G2 | |||
Original Effective | G1 | 43 | 3 | 46 |
G2 | 1 | 18 | 19 | |
Rate | G1 | 93.478 | 6.522 | 100 |
G2 | 5.264 | 94.736 | 100 |
According to table1,we can remark that only three non-bankruptcy firms thatarereportedasbankruptcyfirms (93.478% of the firmsin the first group are correctly classified) and one bankru ptcy firm siscl assified in the group of non-bankru ptcy firms (94.736% firms in the second group are correctl yclassi fied). Hence, the correct class if icationrate given by the proposed approachis 94.107%.There sultob tained by the proposed metho disver ysatis factory.
5. Conclusion
The aim of this paper is to evaluate a new approach for solving classification problems in the presence of fuzzy variables. In the first stage we have solved a first linear programming model to identify the overlap between the two groups. In the second stage, we solved a second linear programming model. While, the objective of the second stage is to find a new discriminant function with a new threshold to reclassify misclassified observations. To evaluate our approach, we calculated the rate of good classified obtained by the proposed method. This rate is equal to 94,107%.
The result is satisfactory and shows the ability of this procedure to solve classification some problems.
Given the relevance of this approach and its applicability to various classification problems, we think it would be interesting to show case our work:
Ÿ Adapting the developed method for other linear and nonlinear classification programming models;
Ÿ Extending the scope to other classification problems such as medical diagnostic, credit scoring etc.
References