Distances and Similarity Measures in Heuristic Possibilistic Clustering the Intuitionistic Fuzzy Data: A Comparative Study

: The note deals with the problem of heuristic possibilistic clustering the intuitionistic fuzzy data. Different distances between intuitionistic fuzzy sets are considered in the paper. Similarity measures for intuitionistic fuzzy sets for constructing intuitionistic fuzzy tolerance relations are also considered. A numerical example of application of these distances and similarity measures for clustering the intuitionistic fuzzy data is presented. Some preliminary conclusions are formulated.


Introduction
Cluster analysis is a group of approaches for classifying objects according to their likeness by means of unsupervised training. It makes the objects, which have greater likeness, as a class, or cluster, and occupies the partial area of feature space. The cluster prototype of each partial area is respectively acting as a representative of the corresponding type. Fuzzy sets theory, which was proposed by Zadeh [1], makes it possible to model partial belongingness to a cluster, which is described by a membership function. Fuzzy clustering methods have been applied effectively in image processing, data analysis, symbol recognition and modeling. Moreover, fuzzy set theory is a basis for possibility theory [2]. Thus, a possibilistic approach to clustering was proposed by Krishnapuram and Keller in [3] and developed by other researchers. A concept of possibilistic partition is a basis of possibilistic clustering methods and the membership values can be interpreted as the values of typicality degree. Fuzzy and possibilistic clustering methods are considered at length, for instance, in [4][5][6].
The most common and widespread approach to fuzzy clustering is the optimization approach. Moreover, major possibilistic clustering methods are also objective functionbased clustering algorithms. However, heuristic algorithms of fuzzy clustering are simple and very effective in many cases, because heuristic algorithms display high level of essential clarity and low level of complexity. Some heuristic clustering procedures are based on the definition of a cluster concept and the purpose of these algorithms is cluster detection conform to a given definition. Such algorithms are called algorithms of direct classification or direct clustering algorithms. Thus, a heuristic approach to possibilistic clustering in which the sought clustering structure of the set of objects is based directly on the formal definition of fuzzy α -cluster and possibilistic memberships are determined directly from the values of pairwise similarity of objects was proposed in [7] and developed in other publications. The essence of the heuristic approach to possibilistic clustering is that the sought clustering structure of the set of observations is formed based directly on the formal definition of fuzzy cluster and possibilistic memberships are determined also directly from the values of the pairwise similarity of Dmitri A. Viattchenin and Stanislav Shiray: Distances and Similarity Measures in Heuristic Possibilistic Clustering the Intuitionistic Fuzzy Data: A Comparative Study observations. A concept of the allotment among fuzzy clusters is basic concept of the approach and the allotment among fuzzy clusters is a special case of the possibilistic partition. Direct heuristic algorithms of possibilistic clustering can be divided into two types: relational versus prototype-based. A fuzzy tolerance relation T matrix is a matrix of the initial data for the direct heuristic relational algorithms of possibilistic clustering and a matrix of attributes is a matrix of the initial data for the prototype-based algorithms. In particular, the group of direct relational heuristic algorithms of possibilistic clustering includes a) the D-AFC(c)-algorithm which is based on the construction of an allotment ( ) among an a priori unknown number c of fully separate fuzzy α -clusters with respect to the minimal value α of the tolerance threshold. It should be noted, that these prototype-based heuristic algorithms of possibilistic clustering are based on the transitive closure of the initial fuzzy tolerance.
Since the fundamental Atanassov's paper [8] was published, intuitionistic fuzzy sets theory has been applied to many areas such as learning, decision-making and classification. Techniques for clustering the intuitionistic fuzzy data were proposed by different researchers and these algorithms are summarized in [9]. However, the intuitionistic fuzzy set-based extension of the heuristic approach to possibilistic clustering was also outlined in [7]. Direct heuristic algorithms of possibilistic clustering for processing the intuitionistic fuzzy data can be also divided into two types: relational versus prototype-based. An intuitionistic fuzzy tolerance relation matrix is a matrix of the initial data for the relational algorithms and a matrix of attributes is a matrix of the initial data for the prototype-based algorithms. In particular, the group of direct relational heuristic α β -clusters [12].
It should be noted, that these prototype-based heuristic algorithms of possibilistic clustering are based on the transitive closure of the initial intuitionistic fuzzy tolerance. The corresponding procedure is proposed in [13].
The main purpose of the presented paper is a comparative analysis of application of different distances and similarity measures between intuitionistic fuzzy sets for clustering the intuitionistic fuzzy data by using heuristic algorithms of possibilistic clustering. In particular, the D-PAIFC-algorithm and the D-PAFC-algorithm were selected for the comparison. Thus, the contents of this paper are the following: in the second section some definitions of the intuitionistic fuzzy set theory are described, in the third section distances between intuitionistic fuzzy sets are presented, in the fourth section similarity measures for constructing intuitionistic fuzzy tolerance relation are described, in the fifth section results of numerical experiments are presented, in sixth section some preliminary conclusions are formulated and perspectives of future investigations are outlined.

Basic Definitions of the Intuitionistic Fuzzy Set Theory
The intuitionistic fuzzy sets were developed by Atanassov also in [14], [15] and other researchers as an extension of the ordinary fuzzy sets. Let us remind some basic definitions of the intuitionistic fuzzy sets theory which will be used in further considerations. All concepts will be presented for a finite universe x ν denote the degree of membership and the degree of non-membership of element i x X ∈ to IA , respectively. For each intuitionistic fuzzy set IA in X an intuitionistic fuzzy index of an element i x X ∈ in IA can be defined as follows The intuitionistic fuzzy index ( ) The binary intuitionistic fuzzy relation IR on X is an intuitionistic fuzzy subset IR of X X × , which is given by the expression where :

Some Distances Between Intuitionistic Fuzzy Sets
Let us remind some distances between intuitionistic fuzzy sets which were proposed by Szmidt and Kasprzyk in different publications and summarized in [16]. In particular, for two intuitionistic fuzzy sets IA and IB in X the following distances were proposed: the normalized Hamming distance: the normalized Hausdorff distance: the normalized Euclidean distance: These distances satisfy the conditions of the metric [16]. A unique value of dissimilarity between intuitionistic fuzzy sets IA and IB in X is the result of application formulas (4) - (6) to the intuitionistic fuzzy sets. So, a value of similarity between intuitionistic fuzzy sets IA and IB can be calculated from the complement operation [17] where ( ) ( , ) IFS d A B is a general notation for the distances (4) -(6).

Similarity Measures for Constructing Intuitionistic Fuzzy Tolerances
The method for constructing the intuitionistic fuzzy tolerance relation from a family of intuitionistic fuzzy sets was proposed by Wang, Xu, Liu and Tang in [18]. The corresponding similarity measure is based on the normalized Hamming distance and the similarity measure can be expressed by a formula ( ) for all 1, , i n = … . That is why the closeness degree is met for any intuitionistic fuzzy sets IA and IB . These facts were proved in [18]. On the other hand, a similarity measure based on the normalized Hausdorff distance was proposed in [19]. The similarity measure can be written as follows: A similarity measure based on the normalized Euclidean distance was proposed in [20] and the corresponding similarity measure is defined by a formula  (9), (10). The corresponding intuitionistic fuzzy relations satisfied to the symmetry property and the reflexivity property. That is why the intuitionistic fuzzy relations are intuitionistic fuzzy tolerances.

Numerical Experiments
Let us consider application of the considered distances and similarity measures between intuitionistic fuzzy sets to solving the classification problem. For the purpose, Wang's cars data set [18] were used. The data set contains the information of ten new cars i x , 1, ,10 i = … to be classified into several kinds. Each car has six evaluation attributes which represent the oil consumption, coefficient of friction, price, comfortable degree, design and safety coefficient evaluated for five cars. Denote oil consumption by 1 x , coefficient of friction by 2 x , price by 3 x , comfortable degree by 4 x , design by 5 x and safety coefficient by 6 x . The characteristics of cars under the six factors 1 t x , 1 1, , 6 t = … are represented by the intuitionistic fuzzy sets, as shown in Table 1. Thus, each car can be interpreted as an intuitionistic fuzzy set i x , 1, ,10 i = … on the universe of Cars Factors Matrices of ordinary fuzzy tolerance relations were constructed according to formulas (4) -(6) and (7). So, the D-PAFC-algorithm can be applied to each fuzzy tolerance. Let us consider the results of numerical experiments.
By executing the D-PAFC-algorithm for the fuzzy tolerance relation obtained by using the normalized Hamming distance (4) and the normalized Hausdorff distance   Thus, we obtain the following: the first class is composed by six elements and the second class consists of four elements in both cases. The eighth object is the typical point of the first intuitionistic fuzzy cluster and the third object is the typical point of the second intuitionistic fuzzy cluster.     So, we obtain the following: the first class is formed by three elements, the second class is composed of one element, the third class consists of one element, the fourth class contains two elements, the fifth class is composed of two elements, and the sixth class is formed by one element. The second object, the third object, the tenth object, the fifth object, the sixth object, and the ninth object are typical points of corresponding intuitionistic fuzzy clusters in the first case, the second object, the third object, the fourth object, the fifth object, the sixth object and the ninth object are typical points of corresponding intuitionistic fuzzy clusters in the second case, and the second object, the third object, the fourth object, the eighth object, the sixth object, and the ninth object are typical points of intuitionistic fuzzy clusters in the third case.

Concluding Remarks
The differences between heuristic possibilistic clustering results obtained from the D-PAFC-algorithm by using wellknown distances between intuitionistic fuzzy sets and from the D-PAIFC-algorithm by using similarity measures for constructing intuitionistic fuzzy tolerances are shown in the Intuitionistic Fuzzy Data: A Comparative Study paper. A principal allotment among fuzzy clusters is the result of application of the conventional D-PAFC-algorithm of the heuristic approach to possibilistic clustering to classification the attributive intuitionistic fuzzy data by using distances between intuitionistic fuzzy sets. A principal allotment among intuitionistic fuzzy clusters is the result of application of the D-PAIFC-algorithm to classification the data which can be obtained by using similarity measures. So, non-membership values of objects are also presented in the case. That is why the use of the D-PAIFC-algorithm in combination with the similarity measures is more preferred than use of the D-PAFC-algorithm in combination with distances between intuitionistic fuzzy sets.
Both approaches to clustering the intuitionistic fuzzy data were tested on the Wang's cars data set [18]. The results of applying the D-PAFC-algorithm in combination with the distances between intuitionistic fuzzy sets differ from each other for different distances. On the other hand, the results of applying the D-PAIFC-algorithm in combination with the similarity measures are resemble in all cases. However, the difference of results is not sufficient for preferences for one approach to clustering the intuitionistic fuzzy data before another approach. Therefore, experiments must be performed for other data sets in further studies.