Applying the Self-Organizing Map in the Classification of 195 Countries Using 32 Attributes

: Many organizations such as World Bank, UN, Wikipedia and others have tried to classify countries as under-developed, developing, developed and highly developed countries based on certain criteria but these criteria aren’t robust enough. In most cases, they used one to three criteria. This research classified 195 countries using 32 attributes (features/ criteria) with the self-organizing map (SOM) algorithm. This is a robust classification because 32 features are considered for the classification. SOM is an unsupervised learning algorithm which reduces high dimensional data to 2 dimensions. The SOM classifies the 195 countries into 5 categories, implying that it is possible to classify countries with SOM algorithm. There is no benchmark to measure the accuracy of the SOM algorithm because most classifications are based on at most three criteria which are not robust enough, but comparing the results of the SOM algorithm with these weak classifications still show the flawlessness of the SOM algorithm. This research will help scientist, students, lecturers, teachers, organizations and countries to have a robust knowledge about the state of their countries from an unbiased position and will also help organizations and countries to make concrete decisions about business establishment in viable places all over the world. The key limitation is the reliability of the data and the number of attributes, which could be increased in future researches for better results.


Introduction
Economic classification is a complex and multidimensional process [1].The socio-economic situation of a country can be measured in a number of ways, often looking at indicators describing different aspects of the social and economic reality in the world [2].Although economic development has been attributed to a high level of per capita income, GDP per capita, GNI per capita and so on, there has been a growing consensus among intellectuals that economic growth is far more than that.Some has classified economic development on accessibility to clean drinking water, availability of primary health services, income level and many more.In fact, several agencies such as the World Bank, the United Nations, and many researchers have classified countries into different categories: developed countries, developing, and under developed countries based on certain criteria using conventionally measured income [3].Several aggregate blocks of countries have been identified in the World Bank reports and in various research studies.For example, for the 2021 fiscal year, low income economics are defined as those with a GNI per capita-calculated using the World Bank Atlas method of $1,035 or less; lower middleincome economics are those with a GNI per capita of more than $1,035 [3] and less than $2,046 and high-income countries are those with above $12,536 GNI per capita.
World Development Report 2001World Bank (2001a) identified three blocks of countries in different geographical regions of the world as low-income (GNP per capita $755 or less), middle-income (GNP per capita $756-9,265), and high-income (GNP per capita $9,266 or more) countries [14].The middle-income group was further subdivided into lower middle-income and upper middle-income subgroups while the high-income countries were subdivided into OECD (organization for Economic Cooperation and Development) and non-OECD countries.More so, WESP classified countries into three broad categories: developed economies, economies in transition and developing economies [4].
When it comes to classifying countries according to their level of development, there is no criterion (either grounded in theory or based on an objective benchmark) that is generally accepted [5].The standard of living enjoyed by citizens of different countries is discernably different in many countries.According to US embassy in Mali (2020), the minimum wage in Mali is administratively determined.The most recent revision of the minimum wage scale occurred in 1995, when it was increased to $35 [7].In reality, full-time salaried employees with formal contracts start at a base of $35, which do not include allowances.Meaning that, an average Malian earns less than $400 basic salary per annum.When compared with a Japanese who have the same kind of job with a Malian, a Japanese earns above US$35,000 per annum [8].In the same vein, life expectancy in Mali was 58.45 years as of 2017 compared to Japanese with an average life expectancy of 89 years.About 71 per cent of the Adult population is illiterate in Mali while almost 90 per cent of Adult population in Japanese is educated [5].To make better sense of developmental classification, countries are placed in groups: developing developed, and under developed countries.
While many economists would readily agree that Mali is a developing country and Japan is a developed country, they would be more hesitant to classify Malaysia or Russia.So the problem is, how do we classify countries that will agree and align with almost every intellectual believe and equity?Where exactly do we draw the line between developing and developed countries?This is one of the major problems of classification of countries encountered by people and organizations.
This research used the SOM algorithm -unsupervised AI algorithm -to solve this problem of classification with 32 features.This classification is far better than many classifications which have been done by economist and organizations because it considers most aspect of development not only on income or life expectancy but also on availability of energy, transition to clean energy and others.Furthermore, the features are processed by an algorithm which is less biased and which gives a more accurate classification.This research will be the final solution to world awaited classification of countries.

Literature Review
Classification of countries based on development has had a long time history.The first classification was done in the mid-20 th century as a way of mapping out the various players in the Cold War (a rivalry that developed after World War II between the United States and the Soviet Union and their respective allies) into "First World War countries", Second "World War countries", and "Third World War countries."This classification was initiated by a French demographer Alfred Sauvy, who coined the term "Third World" in a 1952 article titled "Three Worlds, One Planet [6].In the classification, the first world countries included United States and the capitalist allies such as Western Europe, Japan, and Australia; while the second world consisted the Soviet Union and Eastern European satellites.The third world consists of other countries such as Africa, the Middle East, Latin America and Asia [6].Today, the powerful economies of the West are still sometimes described as "First World," but the term "Second World" has become largely obsolete following the collapse of the Soviet Union [6]."Third World" meaning has changed to the term developing or under-developed countries.Others classified the Third World countries as "low and lower-middle-income countries".
The word pair developing/developed countries came into limelight in the 1960s and the rich countries were termed developed while the poor countries were termed developing countries.This classification was so much debated and some international organizations used membership of the organization of Economic Cooperation and Development (OECD) as the main criterion for developed countries status [7].As the OECD membership is limited subset of countries (OECD is made up of 34 members from 20 countries at its establishment in 1961), this heuristic approach resulted in the designation of about 80 -85 percent of the world's countries as developing and about 15 -20 percent as developed [8].
The Human Development Index (HDI), proposed in 1990, is one of the common taxonomy of countries into indices between 0 and 1.The HDI was created to emphasize that people and their capabilities should be the ultimate criteria for assessing the development of a country, not economic growth alone [9].The HDI is a summary measure of human development in three key areas: a long and healthy life, education, and a decent standard of living [8].The health dimension is accessed from birth till death, the education dimensional takes effect until the child is 25 years of age, and the living standard is measured by the Gross National Income per capita of a country [7].Since the HDI only captures a part of human development, it does not reflect inequalities, poverty, insecurity, empowerment, gender disparity et al.
The World Bank (2020) classifies economics/development based on three threshold of the Gross National Income (GNI) per capita.According to the World Bank, countries with less than $1,035 GNI per capita are classified as low-income countries [14]; those with GNI per capita between $1,036 and $4,085 as lower middle income countries; those with GNI per capita between $4,086 and $12,615 as upper middle income countries; and those with GNI per capita incomes of more than $12,615 as high-income countries [14].
A clear cut approach need to be adopted in classification of countries based on certain development threshold, though the definition of economic development and the threshold benchmark may be difficult to decipher.After much debate, economists and professionals now come to agree that economic development is a multifaceted problem.Over time, the focus of development economics has shifted.For instance, Lynge (2001) argues that development fast-track freedom by removing enslavement -e.g., hunger and tyranny -that leave people with little choice and opportunity [11].This humanistic approach to development leads one to explore what constitutes acceptable minimum living conditions.
Klasen, S. ( 2018) published a paper that critically evaluates UNDP's current suite of human development indicators and composite indices [7].It proposes little change to the flagship Human Development Index (HDI), the Inequality-adjusted Human Development Index (IHDI) and the Multidimensional Poverty Index (MPI), but encourages more analysis of trends and determinants in these measures.It proposes revisions to gender indicators, and two new measures to track sustainability and commitment to development.Milorad, K. ( 2010) published a paper that summarizes the normative issues around the importance of accounting for inequality in opportunities for and outcomes of human development [8].He then reviewed different approaches to accounting for inequality when quantifying human development [9].Milorad ( 2010) described the inequality-adjusted HDI and provides its limited sensitivity analysis [8].Lynge (2011) analyzed how the UNDP, the World Bank, and the IMF classify countries based on their level of development.She argued that these systems are found lacking in clarity with regard to their underlying rationale [11].The paper argues that a country classification system should be based on a transparent and data-driven methodology; such classification is preferable than judgment or ad hoc rules.
Felix, Lopez, & Ivan (2015) used the method of selforganizing maps (SOM) to compare the macroeconomic financial imbalances among European countries [10].They detected different profiles of countries and identify the public expenditure and the saving rate as the most critical variables that impacts on the national financial situation.Ashok, & Amit (2018) investigated the multifaceted nature and complexity of the socio-economic development process of world economies using economic development indicators, poverty and social welfare indicators, as input variables [1].They used self-organizing map (SOM) algorithm to project the multidimensional data onto a two-dimensional SOM surface.Nguyen (2011) applied the SOM technique on mapping the poverty data of countries [12].The world poverty map is based on multi-dimensions of poverty taking into account a number of indicators.The map groups countries of similar levels of poverty together, thus providing a visualization of structure of poverty.Tadanari (2018) wrote on the analysis of tourist consumption trends but the analysis only considered the consumption trends of tourists from each country [13].This study analyzes and visualizes the relationship between consumption trends; visiting rates to Japanese prefectures; and tourist nationalities through the use of questionnaire for 19 countries who visited Japan from 2015 to 2017 and the questionnaire results were analyzed using self-organizing maps (SOMs).

Architecture Self-Organizing Map
SOM is an efficient algorithm in visualizing data by reducing its dimensions from n-dimensional input to a lower dimension while maintaining its original topology relationship [13].The SOM architecture consists of two layers [10] of nodes, namely the input layer and the output layer.Each neuron in the input layer is connected to each neuron in the output layer [13].Then, each neuron in the output layer represents a class (cluster) of the input given.

The input layer is
, , , , Where is the length of the vector or the number of attributes and p represents the number of features.In this architecture 195 while 32.Each input is attached to neurons and the neurons are connected together and each neuron has a weight which determine the special location of the neuron.
Overtime, the weights , , , , … , , , , , , … , , get trained and updated to change the position of the neutrons into clusters, where .W is a set of real numbers that are randomly selected.In this architecture i = 6 and j = 32.The architecture of our SOM is shown below.Each circle represents a dimension and we have a total of 32 dimensions.The output of our architecture is defined as {y 1 , y 2 , y 3 ..., y i }, where i = 6 : , is a lateral distance given as : , = |> − >|.
Neighborhood size < = < % exp (− $ 7 % 9 ) Repeat by going to step 2 if the stopping condition $ = ?is not satisfied, where ? is the number of iterations.

Research Methodology
This research classify 195 countries into 6 categories or group mainly: highly developed countries, developed countries, middle-developed countries, developing countries, transition countries, under-developed countries using 32 criteria called attributes.The 32 attributes will be represented at a range of 0 to 5 with 0 representing minimum percentage or quantity and 5 representing maximum percentage or quantity.The collection of data takes three processes: 1) Data Collection: data are collected from several reliable sources such as World Bank, UN, UNDP and many more by searching for the appropriate features on their websites.In this research, 32 features/attributes were considered.
2) Features extraction: The data sources are usually represented in percentages, ratio, ranges et al.Different ranges are used to classify the data into categories.For instance, availability of power was based on 100% from the source data.So, the percentages for the 195 countries from the data source was divided into 6 categories from 0 -5.3) Data extraction: These categories were labeled 0 to 5 under the data extraction and this process was done for the 32 features considered in this research.

Features Considered
32 economic indicators (features) were scored on a scale of 0 to 5, 0 being the smallest and 5 being the highest.
1) GDP: GDP is the monetary value of goods and services of a country in one year.The USA has the highest GDP of $19.45 trillion followed by China, $12.238 trillion.Tivulu and South Sudan have the least GDP of $32 million and $33.6 million respectively [26].So countries with GPD of $10 trillion and above were allocated, 5; countries with $3 trillion dollars and less than 10 trillion, 4; above $800 billion but less than $3 trillion, 3; above $300 million but less than $800,000, 2; $100 million; below $100 million, 0 2) GDP per capita (GDP*): the GDP per capita is the GDP divided by the population.Luxemburg has the highest GDP per capita of about $105,000 [26].So for countries with GDP per capita of $105,000 to $80,000 take a weight of 5; $79,000 to $50,000 takes a weight of 4; $49,000 to $20,000 takes a weight of 3; $19,000 to $5,000 takes a weight of 2; $4,900 to $1,000 take a weight of 1; less than $1,000 takes a weight of 0 3) GNI per capita (GNI*): Countries were classified as low-income countries (LICs) with a GNI per capita of $1,045 or less, middle-income countries (MIC) with GNI per capita of more than $1,045 but less than $4,365, upper middle income countries with GNI from $4,366 to $12,736 and [12] finally high-income countries with GNI per capita more than $12,736.In this research, weight of 3 was allocated for highincome countries, 2 for middle-income countries and 1 for low income countries.4) Birth rate (BIRTH): The birth rate is the number of birth in 1000 of the population.The world average birth rate is 2.49 while the average birth in Africa is 4.58 [15].Countries with a birth rate of 50 to 60 is accorded with a weight of 5; between 49 and 30 is accorded with a weight of 4; between 29 to 20, a weight of 3; 19 to 10 a weight of 2; and 9 to 0 a weight of 1 et al.

5) Death rate (DEATH):
The death rate is the number of death in 1000 of the population.The death rate is one of the factors for measuring development.Countries with high death rate of 15 to 13 is assigned a weight of 0; followed by countries with 12.9 to 10 death rate, a weight of 1. Furthermore, death rate of 9.9 to 8.0 is assigned a weight of 2; 7.9 to 4.0 a weight of 3 and 3.9 to 1 a weight of 4 [16].6) Migration rate (IR): a country where most of its citizens migrate to live and work in other countries shows lack development and low standard of living.So, countries with the highest immigrants are better in term of development than countries with lesser immigrants.Countries with 15 million to 5 million takes a weight of 5; 4.9 million immigrant to 1 million takes weight of 4; 0.99 million to 0.5 million takes a weight of 3; 0.49 million to 0.1 million takes a weight of 2; and less than 0.1 million takes a weight of 1.
[17] 7) Security (SEC): the safety score for countries equally weighs each of the three factors: war and peace, personal security, and natural disaster risk.The safety score aggregates the indices from these three risks, thus presenting a comprehensive view of safety for each country.This also means that a high level of risk in one factor that will have significant effect on the country's overall ranking.For example, the Philippine is ranked least safe while Yemen is ranked second least safe.This can be attributed to the fact that the Philippine has poor scores in peace, security, and prevalence of natural disasters.[18] Yemen's terrible score is due to war and famine but the country has a very low risk of natural disaster.Thus, the Philippines ranks lower than Yemen even though Yemen is a war zone.Since our safest countries index is data-driven, Global Finance did not include countries like Syria, Iraq, or Afghanistan, so we did not include them in our report as well as countries that have incomplete or unavailable data.The safest country is ranked from 20 to 15 as weight of 5, 14.9 to 10 as weight of 4, 9.9 to 6 as weight of 3, 5.9 -4 as 2, 3.9 -2 as 1 and less than 2 as 0 8) Health Care (HC): Health care or accessibility to primary health care is an important factor in measuring development.A comprehensive health care index was done in the mid-year 2020 by WHO ( 2021) [19].The health care index was based on 100%.For this research, the weight were as follow: 100% -80% weight of 5, 79% to 60% weight 4, 59% to 40% weight 3, 39% to 20% weight 2, and finally 19% to 0% weight 1.The higher the weight, the more advance the health care of the country.9) Primary and secondary education (EDU): One of the key measurements of development is the level of education.From the data published by UNESCO in 2018 [18], UNESCO looked at literacy rate under youth aged 15 to 24, adult, and elderly.This research only considers data for the youth aged 15 to 24 to determine the literacy rate by countries.The literacy rate of 0 to 20% is given a weight of 1, from 21% to 40% is given a weight of 2, 41% to 60% is given a weight of 3, 61% to 80% is given a weight of 4 and 81% to 100% is given a weight of 5. 10) Availability of power (POWER): Availability of power is also an indicator of development.World Bank gave a comprehensive list of the population by countries who have access to electricity.So, a weight of 5 is allocated for countries with 91% to 100% power supply, 4 for those with 81% to 90%, 3 for 61 to 79%, 4 for 41 to 60% and 1 for below 40% [20].11) Transition to clean energy (TRANS): Transition to clean energy is an advanced technology the world is gradually moving to.Most advance countries are already encouraging in this.Govind, & Nathan (2005) gave a comprehensive list of the present world transition to clean energy [35] 2020) gave a comprehensive list of life expectancy all over the world [22].50 -55 years average life expectancy is given 0, 56 -60 years is given 1, 61 -65 years,2; 66 -70 years, 3; 71 -75 years, 4; and finally 76 years and above 5. 14) Dependency ratio (DEPEND): Dependency ratio is the measure of the non-working population with the working population.It shows if there are sufficient working populations who could support the nonworking population.A higher dependency ratio means that there are less working population than the nonworking population.This will lead to low investment and thus reduce productivity.Wikipedia (2020) gave a comprehensive list of the dependency ratio [23].We allocate 5 for 20 -29, 4 for 30 -39, 3 for 40 -49, 2 for 50 -59, 1 for 60 to 69 and 0 for 70 and above.A lower ratio could allow for better pensions and better health care for citizens.A higher ratio indicates more financial stress on working people and possible political instability.15) Corruption (CORRU): Most under-developed or developing countries do not have machineries to fight corruption.Trading Economics (2021) ranked 180 countries according to their level of corruption [36].
According to Trading Economics, South Sudan is the most corrupt countries in the world followed by Syria.
To classify this data into 6 categories from least corrupt to most corrupt countries.We use these ranges: least 0 -29 is assigned 5, 30 -59 assigned 4, 60 -70 assigned gave a comprehensive list of female children who are out of school.From 0 to 100 for 5, 101 to 1000 for 4, 1001 to 10,000 for 3, 10,001 to 100,000 for 2, 100,001 to 1 million for 1, above 1 million for 0 20) Foreign direct investment (FDI): From 0 to $1000 allocate 0, $1001 to $10,000 allocate 1, $10,001 to $100,000 allocate 3, $101,000 to 500,000 allocate 3; 500,000 to 1 million allocate 4 and above 1m allocate 5. 21) Income distribution (INCD): income distribution is the amount of income earned by an average citizen.World Bank (2021) gave a comprehensive list of average income distribution by countries in four categories: high income for 4, low income for 1, lower middle income for 2, upper middle income for 3. 22) Gender equality (GEND): Gender equality is the concept that everyone should be treated equally irrespective of sex [27].Experts believe that gender equality across education, employment, health, politics and economic participation is a reflection of a healthy country and most optimized economics [28].So we allocate 5 for 90% gender equality, 4 for 80% to 89%, 3 for 70% to 79, 2 for 60 -69%, 1 for 50 -59% and 0 for less than 50%.23) Human Development Index: The HDI is one of the most popular indices for classifying countries.It considers health, education and income [9].Wikipedia gave a comprehensive list of the HDI for 2021.24) GDP by Economic sectors (primary, secondary, and tertiary): A country which generate most of its income from the tertiary sectors followed by secondary sector is an advanced and highly developed economy.The sector a country earns most of its income is a measure of development [29].Countries which generate 80% and above from secondary and tertiary section is allocated 5, 70 -79% for 4, 60 -69% for 3, 40 -59% for 2, 20 -29% for 1 and less than 20% for 0 25) Child mortality rate (CMORT): Mortality rate is the number of death of children usually under 5 years old.World Bank gave a comprehensive list of mortality rate under 5 per 1000 live birth.Death rate 0 death to 20 death is allocated a weight of 5, 21 to 40 death 4, 41 -60 for 3, 61 -80 death for 2, 81 -100 deaths for 1, and 101 -120 death for 0. 26) Export earnings (EXP): United Nations gave a comprehensive list of export earnings by countries in million dollars.For the purpose of this research, any export earnings from 1,000,000 million dollars and above is allocated as 5, within 999,999 million and 501,0000 million dollars for 4, 500.000 to 301,000 million dollars for 3, 300,000 to 101,000 for 2; 100,000 to 50,000 million dollars is 1 and less than 50 million dollars is 0. 27) ND-GAIN index: The ND-GAIN index score a combination of venerability score and readiness score.

31 )
People per doctor (P*DOC): World Bank (2021) gave the number of physician per 1000 people [33]expenditure on education (as a percentage of the GDP): Most developed countries spend a significant percentage of their GDP on education [34]. 0,0};