Reasons for Some Countries Having More COVID-19 Cases Than Others: Evidence from 70 Most Affected Countries sans China

A look into the country-level data on the number of COVID-19 positive cases reveals considerable cross-country variations in the number of officially confirmed COVID-19 positive cases. Consequently, there exists a research gap in the relevant field of research. This paper attempts to explain the variations in the number of officially confirmed COVID-19 positive cases across countries around the world and thus fills in the research gap. The study develops a unique dataset of 70 of the most COVID-19 affected countries and employs multiple regression techniques. The findings indicate that regional characteristics play an essential role. Percent of people living in the urban area, number of tests, air passenger transport (an indicator of population mobility) also come out as determinants with substantial influence. Besides, the impacts of trade relationships with China (a proxy for the degree of interaction with the country) and per capita health expenditure appears to be noteworthy. Differences in temperature are found to have no appreciable impact. Also, factors such as the relative importance of health in national policy, the quality of life, and the quality of governance fail to register any vital influence. The study does not find any evidence of endogeneity of the total number of tests conducted.


Introduction
The city of Wuhan in Hubei province, China, reported the first 'pneumonia of unknown cause' on December 31, 2019 [1]. The virus has now spread to 215 countries and regions [1]. As of July 14, 2020, there are 12,929,306 confirmed cases, 569,738 confirmed deaths with 216 countries, areas, or territories with COVID-19 positive cases [1].
The outbreak has triggered a slew of research focusing on various aspects of the economic crisis ensuing the pandemic. For example, [2] devoted its 14 chapters to the economic consequences of COVID-19, such as macroeconomic issues, trade impacts, finance, regional influences, and others. [3] appraises the economic impact of COVID-19 in the USA. [4] assess the developments and determinants of economic anxiety caused by the coronavirus outbreak. Their findings emphasize the importance of information and public education in containment and in coping with the adverse effects arising from higher economic anxiety. [5] finds that the labor market influences of COVID-19 vary across countries, and it hit women and the less educated workers the hardest. [6] finds that low-income US counties comply less compared to counties with more robust economic endowments. [7] discovers that belief in science affects physical distancing in response to COVID-19 lockdown policies.
However, while looking at the affected countries across the globe, one finds notable country-level variation in the number of cases. As of July 2, 2020, there are 84 countries with at least 5,000 cases. 25 of them have 5,000-10,000 cases, 12 countries have 10,000-20,000 cases, 6 countries have 20,000-30,000 cases, 5 countries have 30,000-40,000 cases and 6 countries have 40,000-50,000 cases. Each of the next two groups has four countries. Afterward, countries are distributed sparsely across the case categories. Between 70,000-100,000 cases there are 3 countries, in the 100,000-200,000 group there are 6 countries. From 200,000 to 300,000, there are 7 countries. 3 countries are between 0.3 million-1 million cases, and 2 countries have more than 2 million cases. These statistics point to a gap in the existing COVID-19 literature, i.e., what caused this variance in COVID-19 positive cases across countries?
This paper contributes to the growing body of work on COVID-19 Economics in that it tries to discover the forces that probably caused this cross-country variation in the officially confirmed number of total cases. Specifically, it answers the following questions, among others: How vital are the population characteristics like percent of urban population and population mobility (air passenger transport), along with the widely-used demographic traits, in explaining the cross-country variation in total cases? Has economic interaction with China affected the outcome? Does the relative importance of health in national policy help to explain the variation? How important have been the unobserved regional characteristics? Do the quality of life and the quality of governance affect the number of cases?
The organization of the paper is as follows. Section 2 comes after this introductory section and describes the methodology of the research. Section 3 presents and analyzes the results, and Section 4 concludes the paper.

Methodology
In this paper, we consider countries with at least 5000 confirmed cases. As stated earlier, we want to explain the country-level differences in the number of officially confirmed COVID-19 positive cases. To capture that variation, we use several potentially relevant and important predictors.
The total number of tests is likely to affect the total number of COVID-19 cases positively. Usually, people get tested when they think that they have the symptoms. So, a higher number of tests will push total cases higher.
Then we consider some demographic characteristics of the population. The number of people in the 15-64 age group is likely to affect the total number of cases positively. As people in this age group are the people who are workers or students or both, they need to step outside home more frequently. Consequently, they come in contact with more people and have a higher risk of being infected with the coronavirus. On the contrary, people aged 65 and older are the people who are usually dependent members of a household. More often than not, they do not need to go out as they are less involved in money-earning activities. Hence, we expect a higher number of people in this age group to reduce the total number of cases.
Both the World Health Organization (WHO) and governments have emphasized 'social distancing' to help reduce the spread of the virus. In a densely populated area, it might be difficult to enforce the policy of social distancing. We can apply a similar line of arguments to make a case for percent of people living in the urban area. In urban areas, it is relatively difficult to implement the policy of social distancing. The density of the population is higher in urban areas. Urban people are economically more active than the rural population. They travel more and come into contact with travelers more. So they are more likely to get infected with the coronavirus. We include both population density and percent of people living in the urban area as covariates in our analysis.
The mobility of the population can also play a pivotal role. If a higher number of travels characterizes a population, we are likely to observe more COVID-19 positive cases. We employ the total number of air passengers (domestic and international) per 1 million people as an indicator of the movement of the population.
To indicate the level of the quality of life enjoyed by citizens of different nations, we use the Human Development Index (HDI) Score. We think that it is a better measure of the quality of life, compared to per capita income, because of its multidimensionality. The quality of life is likely to impact the total number of cases negatively.
We also include health expenditure per capita as an explanatory variable. If a country spends more on health per capita, including healthcare goods and services consumed, it is likely to have a more reliable health system. Nevertheless, a better health system does not warrant better management of a pandemic like COVID-19. As we have seen, in most cases, the decisions regarding the management of a pandemic come from political leadership. However, there exists the scope that a better health system can contribute to developing a nation of individuals who, on average, have stronger immunity against disease. This channel will negatively impact the number of tests as well as cases. Again, if per capita health spending is higher, that indicates higher affordability per capita, and people will go and get tested. In this scenario, both the number of tests and cases will rise. In a country where per capita health expenditure is low, we are likely to observe fewer cases for two reasons: 1. less number of tests conducted officially, 2. people are less willing to get tested.
We use a country's trade volume with China divided by the country's total trade volume as a proxy to measure the degree of a country's interaction with China. We define trade volume as the sum of exports and imports. We presume that a country's level of interaction with China is likely to have some bearing on the spread of the virus.
During the first few weeks of the outbreak, there was intense discussion regarding the role of temperature. Some people argued that the virus could not survive higher temperatures. However, the scientific community strongly opposed the notion and expressed that temperature has nothing to do with the spread and severity of the virus. WHO, in its "Coronavirus Myth buster," states: "You can catch COVID-19, no matter how sunny or hot the weather is." It would be interesting to see how temperature sways the number of total cases. Ergo, we include temperature on our list of variables.
Next, we consider prioritizing health, i.e., the relative importance that the health sector receives in government policy. We measure this as the ratio of a country's annual health expenditure to its' annual military expenditure. If health is relatively more important, this ratio would be higher, and we expect that the country will have a more capable health sector that better handles crises like COVID-19. With regards to COVID-19 pandemic, better handling of the crisis implies that there will be more tests and lower attempts to suppress information. Consequently, we are likely to observe more positive cases.
In the presence of good governance, corruption is likely to go down. The level of corruption, thus, can indicate the level of the absence of good governance. To capture the impact of the quality of governance on the total cases, we use the Corruption Perceptions Index (CPI) score. CPI score and quality of governance are positively related. Having competent and incorrupt political leaders is a prerequisite for good governance. Such leaders will manage the pandemic well, and in a well-managed pandemic, we expect to see a lower number of cases per million of populations.
The following table presents the mean number of total cases per 1 million population by seven region categories as classified by the World Bank. According to the table, the Middle East and North Africa have the highest cases per 1 Million Population, which is 7977, followed by North America (5538) and Latin America and the Caribbean (4098). The considerable variation across regions suggest that specific regional characteristics, mostly unobserved, can play an essential role in explaining the cross country variation in the number of COVID-19 positive case. To understand the impact of region-specific traits, we use regional dummies in our model. Based on the arguments presented above, we define the following multiple linear regression model to explain the cross-country variation in the number of COVID-19 positive cases.
Total Where i (1,2,3,…….., n-1, n) represents a country, and u i is the error term. We have six regional dummy variables that can assume values 1 and 0 only. Coefficients b 13 -b 18 are the coefficients associated with the dummies. We treat East Asia and the Pacific (EAP) as the base region, and all regional comparisons are made apropos of EAP. Table 2 provides the list of variables, their descriptions, and the sources of data. Note: the real-time data on total case, total tests, and population density were collected from the Worldometers at 12:43 am Bangladesh time, July 02, 2020 [37]. The UNDP is the source for data on population 15-64, population 65 and above, and the quality of life (HDI) [38]. The degree of interaction with China was measured by using information from the World Bank [39] and the Tradingeconomics.com [40]. The quality of governance (CPI) data comes from Transparency International [41]. The World Bank is the prime source of the rest of the data.
As of July 02, 2020, 12:43 am Bangladesh time, there were 84 countries with at least 5,000 COVID-19 positive cases. Since one of our prime objectives is to investigate the potential impact of economic relationship with China, we cannot have China in the dataset. Also, in the process of developing the dataset, we lost another 13 observations due to partial data unavailability. Four countries, Congo Dem. Rep., Tajikistan, Cameroon, and Algeria, were dropped as data on total tests were not available. Eight countries did not have data on the number of air passengers and hence dropped. They are Haiti, Denmark, Dominican Republic, Norway, Guinea, Armenia, Gabon, and Sweden. For Panama, as the Military-GDP ratio was zero, the Health-Military ratio could not be calculated, and hence it was dropped, leading to the final sample size of 70.
Information on Iran's trade-GDP ratio is from [42], whereas Venezuela's information comes from [43]. We used the 2017's trade-GDP ratio as proxies for 2018's trade-GDP ratio for these two countries. For Bahrain and Qatar, data on population, ages 65 and older (millions), were collected from [44]. Also, for four countries, military expenditure data was not available for 2017. Hence we used their military expenditure data of the closest available year. For the Gambia, Uzbekistan, Qatar, and UAE, those years were 2018, 2018, 2010, and 2014 respectively. Table 3 reports the regression results as well as the relevant diagnostics. Urban Population and the region MENA affect the total number of cases positively and significantly (p<0.005). The number of tests also affects the outcome variable positively, and this impact is significant at a level of significance of 0.086. The impact of no other variable is statistically significant at the usually accepted level of significance. The standard errors seem to be unusually high. Some of the coefficients, namely, Population 15-64, Population 65 and above, and Quality of Governance, bear signs opposite to our expectations. Also, Health expenditure per capita appears to exert a negative impact on total cases. As suggested in the literature, outliers, heteroscedasticity, data definitions, and specification errors are among the causes that can generate wrong signs [45]. The diagnostic tests reveal that the residuals do not follow the normal distribution leading to incorrect inferences. The Breausch-Pagan test rejects the null of constant variance, whereas the White test suggests the opposite. Ramsey RESET test suggests that the model is misspecified. A look into the studentized residuals tells that four residuals exceed +2 or -2. Also, inspecting the leverages disclose that we have eight leverages higher than (2k+2)/n, where k=18.

Results
As a remedial measure, we resort to the transformation of some variables, including the dependent variable into their natural logs. We rerun the model and conduct the same diagnostic tests as before. The skewness/kurtosis test does not reject the normality assumption. Both the Breausch-Pagan and the White test for heteroscedasticity confirms the rejection of the alternative hypothesis of nonconstant variance. Also, following the Ramsey RESET test, the model does not suffer from specification error. The predictive power of the model is significant. It explains 73.81 percent of the cross country variation in the total number of cases of COVID 19 positive per 1 million population.
The number of tests done affects the number of positive cases directly and significantly (p=0.051), which should not surprising because, as more tests are carried out, more confirmed positive cases are expected. Among the demographic characteristics, the population in the 15-64 and the 65 and above age groups are with expected signs. However, these impacts are not statistically significant. As the density of the population increases, we observe a surge in total cases. Again, this escalation does not affect the predictand significantly. The percentage of people living in the urban area affects the regressand, total cases, positively, and the impact is statistically significant at less than a 5% level of significance (α). Conforming with our expectancy, mobility of people, measured by air passenger transport, affects the response variable positively, and the associated p-value is 0.058. The impact of the Human Development Index, used here to represent the quality of life, is negative but statistically insignificant. Health expenditure per capita has a positive influence, which is statistically significant at α=0.09.
The ratio of a country's trade with China to the country's total trade volume, a proxy we use to indicate the degree of that country's interaction with China, affects the total number of COVID-19 cases positively. It is significant at α=0.084. Temperature, the relative importance of the health sector, the quality of governance as proxied by the CPI, fails to register any important impact, individually, on the outcome variable.
Compared to the base region, East Asia and Pacific (excluding China), when controlled for other influences, all the regions have higher cases of COVID-19. Except for SSA, all the region dummies are significant at α < 0.01.
When the number of total cases rises, a higher number of people will be concerned about their health and get tested. Hence the number of total cases can drive the number of total tests making the later endogenous. To address the potential endogeneity of total tests, we use the Instrumental Variable (IV)-the Two-Stage Least Squares (2SLS) procedure. We use the Democracy Index Score of 2019 [46] as the instrument. We argue that testing and tracing are vital to the management of the COVID-19 pandemic. Since, in a democratic society, people have a strong voice, and the government is held accountable for its actions, policymakers are more concerned about people's reactions or criticism. During the coronavirus outbreak, this will translate into conducting more tests to manage the pandemic well. Thence, score in the democracy index is a valid instrument of the total number of tests. We estimate the IV-2SLS regression, instrumenting total tests by the democracy index. The Durbin-Wu-Hausman test shows that we are unable to reject the null of 'variables are exogenous.' With the exogeneity of total tests verified, we adhere to the results in Table 4.

Conclusion
The findings of this paper suggest that unobserved regional characteristics play the most critical role in explaining the variation in the number of COVID-19 positive cases around the world. All the regional dummies, except the one for Sub-Saharan Africa, impact the total number of cases positively, compared to the base category, and these impacts are statistically significant at less than 1% level. Percent of the population living in the urban area, the number of tests, and the level of mobility of the population come out as important determinants. Additionally, trade volume with china compared to a country's total trade volume, a measure of the degree of interaction with China, and health expenditure per capita exercise substantial positive influence. These two impacts are significant at the 10% level.
The factors that fail to mark significant influence on the total number of cases include other demographic characteristics like the number of people in different age groups and the density of population, quality of life, relative importance of health in national policy, quality of governance, and temperature.
This study opens up several avenues for further research. i) We could not explicitly consider mitigation strategy as reliable data on country-level mitigation policies are still emerging and are mostly incomplete. However, variables such as the quality of governance, quality of life, region (along with temperature, it probably encapsulates unobserved characteristics like culture, norms, values, lifestyle, etc.), and relative importance of health might capture the effectiveness of various mitigation strategy to a large extent. ii) Many countries are still in the midst of the crisis, and hence, a complete evaluation will be possible only when the pandemic finally ends. iii) Instead of using the aggregate CPI score for a country, it would be more appropriate to use a measure of the quality of governance in the health sector. iv) We did not test for the robustness of the model to alternative specifications. v) Data on per capita out of pocket health expenditure might bring out meaningful insights. vi) In this paper, the total number of tests was found to be exogenous. Future studies should explore this issue further.