Estimation of Risk Factor’s Contribution to mortality from COVID-19 in Highly Populated European Countries

Background: The outbreak of the COVID-19 epidemic and the excess of mortality attributed to COVID-19 worldwide raised the need to develop a simple and applicable mathematical model for predicting mortality in different countries, as well as to point out the risk factors for COVID-19 mortality, and, in particular, demographic risk factors. Methods: A linear model was developed based on demographic data (population density, percentage of population over age 65 and degree of urbanity) as well as a clinical data (number of days since the first case was diagnosed in each country) from 10 highly populated (over 8.5 million people) randomly selected European countries (Austria, Hungary, Portugal, Sweden, Czech Republic, Belgium, the Netherlands, Romania, Italy, France). A linear regression model was applied, using IBM SPSS version 20 software. Results: The proposed model predicts mortality among the selected countries. This model is found to be highly correlated (R2=0.821, p=0.042) with the actual (reported) number of deaths in each country. Percentage of population above age 65, population density and number of days since the first case appear at each state were found to be positively correlated with COVID-19 mortality, whereas urbanity were negatively correlated with mortality. Conclusions: Percentage of population above age 65 and population’s density and the number of days of exposure to COVID 19 are potential risk factors for dying from the pandemic, whereas, urbanity is considered a protective factor. However, it should be remembered that this model is based on data from medium to large populations and only in continental Europe. Moreover, it is based on mortality data of the "first wave" of the pandemic. Further study should evaluate the model accuracy based on data from the "second wave" and not only in continental Europe.


Introduction
By April 5, 2020, the outbreak of the coronavirus disease 2019 (COVID-19) caused 1,318,713 confirmed cases and 73,146 deaths globally. These numbers are much higher than those of the 2003 Severe Acute Respiratory Syndrome (SARS) (8273 cases, 775 deaths,) and the 2012 Middle East Respiratory Syndrome (MERS) (1139 cases, 431 deaths). Since its outbreak, COVID-19 was detected four months later internationally [1]. Although SARS and MERS are considered much more fatal compared to COVID-19, the latter tends to spread at a higher rate and infect considerably more people [2]. Among designated groups (males, over 75 years of age with background disease), the fatality rate of this disease could rise to 14.2% and above [3]. Due to the COVID-19 high mortality rate in certain populations predisposed to death, it is important to develop a model that could predict its influence based on demographical and minimal clinical data. Over the years, many models have been developed to predict mortality from infectious diseases. These models are based on the classical epidemiological approach Known as the acronym SEIR models: S for susceptible individuals, E for exposure to a pathogen (i.e., infected but not yet infectious themselves), if infected they are mentioned by the I letter and if they are recovered or removed (dead) they are classified by the letter R. Each is characterized by a specific pace coefficient (rate of infection, recovery, and mortality), and is influenced by governmental policies -levels of social isolation, closure, and usage of protective gear and masks. These coefficients are based on post-exposure experience. The SEIR model a number of notable drawbacks: the populations differ in age, genetics, ethnic characteristics, background diseases, and immune system effectiveness. Another erroneous is that this model assumes that the pathogen's infection is random, which is not always the case. Children and adolescents are exposed to larger populations (kindergartens, primary schools, and high schools) compared to adults. In contrast, elderly populations may live in nursing homes, long and continuous exposure to external people, such as staff and visitors. Furthermore, it is mistakenly assumed the infection rate is constant throughout the seasons and that the amount of virus to which a person has been exposed not matter [4].
Some researchers developed mathematical models based on virological and serological datasets collected intensively during previous pandemics. Yaari et al. have employed a conditional likelihood approach for fitting a disese transmission model to virological and serological data collected in Israel during the 2009 H1N1 pandemic. However, this model assumed the existence of vaccination and reflected its influence on the disease's waves [5].
Some mathematical models were developed to estimate the impact of specific factors on the disease's spread, A study which examined the control of influenza in the elderly found that 50% of the influenza in the elderly were caused by a direct contact with an infected child [6].
An Structural Equation Modelling (SEM) based on empirical data from 88 countries around the works showed that socioeconomic status, as well as urbanity and modernity of the living area have significant effects on COVID-19 pandemic severity (Mokhlesur et al, 2020).
Another study which applied a linear model based on data from COVID-19 top seven infected countries found an association between the prediction of lethal duration and COVID-19 mortality rate (Vivek et al. 2020).
The American Center for Disease Control (CDC) between February 12 and April 7 the risk factors associated with COVID-19 incidence rate and mortality rate in 50 U.S. states. The factors found to affect mortality were: 1) duration of exposure to COVID-19; 2) population density; 3) age distribution and prevalence of underlying background diseases; 4) the timing and extent of community mitigation measures; 5) diagnostic testing capacity; and 6) public health reporting practices [7]. Another model based on non-clinical data found that mortality was inversely related to high ambient temperature, low population density and an early lockdown policy [8]. Clinical models found that male patients with heart injury, hyperglycemia, and high-dose cortico-steroid use may have a higher mortality risk [9]. Other markers include: cardiovascular disease, diabetes, chronic respiratory disease, serological markers, such as C reactive protein levels (inflammation sign) and elevated levels of the enzyme lactic dehydrogenase (LDH) (signing tissue damage) [10,11].
In light of the SEIR model shortcomings and CDC theoretical results, it is highly beneficial to develop a simplified model that can predict mortality based on available demographic information such as percentage of elderly people (above age 65) within the population, population density and percentage of urban residence, which serves as another determinator for evaluating population crowding [6]. Another recently published article compares between 10 of the most leading countries in death rate (more than 25 people per 100,000 population) to other 83 countries with lower death rate. It was found that the following risk factors are significantly correlated to the higher death rate: Alzheimer's disease, Lung cancer, COPD, Asthma, Depression and the socio-economical factor Gross Domestic Product per capita. However, Age ≥ 65 years, Urbanization (%), Population density and Unemployment (%), were not found to have statistically significant correlation [10]. the influence of duration exposure to COVID-19 where not included in any mentioned model [12].

Demographic and Clinical Data
Demographic information including population density (highly populated countries above 8.5 million people), degree of urbanization and age distribution (percentage of population above age 65). Clinical information includes the number of days since the first case was diagnosed. This data was retrieved from real-time available websites [11][12][13].

Statistical Analysis
We used IBM SPSS version 20 to develop a linear regression model to predict the number of deaths in each randomly selected European country. The dependent variable was the number of cumulative deaths actually observed until 5 of april 2020, and the explanatory variables were those found to be statistically significant (p<0.05).

Demographic Information
The demographic data that was used for developing the model included population density data (population/country area), percentage of population above 65 year, and the clinical information of number of days since the first case appeared in the country and mortality rate (cases per millions of people) are summarized in table 1:  Austria  109  19  57  42  204  Hungary  107  20  72  34  38  Portugal  111  20  66  36  295  Sweden  25  20  88  74  401  Czechia  139  19  74  37  72  Belgium  383  19  98  63  1447  Netherland  508  19  92  40  1766  Romania  84  17  55  41  162  Italy  206  22  69  67  15887  France  119  20  82 74 8078 Table 1 Looking at comparable data is between countries with as many similar characteristics as possible. For example, a comparison between Austria and Romania indicates that there is a link between population density and increased mortality (the rate of urbanization, exposure days and the elderly population is similar). A comparison between Hungary and Portugal, whose population density, number of days of exposure and percentage of elderly population, emphasizes the role of urban life, as a factor reducing mortality. A comparison between Sweden and France, which have an elderly population rate and a similar number of exposure days, highlights population density and life outside the city as mortality-increasing factors. A comparison between Belgium and the Netherlands has a similar elderly population rate, and a close urbanization rate, but the number of days of exposure is about 1.5 times greater while the density in the Netherlands is 1.3 times higher, amounting to a 1.2 times higher mortality rate, meaning population density is more dominant Exposure to the virus. The latest and most extreme comparison is between the Czechia and Italy. A densely populated Italy is older and much longer exposed to the virus, on the other hand its degree of urbanization is lower. Given these conditions, the mortality rate is 220 times higher.

Linear Model for Predicting Mortality Rate from COVID-19
Based on the data presented in table 1, a simplified linear model was developed for predicting mortality rate using demographic information (population density, percentage of population above 65 years and urbanity) and minimal clinical data (days from the appearance of the first case). The model is summarized in table 2.  Table 2 As can be seen from table 2, the three factors: the percentage of the population over the age of 65, the population density and the number of days that have passed since the first case was diagnosed -have a positive effect, ie the greater the numerical value, the higher the number of deaths. In contrast, as the population becomes more urban, mortality decrease. The resulting β values indicate the relative contribution of each background factor for mortality. Thus, the factors in the order of their contribution are: the degree of urbanization (β=-0.84), the number of days of exposure to the virus (β=0.77) population density (β=0.68) and the proportion of the elderly population (over the age of 65) (β=0.55). Positive β values means the higher the value, the higher is linear contribution, whereas negative β values mean negative linear contribution.

Discussion
The suggested simplified linear model predicts mortality rate based on publicly available demographical and clinical data of 10 randomly assigned European countries (population size above 8.5 million people). This model includes the following demographical data: percentage of population above age 65 years, population density, and urbanity. The only clinical data included in the model is the number of days since the first case diagnosed in each country. Compared with the SEIR model mentioned above, which is characterized by complicated differential calculations (infection rate, recovery rate, and mortality rate), the suggested model is easy to implement, as population density, and percentage of older population (above 65 years) and urbanity are constant factors at certain period, whereas days from the first case diagnosed Highly Populated European Countries is variable. A review published by Eliyahu U and Boaz M noted that urbanization is associated with an increased risk of infection, as the common public spaces shared by the population, such as skyscrapers and stairwells [16]. However, with regard to the chance of mortality as suggested by this model, the opposite trend was found, urbanization reduces the chance of mortality, possibly due to the immediate availability of effective medical services in cities compared to rural areas. Other factors that have not been examined are the quality of medical care in each and every country, the number of screening tests to detect morbidity and the implementation of quarantine policies by the various governments.
A recently published review by Wynants L. et al, reports the results of 145 prediction models, among them 50 predicts mortality rate. In those models the most frequently reported and prognosis variables are: age, comorbidities and sex. However, the authors concluded that those models are at high risk of bias, and their reported performance is probably optimistic [17]. Therefore, our model tries to emphasis the relative contribution of urbanity, population density, percentage of older population (over age 65 years) and days from first case exposure.
Limitations of the proposed model: The model was not validated for smaller European countries (less than 8.5 million people) or for countries in other continents beside Europe. Additionally, the model did not consider the lockdown and isolation policies imposed in different countries and the timing of their implementation, which probably affected the morbidity and thereby the mortality rates [18,19]. This suggested model should be applied in different countries and in different time intervals (the "second wave" of pandemic outbreak), procedures that could further explore the models' reliability and validity. Moreover, the model did not consider the quality of health systems in each country, a factor that may affect the treatment quality and the mortality rate [18]. consequently, our model may not provide with an exact prediction of future mortality rates, but it points out the important associations between percentage of elderly population, population density and mortality rates. Whereas urbanity is considered protective factor maybe due to the availability of medical services in the city zone. Such associations may be useful to decision makers while considering measures to be taken during the pandemic. For example, deciding on applying restrictions such as a lockdown or school and business closure only in urban areas [19].

Conclusion
The outbreak of this pandemic highlights the importance of developing simplified models for predicting the spread of contaminated disease. The suggested model uses available information to predict mortality in 10 different European countries, based on population density, percentage of population above age 65, and cumulative time interval from first case appearance. Our model was highly correlated with actual mortality data among the selected countries. Moreover, the above risk factors were found to be statistically significant potential determinants of COVID-19 mortality (either protective factors or preventive factors), and, as such, they may be considered by decision makers when deciding on measures and policies. Further research is warranted to expand and validate this preliminary model for other countries worldwide.