3 2 Factorial Design and Model for the Spread of COVID-19 in West Africa

: The coronavirus diseases 2019 (COVID-19) is a worldwide pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-COV-2). This research is therefore aimed at using experimental design to derive a model for the spread of COVID-19 in West Africa. In effect, 3 2 factorial design with mixed factors was used to design the data layout with country as the random factor and status of COVID-19 patient as the fixed factor. Under country, Nigeria, Senegal and Ghana were randomly selected as the three levels from the seventeen West African countries while under status of COVID-19 patient, recovery, death and active cases were the three fixed levels used. The data for the study were collected based on the monthly recorded number of the various COVID-19 cases (i.e. recovery cases, death cases and active cases out of the total confirmed cases) for the period from February to September, 2020. The data were retrieved from the reports on COVID-19 given by the respective country’s health authorities and published on their various websites. In effect, a mixed effect model was derived through formulation processes for the prediction of the various COVID-19 cases across West Africa. Also, different residual analyses were conducted for the model adequacy checking and it proved that the model was adequate in the estimation of the various COVID-19 cases in the West African Sub-region. This therefore makes it the first time experimental design and analysis has successfully been used for a study of this nature on the spread of COVID-19 in West Africa.


What Is COVID-19
Coronavirus disease 2019 (COVID-19) is a worldwide pandemic caused by severe acute respiratory syndrome corona virus 2 (SARS-COV-2). A research conducted by Bette Korber and associates at Los Alamos National Laboratory and published in the journal cell shows that a specific change in the SARS-COV-2 genome, previously associated with increased viral transmission and the spread of COVID-19, is more infectious in cell culture. The variant in question, D614G, makes a small but effective change in the virus's 'spike' protein which the virus uses to enter human cells. The virus with D614G change in spike-out completes original strain, but may not increase the severity of the patient's illness. [1].
Although COVID-19 causes only mild illness in most people, it can make some people very ill. More rarely, the disease can be fatal, especially in older people and those with such pre-existing medical conditions as high blood pressure, heart problems, diabetes, etc. This is from WHO report in February 2020, as in reference [2].
The most common symptoms of COVID-19 are fever, tiredness and dry cough. Some patients too may have aches and pains, nasal congestion, runny nose or sore throat. These symptoms are usually mild and begin gradually. Most people (about 80%) recover from these symptoms without needing special treatment, of which majority are children and young adults. [3].
The best way to prevent and slow down transmission is to be well informed about the COVID-19 virus, the disease it causes and how it spreads. Protect yourself and others from infection y washing your hands or using an alcohol based sanitizer frequently and not touching your face. Also, since the COVID-19 virus spreads primarily through droplets of saliva or discharge from the nose when an infected person coughs or sneezes, it is important to practice respiratory etiquette (for example, by coughing or sneezing into a flexed elbow, wearing face/nose mask, etc). [4].

In The World
When a new virus is discovered, it is important to understand where it comes from. This is critical to be able to identify and isolate the source and prevent further introductions of the virus into the human population. It also helps to understand the dynamic of the beginning of the outbreak, which can be used to inform the public health response. Understanding the origin of the virus may also aid the development of therapeutics and vaccines.
From WHO report on 26 th March 2020, all available evidence suggests that the virus, SARS-COV-2, which causes COVID-19 has a natural animal origin and as such not created from laboratory. [5].
The first human case involving COVID-19 was reported at Wuhan Central Hospital in December 2019 in Wuhan, China. During the early stages of the outbreak, the number of cases doubled at approximately every seven and a half days and the spread moved to other Chinese provinces by mid-January 2020. The infection rate kept on increasing to the extent that by 20 th January, China was recording about 140 infections a day, which accrued to 6,174 infected people.
On 30 th January, with 7818 confirmed cases across 19 countries, the WHO declared the outbreak a Public Health Emergency of Internal Concern (PHEIC) and then a pandemic on 11 th March 2020 as Italy, Iran, South Korea and Japan reported increasing numbers of cases. Later that month, the number of cases outside China quickly surpassed the number of cases inside China. Italy for instance overtook China as the country with the most reported deaths on 19 th March and United States of America (USA) took over from Italy as the country with the highest number of confirmed cases in the world. Before this, WHO considered Europe the active centre of the pandemic as at 13 th March 2020. According to the Johns Hopkins COVID-19 interactive map, as at 30 th September, globally, there were 33706888 confirmed cases, 1008993 deaths and 23432488 recoveries. [6].

In Africa
The COVID-19 pandemic in Africa was confirmed to have spread to the continent on 14 th February 2020 and Egypt happened to be the first country to have recorded the first confirmed case. [7]. Chronologically, Egypt was followed by Algeria who had her first case reported on 25 th February and then Nigeria who happened to be the third on the continent and first in the Sub-Saharan Africa, with her first case recorded on the 27 th February. Apart from these three countries, the first cases in the other African countries were detected in March. However, Lesotho is the last African to have reported her first confirmed case which happened in 13 th May 2020. [8].
As at September 30 2020, the most COVID-19 affected African countries were South Africa (confirmed cases: 671, 669 and deaths: 16

In West Africa
In West Africa where this study is focused, Nigeria happened to be the first country to have reported a COVID-19 case on the 27 th February, 2020, followed by Senegal on 29 th February. [10]. By March ending, all the seventeen West African countries had reported confirmed COVID-19 cases. As at 30 th September 2020, the total number of confirmed COVID-19 cases had reached 187,290 with 2,882 deaths, 166,501 recoveries and 17,980 active cases across the West African sub-region. The five countries with the highest confirmed caseloads are Nigeria (59345), Ghana (46, 829), Co te d'Ivoire (19,849), Senegal (15,094) and Guinea (10,754). However, Ghana and C o te d'Ivoire recorded the highest recovery rates of 98.5% and 97.8% respectively, while Chad and Liberia the highest fatality rates of 7.1% and 6.1% respectively. [11].

Related Studies
Some researchers across the world have carried out some studies elsewhere which are directly or indirectly related to this study. Anwar Zeb and others for instance used nonstandard finite difference (NSFD) and Runge-Kutta fourth order method to derive a mathematical model for COVID-19 which contain isolation class and concluded that their derived model was used to show that the corona virus spreads through contact. It was also used to describe how fast something changes by counting the number of people who are infected and the likelihood of new infections. [12].
Also, in reference [13], Wang and his partners developed epidemiological models to forecast the evolution of the coronavirus and estimate the effectiveness of various intervention measures and their impacts on the economy. In their study, they reviewed a range of mathematical models for the outbreak of COVID-19 which can be employed to estimate the epidemiological trends, including severity, basic reproduction number and herd immunity as well as the potential effects of interventions on COVID-19. They also found that most of the existing epidemiological models of COVID-19 are typically based on epidemic-dynamic models rather than the statistical models or machine learning.
Ivorra and associates mathematically developed a new ϴ-SEIHRD model, for the spread of the COVID-19 in China which takes into account the known special characteristics of the COVID-19, as the existence of infectious undetected cases and the different sanitary and infectiousness conditions of hospitalized people. Their model included a novel approach that considers the fraction (ϴ) of detected cases over the real total infected cases which allows studying of the importance of the ratio and its impact on COVID-19. The model was also able to estimate the needs of beds in hospitals. They further used the reported data on the spread of COVID-19 in about five provinces in China to identify the model parameters which can be of interest for estimating the spread of COVID-19 in other countries. [14].
In reference [15], Li and others established dynamic models of six chambers and established time series models based on different mathematical formulas according to the variation law of an original data on COVID-19 in China. Their results from the time series kinetic model analysis showed that the COVID-19 infection rate and the basic regeneration number of COVID-19 continued to decline while the results from the sensitivity analysis showed that the time it takes for a suspected population to be diagnosed as a confirmed population can have a significant impact on the peak size and duration of the cumulative number of diagnoses. They then explained the results from the model analysis that the emergency intervention measures (such as blocking Wuhan, restricting the flow of people in Hubei province and increasing the support to Wuhan) adopted by the Chinese Government in the early stages of the epidemic had a crucial restraining effect on the original spread of COVID-19.
In this study, the researchers want to answer the following questions: 1. What effects do the countries and patient's status have on the spread of COVID-19 in West Africa? 2. Could there be a model for the prediction of the COVID-19 infection rate across the West African Sub-Region? 3. How would the model for the prediction of the COVID-19 infection rate across the West African Sub-Region be derived? It is in this direction that this research was conducted by employing, using the available, to derive a model for the spread of COVID-19 in West Africa.

Materials and Methods
The purpose of this research was to employ experimental design in modeling the spread of COVID-19 in the West African Sub-Saharan region. In reference [16], Sweeney defined experimental design as a branch of statistics that deals with the design and analysis of experiments.
In particular, factorial design was used for the whole process involved in the model derivation. In his book, Chris Spatz (2005) described factor in experimental design as independent variable, as in reference [17]. Factorial design is a design in which each complete trial (replication) of an experiment has all possible combinations of the levels of the factors being investigated. Factorial design can be classified into three based on the nature of the factors involved in the experiment. They are factorial design with fixed factors, factorial design with random factors and factorial design with mixed factors. In factorial design with fixed factors, all the factors contain small number of levels and as such all the respective levels are used in the experiment (design). Here, conclusions drawn covers only the levels used in the experimental design. In the case of the factorial design with random factors, the factors involved have levels which have been randomly selected from a large population of possible levels. Here, conclusions drawn cover the entire population levels and not just those which were used in the experimental design. In factorial design with mixed factors, some of the factors are fixed while others are random. In this design, the conclusions drawn about the levels involved in the experiment cover the entire population levels. [18].
In this research, two factors were involved, namely, countries in West Africa as factor A and status of COVID-19 patients as factor B. With factor A, three countries (Nigeria, Senegal and Ghana) were randomly selected from the seventeen West African countries (assumed as population) as the levels while factor B had the status of the COVID-19 patients (Recovery cases, Death cases and Active cases) as the levels.
Since factor A (countries) is a random factor and factor B (status of the COVID-19 patients) is a fixed factor, twofactor factorial design with mixed factors was used for the design, hence the analysis of mixed effect model was employed. More to the point, since there are only two factors with each having three levels, the precise two-factor factorial design used is the 3 2 factorial design.

Data Collection
The data on the number of confirmed COVID-19 cases recorded under the recovery cases, death cases and active cases were collated from the COVID-19 situation reports and timeline for the months from February to September published on the respective websites associated with each of the three countries involved. In the case of Nigeria, the data was retrieved from Nigeria Centre for Disease Control (NCDC) situation report, as in reference [19]. Ghana's own was retrieved from Ghana Health Service (GHS), as in reference [20]. That of Senegal was retrieved from Wikipedia, as in reference [21]. Each replication (observation) is the cumulated number of COVID-19 status per month in each country for the period from February to September 2020.

Concept of Two-Factor Factorial Design
Generally, the two-factor factorial design is as given in table 1 below. Passing onto the general case, let be the observed response when factor A is at the ith level (i=1, 2, …, a) and factor B is at the jth level (j=1, 2, …, b) for the kth replicate (k=1, 2, …, n). Example, means the 4 th observation (replicate) taken at the first level of factor A, and the second level of factor B. The order in which the n observations per cell are arranged is in correspondence with the order of the recordings. It follows that the .. is the total of all the observations under the ith level of factor A; . . is the total of all the observations under the jth level of factor B; . is the total of all the observations in the ijth cell; and ... is the grand total of all the observations in the whole data set.
Also, .. , . . , . and … are the corresponding row, column, cell and grand means. Mathematically, the terms are defined as follows:

Model Formulation and Parameters Estimation
Now considering the situation where one of the factors (A) is random and the other (B) is fixed, a mixed effects model arises and it is given by: Where % is the overall mean effect, ' is the effect of the ith level of factor A, ( is the effect of the jth level of factor B, ('() is the effect of the interaction between ' and ( , and ) is the random error component. Using In order to obtain a unique solution, the following constraints are imposed.

Model Adequacy Checking
Before a conclusion is drawn on the accuracy of a model, the adequacy of the underlying model is checked. The primary diagnostic tool for model adequacy checking is the residual analysis. The residual of each observation for twofactor factorial model is: The residual analysis consists of two main categories. They are: 1. the numerical residual analysis, and 2. the graphical residual analysis.
In the numerical residual analysis, the standardized residual (6 )  is the largest residual among the other residuals and => 7 is the mean sum of the squares of the error component, which is also given by: Where >> 7 is the sum of the squares of the error component, a is the number of levels under factor A, b is the number of levels under factor B and n is the number of observation (replicates) per cell.
If the value of 6 falls within ±3 (ie −3 ≤ 6 ≤ 3 ), then the normality assumption of the residuals (errors) is assured, hence the residuals would not reveal anything troublesome. Therefore, the model is adequate.
In the case of the graphical residual analysis, three main graphs were considered. They are the normal probability plot, the residuals plot in time sequence and the plot of residuals versus fitted values ( ). In the normal probability plot, if the underlying error distribution is normal, the plot would portray some kind of linearity. This would be ensured by placing more emphasis on the dots of the central values than on the dots of the extreme values.
Plotting the residuals in order of the time the data were collected is helpful in detecting the correlation between the residuals. This would be portrayed through the uniform spread out of the dots about the mean of the residuals. This would also imply that there is no violation of the independency or constant variance assumptions. Hence, the model is adequate.
When a graph of the residuals against the fitted values is plotted, the model would be adequate and the various assumptions satisfied if the dots reveal no obvious pattern (that is the plot would be structureless).

Results
The detail of the raw data on the COVID-19 cases, entailing the number of recovery cases, death cases and active cases for Nigeria, Senegal and Ghana are as presented in table 2 below. The first column is headed by country, under which we have Nigeria, Senegal and Ghana; the second column is headed by month, under which we have the listed months from February to September for each country; and the third column is headed by the status of COVID-19 patient, under which we have recovery cases, death cases and active cases. From table 2 above, all the three countries involved started with lower recordings under all the COVID-19 cases in February and this continued to increase tremendously to a point and then began to decrease after some months time. For instance, Nigeria's recovery cases started with 0 patients in February increased through the consecutive months to 18,073 in August and then started reducing to 8720 in September ending. Senegal's own was 0 in February, increased though the months to 4227 and then reduced to 1228 in September. Also, Ghana started with 0 patients in February, increased to 16754 in July and then started reducing through August to 2528 in September.
The two-factor factorial design with mixed factors on the raw data above is as shown in table 3 below. The row factor is country with level headings Nigeria, Senegal and Ghana. The column factor is the status of the COVID-19 patient with level headings Recovery cases, Death cases and Active cases. Each cell contains eight replications (observations) arranged in correspondence with the order of the months in which they are recorded for the period from February to September with the k=1, 2, 3, …, 8 term of assigned to the months correspondingly. The last two observations in each cell are also the cell total ( . ) and cell mean ( . ) respectively. The last two columns of the table contain the row totals ( .. ) and row means ( .. ) respectively while the last two rows of the table contain the column totals ( . . ) and column means ( . . ) respectively. Moreover, the last cell at the right down corner contains the grand total ( ... ) and the mean ( ... ).
The totals and means were calculated using the corresponding equations preceding table 1 above. Since each observation in a cell is estimated by the mean of the cell it is well to use the column means ( . . ) to estimate the various COVID-19 cases across the West African Sub-region. Not all, the ground mean ( ... ) is used to estimate the population's average COVID-19 infection rate. Hence, from table 3 above, the number of COVID-19 recovery, death and active cases across the West African Sub-region for the period between February and September 2020 were estimated at about 4524.792, 71.833 and 3666.875 respectively per month. Also, the monthly COVID-19 infection rate across West Africa was estimated at 2754.5.

Computational Process
The computational processes involved here are the calculation of the respective sum of squares, mean sum of squares and the estimated errors/variances (i.e.E 's) Finding the respective sum of squares, with reference to These are the variance components of the model after eliminating the mean squares containing the fixed factors. And it is clear that none of these variance components is zero or negative.

The Prediction Model
Since the three levels of factor A (country) were randomly selected from seventeen countries in West Africa and the three levels of factor B (the status of COVID-19 patient) were fixed, two-factor factorial design with mixed factors was used. Hence, the mixed effect model: = % + ' + ( + ('() + ) .
Through the model formulation processes and under some constraints, the number of COVID-19 patients under a particular status per country may be estimated by the corresponding cell mean. This is mathematically given by: = .
On the broader perspective, the number of COVID-19 patients under a particular status in West Africa may be estimated by the column mean. This is: Precisely, the number of the recovery, death and active COVID-19 cases per month in West Africa could be estimated as 4524.792, 71.833 and 3666.875 respectively. These are approximated to the nearest whole figures as 4525, 72 and 3667 respectively.
Also, the population mean of the number of confirmed COVID-19 cases per month in West Africa is estimated by the sample grand mean. That is: %̂= … =2754.5≈2755 (to the nearest whole figure)

Results of the Model Adequacy Checking
Having derived the model for the spread of COVID-19 in West Africa, there is the need to check the adequacy of the model. This is done by first finding the residuals from the prediction model using the relation: That is the difference between the actual value and the estimated/predicted value. The detail of the results is as shown in table 4 below. In table 4, the first column is labeled standard order, that is the order in which the observations (in   table 3) are arranged, starting from the first cell down to the  last cell of the recovery case column through the death case  column to the last cell of the active case column. The second,  third and fourth columns of table 4 are headed by Actual  value ( ), Predicted value ( . ) and Residual ( ) ) respectively. The standardized residuals (6 ] Y 2.622 falls within the interval ?3, indicating that the normality assumption of the residuals is assured. Also, the errors are independent of the variability of the country's effects, COVID-19 status's effects, and the effects of the interaction between country and COVID-19 status. The normal probability plot presents normal probability (in percentage) on the vertical axis and residuals on the horizontal axis. The detail of the plot is as shown in the excel output in figure 1 below: In figure 1 above, the normal probability generally exhibits some kind of linearity from left to right in ascending order. Some plots (dots) too are scrambled around the foot of the vertical axis. However, the effect of these scrambled residuals may not be significant on the variance than that of those who portray the linearity. Hence, the independency of the errors (residuals) was not violated in a significant manner.
Plotting residuals against time in months, the excel output is as shown in figure 2 below. The months are coded in a way that February is assigned 1, March is assigned 2 up to September which is assigned 8. This plot of residuals in time order of data collection on the number of COVID-19 cases from February to September 2020 shows that the plots are uniformly distributed about the horizontal axis. This implies that the independence and constant variance assumption on the errors had not been violated. Another way of graphical residual analysis for the model adequacy checking is plotting the residuals against the predicted values ( ). This is as shown by the excel output in figure 3 below. The predicted values ( ) are the cell means ( . ) in table 3. In the graph, even though the plots exhibit series of vertical linearity about the horizontal axis, they do not together follow a unique pattern, hence they are structureless. This is a clear indication that the normality and independency assumptions of the errors (residuals) were accurate.

Discussion
This research was purposed at using experimental design to derive a model for the spread of COVID-19 across the West African Sub-region. In effect, two-factor factorial design with mixed factors was used, resulting in the design shown in table 3.  From table 3, it is clear that the average monthly COVID-19 recovery, death and active cases per country in West Africa were about 4525, 72 and 3667 respectively for the period between February and September, 2020. Also, the average monthly number of confirmed COVID-19 cases was 2755 per country in West Africa right from February to September 2020. These values were estimated by using the various terms in the mixed effect model: = % + ' + ( + ('() + ) .
Since none of the estimated variance components (E . P = 3344138.648 , E . PS = 2406022.229 and E . 7 = 20176984.25) was zero or negative, it implies that the effect of the country, the effect of COVID-19 patient status, the effect of interaction between country and COVID-19 patient status, and the effect of the error incurred during the process all helped in explaining the spread of COVID-19 in West Africa, just as Ivorra and associates' were able to use the reported data on the spread of COVID-19 in about five provinces in China to identify the model parameters which could be of interest in estimating the spread of COVID-19 in other countries, as in reference [14].
Also, since the standardized error ( 6 ] ) = 2.622 falls within the interval±3, the normality and the independency of the residuals are assured. Hence, the mixed effect model is adequate for the prediction of the various COVID-19 cases per country across the West African Sub-region.
From the normal probability plot in figure 1, the linearity exhibited by the plots is an indication that the residuals (errors) recorded are normal and independent. Meaning that the mixed effect model is appropriate for the prediction of the various COVID-19 cases. Also, in figure 2, the plots were uniformly spread out about the horizontal axis, confirming the independence and normal assumptions of the errors (residuals). Hence, the adequacy of the mixed effect model. Moreover, the plot of residuals verses predicted values in figure 3 being structureless proves the normality and independency assumptions of the residuals, signifying the adequacy of the mixed effect model. From table 3, it is clear that Nigeria led in all the COVID-19 cases among the three randomly selected West African countries followed by Ghana and Senegal. With this, it was revealed that Nigeria happens to have the highest number of contact times followed by Ghana and Senegal. Through this, further investigation revealed that most of the West African countries with higher contact times recorded higher COVID-19 infection rate, and this is in line Zebs and his associates' findings through their derived model that coronavirus spreads much through higher number of contact times, as in reference [12].
Last but not the least, since the processes involved in the derivation and formulation of the mixed effect model were basically statistical, the derived model is classified as a statistical model and this would help in the increase of the few number of epidemiological models for COVID-19 based on statistical process, as revealed by Wang and co in reference [13].

Conclusion
3 2 factorial design with mixed factors was successfully used for the data on the number of the various COVID-19 cases. The two factors involved serve as the exponent 2 while the three levels of each factor serve as the base 3, hence the 3 2 factorial design. Also, the levels of factor A (country) were randomly selected while those of factor B (status of COVID-19 patient) were fixed, hence the mixed effect aspect of the design.
From table 3, the 3 2 factorial design used to construct the table helped in the adequate estimation of the average number of COVID-19 recovery cases, death cases and active cases per month in West Africa. These are the corresponding column means . . = 4524.792 , . . = 71.833 and .$. = 3666.875, and these are approximated to the nearest whole figures as . . = 4525 , . . = 72 and .$. = 3667 respectively since there is no part human being. Also, the average number of COVID-19 confirmed cases per month in West Africa was adequately estimated by using the grand mean ( ... ) = 2754.5, which is 2755 to nearest number of human beings.
All the residual analysis conducted for the model adequacy checking proved that the residuals (errors) incurred in the study are normal and independent. Therefore, there was enough evidence to neglect the residual (error) effect, ) , in the mixed effect model. Hence, the model: = % + ' + ( + ('() is adequate for the prediction of the various COVID-19 cases in West Africa. Also, the various constraints imposed during the model formulation were in order. This means that the various estimations made were correct and adequate. The uniqueness of this research is portrayed in the fact that it is the first time experimental design and analysis is being used in the study of COVID-19 issue since its discovery in December 2019 in China.
The researchers would also wish to recommend that future researches should be directed towards the use of factorial design to study the effect of any possible vaccine on the spread of COVID-19 in a locality.