Modeling Primary School Absenteeism and Academic Performance in Ethiopia: A Multivariate and Count Regression Models Approaches

School absenteeism and low academic performance at primary schools remain a big issue for developing countries like Ethiopia. Thus, this study aims to determine predicting factors influencing academic performances and school absenteeism jointly at primary schools in Ethiopia. A cross-sectional data were obtained from the Young Lives project from wave 1 (the starting month of academic year) and wave 2 (the last month of academic year). Multivariate regression model was used to investigate the predictors on the linear combination of academic performances and count regression model was also used to investigate the predictor of school absenteeism. In fact, both Poisson and Negative Binomial Regression Models were considered but the latter better fit to the data that have been used for this study than the former model. The result at national level showed mean of school absenteeism and academic performance at wave 1, respectively are 6 days and 67.64 scores and the average performance at wave 2 is also 61.44 score. It has been found out that the number of meals, number of siblings, mother’s literacy, survivor-ship of mother, type of school siblings attending, pre-school attendance, time to get to school, grade repeating, school drop outing, extra class attendances and the availability of helping person with school works at home have a combined effect on the school absenteeism and academic performance. Thus, potential stakeholders should pay attention for the aforementioned factors so as to reduce school absenteeism and then maximize academic performance.


Introduction
Education has long been recognized as a fundamental human right and an important building block in the development of children and broader society [1]. The primary level of education tends to play crucial role in subsequent levels of education and develop the critical skills and abilities needed to become productive and responsible adults [2], [3]. It is recognized by education scholars that effective participation and engagement in primary education is an important precursor to educational success [4]. Despite this, the level of participation and engagement even in compulsory education processes can generally be viewed on a spectrum. This has been a feature of the educational literature across a range of countries over many decades, and underscores the fact that there is typically a segment of the population who has persistent difficulties in this area [5].
The cost of student absenteeism in terms of investment in their education is still made by governments but the benefits are not received by the student or community and become plagues both urban and rural elementary schools of low developing countries such as Ethiopia. The issue of chronic absenteeism due to different reasons such as economic reasons of their parent [6] is prevalent in many urban and a rural school usually begins in the main subjects such as mathematics and English. Once a student begins to fall behind in one of these subjects, he or she tends to gradually tends to temporary dropout of the school and the student started absenteeism in all of classes, but the net result of frequent school absenteeism is permanently dropping out of the school. Persistent non-attendance may require additional resources to re-engage students in schooling and to help them catch up on missed learning involve learning and personal student support staff and interagency collaborations which incur additional cost to the country. This poor school attendance represents a loss of educational opportunity and also a cost to the community.
The impact of chronic absenteeism on academic achievements, however, has been indicated by different studies. For instance, [7] showed in his studies that a positive relationship of student attendance and academic achievements at primary level of education. Other scholars have also found a strong correlation between school attendance and academic achievement of student. Similarly, [8] found out that class attendance is the main predictor of academic achievements than any other variables at primary level of education.
In Ethiopia, the lowest class repetition rate (6.4%) was observed in grade 6 while the highest class repetition rate (9.0%) was registered in grade 1 mainly due to school absenteeism according to ministry of education report. The repetition rate for grades 5 and 8 were the next highest in 2011/12 academic year. The highest (18.6%) nationally registered dropout rate was in the year 2008/09 while the lowest (13.1%) was observed in the year 2009/10 [9]. The figures reported as lowest repetition or dropout rate are serious problem challenging educational outcomes particularly skilled man power production and wasting resources for any country that seeking to have efficient educational system and improved utilized limited resources hence the rates are commonly used to measure the efficiency of the educational in producing graduates of a particular education cycle or level. Thus, this study model and deeply investigate the school absenteeism and academic achievement of primary school students in Ethiopia.

Response and Explanatory Variables
School absenteeism and Academic performances are the two response variables in this study paper, while their expected determinants were listed in Table 1 along with their coding.

Source of Data and Young Lives Project
This study was based on the secondary data that comes from Young Lives Project (the school survey data that was introduced in Ethiopia in 2010). The Young Lives survey is an international innovative long-term project investigating the changing nature of childhood poverty in four developing countries (Ethiopia, India (in the state of Andhra Pradesh), Peru and Vietnam) over 15 years. Young Lives is core-funded from 2001 to 2017 by UK aid from the Department for International Development (DFID), and co-funded by the Netherlands Ministry of Foreign Affairs from 2010 to 2014.
Sub-studies are funded by the Bernard van Leer Foundation and the Oak Foundation.
The total number of students covered in this study was 13,724 at wave 1 (September, the start of Ethiopian academic year), and 11,985 at wave 2 (May, the end of Ethiopian academic year) in all sentimental sites of seven Ethiopian regional states including Addis Ababa city administration. The descriptive analysis result indicated that 12.67% of wave 1 students have dropped out the school in the study year. Among those students who have been covered at wave 1, only 11,778 (85.82%) and 11,790 (85.91%) of them have taken the literacy test and Numeracy test respectively.
Among those students who have been covered at wave 1, only 11,778 (85.82%) and 11,790 (85.91%) of them have taken the literacy test and Numeracy test respectively. The difference in number of students who have been covered and who have taken the tests is equal to the number of students who have not taken in accordance with the tests. For example, (13,724 -11,778 = 1946) is a total number of students who have not taken the wave 1 literacy test.
Similarly, among those students who have been surveyed at wave 2, only 10,072 of them have taken the literacy test while only 10,079 of wave 2 covered students have taken the Numeracy test. In the comparison of the percentage of students who have taken the tests at both waves, the percentage of students who have taken the tests at wave 2 is higher than that of wave 1 students among the registered students at the school. There were about 50.4% female and 49.6% male respondents, and equal percentages of students for both rural and urban sites. 75.5% of the total participants are in the age group8-12 whereas the remaining 21.7% and 2.8% are respectively, in the age groups 13-15 years and 16-20 years. The results of descriptive analysis revealed that the average age of the participants was 12 years. More than 24% of students have ever repeated the grade since they started the schooling. Among those students who have ever repeated the grade, only 39.7% of them repeated the grade because of the dropping out the school. Obviously, the remaining percentage repeated because of failure in examinations.

The Statistical Models
Different statistical methods were used to achieve the goals of this study includes multivariate multiple linear regression model, multivariate analysis of variance (MANVA), Poisson and negative binomial regression model and binary logistic regression and simple linear regression model as given below:

Multivariate Methods
1. Multivariate Analysis of Variance (MANOVA): The hypothesis which concerns a comparison of vectors of group means and the goal is to maximally discriminate between two or more distinct groups on a linear combination of quantitative variables.

Multivariate Multiple Linear Regression models:
Regression analysis for predicting the values of one more response (dependent) variables from a collection of predictor (independent) variable values. 3. Canonical Correlation Analysis: An analysis that focuses on the correlation between a linear combination of the variables in one set and linear combination of the variables in another set.

Count Regression Models
Poisson regression model: Assumes that the response variable has a Poisson distribution and the logarithm of its expected value can be modeled by a linear combination of unknown parameters. Negative Binomial Regression model: regression model which is a direct extension to the Poisson model that allows for over-dispersion.

Directed Acyclic Graphs (DAG)/Causal Diagrams
Graphical models for causal relationship that can serve a role complementary to conventional models.

Bivaraite Descriptive Analysis
The result showed that the average academic performance of pupils was 67.64 at wave 1 and 61.44 at wave 2, where as the average school absenteeism in one year was 6 days. The average academic performances of students differ by if the student attended pre-school or not. Accordingly, students who attended pre-school education achieved an average academic performance of 67.4 while who did not attend achieved 59.88 at wave 1 but there is no large difference at wave 2. The average academic performance differs from region to region in both waves. For example, the highest average academic performance was observed in Tigray with the mean of 71.20 followed by Oromiya region with the mean of 70.24 whereas the lowest performance was observed in Amhara region with the mean of 58.28 followed by Somali region with the mean of 60.28. At wave 2, there was a bit little change in average academic performance with the highest was observed in Addis Ababa city with the mean of 74.44 and the lowest in Affar region with the mean of 60 but the performance in the rest of five regions is nearly the same.
Both the average academic performance of students and school absenteeism do differ according to the number of meals they normally consumed per a day. Accordingly, the average academic performance at wave 1 of students who normally ate only once per aday was 53.80 while the students who normally eat three or more meals per a day was 64.88 which was consistent result reported by [10] given that similar data sets as indicated in Table 2. With regard to school absenteeism and eating habit, the more number of meals they normally eat per a day the less number of days a student absent from the school may imply that there are some students who absent from the school for searching for the meal. Similarly, the school absenteeism and the average academic performance of students differ by the number of periods of teaching their class receives per a day. The highest average academic performance (77.00) was observed at wave 1 for students whose school attend four periods of teaching in a day as opposed to the lowest average academic performance (64.80) was observed for students whose school attended eight periods of teaching per a day may implying that lengthy period becoming boring and the period of classes per day something in between 4 and 8 at primary level. And, the lowest average school absenteeism (2 days) was observed in school where the periods of teaching are eight per a day as opposed to the highest average school absenteeism (4.05days) was observed in a school where the periods of teaching are twice per a day may indicating that some modification made the number periods per day.
The result in Table 2 showed that both the average academic performance of students and their average school absenteeism also vary by the type of school attending characteristics by siblings. Accordingly, the highest average performance (68.00) at wave 1 was observed for the students whose sister(s) or brother(s) is/are attending the private school and the lowest (61.76) was for the students whose sister(s) or brother(s) is/are not attending any school. With regards to school absenteeism, the lowest average school absenteeism (4.22) was observed for the students whose sister(s) or brother(s) is/are attending a private school while the highest average school absenteeism (6.42) was for the students whose sister(s) or brother(s) is/are not attending any school may indicate siblings attending characteristics influence the school attending and achievement.

Result of Multivariate Multiple Linear Regression Analysis
In this section, the overall determinants of academic performance were assessed simultaneously to identify the basic determinant factors which were important for identifying factors of primary school pupils' academic achievement. The results of multivariate regression model in Table 3 indicated that the combined factors that found to have a significant effect on the linear combination of average academic performance at wave 1 and at wave 2 were: Age, number of meals a student normally eat per aday, number of younger siblings, number of older siblings, ability of mother to read and/or write, ability of sister(s) and/or brother(s) to read and /or write, type of school siblings attending, preschool attendance, time spent to travel to school, grade repeating, school drop outing, time spent for paid work per day, extra class attendance of numeracy, extra class attendance of literacy and availability of helping person with school works at home.
The result of β coefficients in Table 3 cannot be assembled into a regression equation directly, as in OLS regression, due to differences in computation [11]. It can be interpreted, for instance, the average performance was a continuous variable, so in the output illustrated in Table 3, the negative β coefficient for number of meals a student normally eat per a day (nummeal) indicates that respondent's number of meals (nummeal) is significantly associated with scoring average performance at wave 1 less. Similarly, the positive β coefficient for school dropout (dropschl) indicates that school dropout is significantly associated with scoring average performance at wave 1 more. And, the rest of beta coefficients can be discussed in similar ways.

Result of MANOVA
A MANOVA was used to see the main effects of categorical variables on multiple dependent continuous variables. It is a way to test the hypothesis that one or more independentvariables, or factors, have an effect on a set of two or more dependent variables. The result in Table 4 revealed that the overall F test (over all three dependent variables) and contains the set of multivariate tests of significance that indicate whether there are statistically significant differences among the groups on a linear combination of the academic achievement at wave 1 and wave 2 based on Wilks' lambda (λ) test statistic, the most commonly used to carry out MANOVA [12] measure percent of variance in the dependent variables that is explained by differences in the level of the independent variable whose value varies between one and zero. In this study, all the four different multivariate tests gave the same result of significance of each explanatory variable. Accordingly, all the four test statistics explored that survivor-ship of mother, and openness and honesty of teachers are the only two explanatory variables that have no significant impact on academic performance at wave 1 and wave 2 among all the considered explanatory variables while other covariates were significant effect on student performance.
Therefore, it has been found that gender, age, number of meals student eat, number of older siblings, number of younger siblings, survivor-ship of father, ability of mother to read/write, ability of father to read/write, ability of siblings to read/write, school type siblings attending, pre-school attendance, time taken to get to school, grade repeating, dropping out school, time spend working for pay on a usual school day, attending extra classes of numeracy, attending extra classes of literacy, teachers caring about students, existence of helping person with school work at home and time spend on homework outside school have a significant effect on the academic performances in both waves. Continuing to examine the results further, it has been found that the partial eta squared associated with the main effect of grade repeating is 0.032 (which is maximum) and the power to detect the main effect is 1. And also, the Wilks' Lambda associated with grade repeating is 0.968 indicating that 96.8% of variance in the set of dependent variables is explained by differences in the level of the independent variable (grade repeating). These are very good results! As a hypothetical idea, the parameters of grade repeating up would be written as the following way: "A one-way MANOVA revealed a significant multivariate main effect for grade repeating, Wilks' λ = 0.968, F (3, 9106) = 98.82, p < 0.05, partial eta squared =0.032 and Power to detect the effect is 1. Thus, the alternative hypothesis is confirmed." And, similar discussion would hold true for the rest of significant explanatory variables. Of those twenty two investigated predictors, using partial eta squared as a criterion, grade repeating is the most important variable (though it might have been a weak predictor of the set of academic performances).  As in case of multivariate multiple linear regression analysis and multivariate analysis of variance, the result of canonical correlation analysis that can be observed from Table 5 indicates that all the four multivariate test statistics are significant. The result revealed that all the coefficients are significant (p = 0.00) as observed under 'Dimension Reduction Analysis' part of the Table 6. And, the hypothesis that pairs of canonical variates tested simultaneously CVx is independent of CVy could be rejected. Next tests all remaining pairs (as a set) with the first pair removed, then all remaining pairs with the first and second pairs removed and so on. In this study, all the canonical correlations were significant (p = 0.00) with the first removed. However, it is possible this test would be significant even though a test for the correlation itself would not be. As it is simply a correlation coefficient, one should be more interested in the size of the effect, more than whether it is different from zero. Next, looking at the number of pairs of canonical variates formed under 'Eigen values and Canonical Correlations' part indicated in Table 7 and the canonical correlations among the three pairs (the maximum) of canonical variates constructed. The first is always of most interest and here probably the only one in this study. The first has a canonical correlation of 0.37, the second 0.21. The first canonical correlation have a value at least as large as the largest correlation coefficient between one variable and the opposite set of variables [13], but that canonical correlation can be much larger than that largest correlation coefficient. Standardized canonical coefficients for dependent variables and Standardized canonical coefficients for covariates parts of the Table 7,  Table 8, Table 9, Table 10 and Table 11 gave us the standardized coefficients (a 1 , a 2 ,..., b 1 , b 2 ,...) for each pair of canonical variates. These coefficients have the same interpretation as regression coefficients, and are provided for each pair of variates created, regardless of the correlation's size or statistical significance.

Result of Simple Linear Regression Model
Simple linear regression analysis was used to investigate the effect school absenteeism on later academic performance of the pupils. The result linear regression analysis indicated in the above Table 12 revealed that the school absenteeism has a significant effect on the later academic performance (Average academic performance at wave 2) of a pupil. The significance test from the ANOVA table also indicates the same result. Thus, fitting the simple linear regression of academic performance on school absenteeism as: where Y and X are the average academic performance at wave 2 and the number of days absent from school since wave 1 respectively. The regression coefficient can be interpreted as "for each day a student absent from the school, his/her academic performance would be reduced by 0.074 scores. The higher number of days a student absent from the school, the lower test scores a student would achieve. Under this section, the modeling was based on the DAG Figure 1. Consider the proposed study of the relation of mother's survivor-ship and pre-school attendance to the average academic performance of a student at wave 1, and intending to study how these two variables (exposures) affect the average academic performance of a student at wave 1. Suppose the narrative asserts that mother's survivor-ship and pre-school attendance are independent among the primary school children. From the DAG, it is observed that considering "mother's survivor-ship" as the exposure and "average academic performance at wave 1" as the outcome, it is not needed to adjust for any other covariates. The same is true if "pre-school attendance" is considered as the exposure. The statistical model considered under this section is the generalized linear models (binary logistic regression with the assumption of 'no interaction effect'). The explanatory variables are categorical coded as Mother survivor-ship (0 = mother not alive, 1 = mother alive) and Pre-school attendance (0 = not attended pre-school, 1 =attended pre-school) whereas the outcome variable, average academic performance at wave1, is also coded as binary (0 = below the mean, 1 = above the mean), where the mean score is 67.64.

Result of Directed Acyclic Graphs (DAG): Causal Diagrams
The result in the diagram indicated that "holding preschool attendance constant, the odds of student whose mother alive scored an academic performance above the mean was 1.048 times (4.8% higher than) the odds for a student whose mother not alive". Similarly, keeping the survivor-ship of mother constant, the odds of a student who attended a preschool education score an average academic performance above the mean was0.432 times (56.8% higher than) the odds for a student who did not attend the pre-school education.

Results of Count Regression Models
The negative binomial and the Poisson regression analysis were used to examine the effect of covariates on school absenteeism, considering absenteeism as count variable which count the number of school absents. The result in Table 13 showed that the comparison between the Poisson and the negative binomial regression models to investigate the effect of covariates contributing to school absenteeism. The regression parameters for both models give similar estimates. The coefficients in this Table  13 were nearly the same to each other in both models, but the standard errors from the Negative Binomial regression model were all approximately (for instance, 1.5times) larger than the standard errors in the Poisson model, which follows from the variance of the Negative Binomial distribution and significant dispersion.
To make the inference about the regression parameters, the Negative Binomial model give standard errors that are relatively larger than the Poisson regression model and hence, resulted inan insignificant regression parameters for some of covariates in the Poisson regression parameters. Because the only difference between the Poisson and the Negative Binomial lies in their variances; regression coefficients tend to be similar across the two models, but standard errors can be very different. When the outcome variable is over-dispersed relative to the Poisson distribution, standard errors from the Negative Binomial model was larger but more appropriate. Thus, p values in Poisson regression are artificially low and confidence intervals too narrow in the presence of over-dispersion. The deviance for the Poisson regression model in school absenteeism was relatively lower than the degree of freedom. For instance, the deviance was 0.97 times (in Table 13) smaller than the degree of freedom, and thus, indicating the absence of over dispersion. The comparison of the two models was also implemented based on AIC and BIC. Thus, based on the AIC and BIC criteria, the Negative Binomial is better fitting the data than the Poisson regression model due to smaller values.
Thus, taking into account the better fitting of the data based on AIC and BIC criteria, the inference was made on the result of Negative Binomial regression model. Accordingly, the result in Table 5 indicated that the covariates were found to be significant effect on school absenteeism were: Gender, Age, number of meals a student normally eat pera day, number of older siblings, number of younger siblings, survivor-ship of mother, ability of mother to read and/or write, ability of father to read and/or write, ability of sister(s)and/or brother(s) to read and /or write, time to get to school, grade repeating, school drop outing, extra classes attendance, availability of helping person with school works at home and time spent on homework outside school.

Conclusions
This study aim to investigate the covariates contributing for school absenteeism and associated with average academic performance based on young lives data using multivariate methods and count regression models. The result of this study found evidence that most of the considered variables have significant effect on school absenteeism and academic performance. The results showed that the nature of variation of school absenteeism and academic performance was different by the number of meal sa student normally eat per a day, type of school siblings attending, pre-school attendance, grade repeating and region.
The multivariate methods applied in this study showed that the covariates that affect the linear combination of academic performances at wave 1 and 2 in Ethiopia include: gender, number of meals a student normally eat per aday, number of competing siblings, ability of mother to read/write, type of school siblings attending, pre-school attendance, time to get to school, grade repeating, school drop outing, time spent working for pay on usual school day and availability of helping person with the school works at home. The Negative Binomial regression model revealed that nearly all that associated with academic performances were also contributing to the school absenteeism too, which was very interesting in this study.
To investigate the effect of school absenteeism on the average academic performance at wave 2 further, the simple linear regression model was fitted. The result indicated that school absenteeism has a significant impact on the academic performance of pupils at primary level i.e. negatively associated with student's academic performances. Moreover, this study found out that pre-school attendance is one of the significant predicting covariate of academic performances that adversely affecting. The number of meal sa student normally eats per a day is also an significantly associated covariate that predict both school absenteeism and academic performances.
It is the hope that the results obtained from this study were of great concern to the family of children and policy makers because of children are generations who are expected to change the country if and only if they are well educated. In order to formulate policies to control the problems of school absenteeism and low academic performance in Ethiopia, it is important to understand the effect of reforms on the primary school and intervention is necessary targeting the significant predictors.