Logistic Mixed Modelling of Determinants of International Migration from the Southern Ethiopia: Small Area Estimation Approach

The main objective of this study is to investigate socio-demographic and economic characteristics of a household on international migration and to estimate small area proportions at district and enumeration area level. Migration status refers to whether a household has at least one member who ever migrated abroad or not. A total of 2288 data are collected from sixteen randomly sampled districts in Hadiya and Kembata-Tembaro zonal areas, Southern Ethiopia. Several versions of the binary logistic mixed models, as special cases of the generalized linear mixed model, are analyzed and compared. The findings of the study reveal that about 39.4% of the households have at least one international migrant, and the rest 60.6% have no such migrants. Based on analysis of the generalized linear model and stepwise variable selection, four predictors are found to be significantly related to household migration status at 5% significance level. These are age, occupation, and educational level of household head and family size. Then twelve mixed models are analyzed and compared. The best fitting model to the data is found to be the logistic mixed regression model consisting of the six predictors with age nested within districts as random effects. Area or district specific random effect has variance of 1.6180. The district level random variation founded on final model with six predictor variables about the presence of migrant in the households such as the variation between districts is 33% and variation within the district is 67%. From analysis of the final model, it is found that the likelihood of a household of having international migrant increases with head's age and family size. An increase of family size by one person increases the log odds of having migrant by 0.131 indicating that large family size is one of the determinants for migration in the study area. The migration prevalence varies among the zones, the districts and the enumeration areas. Household characteristics: age, educational level and occupation of head, and family size are determinants of international migration. Community based intervention is needed so as to monitor and regulate the international migration for the benefits of the society.


Introduction
Migration is a complex phenomenon influenced by economic, social, political, geographical and environmental factors. Migration is defined as the movement of a person or a group of persons, either across an international border, or within a state. It is a population movement, encompassing any kind of movement of people, whatever its length, composition and causes; it includes migration of refugees, internal displaced persons, asylum seekers, smuggled migrants, victims of trafficking, economic migrants, and persons moving for other purposes, including family reunification. It is a large concern for policy makers because flows of population can significantly affects local politics, social, economic, and ecological structures for both sending and receiving countries ( [1], [2]).
According to [3], the number of international migrants worldwide has continued to grow rapidly over the past fifteen years reaching 244 million in 2015, up from 222 million in 2010 and 173 million in 2000. Nearly two thirds of all international migrants live in Europe (76 million) or Asia (75 million). Northern America hosted the third largest number of international migrants (54 million) and followed by Africa (21 million). Between 2010 and 2015, the international migrant stock grew by an average of 1.9% per year. The majority of the world's migrants live in high-income countries. As of 2015, 71% of all international migrants worldwide equal to 173 million were living in high-income countries. Of these, 124 million migrants were hosted in high-income OECD countries, while 49 million migrants were living in other in high-income non-OECD countries. Only 29% or 71 million of the world's migrants lived in middle or low income countries. Of these, 61 million migrants resided in middle income countries and 9 million in the low income countries. In Africa, Republic of South Africa was the only country hosted the largest numbers of international migrants' equivalent to 3 million in the year 2015.
Migration with its associated remittance has diverse socioeconomic impacts such as increasing better opportunities for the migrant, improving the livelihood of sending households, contributing economic growth and has emerged as an important policy issue in developing countries [4]. The most recent estimates suggest that there are at least 50 million irregular migrants in the world over one fifth of all international migrants, which is a significant number of whom paid for assistance to illegally cross borders.
The study on the irregular migration of youth from Southern Ethiopia to Republic of South Africa indicates, it is facilitated by a network of human smugglers in Ethiopia work in cooperation with those smugglers from Kenya and Somalia [5]. The problem of irregular migration to Republic of South Africa is widely observed in two zones of the Southern Ethiopia, namely in Hadiya and Kembata-Tembaro zones. The study results on quantitative cross-sectional study, which was carried out on the randomly selected 4 local districts of two zones. The study revealed that irregular migration was denominated by young aged 20-34 and the conclusion made indicates that most of the young adults who move illegally to Republic of South Africa had suffered several problems like being smuggled, physical abuse, and human right violation and in some cases even death [5]. It is known that at regional and national level of Ethiopia many people of Hadiya and Kembata Tembaro zones are migrating to the Republic of South Africa. The households residing in the two zones are sending young adults irregularly to Republic of South Africa and elsewhere abroad are explored.
The main objective of this study is to investigate impacts of socio-demographic and economic characteristics of a household on international migration and to estimate small area proportions at district and enumeration area level. The specific objectives are to: evaluate the socio--demographic and economic characteristics of migrant and non-migrant households in districts of Hadiya and Kembata Tembaro zones; estimate the local district and enumeration area level proportions of international migration; and develop generalized linear mixed models for international migration status. The study is conducted in highly vulnerable areas by irregular migration. Results are expected to be used as a basis for planning, decision and policy makers and different program implementation at the regional as well as national level in Ethiopia. This study can be a basis to conduct indepth further studies in specific aspects of international migration along with small area estimation techniques.

Description of the Study Area
The study areas are Hadiya and Kembata Tembaro zones which are highly vulnerable areas by irregular migration in Southern Ethiopia. Based on statistical report of the 2007 population and housing census results Hadiya Zone has a total count of 231,846 households and Kembata-Tembaro zone has a total count of 122,580 households. Hadiya and Kembata Tembaro zones have 11 and 8 districts respectively.

Sampling Design
In this study the multi-stage sampling design is employed as the sampling design. When the number of small areas is large, it is not feasible for travel cost or time to survey some units in all of them. For travel cost or time to survey, it is sometimes more convenient to use a multi-stage sampling design. Therefore, the sample can be made from administratively clustered small areas and often reduces interviewer travel costs. Sample design for small areas can be determined by only surveying a subset of small areas. In such case sample designs represented by multi-stage sampling where clusters are considered as small areas ( [6], [7]). In this study the sampling frames of 19 local districts are considered as small areas. Four-stages sampling technique are implemented; in the first stage sample of local districts is taken as the primary sampling units. After selecting a sample of local districts, the second stage is selection of samples of Kebeles within each selected local districts and third stage is sampling of enumeration areas and fourthly households within the each selected enumeration areas are chosen. A relisting of all households in sampled enumeration areas are carried out as suggested by [7].

Sample Size Determination
A total population of 354,426 households is grouped exclusively and exhaustively into 19 local districts in such a way that each small area contains a number of Kebeles, enumeration areas and households as subpopulations. Multistage sampling is used: first 16 local districts are selected, at second stage 71 Kebeles are selected, followed by the selection of 89 enumeration areas in the third stage, and finally using systematic random sample selection 2381 households are selected. The sample size determination for local districts is used to estimate the proportions of international migration using the formula as in [8] - [10]: where N is the total number of local districts or clusters anddlevel of precision, d = 0.1. This gives the calculated number of local districts, n = 16 . Using the same procedures from a total of 328 Kebeles and 511 enumeration areas, 71 Kebeles and 89 enumeration areas are sampled. The total number of households in the selected 16 local districts is N, which is equal to 301,531. Then the sample size required n of households is the first estimated by equation (2) with finite population correction in equation (3): where P $ the proportion of international migration in each small area is taken as 0.5. W $ is the proportion of the population in each local district h = 1,2, … , n = 16 computed as ratio of subpopulation size N $ to the total size N , Z +/, = 1.96 is a critical value of the standard normal distribution with significance level, α = 5% , and level of precision, d = 0.02. Then using equation (2), the sample size is estimated to be n = 2401 and after finite population correction by equation (3), the sample size becomes n = 2381 households.

Description of the Data
Data on international migration are collected from households using designed questionnaire. At each household level the interview is carried out with the household heads by well trained numerators. Data collections at local district level are coordinated by Woreda labour and social affairs officers. At each enumeration areas level data are collected by selecting one enumerator with good performance from Kebele agricultural extension workers, health care workers and Kebele administration heads.
The response variable is the migration status of a household as reported by household heads (HH). It is a dichotomous with outcome value, 4 56 = 1, if there is at least one migrant in the household ever migrated abroad and if not the outcome value, 4 56 = 0 . Predictor variables focus on socio-demographic and economic characteristics of each household and its head: gender of head, age of head, marital status of head, educational status of head, place of residence, family size, occupation of head, ethnicity of head, religion of head, farm land size, zone, district, and enumeration area of household. There is about 4% non-response rate and so final data size accessed from 2288 households. Therefore, a total of 2288 households head data are collected from 16 randomly sampled local districts and 86 enumeration areas are collected respectively.

Small Area Estimation Methods
Sample surveys have long been recognized as cost-effective means of obtaining data on wide-ranging topics of interest at frequent intervals over time. In most surveys, estimates are used in practice to provide estimates not only for the total population of interest but also for a variety of subpopulations [11] - [12]. Small area estimation is the process of using statistical models to link survey outcome variables to a set of predictor variables known for small areas, in order to predict area-level estimates. It is becoming important in survey sampling due to a growing demand for reliable small area statistics from both public and private sectors. Small area estimation method seeks to improve the precision of the estimates when standard methods are not accurate enough and produces estimations for the small areas having not reliable direct estimators ( [11], [13], [14]). In this study, small area estimation technique is used to predict area level estimates for proportions of international migration.

Generalized Linear Mixed Model
Extension of linear models to generalized linear models (GLM) was first proposed by [15] by noting that the linear model consists of three components: (i) independent observations (ii) mean of observation as linear function of some covariates, and (iii) constant variance of observation. The observation has probability distribution that belongs to the exponential family. The variance is a function of the mean of observation. GLMs generalize a variety of models including normal, binomial, Poisson, and multinomial. In GLMs, the predictor variables x affect the response Y via the linear predictor. The GLM is obtained by specifying some function of the response conditional on the linear predictor and on other parameters.
Another important extension of GLM is the generalized linear mixed model (GLMM). In the GLMM, the linear predictor contains both fixed and random effects and can be applicable to several areas [16] - [19]. The response is a random variable; Y follows an exponential family distribution defined as: where b θ , a < & c 4, < are known functions and < is a dispersion parameter which may or may not be known. The expectation of the response variable N O = P and the linear predictor are linked using a link function Q P given fixed effects parameters R and random effects S which can be expressed generally as: where Z and [ are design matrices of predictors. This is a mixed logistic regression model with the response variable having Bernoulli distribution in the exponential family and the logit link function g μ = logit μ , which is a special case of GLMM. Area level models relate small area direct estimators to area-specific covariates. The authors [11] & [13] considered sampling models, which was originally studied for small area estimation by involving direct survey estimators and linking model for the small area parameters of interest.

Parameter Estimation Methods
The logistic regression model may be estimated by using either full maximum likelihood (ML) or a GLM methodology. Maximum likelihood estimation typically uses modified forms of Newton-Raphson estimating equations. GLM uses an iteratively re-weighted least squares (IRLS) algorithm that is a simplification of maximum likelihood estimation but is limited to distributions belonging to the exponential family of distributions. In the case of maximum likelihood, an estimating equation is defined as setting to 0 the derivative of the log-likelihood function of the response distribution with respect to one of the parameters of interest, where there is a single estimating equation for each unknown parameter [21]. The IRLS algorithm and related statistical values are based on the formula for the exponential family of distributions in equation (4). The term G(4, <) is the normalization term, which is required to assure that the probabilities sum to 1. For the logistic model, as well as the Poisson and negative binomial count models, the scale is taken as 1. The first derivative of b (θ) with respect to θ, b′(θ), is the mean and its second derivative, b′′(θ), is the variance. Changing the link gives the user alternate models. The logit link ln(μ/(1 − μ)) is a natural link for binomial distribution ( [17], [18], [21]).
GLM applications typically come with a variety of goodness of fit statistics, residuals, and so forth, to make the modeling process much easier than traditional ML. In fact, this is the advantage of using GLM methods over individual ML implementations. Alternatively, for the logistic model, the ML algorithm can provide easier use of so-called Hosmer-Lemeshow fit statistics, which are based on collapsing observations having the same pattern of covariates. The likelihood associated with the mixed models for binary data considered in equation (5) is: L(γ, σ ƒ , |y, X, Z) = ∏ Ž ∏ gpy de •X de , Z e , u e q d • • e fpu e qdu e , (7) where gpy de •X de , Z e , u e q = μ de ' '" (1 − μ de ) ' '" , Statistical computing programming language R evaluates the integral L(γ, σ ƒ , |¢, £, ¤) for the binary response model using standard Gaussian quadrature or adaptive Gaussian quadrature for the numerical integration [21] - [22]. There is no an analytic solution for this integral with normally distributed u že . The analyses are done in R software version 3.3.2 and SPSS version 20.

Intra-class Correlation Coefficient
For binary data, the intra-class correlation coefficient is often expressed in terms of the correlation between the latent responses O * . The logistic distribution for the level-one residual, ε ij , implies a variance of π 2 /3 = 3.29 ( [22], [23]). This means that, for a two-level random intercept model with an intercept variance of σ ƒ , , the intra-class correlation coefficient is:

Descriptive Statistics
The main objective of this study is to investigate impacts of socio-demographic and economic characteristics of a household on international migration and to estimate small area proportions at district and enumeration area levels of Hadiya and Kembata-Tembaro zones, Southern Ethiopia. Primary data are collected with a sample survey conducted from July 2016 -October 2016 for the purpose of studying international migration in the Southern Ethiopia. Out of 2381 sampled households, data are obtained from 2288 households in 16 districts capturing 86 enumeration areas. Out of 2288 households, 65.6% are from Hadiya while the rest 34.4% are Kembata Tembaro zones.
The response variable is migration status (1 if there is at least one member of the household ever migrated abroad, 0 otherwise). Predictor variables focus on demographic and socio-economic characteristics of each household and household head: sex of head, age of head, marital status of head, place of residence, family size, educational status of head, occupation of head, ethnicity of head, religion of head, farm land size, zone, district and enumeration area.
The proportions of migrant households those had at least one person ever migrated abroad is 39.4% and the rest 60.6% of households have no international migrants at their home. Out of 902 migrant households the proportion of 68.6% are found in Hadiya and 31.4% are found in Kembata-Tembaro zones. The proportions of households having at least one migrant are found to be 41.2% and 36% within Hadiya and Kembata Tembaro zones, respectively.
More than 30% proportions of migrant households observed in 11  Out of 86 enumeration areas 32 had more than 50% proportions of migrant households within each enumeration area. Also these 32 enumeration areas with more than 50% proportions of migrant households are found within the 11 districts mentioned above by observing more than 30% of migrant households. The proportion of presence of migrant in the households relating to age composition of household heads are 64.3%, 47.3%, 37.6%, 28.8%, and 27.6% for the age categories ≥60, 50-59, 40-49, 31-39 and 19-30, respectively. The result shows that household heads within older age had more proportion of international migrants. The marital status of most of the household heads (89.44%) is married and the rest 10.66% of household head marital status are single/divorced/widowed. Concerning the educational status of household heads, the proportion shows that the prevalence of migration decreases with increasing educational level of the heads. About 77.9% of the respondents reside in the rural areas and the rest in urban areas. Equal proportions (39.4%) of migrant households are observed in both rural and urban resident households. Large proportion of respondent household heads (77.9%) religion is Protestant followed by 12.1% Orthodox, 6.6% Muslim, 2.1% Catholic and 1.3% others.
The proportion of interviewed household heads with their respective ethnic groups are dominated by two ethnic groups-Hadiya and Kembata. These two ethnic groups account the highest share of the respondent household heads of 61.5% belongs to ethnic group-Hadiya followed by 31.1% of ethnic group-Kembata. The rest ethnic groups of respondent household heads are Donga, Amhara, Guraghe, Silete, Dubamo and others together shares 7.4%. The largest proportions of occupational distribution of household heads are engaging in farming tasks with the proportion of 69.7% followed by 10.1% of merchant. The rest 20.2% of respondents are student, housewife and others. The average and standard deviation of family size are 6.32 and 2.29, respectively. Family sizes in the range 5-7 had more proportion of migrant households while compared to less than 4.

Bivariate Analysis Results
The association tests show that there is significant association between the response household migration status and the predictors: age, ethnicity, occupation of head, family size, district, and enumeration area at 5% significance level. Its association with sex of head, educational level of head, and farm land size are significant at 10% significance level. However, insignificant associations are found between the response migration status with religion, marital status, and place of residence of household head.

Assessing Model Fit
The overall significance is tested, which is derived from the likelihood of observing the actual data under the assumption that the model has been fitted is accurate. The deviance is the log-likelihood of the final model to the loglikelihood of a null model with no predictor variables. The deviance between -2*log-likelihood for the final model is 2865. 18 and for the null model is 3056.22. Therefore, the full model gets smaller deviance, which is good fit to the dataset.
The presence of relationship between the response and combination of predictor variables are based on the statistical significance of the final model chi-square. In this analysis, the distribution reveals that the probability of the model chisquare (χ 2 (17)) with value 191 was 2.2x10 -16 , which is less than 5% level of significance. The null hypothesis that there was no difference between null and final model was rejected. Therefore, the final model predicts the response variable well and it is good fit to the data. The analogous to the linear regression coefficient of determination, R , have been proposed for the logistic regression are Cox & Snell R , , Nagelkerke R , and the McFadden R , value, they provide an indication of the amount of variation in the response variable. In linear regression, R , has a clear meaning; it is the proportion of the variation in the response variable that can be explained by predictor variables in the model. Attempts have been developed to yield an equivalent of this concept for the logistic model. However, it renders the meaning of variance explained for the logistic regression. The pseudo R , value of 6.25% (McFadden pseudo-R , ) and 8.04% (Cox & Snell and Nagelkerke R , ) indicates that the inclusion of the predictor variables in the model reduces the variation as measured by absolute value of log-likelihood of null model.
The overall accuracy of the final model to predict migration status of household is 65% correctly predicted. And 85.6% of absence of migrant and 31.3% of presence of migrant in household are correctly predicted in their respective categories. The Hosmer-Lemeshow test that yields a χ 2 (8) value of 8.851 and is insignificant with p-value of 0.335 which is greater than 5% significance level. This suggesting the final model is good fit to the data well. In other words, the null hypothesis, Ho: null model is a good fit to data is reasonable rejected.
The adequacy of the fitted model is checked for possible presence and treatment of outliers and influential observations. The minimum and maximum values of the test results for Cook's influence statistics are 0.0054 and 0.05127, respectively. DFBETAs for model parameters and Cook's influence statistics are both less than unity, which shows that an observation had no overall impact on the estimated vector of regression coefficients. There is no observations have studentized residuals less than -3 or greater than +3, then we can conclude that there are probably no outliers in the dataset. Therefore, we can go on to evaluate and interpret the model parameters.

Fixed Effects Logistic Regression Analysis
The classical logistic regression model constitutes fixed effects only and is defined as: Model0 logitpP de q = β + β Age de + β , Job de + β ª Educ de + β°Hsize de + ε de We use the algorithm of variable selection suggested in [24]. The algorithm involves variable selections decision at each step of the modeling process. First, fit a univariate model with each of the covariates. Second, select more candidates that are significant at some chosen significance level to build a multivariate model. Any variable whose univariable test has a p-value less than 0.25 is a candidate for the multivariable model along with all variables of known intuitively relevant variables regardless of their statistical significance. Third, following the fit of the multivariable model, the importance of each variable included in the model should be verified. Fourth, once we have obtained a model that we feel contains the essential variables, we should look more closely at the variables in the model. Fifth, once we have refined the main effects model and ascertained that each of the continuous variables is scaled correctly, we check for interactions among the variables in the model.
Using the algorithm and forward-backward variable selections the some categories of four predictor variables such as age, occupation, and educational level of household head (HH), and family size are found statistically significant at 5% significance level. The predictor variable ethnic group of HHs is entered into final model as dummy variables without variable selection as ethnic1 (1=Hadiya, 0=others) and Ethnic2 (1=Kembata, 0=others). In the Table 3, the analyzed results of regression coefficients, standard error, zvalue, p-value, odds ratio and 95% CI of odds ratio using logistic regression analysis for fixed effects are displayed. We can determine which predictor variables matter in logistic regression analysis by looking at the P-values of the individual coefficients. Predictor variables with P-values that are less than 5% significance level would be considered as statistically significant. Meaning that there is statistical evidence that they affect the probability that the response variable is 1, which is the presence of international migrant in the household. More generally, if the P-value is less than α, then a predictor variable is statistically significant at α level of significance. Therefore, the categories of the predictor variables identified by star(s) are statistically significant at 95% confidence level. But there is no statistical evidence to the categories of predictor variables matter on the migration status of household their respective P-value is greater than 0.05 and no interpretation is made for these.
For instance, the p-value of age of household head older than 60 years is 5.18x10 -15 . Thus there is strong statistical evidence that household heads are more likely to send household members abroad if they are older more than 60 years. In general, the results in the age categories of HHs illustrate as age of heads increases they are more likely to send household members abroad. The p-value of family size (Hsize) is 1.83x10 -09 . Thus there is strong statistical evidence that household heads are more likely to send household members abroad if family size increases by a person.
The z-value is the regression coefficient divided by its standard error. If the z-value is large in magnitude (that is, either positive or negative), it indicates that the corresponding true regression coefficient is not 0 and the corresponding predictor-variables matters on the response variable. A good rule of thumb is to use a cut-off value of 2 which approximately corresponds to a two-sided hypothesis test with a significance level of α=0.05. For instance, for the occupation of household head in categories of farmer, merchant, student are having the z-values 0.079, 1.627 and 1.339 which are not large enough to provide strong evidence to be significant.
Odds ratio (OR) -is a measure of association between an exposure and an outcome. The OR represents the odds that an outcome will occur given a particular exposure, compared to the odds of the outcome occurring in the absence of that exposure. In this study, the response variable migration status denotes the presence or absence of international migrant and one of the predictor variable AgeHHs denotes the age of household head with categories 19-30, 31-39, 40-49, 50-59, and older than 60 years. For instance, the odds ratio of age of HHs older than 60 is 5.277 estimates that presence of migrant is 5.277 times more likely to occur among household head with this age group than among the reference age category 19-30. Also, for the quantitative predictor variable family size, the odds ratio shows the presence of international migrant in the household increase by 1.131 for every a person increase in household. We can follow the same procedures to interpret the rest results in the Table 3.

Small Area Estimation of Binomial Processes
Small-area estimation refers to estimation of parameters for a large number of geographical areas when each has relatively few observations. For instance, one might want district or enumeration area-specific estimates of characteristic such as the proportion of households having one or more international migrants. Small area estimation models are random effects models. These models treat each small area as a cluster with its own random effect coming from a common distribution of the random effects.
Let denote the finite population size by N and assume that it is partitioned into 16 non-overlapping districts (or small areas), each of sizes N d with i = 1, 2, 3, …, 16 for districts In this study, one of the limitations is the true proportions of migrant households in each district level are not found. This is due to the absence of exact number of migrant households in each district. We have only the number of migrant households that considered in conducted sample survey.
In Table 4, ´d is the probability that Y de = 1 and the values under this column is the proportion of migrant household compared to total sample size at each district level. The result in column n d is the number sample size drawn from each district and n˜ is the number of migrant households at each district. Let u d be the random area effect for the district i and the random effect results, u d is each district level variation in migration status of households. Predicted response probability and predicted logit for each district are explored (see Table 4). The values under predprob indicates the predicted probabilities that can be revalidated with the actual outcome to determine the predicted probabilities indeed associated with the presence of migrant in the household at each districts. It predicts the logit of presence of migrant in household from a set of predictors at each district level. Moreover, it is the predicted probability of presence of migrants in the household at district level.

Random Intercept Models
In a two-level null model we split the residual into two components, corresponding to the two levels in the data structure. They are the area level effect as part of the intercept term and the household level random residual -each having normal distributions with zero mean and constant variance. The three null models below with random intercepts are versions of model (6) representing, respectively, zone, district, and enumeration area level effects on migration.

Null Model with District Level Random Effects
The assumption of random effects with zero mean and constant variance is attained, that is, υ d, dwAE~ 0, 1.022 . The intercept is interpreted as the log-odds that y = 1 when x = 0 and υ = 0 and is referred to as the overall intercept in the linear relationship between the log-odds and x. The log-odds of presence of international migrant in household an 'average' at district level (with υ ,e = 0 ) is estimated as β s = −0.5827 . This indicates that the overall estimated mean of migration status (across districts) is -0.5827. The mean for district j is estimated as −0.5827 + υ e , where the variance of υ e is estimated asσ , " = 1.022.The district level random variation, σ , " = 1.022 and the logistic distribution for the level-one residual, εij, has a variance of π , /3 = 3.29. Therefore, the intra-class correlation coefficients, ¦ = § É! § É! § Ê = 0.237 indicates that there is 23.7% of the variation between the districts and the rest 76.3% variation is within the district. Also the correlation between randomly chosen pairs of individuals belonging to the same district is 0.237.

i. Testing for District Effects
To test the significance of district effects, we can carry out a likelihood ratio test comparing the null model with a null single-level model. The null model takes the form logit P de = β . The likelihood ratio statistic for testing the null hypothesis, that is, H : σ , " = 0, can be calculated by comparing the null model, with the corresponding null single-level model without the random effect. The likelihood ratio test statistic is calculated as two times the difference in the log likelihood values for the two models. The likelihood ratio test statistic is 320.06 with 1 DF, so there is strong evidence that the between-district variance is non-zero.

ii. Examining Districts Effects
To estimate the district-level residuals υ Ì e and their associated standard errors, we use the function ranef in R with the condVar option. This creates a random effects object containing the variance-co-variance matrix in the condVar attribute. The 16 district level residuals are stored in υ and υ [1] is the list corresponding to the first set of random effects. The estimates of the district effects, u Í e by obtaining from the null model are examined. To calculate the residuals and produce a 'caterpillar plot' with the district effects in rank order together with 95% confidence intervals we can use the function ranef() that can be work for the continuous and categorical responses of two-level random intercepts model. There is only one set of random effects; the postVar() attribute only contains the "posterior variance" of each district-level residual. To access this set of variances, we look into the attribute postVar() of the data frame S [1]. This returns a three-dimensional array with the third dimension referring to each individual residual.
The district residuals and their standard errors have been calculated and stored for each individual district. We can therefore calculate summary statistics and produce graphs based on these data.  The plot in Figure 1 is the estimated residuals for all districts in the sample. The 95% confidence interval does not overlap the horizontal line at zero, indicating that presence of international migrant in the districts are significantly above the average or below the average.

Null Model with Enumeration Area Effects
Fitting the null model that allows random effects for enumeration area on migration status of household computed using R. The log-odds of presence of migrant in an 'average' enumeration area (one with S ÎD = 0 ) are estimated as R Ï ,ÎD = −0.5751. This indicates that the overall estimated The enumeration area level random variation, σ , "ÎD = 1.659 and the logistic distribution for the level-one residual, ε ij , has a variance of π , /3 = 3.29 . Therefore, the intra-class correlation coefficients, ρ is equal to 0.335 indicates that there is 33.5% of the variation between the enumeration areas and the rest 66.5% variation is within the enumeration area. The correlation between randomly chosen pairs of individuals belonging to the same district is 0.335.

Testing for Enumeration Area Effects
To test the significance of enumeration area effects, we can carry out a likelihood ratio test comparing the null model with the null single-level model. The likelihood ratio statistic for testing the null hypothesis, that is, H : σ , "ÎD = 0, can be calculated by comparing the null model, with the corresponding null single-level model without the random effect enumeration area. The likelihood ratio test statistic is calculated as two times the difference in the log likelihood values for two models. The likelihood ratio test statistic is 361.58 with 1 degree of freedom, so there is strong evidence that the between-enumeration area variance is non-zero. To estimate the enumeration-level random effects or residuals, S Ì d and their associated standard errors, we use the ranef () R function with the condVar () option. This creates a random effects object, containing the variance-covariance matrix in the condVar () attribute. The 86 enumeration area level random effects residuals are stored in S d . The estimates of the enumeration area effects or residuals, Ò Ì d obtained from the null model are examined through the following procedures. The residuals and a 'caterpillar plot' with 95% confidence intervals are produced using the two-level random intercepts model.
We use the plot and segments commands in R to produce a 'caterpillar plot' to show the enumeration area effects in rank order together with 95% confidence intervals.

Figure 2. Plot of Estimated Random Effects for Enumeration Areas.
The plot in figure 2 is the estimated residuals for all enumeration areas in the sample. For a substantial number of enumeration areas, the 95% confidence interval does not overlap the horizontal line at zero, indicating that the presence of international migrant in the enumeration areas are significantly above the average or below the average.

Mixed Logistic Regression Model with Covariates
Comparison of Models Three models (10) - (12) are analyzed in the section 3.6. Here, the following nine more GLMM models constructed from model (6) are analyzed. A model with best fit to the data is determined and further analyzed.
These models involve covariates and random effects.
The results for mixed model analysis of Model 6 are given in Table 8. The age of head, occupation of head, educational level of head, and family size are significant at 5% level of significance. As age of head increase the odds of a household having migrant increase with reference to the lowest age group 19-30. With reference to occupation as government employee, the odds of having migrant is different for households with heads in merchant, housewife and other job categories. Educational level of head is also found important predictor of migration status. The odds of having migrant in a household with a head who can read/write are different from the head who can't read/write. However, it is not significantly different for heads with educational level of primary, secondary and higher education, indicating that these households might behave similarly in terms of sending their members to the international migration. Family size is significant and affecting the odds of migration positively. The odds of a household have migrant increases with family size. Ethnicity is not significant in this case. There exist random effects due to district level and age of head within districts.
This fitted model (22) can be used to make point estimates given information of individual household. The predication of probabilities or proportions of individual households in each district of having international migration can be made using r s = exp 9`ßß=à / 1 + exp 9`ßß=à . The predicted probabilities of each district are made as shown in the Table  4 and the interpretation of the regression coefficients can be made as done in Table 3 for fixed effects. The district level random variation with covariates is, σ , " = 1.618 and the logistic distribution for the level-one residual, ϵ ij , has a variance of σ á '" , = π , /3 = 3.29 . Therefore, the intra-class correlation coefficients, ρ is equal to 0.33 indicates that there is 33% of the variation between the districts and the rest 67% variation is within the district.

Conclusions
The main objective of this study is to investigate impacts of socio-demographic and economic characteristics of a household head and household on international migration and to estimate small area proportions at district and enumeration area level. A total of 2288 data are collected from sixteen randomly sampled districts in Hadiya and Kembata-Tembaro zonal areas in Southern Ethiopia. The response variable migration status refers to whether a household has at least one member who ever migrated abroad or not. The findings of the study reveal that about 39.4% of the households have at least one international migrant, and the rest 60.6% have no such migrants. Proportions of households within districts that have international migrants are estimated. Based on analysis of the generalized linear model and forward-backward stepwise variable selection four predictors are found to be significantly related to household migration status at 5% level of significance. These are age, occupation and educational level of household head and family size.
Several versions of the generalized linear mixed models are proposed, analyzed and compared. The best fitting model to the data is found to be the logistic mixed regression consisting of the six predictors with age nested within districts as random effects. Under this model, the district specific random effect is significant with variance of 1.6180. From analysis of the final model, it is found that the odds of a household head of having international migrant increases with head's age and family size. An increase of family size by one person increases the log odds of having migrant by 0.131 indicating that large family size is one of the determinant factors for migration.
In conclusion, there is high prevalence of international migrant in the study area. The migration prevalence varies among the zones, the districts and the enumeration areas. Household head characteristics: age, educational level and occupation of head, and family size are determinant factors of international migration. The district level random variation without and with six predictor variables, indicates there are 23.7% and 33% of the variation between districts and the rest 76.3% and 67% of variation are within the district correspondingly. The enumeration area level random variation without predictor variables in the model is 0.335 indicates that there is 33.5% of the variation between the enumeration areas and the 66.5% variation is within the enumeration area. Community based intervention is needed so as to monitor and regulate the international migration for the benefits of the society.