Post-Harvest Loss Modeling of Maize Produce in Kenya

The classical linear model is commonly used to model the relationship between a response variable and a set of explanatory variables. The normality assumption is usually required so as to ease the hypothesis testing for the various linear regression models but it can be misleading for a proportional response variable that is bounded. This makes the ordinary least squares regression inappropriate for a regression model with a bounded dependent variable. This research proposes the fractional beta regression model as an alternative to help examine the determinants of post-harvest loss management of maize produce for farmers in Kenya. The response variable (Post-Harvest Loss Coefficient (PHLC)) is assumed to have a mixed continuous-discrete distribution with probability mass between zero and one. The fractional beta distribution is used to describe the continuous component of the model, since its density has a wide range of different shapes depending on the values of the two parameters that index the distribution. The study uses a suitable parameterization of the beta law in terms of its mean and a precision parameter, the parameters of the mixture distribution shall be modeled as functions of regression parameters. The considered parameters are Agriculture, Storage, Education, Fumigation and Transport. Inference on parameters, model diagnostics and model selection tools for the fractional beta regression is also be provided. Data used for this research was purely primary data which was collected from Uasin Gishu County, Kenya maize farmers through administration of a research questionnaire.


Introduction
Post-harvest loss reduction has received attention in many policy documents across nations to ensure global food security, particularly in developing countries. Many researchers have examined various options for reducing post-harvest losses. This study contributes its quota to this scientific discourse by using a different approach. The human element of managing post-harvest loss is central and therefore poses the question of what are the characteristics of the farmer who manages post-harvest losses better.
The study modeled post-harvest loss coefficient (PHLC) which measures how effective a farmer works to manage post-harvest storage losses in Kenya. A mixed continuous-discrete distribution is used to examine the determinants of post-harvest loss management of maize produce among maize farmers in Kenya.
Majority of households in Kenya rely on maize as their staple food, a phenomenon that makes it by far the most important food crop in the country and playing an integral role in the national food security agenda. Consequently, a significant portion of harvested maize must be stored to guarantee supply between harvest seasons. This shows that post-harvest storage and management of maize is critical to the sustainability of food security in Kenya. Currently, the bulk of storage takes place in farm storage systems, which are characterized by traditional storage structures that are prone to invasion by agents of stored food losses including insects and rodents.
Kenya has faced perennial food crisis over the years stemming from poor management of farm produce among other factors. Food situation assessment carried out in 2017 showed that maize losses could be quite significant. The country produced 37 million bags of maize in 2017 of which 12% is estimated to have been lost through post-harvest loses. These losses translate to about 4.5 million bags. In October 2018, the government of Kenya through the Ministry of Agriculture reported that over 60% of the maize stored in National Cereals and Produce Board (NCPB) stores were rotting, a phenomenon that significantly exposes the poor post-harvest storage and management of maize in the country.
In an attempt to minimize these loses the government has invented a number of technologies that improve grain storage at the household level and also at the national grain reserves. This study therefore sought to investigate the factors affecting the post-harvest loss management of maize produce in Kenya, with an aim of bridging the knowledge gap and recommending lasting solutions to the food storage challenges in Kenya.

Literature Review
Statistical modeling of continuous proportions has received close attention in the recent past with an application to many fields. This section thus tries to have an overview of some recent literature that shall help in developing an understanding of the need for this research.
Unlu & Aktas (2017) applied the parametrized beta regression in modeling the well-being index data of Turkey [1]. Rojas (2019) had a research on operational maize yield model development and validation based on remote sensing and agro-meteorological data in Kenya using the multiple linear regression model [2]. Balgar & Mojoko (2017) examined the effects of the growing climatic oscillations on maize production using simple logistic regressions [3].
Vasanthi, Muraligopal & Swaminathan (2015) did a statistical analysis of trends in the maize area, production and productivity in India [4]. Adesoji & Babatunde (2013) applied artificial neural network modeling to predict post-harvest loss in some common Agrifood commodities in Nigeria [5]. Hugo (2016) analyzed maize yieldlevels and technical efficiency for small scale farmers using the stochastic frontier analysis in Busia County, Kenya [6].
Hassan & Gurmu (2017) used single commodity partial equilibrium and the Johansen's co-integration approaches to investigate maize price formation and market integration in the Ethiopian maize market [8]. Ayieko et al (2013) used the economic utility maximization theory to ascertain the relative importance of non-price factors in influencing production of the crops aswell as complementarity between price and non-price incentives [14].
Mwanjele, Waiganjo, Moturi & Muthoni (2014) used a case study approach and knowledge of discovery data mining process to design and implement an agricultural drought prediction system [15]. Short et al (2012) described the market incentives and disincentives [9] and Koskei et al (2020) had a cross-sectional study on post-harvest storage practices of maize produce in Rift Valley and Lower Eastern Regions of Kenya [12]. Rembold et al (2011) had an innovative framework to analyze and compute quantitative post-harvest losses for cereals under different farming and environmental conditions in East and Southern Africa [11]. Jadhav et al (2017) had an application of ARIMA Model for Forecasting Agricultural Prices [7]. Parfitt et al (2010) studied the Food waste within food supply chains, a quantification and potential for change to 2050 [13]. Chen et al (2018) used the Tobit model to study the main factors affecting Post-Harvest grain loss during the sales process [10].

Introduction
This chapter eluded the fractional beta regression and its application to modeling the determinants of post-harvest losses using the post-harvest loss coefficient (PHLC) as the main response variable. PHLC was considered to be a function of several variables, which included farmer and farm specific characteristics, location characteristics, and other social economic characteristics.

Study Area
The study was carried out in Uasin Gishu County, Kenya. Uasin Gishu County is one of the counties in Kenya. It has a population of 181,338 with an area of approximately 566.50 Sq. Km. in Uasin Gishu County, almost all farmers are growing maize. It is one of the counties where farmers have been affected by post-harvest losses due to poor management of maize after harvest. It has a total of thirty (30) electoral wards in which the samples of farmers shall be got from and the research questionnaire administered.

Data Collection
The data to be used was primary data which was collected through the administration of a research questionnaire to a section of Uasin Gishu County maize farmers. An aim was to do the evaluation of maize post-harvest losses incurred by the farmers for the period Jan 2018 to Dec 2019.
The study used the sample size as the number of maize farmers from Uasin Gishu County that shall be included in the sample. This is due to the fact that having the right sample size is crucial in finding statistically significant results. This was the sample i.e a section of maize farmers drawn from the entire population of maize farmers in Uasin Gishu County.

Sample Size Determination
The sample size used for the study was determined using the formula = Where; i. is the sample size to be determined.
ii. ∑. is the amount of variation expected from the responses and this study uses the variation of 0.5 as it is the most forgiving number and ensures the sample size is large enough.
iii. is a z-score that is obtained at a given confidence interval. This study estimate the sample size at a 95% confidence interval thus giving a z-score of 1.96.
iv. ∆ is the marginal error allowed in the sample size determination. This study shall use a marginal error of (+)-5%.

Fractional Beta Regression
To assess factors that influence post-harvest loss management, this research uses the fractional beta regression model. Under this approach, the probability density function of the response variable (PHLC) with respect to the measure generated by the mixture was given by; Where is the probability mass at c and represents the probability of observing zero (c=0) or one (c=1). If c=0, the density (2) is called a zero-inflated beta distribution, and if c=1, the density is called a one-inflated beta distribution. The mean of and its variance were given respectively as; This beta regression was thus defined in terms of two different logistic functions as; *($ ) = + + + ⋯ + -- for i=1, 2,…, n.

Parameter Estimation
To estimate and 0 the study used the maximum likelihood approach in which the log likelihood function was given as; Differentiating this log likelihood with respect to and 0 and solving using the Quasi-Newton optimization algorithm yielded; are ordinary least squares estimates.

Model Diagnostics
The beta regression model adequacy were assessed using different types of diagnostics. This included the Standardized and Deviance residuals and the coefficient of variation.

Introduction
This chapter gave the data analysis procedures employed in the study, the results obtained and associated discussions.

Data Analysis
In order to give an insight into the Post-Harvest Loss Coefficient (PHLC) data, the descriptive statistics were obtained and the data was visualized by the use of the histogram. A further analysis of the descriptive statistics is also given to see if meaningful information of the data can be obtained prior to fitting the beta regression model. For the beta regression model coefficients the PHLC was regressed on five covariates namely; Fumigation, Transport, Storage, Agriculture and Education.
A total of three hundred and ninety (390) maize farmers were interviewed across the thirty (30) electoral wards in Uasin Gishu County, Kenya and used in the study. The percentage of those farmers who fully did agriculture as a means of livelihood, those who had storage facilities, those who had education past primary level, those who fumigated their farm produce and those who had a means of transporting their farm produce to the store/market was looked into. The associated PHLC was then calculated across the thirty (30) electoral wards in the County. Table 1 gave the descriptive statistics of the Post-Harvest Loss Coefficient (PHLC) of maize produce in Uasin Gishu County, Kenya. This coefficient was calculated as;

Descriptive Statistics
Where ] was the total number bags of maize harvested by a farmer and ] ) the total number of bags of maize lost due to post harvest activities.
The coefficient of variation was estimated at 0.6, a less than 1 value hence the implication that the fitted beta distribution was of low variance. The skewness of the data was less than zero thus giving an indication of a left skewed distribution for the data as confirmed in Figure 1 of the Histogram and the associated density of PHLC. The median of the data is higher than the mean thus the left-skewness of the data with majority of the coefficients being to the right of the mean value.

Parameter Estimates
Tables 2 and 3 gave the regression model coefficients for the mean model with logit link and the precision model with log link respectively. From table 2, the percentage of those farmers who fully did agriculture as a means of livelihood, those who had storage facilities, those who had education past primary level, those who fumigated their farm produce and those who had a means of transporting their farm produce to the store/market had a -5.09%, 0.06%, -1.92%, 0.19% and -3.46% respective effect on the Post-Harvest Loss Coefficient of maize produce. The mean regression coefficient was estimated at 2.7046. From table 3, the percentage of those farmers who fully did agriculture as a means of livelihood, those who had storage facilities, those who had education past primary level, those who fumigated their farm produce and those who had a means of transporting their farm produce to the store/market had a 0.28%, -3.97%, 7.85%, 0.57% and -35.68% respective precision effect on the Post-Harvest Loss Coefficient of maize produce with a regression coefficient of 5.8532.

Results Discussion
The study explored the Post-Harvest Loss Coefficient as a measure of the effectiveness with which farmers endeavor to reduce the post-harvest losses. This was in regard to maize produce in Uasin Gishu County, Kenya. To aid the data exploration, the Q-Q plots, standardized residuals and the Beta diagnostic plots were used. Table 4 gave the residual of the fitted model in modeling PHLC. The minimum and maximum residuals were -2.4343 and 2.0374 respectively. The median, 1^_ and 3 UD Quartile residual plots were 0.2435, -0.6551 and 0.6507 respectively. The median standardized residual was found to be close to zero and a careful analysis of this residuals shows that that they are symmetrical. This thus gave an indication that the fitted model was not biased in modeling PHLC data as it gave a better fit. A visual representation of this is given in Figure 2.

Residual Probability Plots
The Q-Q plots which are the theoretical quantiles of the fitted model against the sample quantiles were also used in the data analysis of the study. The 45 degrees reference line was used as a measure of goodness of fit. Almost all residuals lie along this line thus explaining the better fit to the data by the beta regression in relation to the normal distribution. This gave an insight of the modeled sampled data as of being to the same distribution as the sample obtained from the theoretical normal distribution with mean 0 and standard deviation 1. This was illustrated as in Figure 2. Figure 3 gave further diagnostic plots that the study used in the data exploration. Six diagnostic plots for the PHLC data were obtained; the standardized residuals plot against C, Cook's distance plot, Generalized vs predicted value plot, Residuals vs linear predictor plot, Half-normal plots of residuals and the Predicted vs observed values plot.
A close inspection of Figure 3 revealed that the largest standardized and deviance residuals in absolute value correspond to electoral ward (observation) 11. This observation also had a larger Cook's Distance measure compared to others thus the indication that it had a great influence on the results of the regression analysis. However, the generalized leverage for this observation is not large relative to the remaining ones. This electoral ward (observation) had a highest PHLC value at 0.753.

Post-Harvest Loss Coefficient Prediction
The Post-Harvest Loss Coefficient Prediction was achieved using the fitted model parameters. Table 5 gave a summary of the predicted post-harvest loss coefficients. The mean Post Harvest Loss Coefficient was less than the average of 0.5. This gave an implication that most of the maize farmers in Uasin Gishu County would suffer from Post-Harvest Maize Losses even in the future. This calls for immediate government interventions aimed at curbing the Losses since maize farming is a stable source of income and food for the farmers. Figure 4 gave a graphical visualization of the same. The mean value of predicted post-harvest loss coefficient was higher than the median value. This gave an indication that majority of the predicted post-harvest coefficients were centered to the left of the mean value and that extreme post -harvest loss coefficient are on the right of the mean value.

Introduction
This chapter was the final stage of the study. It gave the summary, conclusions and recommendations for any further study.

Summary
Maize is a stable food in the Kenyan economy thus the essence of minimizing the losses that may accrue in the production process. The study examined the Post-Harvest loss management of maize produce by the use of a Post-Harvest Loss Coefficient (PHLC). This was analyzed by the subsequent fitting of the fractional beta regression model to the PHLC data. A parameterization of the fractional beta regression that allowed for the precision parameter phi / defined by regression parameters was used.
Five explanatory variables (Agriculture, Storage, Education, Fumigation and Transport) were regressed on the response variable (Post Harvest Loss Coefficient-PHLC). The explanatory variables were obtained as percentages of those farmers who fully did agriculture as a means of livelihood, those who had storage facilities, those who had education past primary level, those who fumigated their farm produce and those who had a means of transporting their farm produce to the store/market. Fumigation had the highest positive effect of 0.19% on PHLC followed by Storage with 0.06%. The corresponding effect on the precision parameter was 0.56%, and -3.97 respectively. Agriculture, Transport and Education had a 5.09%, 3.46% and 1.92% respective negative effect on PHLC and 0.28%, -35.68% and 7.85% respective effect on the precision parameter. The mean predicted post-harvest coefficient was estimated at 0.4780 which was less than the mean average of 0.5. The graphical visualization of the data by the use of a histogram gave a view of the data being left skewed with majority of the data being to the right of the mean. The variance of the data was less than the mean which indicated an under-dispersion in the data as with the coefficient of variation which was less than 1.
The residual probability plots used in the data exploration indicated that the residuals of the fitted model came from the same distribution as with the normal distribution with mean 0 and variance 1. Hence the indication of unbiaseness of the fitted model in modeling PHLC data.

Conclusion
The study presented an application of a regression model tailored for responses that are measured continuously on the standard unit interval, i.e. h (0, 1), which is the situation that practitioners encounter when modeling rates and proportions. An application was made to Post Harvest Loss Coefficient of maize produce in Uasin Gishu County, Kenya.
Model parameter estimates of the fitted model were obtained through the maximum likelihood approach. Two sets of parameter estimates were obtained; the mean model with logit link and the precision model with log link. The descriptive statistics of the Post-Harvest Loss Coefficient of maize produce were obtained. The estimated parameters were then used to predict the post-harvest loss coefficients.
This was as with Unlu & Aktas (2017) who used the parametrized beta regression in the modeling of well-being index data of Turkey [1]. It was also an extension of the Rojas

Recommendations
In order to curb Post-Harvest Losses incurred by maize farmers in Kenya, the study notes various strategies that ought to be put in place by the Kenyan government. This would include the improvement of the road infrastructure in the farms, encouraging mono-cropping among farmers and educating the farmers on proper storage of maize produce after harvesting.
This study gave an application of the Fractional Beta Regression to modeling Post-Harvest Loss Coefficient of Maize Produce. The study notes that great work needs to be done in the modeling of data-sets with proportional response variables and that that this regression technique can be extended to zero-inflated data with response variables in the interval [0, 1]. An extension to parametric survival data can also be given a consideration.