An Assessment of Farmers Livelihood in the Coffee Certification Schemes in Tanzania
Charles Kipkorir Masson
Department of Statistics and Computer Science, Moi University, Eldoret, Kenya
To cite this article:
Charles Kipkorir Masson. An Assessment of Farmers Livelihood in the Coffee Certification Schemes in Tanzania.American Journal of Theoretical and Applied Statistics.Vol.4, No. 6, 2015, pp. 446-463. doi: 10.11648/j.ajtas.20150406.15
Abstract: This study was undertaken to assess the impacts of adoption of various types of coffee certifications on the livelihoods of smallholder farmers. The main objective of this study was to compare the livelihood of farmers under the different producer groups with respect to their income and food security situation. It begins with an introduction to impact assessment and a description of the methodology and its challenges with an outline of the method used for handling outliers and comparing the certified and non-certified farmers and the producer groups. Secondary data from coffee survey data collected by COSA and partners for analyzing the impact of sustainability standards forms the basis of this study. Multi stage cluster sampling was used to sample farmers that were interviewed. In the first stage, the coffee growing areas in Tanzania and the active certification programs were identified. Then second level producer groups that had obtained certification were used to obtained the sampling frame of the first level producer groups. Random sampling was then used to select the first level producer groups and also randomly select villages with farmer in the producer groups. Non parametric methods have been used to compare the producer groups because one sample does not follow a normal distribution and most of them are highly skewed. Error bars plots have been used to compare the significance difference in the producer groups. Aggregate income from the different forms in which coffee was sold has been computed and used for comparison. It also evaluates the food security situation last production year of the farmers across the different producer groups. The key indicators used, showed that generally, adoption of the various coffee certifications programs have positive impacts on income and food security. In the course of this study, the areas of further research that emerged are; an evaluation of the farmers livelihood before intervention is done to ascertain whether their livelihood has changed due to adoption of certification or due to other factors and the development of a stepwise procedure for an outlier identification and ascertaining their validity. The methods that were used for outlier detection were subjective.
Keywords: Coffee Certifications, Smallholder Farmers, Livelihoods Assessment, Outlier Detection
1.1. Impact Assessment
Carney (1998) defined livelihood to comprise the capabilities, assets (including both material and social resources) and activities required for a means of living. A livelihood is sustainable when it can cope with and recover from stresses and shocks and maintain or enhance its capabilities and assets both now and in the future, while not undermining the natural resource base. Impact Assessment is a process of systematic and objective identification of the short and long-term effects which can be positive or negative, direct or indirect, intended or unintended, primary and secondary on households, institutions and the environment caused by on-going or completed development activities such as a programs or projects. The term impact is the difference between what would happen with the action and what would happen without it. The purpose of impact assessment is to help in a better understanding of the extent of activities, objectives fulfilled and magnitude of effects. IA involves observing, measuring and describing how the conditions being assessed have been influenced. Impact is given by direct effects on income from increased adoption and use of technologies. This can be measured by the number of farmers or area planted with an improved technology, yield increase productivity growth and economic effects of adoption of new technologies. The indicators by which a program is to be assessed are taken to be given, as appropriate to the type of program. Knowing impact is of obvious interest in its own right as a means of measuring the aggregate benefits from the program. However, when reducing poverty is the overall objective of the program we also want to know the incidence of the welfare gains (Ravallion, 2003)
IAs assess the difference in the values of key variables between the outcomes on 'agents' (individuals, enterprises, households, populations, policymakers etc) which have experienced an intervention against the values of those variables that would have occurred had there been no intervention (Hulme, 1997) IA studies have recently become popular with donor agencies and, in consequence, have become an increasingly significant activity for recipient agencies. In part this reflects a cosmetic change, with the term IA simply being substituted for evaluation. But it has also been associated with a greater focus on the outcomes of interventions, rather than inputs and outputs. While the goals of IA studies commonly incorporate both 'proving' impacts and 'improving' interventions, IAs are more likely to prioritize the proving goal than did the evaluations of the 1980s. A set of factors are associated with the extreme 'pole' positions of this continuum and these underpin many of the issues that must be resolved (and personal and institutional tensions that arise) when impact assessments are being initiated (Hulme, 1997)
1.2. Methodological Challenges
In surveys, the quality of data means that considerable efforts have been made to calculate metadata about quality to support the series being produced. Metadata for this purpose come in a variety of styles; in some cases they are readily calculated through theory, such as sampling errors. In other cases the quality aspect which is of primary interest is not easily measured, such as non-response bias, but a relatively easy calculation- the response rate - gives an indicator for the magnitude of either the bias, or perhaps the risk that the bias will be large enough to affect the interpretation of the statistics. In a few cases, such as measurement error, there is very little that can be done with survey data and the only real way to measure the quality is to do an expensive follow-up study. In other dimensions there is no direct quantification (for example, relevance), and then only circumstantial information can be provided. In this study data was downloaded into a single spreadsheet instead of multiple sheets for ease in navigation. Metadata given partially qualities the data they describe and a number of variables are categorical which limit the type of analysis to be done.
1.3. Statement of the Problem
Coffee production is the main livelihood strategy for most of the smallholder farmers and the use of coffee certification has been the principal means of maintaining the sustainability of farmers' livelihoods. They have been growing coffee for a long period with the expectation that their livelihood would improve significantly but ironically it has constantly stagnated and in worst circumstances continued to deteriorate despite the adoption of new technologies of coffee production. To avert these trends it is necessary to evaluate the revenue earned from engaging in coffee production. The farmers' yields have been based on subjective estimates which are not accurate in calculation of revenue accrued from farming.
The main objective of the study was to compare the livelihood of farmers under the different producer groups with respect to their income and food security situation. The study was guided by the following specific objectives
i. To explore data and identify the outliers
ii. To compare the different producer groups income.
iii. To compare the different producer groups food security situation.
iv. To find the relationship between income and food security situation.
2. Litrature Review
2.1. Coffee Certifications
Declining coffee prices considerably affect the livelihoods of producer farmers as they largely depend on income from coffee to meet most of their basic household needs. Lower prices mean, for instance, that they cannot afford to send their children to school, buy medicines or food. According to (Mayne et al., 2002), many farmers were forced to sell assets such as cattle and cut essential expenses, including food, during the price slump between 1999 and 2002. Smallholder livelihoods suffered when international coffee commodity prices plummeted from 1999_2004. In response to the coffee crisis, non-governmental organizations (NGOs), selected coffee companies, and several coffee producer cooperatives spearheaded efforts to expand sustainable coffee certification programs (Bacon et al., 2008).The impacts of the drop in coffee prices on small-scale and micro producers (fewer than 14 hectares) included rapidly declining incomes, resulting in hunger, crop abandonment, and a series of issues that we explore more deeply in the following sections. The owners of medium-scale farms (14 to 35 hectares) often stopped employing farm workers and decreased management intensity. The largest plantations (more than 35 hectares) employed most of the farm workers and had higher monetary costs of production due to dense cropping patterns, dependence on paid labor, and intensive chemical inputs. When international coffee prices were high, high yields and low wages contributed to a profitable operation. When the prices fell below the costs of production, banks stopped offering credit and foreclosed on debt-ridden large landholdings (Bacon et al., 2008).
Certification is an instrument to add value to a product, and it addresses a growing worldwide demand for healthier and more socially and environmentally-friendly products. It is based on the idea that consumers are motivated to pay price pre-mia for products that meet certain precisely defined and assured standards (Ponte, 2004b). The social and economic challenges small-scale coffee producers face today in many coffee producing countries has given strong impetus to the Fair trade movement. Fair trade is a voluntary certification scheme that seeks to challenge the unequal terms of trade in the global coffee value chain to facilitate sustainable development. Fair trade is an alternative trade initiative promoting a different approach both to the conventional global trading system (free trade) and to development systems (protectionism and development aid) through the philosophy of 'trade-not-aid' (Raynolds, 2002) Certifications are often seen as a solution to problems to the instable commodity markets. Certification schemes have emerged as one approach to try and raise the economic, social and environmental standards of coffee production and as well as trade (Ponte, 2004a).
2.2. Livelihood Assessment
A livelihood comprises the capabilities, assets and activities required for a means of living. The assets include natural, material and social resources such land, livestock, machines, tools, stocks of money, education, skills and social networks while activities Include productive ventures such as farming and livestock keeping. Current understanding of livelihoods place considerable emphasis on the ownership or access to assets that can be put to productive use as the building blocks by which the poor can make their living (Ellis, 2000). Bania et al. (2007) observed that many simple correlations have been noted between food insufficiency and a range of factors, including the level of household income, food stamp receipt, demographics, household composition, education, physical and mental health status, and geography. Lewis (2005) analyzed the Mexican coffee sector focusing on the links among low coffee prices, migration, and certified coffee production and trade. The results show that although remittances from migrants help finance coffee production, increased migration drains human capital out of the region which again raises the opportunity cost of labor and hence local wages, thus raising the costs of coffee production.
The findings raise doubts about the sustainability of the Fair Trade-organic coffee model in the face of migration opportunities. According to Bacon (2005), in the Nicaraguan context that Fair trade and organic networks can provide security and increased income, but do not offset the many factors leading to a general decline in quality of life for the farmers. Wollni and Zeller (2007) used data from coffee farmers in Costa Rica and determine the factors which make farmers participate in a specialty coffee market. They find that significant price pre-mia are received by certified farmers as opposed to their noncertified counterparts and that social capital, if captured in terms of participating in a cooperative, is highly significant for the decision to grow specialty coffee. The findings of Dasgupta (1989), revealed that the level of education is strong and a significant determinant of farmers' adoption of improved agricultural technologies.
2.3. Outlier Detection
The purpose of outlier detection is to discover the unusual data, whose behavior is very exceptional when compared to the rest of the data set. Examining the extraordinary behavior of outliers helps to uncover the valuable knowledge hidden behind them and to help the decision makers to make profit or improve the service quality. Hence, mining aiming to detect outlier is an important data mining research with numerous applications, which include credit card fraud detection, discovery of criminal activities in electronic commerce, weather prediction, marketing, statistical applications and so on. Detection methods are divided into two parts: univariate and multivariate methods. In univariate methods, observations are examined individually and in multivariate methods, associations between variables in the same dataset are taken into account. Classical outlier detection methods are powerful when the data contain only one outlier. However, these methods decrease drastically if more than one outliers, are present in the data (Hadi, 1992).
Although outliers are typically detected by comparison with other observations in a redundant data set, an outlier is not just an observation that deviates from other observations. Random errors can be large and, as long as the understanding of the sources of errors is correct, the Standard Uncertainty (SU) will be large, and comparable to the size of deviations. If such an observation is merged with other observations, it will have an appropriate influence on the mean value, depending on the precision of other observations. Problems only arise when the error is much larger than one would expect from the SU. Therefore, an outlier is an observation that is unlikely to be correct within error limits(Read, 1999).
3.1. Source of Data
Secondary data from coffee survey data collected by COSA and partners for analyzing the impact of sustainability standards forms the basis of this study. This data was entered onto an online database and was accessed by downloading from http://surveys.tcosa.org/CosaSurveys.html in June 2011. After the launch of the COSA application users can work with it either on-line or off-line provided that Google gears are installed. The features of the survey builder enable the standardized customization so that basic survey can easily be adapted from different languages, crops and specific conditions in different countries. The COSA methodology was built upon a process of annual field visits to farms located throughout the major growing regions to gather information based on a common set of measures/indicators. The basic parameters of the full methodology include;
i. Farm visits over a minimum of a three-year period to discern measurable changes over time resulting from the implementation of different initiatives;
ii. Indicator selection criteria using SMART concepts;
iii. Farm selection criteria ensuring balanced representation across:
• The six major sustainability initiatives operative in the coffee sector (Organic, Fair
• Trade, 4Cs, Utz Certified, Rainforest Alliance and Starbucks C.A.F.E. Practices);
• Major coffee growing regions (Africa, Asia and Latin America)
• Small and large farms (based on national norms); - Distinct agro-ecological zones (rainfall, altitude, etc.);
• Coffee types (Robusta, Arabica, etc.); and
• Different production systems (traditional shade, intensive sun, etc.).
COSA envisions the future global availability of comparably-defined data so that producers and policy-makers can better determine how they compare with producers operating in different regions or applying similar or different standards (Giovannucci et al., 2008).
A sample of 1035 farmers was interviewed and information collected included the socio-economic characteristics of farmers, inputs and outputs of coffee, assets, factors of production such as labor and fertilizers that were used as well as their costs. Socioeconomic variables such as the level of education, number of years of coffee farming, land tenure situation and use of improved coffee varieties.
The geographical areas in the North, South and West of Tanzania that from which sampling was done covers 80% of the coffee growers. Multi stage cluster sampling was used to sample farmers that were interviewed. In the first stage, the coffee growing areas in Tanzania and the active certification programs were identified. Then second level producer groups that had obtained certification were used to obtained the sampling frame of the first level producer groups. Random sampling was then used to select the first level producer groups and also randomly select villages with farmer in the PG. From the sampling frame of first level PG members, farmers were randomly selected from the selected villages. After selection of the second level PGs with the certification, the certified treatment groups were identified, these were: Starbucks C.A.F.E Practices (CP), Fare Trade (FT), Organic, FT and Utz, FT and Organic and FT and CP, PGs operating with similar conditions to the certified groups were identified and approached them on obtaining the lists to sample their members. Farmers were then selected from these second level groups by a process similar to that for the certified sample. In this study, the term 'producer' has been to mean the person(s) responsible for the production of the commodity on the farm. In most cases the smallholders (who can be of female or male gender) will be the farm owners themselves, but it may also be a farm manager, caretaker or the person who can provide information regarding farm management and production.
3.3. Variable Selection
The key indicators (Table 1) that were used in this study were identified, retrieved and aggregated to generate the indicators.
The income indicator variables (Table 1) were computed by aggregating the variables on the second column, all the food security variables that were used are categorical.
|Key indicators||Variables used||Type|
|Block_income_revenue||Q10.3.3-Kg coffee sold Q10.3.4-Price producer received per kg Q10.4.2-Kg sold as not certified but produced as certified Q10.4.3-Price producer received per kg||Continuous|
|Total_crop_revenue||Q10.2.2-Kg of not certified coffee sold Q10.2.3-Price producer received per not certified kg Q10.3.3-Kg coffee sold Q10.3.4-Price producer received per kg Q10.4.2-Kg sold as not certified but produced as certified Q10.4.3-Price producer received per kg Q23.1-How much was the income.||Continuous|
|Coffee_revenue_per_ha||Q10.2-Kg of not certified coffee sold Q10.2.3-Price producer received per not certi_ed kg Q10.3.3-Kg coffee sold Q10.3.4-Price producer received per kg Q10.4.2-Kg sold as not certified but produced as certified Q10.4.3-Price producer received per kg Q22.5.1, Q22.6.1 and Q22.7.1- Plot area.||Continuous|
|Revenue_ha||Q10.3.3-Kg coffee sold Q10.3.4-Price producer received per kg Q10.4.2-Kg sold as not certified but produced as certified Q10.4.3-Price producer received per kg Q22.5.1, Q22.6.1 and Q22.7.1- Plot area.||Continuous|
|Price_cert_sold_uncert||Q10.4.3-Price producer received per kg||Continuous|
|Price_uncert||Q10.2.3-Price producer received per not certified kg.||Continuous|
|Average_price_all_coffee_sold||Q10.3.4-Price producer received per kg Q10.4.3-Price producer received per kg Q10.2.3-Price producer received per not certified kg.||Continuous|
|zero_days_hunger One_nine_days_hunger Ten_twentynine_days_hunger Thirty_or_more_days_hunger||Q17.1- How many days of food insufficiency||Categorical|
3.4. Test for the Distribution of Data
In testing whether or not the data is normally distributed, it skewness and kurtosis should lie within the range ±1 and ±3 respectively. I run the descriptive statistics to get the skewness and kurotsis, then divide the values by the standard errors. Skewness was determined by comparing its numerical value by the standard error of skewness. If the lies in the range it is considered not seriously violated. (Bulmer, 1979) suggested that if;
• If the skewness of data is <-1 or >+1, then the distribution is highly skewed
• If skewness is between -1 and -½ or between +½ and +1, the distribution is moderately
• If skewness is between -½ and +½, the distribution is approximately symmetric.
With a skewness of =0.1098, the data is approximately normal.
3.5. Outlier Analysis Using the Box and Whisker Plot
Graphical representation of the dispersion of data shows the dispersion of the observations. This can give us some sense of data distribution by looking at the five summary statistics: minimum, maximum, first quartile, second quartile (median) and third quartile. The upper and the lower quartile indicate a fixed distance from the inter-quartile range Box plots for the income indicator related variables were plotted with the categorization per PG and the outliers labeled by villages. We then identified the number of outliers per variable and their distribution across the villages. The number of survey sheets per villages was identified and this was used to compute the percentages of errors per village. We then compiled a list of the same variables with villages without the outliers for determination of the villages that are to be eliminated. The values that were identified as outliers were eliminated from the dataset when plotting the box plots in the subsequent plots. We checked for the randomness of the outliers after every plot to ascertain the number of times that cleaning should be done.
3.6. Comparison of Producer Groups Income
For two independent samples (certified and non-certified), we used the Mann - Whitney U Test to compare the differences between two independent groups and the dependent variable. The Mann-Whitney U test is a non-parametric test that can be used in place of an unpaired t-test. It is used to test the null hypothesis that two samples come from the same population (i.e. have the same median) or, alternatively, whether observations in one sample tend to be larger than observations in the other (Shier, 2004)
=The number of observations in group 1
=The number of observations in group 2
Ri=The sum of ranks assigned to group i
The assumptions made for this test are; The dependent variable must be as least ordinally scaled, The independent variable has only two levels and The subjects are not matched across conditions.
Kruskall-Wallis test was used to compare the different producer which is expressed as;
k=the number of independent samples
ni=the number of cases in the ith sample
N=the total number of cases
Ri=the sum of the ranks in the ith sample
The assumptions made for this test are: The samples were taken randomly and independent from each other and the populations have approximately the same shapes
3.7. Comparison of Producer Groups Food Security Situation
In this study, food security was described in the context of food availability for consumption by any member of the farm family. These indices were computed to help in making a decision on the farmers' food security situation across the producer groups. This was calculated by summation of the number days of food insufficiency for each category during the last production year. The results were presented in percentages where households with the highest in 0 days category considered to be more food secure. The highest percentage in the 30 or more days was considered to more food insecure.
3.8. Relationship Between Income and Food Security
An error bar plot was generated to evaluate the relationship between the farmers income and food security situation. Block income revenue was plotted against the four categories (0 days 1-9 days, 10-29 and 30+ days).
4. Empirial Results and Discussion
4.1. Test for Normality
In this study, research tested for the normality of the dataset by running descriptive statistics to get the skewness and kurtosis together with their respective standard errors. It was found out that the SE of skewness for FT South was beyond the expected interval (-0.366 and +0.366). In a normal distribution the values of skewness and kurtosis are zero. The positive values indicate a pile of data points on the left of the distribution whereas the values indicate a pile up of data points to the right of the distribution. The further these values are from zero the more unlikely it is that data are not normally distributed. Histogram and Q-Q plot for the block income revenue all target crop was plotted as a representative for the other indicators.
The graphical representation (Figure 1) displays that this data does not assume normality. The histogram shows asymmetrical bell-shape with a normal curved superimposed with more of the values lying to the left in the left than those to the right. The Q-Q plot has a line almost 45 degrees to the origin but the observations appear to deviate more from the fitted line. These results and those from the descriptive analysis suggests that all the samples do not follow a normal distribution hence used non Parametric methods for comparison of the certification types and PGs.
|A.F.E. Practices Control Mbozi||0||18.0|
|C.A.F.E. Practices Lima||147.7||0|
|Fair trade South||20.0||0|
|Fair trade North||10.0||0|
|Fair trade North/Fair Trade and Organic Control||0||21.8|
|Fair trade South/Fair trade and Utz Control||0||27.3|
|Fair trade and C.A.F.E. Practices Control||0||12.8|
|Fair trade and C.A.F.E. Practices Kilicafe||17.3||0|
|Fair trade and Organic||16.7||0|
|Fair trade and Utz||19.2||0|
4.2. The Distribution of the Producer Groups
This survey consisted of 12 producer groups which were classified as either certified or non-certified, the farmers who were sampled were distributed (Table 3). These were sampled from 122 villages and 52.7% of the farmers were certified while 47.3% were non certified. In the certified group of farmers, FT and Utz had the highest percentage (19.2%) and among the non-certified group FT south/FT and Utz control had the highest percentage (27.3%).
4.3. Outliers’ Analysis Using Box and Whisker Plots
4.3.1. Sequential Identification of Outliers
Further exploratory analysis of the variables was done using box plots to display the spread of the data a glance. This presented the overall shape of the graphed data which included its symmetry and departure from assumptions. According to Hawkins (1980) an outlier defined as an observation that deviates so much from other observations as to arouse suspicion that it was generated by a different mechanism.(Johnson and Wichern, 2002) also defined an outlier as an observation in a dataset which appears to be inconsistent with the remainder of that set of data. In this study, we have considered the outliers as the data that lie outside the expected range of data distribution and it necessary to conduct an outlier analysis for the purpose of data validation. This can indicate errors and since the data used in this study is secondary data, it was not possible to check whether these outliers were indeed true values or erroneous data. Erroneous data can be caused by either; the enumerators during data collection (non-random) and data entry (random).
Davies and Gather (1993) came up with an important distinction between single-step and sequential procedures for outlier detection. Single step procedures identify all outliers at once as opposed to successive elimination or addition of datum. In the sequential procedures, at each step, one observation is tested for being an outlier. Outliers caused by errors may occur frequently, while outliers caused by events tend to have extremely smaller probability of occurrence (Martincic and Schwiebert, 2006)
Erroneous data is normally represented as an arbitrary change and is extremely different from the rest of the data. Due to the fact that such errors influence data quality, they need to be identified and corrected if possible as data after correction may still be usable for data analysis. Before we address the issue of identifying these outliers, we must emphasize that not all are wrong numbers. They may justifiably be part of the group and may lead to better understanding of the phenomena being studied. When an outlier is detected, the analyst is faced with number of questions (Andrews and Pregibon, 1978);
• Is the measurement process out of control?
• Is the model wrong?
• Is some transformation required?
• Is there an identifiable subset of observations that is important in its different behavior?
An exploratory analysis on the income indicators was done using box and whisker plots to display the spread of the data at a glance. This presented the overall shape of the graphed data which included its symmetry and departure from assumptions. In this study, total crop revenue was used as example for all the 7 indicators.
From Figure 2, all the PGs had outliers, C.A.F.E practices control Mbozi showed the highest variability of the observations and the highest number (12) of outliers all above the upper whisker, C.A.F.E practices Lima showed 4 outliers above the upper whisker, FT south, FT north and FT north/FT and organic control and Fair trade South/Fair trade and Utz Control showed less variability with each showing two outliers. Fair trade and C.A.F.E. Practices Control and Fair trade and C.A.F.E. Practices Kilicafe and Fair trade and Organic showed outliers clustered around the upper whisker. Fair trade and Utz had 4 values as outliers. Organic and Organic Control also had the outliers clustered around the two whiskers. There was minimal variability in the observations in most of the producer groups. The outliers were randomly distributed and all the PGs had at least one outlier.
In Figure 3, all the box plots except for Fair trade South were clear when the outliers were deleted in the original dataset and their number are reduced. C.A.F.E practices control Mbozi still showed extreme values (around 4,000,000) as outliers. C.A.F.E Practices Lima, Fair trade and C.A.F.E. Practices Control and Fair trade and C.A.F.E. Practices Kilicafe all had the same median value each with at least 1 outlier. Fair trade North, Fair trade and Organic, Fair trade and Utz, Organic and Organic Control each had 2 outliers. Fair trade North/Fair Trade and Organic Control had the highest number of outliers (7) clustered around the upper whisker. Fair trade South did not show any outliers. When high values were eliminated in Figure 1, the outliers were still random and some of the PGs started showing some variability.
In Figure 4, all the PGs showed that at least one existed with C.A.F.E. Practices Control Mbozi and Fair trade and C.A.F.E. Practices Control showing extreme values. C.A.F.E Practices Lima had 1 outlier, Fair trade and C.A.F.E. Practices Control and Fair trade and C.A.F.E. Practices Kilicafe had the same median with 5 outliers each. Fair trade South showed the least variability with 1 outlier as Fair trade North. Organic and Organic Control also had the same median value with 4 outliers each. Fair trade North/Fair Trade and Organic Control and Fair trade South/Fair trade and Utz Control showed less variability with 4 and 5 outliers respectively.
From Figure 5, the highest number of outliers were clustered around Fair trade and
C.A.F.E. Practices Control followed by Fair trade North/Fair Trade and Organic Control with 3 outliers then C.A.F.E. Practices Control Mbozi with 2 outliers which were extreme. Fair trade South/Fair trade and Utz Control, Fair trade and Organic and Fair trade and Utz each had 1 outlier. Fair trade South, Fair trade North, Fair trade and C.A.F.E. Practices Kilicafe, Organic and Organic Control had no outliers (50%) with Fair trade South showing the least variability in the data. To determine the summary statistics of the key indicators, we computed the descriptive of each indicator (Table 4) to show the changes in the sample size N, mean and standard deviation when data was cleaned thrice.
|Round of cleaning||0||1||2||3|
From Table 4, the sample size N for all the indicators was reduced from one round of cleaning to the next because of sequential deletion of outliers. Reduction in the sample size N after the third round of cleaning for Revenue _ ha was the highest (104), followed by coffee_revenue_per_ha (75) and the least was price_cert_sold_uncert (22). The mean of the indicators increased and decreased when extremely low values and extremely high values were trimmed of respectively. The value of N in all rounds of data cleaning decreased as entries were removed in the subsequent steps.
4.3.2. Distribution of the Outliers
The distribution of outliers across the PGs for all the key indicators was determined by calculating their percentages (Table 5)
|Control Mbozi C.A.F.E Practices Lima||72||25||14||9||4||34.7||19.4||12.5||5.6|
|FT North/FT and Organic control||119||39||26||26||16||32.8||21.8||21.8||13.4|
|FT South/FT and Utz Control||149||27||37||26||13||18.1||24.8||17.4||8.7|
|FT and C.A.F.E. Practices Control||70||116||15||10||10||22.9||21.4||14.3||14.3|
|FT and C.A.F.E. Practices Kilicafe||85||31||29||15||15||36.5||34.1||17.6||17.6|
|FT and Organic||52||19||20||7||5||23.2||24.4||8.5||6.1|
|FT and Utz||94||24||27||20||9||25.5||28.7||21.3||9.6|
From Table 5, before data was cleaned, FT north had the highest percentage of outliers (87.8%), followed by FT south (60%) and Fair trade South/Fair trade and Utz Control had the least (18.1%). In the first round of data cleaning, the percentage were reduced with FT north still with the highest percentage (42.9%) and Fair trade and C.A.F.E Practices Control with the least (21.4). The percentage of outliers continued to drop in the second and in the third round Fair trade south had the highest (30%) and Organic with the least (1%). Fair trade North showed relatively high number of outliers because this was more than 50% and the questionnaires that were administered in that PG were relatively low (49). The outliers across the producer groups are not randomly distributed (Table 5), because their percentages vary from PG to the next and none of the PGs has the same number of outliers. Outliers in Fair trade North and Fair trade South were clustered before after data was cleaned thrice.
4.3.3. Source of Outliers
The detection of influential subsets or multiple outliers is more difficult, owing to masking and swamping problems. Masking occurs when one outlier is not detected because of the presence of others, while swamping occurs when a non-outlier is wrongly identified owing to the effect of some hidden outliers (Pena and Yohai, 1995). Possible sources of outliers are: recording and measurement errors, incorrect distribution assumption, unknown data structure, or novel phenomenon (Iglewicz and Hoaglin, 1993). It is well known that outliers can seriously affect any inferences drawn if they are not treated appropriately. Their detection and treatment, however, can lead to considerably greater computational process. For that reason, removal of outliers effect can improve the quality of data used for statistical inferences. Isolated outliers may also have positive impact on the results of data analysis and data mining. Simple statistical estimates, like sample mean and standard deviation can be significantly biased by individual outliers that are far away from the middle of the distribution. The box plots below presents the outliers when each variable was plotted against the PG.
The amount of coffee sold had the highest number of outliers(43), followed by income from other crops(37) and the amount of coffee sold as not certified but were produced as certified(9). All the 7 variables except kg coffee sold had 0 entries. Income from other crops had both the highest number of extreme high values and the highest number of zeros (216). The distribution of the outliers across the producer groups were random and on the basis of these results, the most appropriate data cleaning procedure is perform cleaning twice because after the second round the mean and sample size are reduced. Sample size reduced by 127 which means we are likely to lose many observations in the subsequent cleaning. Since this was secondary data, it is difficult to verify whether the extreme values were really outliers or that was the real data that the farmer gave.
4.4. Comparison of the Producer Groups
To compare the two groups, certified and non-certified, we tested the hypothesis that;
H0: Both the certified and non-certified farmers have the same income
H1:Their income is different
Significance level: a=0.05; Rejection region: We reject the null hypothesis if p-value≤0.05
|Total crop revenue||Revenue_ha||Block_income_revenue_all_target_crop_revenue||Coffe_reveue_per_ha||Price_uncert||Price_cert_sold_uncert||Average_price_all_coffee_sold|
|Asymp. Sig. (2-tailed)||0.52||0||0.61||0.87||0.02||0.04|
|Asymp. Sig. (1-tailed)||0.26||0||0.3||0.44||0||0.01||0.02|
One-tailed p-values ≤ the specified α=0.05, we reject the null hypothesis that both the certified and non-certified farmers have the same income and conclude that there exist a significant difference in the income of the two certification types.
To compare the 12 producer groups, we tested the hypothesis that
H0: All the producer groups have the same income
H1: At least of the producer group income is different.
Significance level: a=0.05; Rejection region: Reject the null hypothesis if p-value ≤ 0.05
|Total crop revenue||Revenue_ha||Block_income_revenue_all_target_crop_revenue||Coffe_reveue_per_ha||Price_uncert||Price_cert_sold_uncert||Average_price_all_coffee_sold|
Since p-value=0 for all indicators ≤ 0.05=a, we reject the null hypothesis and conclude that at α=0.05 level of significance, there exist enough evidence to conclude that there is a difference among the producer groups based on their income.
These results shows that there exist two categories of income earned by the farmers, which are distributed across the two certification types (Figure 13). The categories are those who earned below Tsh 60,000 and those earn above Tsh 60,000. C.A.F.E practices control Mbozi and FT/ C.A.F.E practices control are non-certified yet their block income is highest (both above Tsh 100000).FT South which certified had the lowest block income. Organic and Organic control had equal block income yet they belong to different certification type. This suggests that there are likely other factors that have contributed to the rise in farmers income, or that when certification programs were initiated the farmers were already established and they intervention, their impacts were negligible.
C.A.F.E practices control Mbozi which is non-certified, had the highest total crop revenue (above Tsh 120000), Organic and organic had the same total crop revenue.
The coffee revenue per hectare earned was randomly distributed across the two certification types. C.A.F.E practices control Mbozi had the highest coffee revenue per hectare(Tsh 1250000). FT/C.A.F.E practices control and FT/C.A.F.E practices
Kilicafe had the same coffee revenue per hectare.
Figure 4.16 shows random distribution of revenue per hectare across the two certification types. C.A.F.E practices control Mbozi had the highest revenue per hectare, followed by FT north/FT control and organic both of which are non-certified. Most of the certified farmers had revenue per hectare less than Tsh 500,000.
4.5. Comparison of Farmers Food Security Across the Producer Groups
The producer groups’ frequency of food insufficiency was divided into 4 categories (0 days, 1-9 days, 10-29 days and 30+ days).(Table 8)
These results suggests that Farmers who are certified are generally food secure because all the producer groups responded that they had insufficient food in the 0 days interval and their percentages reduced drastically as the interval of days of food insufficiency became wider. FT North and FT North/FT organic control had 6.1% and 2.5 % respectively in the 30+day’s interval. In the 0 days interval, FT South had the highest percentage (100%) followed by FT south/FT and Utz control (96.4%) and the least was C.A.F.E practices Lima (61.1). In the 1 - 9 days interval, FT north /FT organic control had the highest percentage(12.6%), followed by Organic(10.2%) and least were FT south and FT and organic each with 0%. None of the producer groups had food insufficiency days in the 10 - 29 days interval. This shows that farmers from FT south PG are the most food secure and those from FT North the most food insecure because they have the highest percentage (6.1%) in the 30 or more day’s interval of insufficient food.
|Producer group||0 days||1-9 days||10-29 days||30+ days|
|C.A.F.E. Practices Control Mbozi(N=98)||81.6||3.1||0||0|
|Fair trade - South(N=10)||100||0||0||0|
|Fair trade North(N=49)||69.4||8.2||0||6.1|
|FT North/Ft Organic Control(N=119)||76.5||12.6||0||2.5|
|FT South/FT andUtz Control(N=149)||96.4||0.7||0||0|
|FT and C.A.F.E. Practices Control(N=70)||77.1||5.7||0||0|
|FT and C.A.F.E. Practices Kilicafe(N=85)||94.1||2.4||0||0|
|FT and Organic (N=82)||91.5||0||0||0|
|FT and Utz(N=94)||80.9||1.1||0||0|
4.6. Relationship Between Income and Food Security
The number of days that any member of the farm family did not have enough to eat during the last production year was evaluated across the producer groups. The revenue per hectare was used to compare the food security and insecurity situation for both the certified and non-certified groups. In this context, food security has been described as 0 to mean the days of food insufficiency and 1 means the days of sufficiency as described (Figure 17 and Figure 18).
Figure 17 shows that FT and Utz had the same number of farmers who responded that the number of days of food security were equal to the number of days of food insecurity. C.A.F.E practices Lima, FT north FT and C.A.F.E practices Kilicafe had the farmers whose number of days of food security were higher than the days of food insecurity. Ft and organic and organic each had farmers whose number of food insecurity were higher than the number of food security. None of the farmers from FT south were food insecure.
Figure 18 shows there exists two categories, those whose revenue per hectare is higher than Tsh 500,000 and those whose revenue per hectare is less than Tsh 500000. C.A.F.E practices Mbozi, FT north/FT and organic control, FT south/FT and Utz control and FT and C.A.F.E practices control had farmers whose number of days of food security was higher than the number of days of food insecurity. Only organic control had farmers whose number of days of food insecurity was higher than the number of days of food security.
5. Conclusion and Recommendation
This study showed that an outlier analysis by deletion of data points that deviated from the mean more than three times the standard deviation reduced the sample size generally to reflect that of an average farmer in the certification scheme and not a representative of the whole population. Since it was not easy to check for the validity of the data we deleted the outliers and this was treated as missing data in the subsequent analysis. It was also realized that livelihood improvement in the certification schemes has been determined by a wide range of factors apart from adoption of the producer groups. These have influence the revenue that the farmers get from coffee farming. In addition, factors that influence the type of certification that the farmer join differs relatively to the type of certification. The food security situation was affected by different factors as even farmers who certified had insufficient food to eat in the last production season and vice versa. The key indicators that we used to assess the farmers livelihood showed that generally adoption of the various coffee certifications programs have positive impacts on income and food security. The tests used showed that there exists significant difference between the producer groups. Generally the certified farmers were more food secure than their counterparts in the last production season.
From the research findings, we recommend first time outlier analysis and deletion of these outliers from the dataset. In the second round of outlier analysis, these outliers should not be deleted from the dataset but are excluded when performing descriptive analysis. The information on food security should be collected in a standardized way, rather than asking the farmers the number of days they had deficit in the last production season, they be asked the number and probably name the months which they had insufficient food. The scale of measurement for the coffee yields should be normalized.
I wish to express my sincere thanks to Dr. Dagmar Mithoefer and Ms. Eddah Nangole both of ICRAF GRP 3 for allowing me to use the COSA surveys data and technical support respectively.