Modelling Extreme Temperature Using Extreme Value Theory: A Case Study Northern Kenya

The impacts of extremely high temperatures on plants, human beings and animals’ health have been studied in several parts of the world. However, extreme events are uncommon and have only attracted attention recently. In this study, extreme temperature behavior was modelled through the application of extreme value theory using maximum monthly temperatures over a 36 years period. Data on monthly maximum temperature from the Mandera, Wajir and Lodwar stations was modelled using generalized extreme value (GEV) and generalized Pareto distributions (GPD) models. The results revealed that the GEV model was better in modelling extreme temperature behavior because it had the least AIC and BIC values. Two comparative tests, namely, Anderson-Darling and Kolmogorov-Smirnov confirmed the GEV model to be adequate for the data. Diagnostic checks of the two models using probability-probability (PP) plot, quantile-quantile (QQ) plot, return level plot and mean residual life plot revealed that the GEV fitted the data well. Return periods of 5, 10, 20, 50 and 100 years also revealed an increasing trend for long return periods.


Background of Study
Recent special reports on climate extremes have shown evidences of changes in the patterns of climate extremes at global, regional and local scales. Understanding the characteristics of climate extremes at regional and local levels is critical not only for the development of preparedness and early warning systems, but is also fundamental in the development of any adaptation strategies. The East African community is prone to climate and weather extremes with a highly variable climate, and has relatively high levels of population exposure and vulnerability. Specifically, Kenya is not new to extreme rainfall and the study of Parry, Echeverria, Dekens, and Maitima [1] found out that Kenya's exposure to climate risk is high, experiencing major droughts about every 10 years and moderate droughts or floods every 3 to 4 years, and as such regarded as one of the most disaster-prone countries in the world. We have had some research work on extreme temperatures in several countries across the globe with most researchers interested in developing appropriate statistical methods for extreme events that provide a significant help towards these problems. In the past few years, there have been several researches concerning extreme climatic events such as those by [2][3][4][5][6] Most of the research work is based on Extreme Value Theory (EVT) which is a branch of statistics dealing with asymptotic behavior of extreme events, this theory has been applied in areas of meteorology, hydrology, ecological disturbances and finance with an aim of characterizing rare events and tails of distributions. Since Kenya is an agric-based economy and the effect of climate change induced temperatures pose great challenges and opportunities, modelling annual extreme temperatures in Kenya cannot be overemphasized. It goes without saying that, economic planners, climatologists, meteorologists, and policy makers in Kenya need to understand extreme temperature patterns and future behaviors for effective decision making, planning and mitigation purposes. Extreme Value Theory (EVT) furnishes us with pertinent tools for modelling and predicting extreme temperature in Kenya [7] and this is the focus of this article.

Statement of the Problem
Temperature extremes are considered to be the most important climate events and have been extensively explored over the past several decades, nearly a third of the people in Africa already live in the drought-prone areas, it is estimated that climate change will add up to over 80 million people at risk of hunger by 2080 [8]. With increasing anthropogenic influence evident on the climate system, such events are projected by the IPCC (2007) to increase over the coming century. It is increasingly becoming apparent that behind the ongoing research and debate on climate change, many parts of Africa are already witnessing dire consequences of erratic climatic conditions that are likely associated with regional climatic changes [9], this is expected to pose unprecedented challenges to most African economies that are significantly hinged on a predominantly rain-fed agriculture.

General Objective
The main objective of this study is to develop a model which can be used to predict extreme temperature return levels for given return periods.

Specific Objective
1) To determine the appropriate distribution for the tails of the distributions of temperature. 2) To determine the exceedance probabilities for selected levels of temperature. 3) To determine the return periods and their corresponding return levels for temperature in the region.

Justification of the Study
Kenya experiences serious threats to social economic development due to climate related events such as prolonged drought, flash floods, unpredictability or rain and extreme weather. Several published papers have analyzed extreme rainfall using either generalized extreme value (GEV) or generalized pareto (GPD) distributions which provides evidence of the importance of modelling rainfall from different regions of the world: Europe. The observational and statistical modeling results of the above mentioned studies have shown that there are remarkable increases in intensity of precipitation extremes. However, there has been little or no published research that has attempted to detect extreme temperatures by using GEV or GPD in Kenya. Therefore, this paper would seem to be the first application of the GEV and GPD distributions for extreme temperatures in Kenya and will significantly help decision-makers, risk management and researchers in climatology with knowledge about the behavior of extreme temperatures to enable them come up with appropriate policies and plans so they can prepare the general public for changes due to extreme temperatures.

Data and Research Methodology
The objectives of any research may not be achieved without the analysis of some form of empirical data or information. In line with this, secondary data comprising records of maximum monthly temperatures were obtained from the meteorological services department. The data spanned from 1980 to 2016 with the maximum temperature for each of the twelve months in a year chosen. Extreme value analysis was performed on this study by fitting both the generalized extreme value distribution and Generalized Pareto Distribution using method of maximum likelihood estimates (MLE).

Generalized Extreme Value Distribution
Consider X 1, X 2, X n is a sequence of independent and idenditdically distributed (iid) random variables with a common distribution say G [8]. Let M n= max (X 1 , X 2 ,……., X n ). The exact distribution of Mn is H n . Suppose that there exists sequences of constants b n > 0 and a n such that: Where G is non-degenerate distribution function, then G belongs to the Gumbel, Fr´echet and Weibull families. The cumulative distribution function of these three distributions can be summarized by the GEVD given by: where x are the extreme values from the blocks, µ a location parameter; σ a scale parameter; ζ a shape parameter. The condition for a distribution to belong to any of the extreme value distributions is given as: a) ζ=0, Gumbel distribution b) ζ > 0, Fr´echet distribution c) ζ < 0, Weibul distribution

Generalized Pareto Distribution
In the heart of the threshold exceedances approach, there is the GPD. The GPD is originally pioneered by Balkema and de Haan [9], then formally introduced by Pickands III [10] as an appropriate asymptotic model for modelling stochastic behavior of residuals above the threshold. Smith [12] and Coles (2001) consider the POT as a better alternative analysis of extremes compared to the block maxima or block minima approach due to the capability of the POT approach to use as much as possible of available information. The data which exceed the threshold is modelled according to the GPD. The CDF of the GPD is

Threshold Selection
The analysis of extrees based on the POT approach is valid provided the threshold above which observations are extreme values is neither too high nor too low. When the threshold is too high, there are few positive excesses above the threshold and hence a large variance. Looking on the other side, a low value of the threshold clues to the destruction of the asymptotic feature of the GPD, implying bias [13] To this effect, the main requirement of the threshold is to be sufficiently high for the purpose of maintaining a balance between bias and variance. Among several threshold selection tools that are proposed in literature, this section discusses few that are frequently used by most researchers.
The issue of threshold selection is similar to that of selection of block size in the block maxima approach. The choice of the threshold is not straightforward and usually a compromise has to be found. A high threshold value reduces the bias as this satisfies the convergence towards the extreme value theory but however increases the variance for the estimators of the parameters of the GPD, as there will be fewer data from which to estimate the parameters. A low threshold value on the other hand, results in the opposite i.e. a high bias but a low variance of the estimators, but there is more data with which to estimate the parameters. Consequently, various graphical techniques have been proposed for use in selecting an appropriate threshold. These include mean excess plot, parameter stability plot and selection based on empirical quantiles.

Model Diagnostics
Exploratory data analysis is often used to test the "goodness-of-fit" of sample observations to specific target distributions [11]. A few graphical tools have been extensively used to detect heavy-tailed behavior or extremal behavior in observed data In view of the likelihood of modeling any combination of the "extreme value model parameters" (such as Temperature or Rainfall) as functions of time or other covariates, there is a wide range of models to choose from, and selecting the best fitting model becomes an essential issue. We will employ the Quantile-Quantile plots, Probability-Probability plots, Mean Excess plots, density Function Plot and Return Level plots to assess the quality of a fitted "Generalized Pareto model".

Model Selection
When there are two competing candidate models for a set of data, it important to subject them to test to see which of them better fits the data well [14]. There are many measures that can be used for estimating how well the model fits the data. Two of these models employed in this study are the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). The AIC is a measure which uses the log-likelihood but adds a penalizing term associated with the number of variables. The fit of a model can be improved by adding more variables. As a result the AIC tries to balance the goodness-of-fit versus the inclusion of variables in the model

Goodness of Fit Tests
The Kolmogorov-Smirnov and Anderson-Darling goodness of fit tests are used to assess the quality of convergence of the GEV distribution. The Kolmogorov-Smirnov test [15] which is based on the empirical cumulative distribution function and the largest vertical difference between the theoretical and the empirical cumulative distribution function, is used to decide if a sample comes from a hypothesized continuous distribution.

Return Level Estimate
Return level is the level that is expected to be exceeded on an average of once every t time periods with a probability of p. In this study, the return level is the maximum temperature amount and t corresponds to the selection intervals.  The analysis is based on monthly rainfall data in Tanzania available in World Meteorological Organization. The data have been recorded from 1901 up 2015 by considering available monthly rainfall data. The data set contains 1380 values of monthly rainfall. The table below shows the statistical summaries of annual extreme temperature.

Unit Root Test
In order to fulfill the stationarity assumption of the Generalized extreme value family of distributions, KPSS tests was conducted. The null hypothesis of KPSS test says that the distribution is stationary and the KPSS test results shown in Table 1 revealed that the returns series were stationary. To further confirm that the maximum temperature returns were stationary, an ADF test was conducted. The test results in Table 2 revealed that the maximum temperature returns were stationary at the 5% significance level The Mann-Kendall (MK) test which does not require normally distributed data and is well suited for analyzing datasets with missing data is also performed to detect for the presence of trend (increasing or decreasing). The null hypothesis states no presence of trend while the alternative states there is trend-test is performed to show the probability of trend occurring by chance as given by (P-values). If the P-value is less than 0.05, a trend is considered significant at 5% level of significance. As per the Table 3 below, in all the three stations the P-value is less than 0.05 and thus we can conclude that the trend is not significant.

Fitting the Generalized Extreme Value Distribution
The stationary maximum temperature was modelled using the generalized extreme distribution. Table 4 displays the parameter estimates, AIC and BIC values of the fitted model. The negative value of the shape parameter suggests that the Weibull distribution which is in the family of GEV distributions fits the data well, tentatively. In addition, the confidence interval exclude zero (0) and hence the distribution indeed belongs to the Weibull family of distributions.  The exceedance probabilities for the tails of the distribution was further explored. Table 5 presents the probabilities of exceedances for some selected temperatures. The exceedance probabilities for the maximum temperature is interpreted as, probability that, the average temperature falls above the absolute value of the temperature.  Table 5, it was observed that, the probability of observing a temperature of 39°C was very small (0.0010). Thus, the maximum temperature is unlikely to exceed 39°C. However, the chances of observing a maximum temperature of 34° C or more daily is about 0.3002. Some graphical techniques were then employed to ascertain the fitness of the GEV distribution to the data. The plots used were the QQ-plot, density plot, return level plot and a QQ-plot based on randomly generated data from the fitted GEV distribution function as shown in Figure 2.

Parameter Stability Plots
Firstly, in the parameter stability plot a range of thresholds were arbitrary selected with the nature of the data in mind for a total number of 30 thresholds. Based on variation of the minimum and maximum thresholds using the in2extreme package in R, it was revealed that a minimum threshold of 25 and a maximum threshold of 30 yielded the best stability of the parameters.

Mean Residual Life Plot
Interpretation of a mean residual life plot is not always simple in practice. The idea is to find the lowest threshold where the plot is nearly linear; taking into account the 95% confidence bounds. Mean residual life plots have been performed (see Figure 4). The plot confirms that 28 or any value slightly greater than this value is a good threshold choice. The downward behavior of the plot also suggests a light-tail distribution.

Parameter Estimation
Using the threshold value of 28 the estimated parameters of the GPD are shown in table 6. The shape parameter which is dominant in determining the qualitative behavior of the GP distribution is negative. The value of the shape parameter of the GPD (-0.23359) is almost the same as the estimated shape parameter value (-0.29423) in the GEV estimate. This shows that the distribution of excesses has an upper bound or upper end point and also is short-tailed. Just as in the GEV case, this is a Weibull distribution in the family of the generalized Pareto distributions.

Model Diagnostics
To further confirm that the threshold selected is good to use in fitting the GPD, diagnostic plots were plotted based on the selected threshold of 28. Figure 5 indicate that the assumptions for fitting the GPD to excesses over threshold were met. The diagnostic plots agreed with those of the GEV distribution function. The QQ-plot figure 5. Shows that all the points are approximately linearly distributed along the unit diagonal showing a good fit of the GPD for maximum temperature returns. This agrees with QQ-plot generated by randomly selected data from the GPD against the empirical quantiles in figure 5 (b). The empirical density plot in (a) also affirms how adequate the GPD is in terms of modelling the data. It was observed that the number of excesses is 70 for the chosen threshold. The return level plot in (d) is also convex as in the GEV distribution case. Apart from a few points at the upper portion which show departure, the rest of the points lie on the line.

Model Selection
The diagnostic plots suggest that both the GEV and GPD fit the data of maximum temperature returns well. However, their AIC and BIC values which are shown in Tables 4 and 6 respectively clearly revealed that the GEV distribution is superior in fitting the data because it has the lowest AIC and BIC values. In order to validate this conclusion, further goodness-of-fit tests were conducted using two nonparametric tests, namely, Kolmogorov-Smirnov (KS) and the Anderson-Darling (AD) tests. Table 7 shows the ranks of the GEV and GP distributions based on the two tests. The results indicate clearly that the GEV distribution is best based on the ranks. Also the likelihood ratio test was used to compare the fit of the two models to the data where the null model is the GPD with two parameters and the alternative model is the GEV distribution model with three parameters. The test yielded a p-value of 0.0000 for Mandera station, 0.0003 Wajir Station and 0.0015 for Lodwar station which is less than the 5% level of significance and hence the GPD model was rejected in favor of the GEV model. The critical value for Mandera Station of the chi-square distribution with one degree of freedom was 3.8415 and the test statistic was 20.6351. This further affirms the rejection of the null hypothesis in favor of the alternative as shown in Table 8.

Return Level Estimate
The return periods for maximum temperature was then estimated and presented in Table 8. It is evident that the maximum temperature ever achieved will start reoccurring at time T=100. The return levels clearly show an increasing trend as the years increase. However, return periods corresponding to long periods such as 100 are often considered and those corresponding to shorter periods such as 5, 10 and 20 ignored.

Conclusion and Recommendation
In this study, the monthly maximum temperatures from January, 1980 up to December, 2016 were studied using two extreme value distribution models. Before fitting the models to the data. All tests of stationarity proved our data to be stationary with no aspect of trend. Block maxima approach was used to fit the Generalized Extreme Value Distribution while the Peak over Threshold method was used to fit the Generalized Pareto Distribution. Between the two models developed, namely, the generalized extreme value and generalized Pareto distributions, only the former was adequate based on evidence from diagnostic plots, model stability checks and model comparison techniques. The return levels revealed that temperatures are rising and can reach unbearable levels in the far future. The return level estimate, which is the return level that is expected to be exceeded in a certain period of time is estimated as T=5, 10, 20 and 100 where results revealed that the temperature which exceeds the maximum temperature amount (36.7) of the observation period starts at time T=100.
This study will help decision makers in Kenya with knowledge about extreme temperatures events in the return periods considered which will enable in her in making appropriate decisions. As climate change persists, continuous preparedness and adaptation measures are essential for the Kenyan communities' resilience. Thus, this research will be useful in coming up with early planning, management, preparedness, response and mitigation. Although Kenya is heading in the right direction in terms of creating an enabling environment to respond to climate change, there is still much that needs to be done. In line with Kenya's vision 2030, there is need to fully implement the National Climate Change Action Plan.
Based on the study, we have shown how extreme value theory serves as a useful analysis tool in modelling extreme events. In this paper, we only considered two models and we hope our study of extreme temperatures using EVT can be very useful in understanding extreme temperature events in Kenya. Future studies can model both extreme Rainfall and Temperature in Kenya with respect to a speficic region. It is further recommended that the study be replicated in other regions to assess the extent of global warming so that mitigation measures could be adopted to reduce it. Researchers are encouraged to look into other areas of EVT applications such as using Bayesian approach or extreme quantiles to further investigate this problem in Kenya.