Fitting Wind Speed to a 3-Parameter Distribution Using Maximum Likelihood Technique

Kenya is one of the countries in the world with a good quantity of wind. This makes the country to work on technologies that can help in harnessing the wind with a vision of achieving a total capacity of 2GW of wind energy by 2030. The objective of this research is to find the best three-parameter wind speed distribution for examining wind speed using the maximum likelihood fitting technique. To achieve the objective, the study used hourly wind speed data collected for a period of three years (2016 – 2018) from five sites within Narok County. The study examines the best distributions that the data fits and then conducted a suitability test of the distributions using the Kolmogorov-Smirnov test. The distribution parameters were fitted using maximum likelihood technique and model comparison test conducted using Akaike’s Information Criterion (AIC) and the Bayesian Information Criterion (BIC) values with the decision rule that the best distribution relies on the distribution with the smaller AIC and BIC values. The research showed that the best distribution is the gamma distribution with the shape parameter of 2.071773, scale parameter of 1.120855, and threshold parameter of 0.1174. A conclusion that gamma distribution is the best three-parameter distribution for examining the Narok country wind speed data.


Introduction
Distribution characteristics refer to the wind speed parameters like the mean, variance, standard deviation, and covariance. These parameters vary from place to place depending on various factors like the length of data observed, the site of an experiment, and time of observing the wind speed data among others. Therefore, there is a need to study the variations of these parameters for any specific site before installing the wind plant. This is only possible if there is an approved statistical distribution that has been examined and recommended for the site or region of interest. Many researchers have come out to model wind speed using different statistical distributions with 2-parameter and 3parameter namely: log-normal distribution, Weibull distribution, gamma distribution, Rayleigh distribution, and Erlang distribution among others in most parts of the world with a good quantity of wind [4,17].
In Kenya, the past reviews indicate that the researcher has only considered Weibull distribution with 2-parameter to be the best for study the wind speed leaving behind the 3parameters distribution yet the 3-parameter may be best than the 2-parameter distribution for studying the wind speed for this region [4]. The best 2-parameter and 3-parameter distributions used and recommended by a good number of researchers are Weibull distribution, gamma distribution, and lognormal distribution [12]. The 2-parameter distributions involve scale and shape parameter where the scale guides on how windy the region is (statistically meaning the distribution of the wind) and the shape parameter guides on how peaked the region is (statistically indicating the most frequently expected wind speed) and the 3-parameter include the third parameter called threshold parameter which now helps in understanding the minimum expected wind speed in the region [12].
In Kenya, Narok is one of the regions with plenty of wind making it one of the places with the potential of generating more wind energy [17], and this why the wind speed data for the study was collected in Narok county. To assist the wind sector to achieve their target of 2GW of wind energy by 2030, Maximum Likelihood Technique the stakeholders need to understand the wind speed characteristics of this region. This is because the wind speed is the most significant factor in the installation process of any wind plant. Hence there is a need to have complete information about wind speed characteristics. This can only be possible by having a recommended statistical distribution for examining the wind speed data. There are existing statistical distributions for studying wind speed data for a different specific region but the problem is that among the existing wind speed distributions there is no underlying 3-parameter distribution for studying the wind speed of Kenya. Therefore, leaving the wind industry and other investors with incomplete knowledge on the suitable distribution for examining wind speed data leading to less interest since they are lacking a control tool for their study. Therefore, this research fills this gap by analyzing the hourly wind speed data, using the maximum likelihood by fitting the three-parameter statistical distributions of Weibull, log-normal, and gamma, to enables us to choose the best distribution among the three that fits the data perfectly to help for studying wind speed.
The three-parameter distributions used are represented as follows;

Weibull Distribution with 3 Parameters
The Weibull statistical distribution with three parameters is given by [1,2,6,18,26,27]; and the cumulative density function is given as below Where: u is the wind speed b is the shape parameter p is the scale parameter measured in m/s w is the thresh-hold parameter

Lognormal Distribution with 3 Parameters
This distribution has three parameters namely scale parameter, shape parameter, and thresh-hold parameter also known as location parameter. The probability density function and the cumulative density function are given by the below equations [3,14,19,29], The cumulative function is given as; Where: v > 0 is the scale parameter. k > 0 is the shape parameter.
y is the thresh-hold parameter, also referred to as the location parameter.
p ≥ is the wind speed

Gamma Distribution with 3 Parameters
From the past studies [11,13,20,28], the gamma function is given as follows; ,, 3, 4, 5 = Where: q is the scale parameter. z is shape parameters t is the thresh-hold parameter The Γ is defined by

Maximum Likelihood Estimation Method (MLE)
According to wind sector prospect Kenya [21], maximum likelihood estimation techniques are more precise than the method of moments estimation, least-square estimation, and also graphical methods and can be applied in many problems since it has a strong intuitive appeal and it yields a reasonable estimator. He also stated that the maximum likelihood method is widely used because it is more precise especially when dealing with large samples since it yields an excellent estimator when the sample is large. Maximum likelihood function C D of M is a solution to the maximization problem given as [9,10,12,15,16,[22][23][24][25].
Where x1,...., xN represents the wind speed observations. Under suitable regularity conditions, the first-order condition is given as These conditions are generally called the likelihood or loglikelihood equations. The first derivative or gradient of a condition (log-likelihood) solved at the point C D satisfies the following equation [10,16,[22][23][24][25] The log-likelihood equation that corresponds to a linear or non-linear system of P equations with P unknown parameters M1,... MP is given by Maximum likelihood is a recommended technique for many distributions because it uses the values of the distribution's parameters that makes the data more likely than any other parameters. This is achieved by maximizing the likelihood function of the parameters given the data. Some good features of maximum likelihood estimators are that they are asymptotically unbiased since the bias tends to zero as the sample size increases and also they are asymptotically efficient since they achieve the Cramer-Rao lower bound as sample size approaches ∞ and lastly they are asymptotically normal [16,21].
The third parameter called the threshold parameter is also known as the location parameter which determines where to shift the 3-parameter density function along the X-axis. The threshold parameter locates the distribution along the time scale and has the same units as the distribution's variable units. This third parameter is used to try to fit the data point into a straight line when the initial data do not fall on a straight line [12,16]. After obtaining the threshold parameter, we subtract it from the original data and obtain a new data set which is then used to estimate the other two parameters (shape and scale parameters). Since the threshold parameter value is not constant, we use the Akaike's Information Criterion (AIC) and the Bayesian Information Criterion (BIC) to estimate the threshold parameter. The threshold value will the lowest AIC and BIC values will be considered to be efficient and precise for further analysis. It will be subtracted from the original data set and the resulting data set will then be used for estimating the scale and shape parameters for both Weibull, Lognormal, and Gamma 3-parameter distributions using the same maximum likelihood estimates obtained for the 2-parameters.
The shape and scale for the 3-parameters are estimated as follows;

Weibull Distribution Estimates
Where is the shape parameter and p the scale parameter, the estimate for ̂ is given as Equations (8) and (9) can be solved simultaneously for \ which also obtains ̂ subsequently [1,7,8,12].

Lognormal Distribution Estimates
For the k shape parameter and v scale parameter from equation (2), their estimates are obtained as shown [2,5,14].

Gamma Distribution Estimates
Using z as the shape parameter and q as the scale parameter in equation (3), their respective estimates are obtained by [11,20];

Test of Goodness of Fit
After analyzing the data using the three statistical distribution, it is important to verify the suitability and the accuracy of the distribution by performing the goodness of fit test using Kolmogorov-Smirnov statistics which simply tell how good the data followed any specific distributions among the three distribution namely gamma, Weibull and log-normal distribution to help have the most precise and reliable distribution [3,18,19]. And the goodness test criteria will be examined using Akaike's Information Criterion and Bayesian Information Criterion.

Kolmogorov-Smirnov Test
This is a two-sample test with the advantage that it does not depend mostly on the underlying cumulative distribution function being tested and also applies only to continuous distributions which in this case is applicable since we are only investigating the continuous statistical distributions [19]. It is calculated as, Where: 5 is the proportion of t1 values less than or equal to t # 5 is the proportion of t2 values less than or equal to t r A : The data follows a specified distribution r : The data do no follow the specified distribution The smaller the test statistic the better the fit.

Akaike's Information Criterion (AIC)
The Akaike's Information Criterion is calculated as shown below Where log L (P) defines the value of the maximized loglikelihood objective function for a model with w parameters. A smaller AIC value represents a better fit. Maximum Likelihood Technique

Bayesian Information Criterion (BIC)
The Bayesian Information Criterion is calculated as below wtu = −2avGH + avGC (14) Where log L (P) represents the values of the maximized loglikelihood objective function for a model with w parameters fit M data points. A smaller Bayesian Information Criterion value indicates a better fit (best model for fitting the data)

Model/Distribution Comparison Test
This was done using the comparison between the Akaike's Information Criterion and the Bayesian Information Criterion for the two distributions whereby the distribution with the smallest Akaike's and Bayesian Information Criterion values will be picked as the best.

Maximum Likelihood Analysis: Threshold Estimation
The three parameters are the threshold, shape, and scale parameters. To find the precise threshold parameter, we investigate different threshold values using AIC and BIC values. This is after finding the original threshold value and then performing several iterations to get the precise threshold value since it is not a constant value as shown in table 1.
From table 1, it can be observed that gamma is giving smaller AIC and BIC values under all threshold values. Therefore, gamma is the best distribution. The threshold value used for the three-parameter analysis is 0.1174. The reason for picking 0.1174 is because from the analysis of the originally collected wind data using the Minitab statistical package it is identified that gamma has a threshold value of 0.1174 m/s as shown in Figure 1.

Summary Statistics Using the Threshold Value of 0.1174
Using this threshold value, we have the following summary statistics.

Test of Goodness of Fit
To confirm that the data follows either of the three distributions namely Weibull, gamma, and log-normal a goodness test of statistics was performed using Kolmogorov-Smirnov, and results are given in table 3. From table 3, it can be seen that data follows the Weibull and gamma distributions and lognormal distributions since their statistics values are less than the critical value of 0.136, making us not to reject the null hypothesis stating that the data follows either of the three distributions. The data fit gamma distributions best compared to the other two distributions since gamma has the smaller test statistic value (0.033614). With the threshold parameter as 0.1174, the other two parameters namely shape and scale parameters are given in table 4.

Best 3-parameter Distribution
Using the maximum likelihood method, we conclude that for the three-parameter distributions, the gamma threeparameter distribution is the best for fitting the data since it has the smallest Akaike's Information Criterion value of 189803, Bayesian Information Criterion value of 189821.1, and the smallest Kolmogorov-Smirnov test value of 0.033614. The three parameters are threshold (0.1174), shape (2.071773), and scale (1.120855) as indicated in table 5.

Conclusion
From the analysis, we can conclude that using the maximum likelihood method gamma distribution with three parameters is the best distribution for fitting the Narok region wind speed data since it yields lower AIC and BIC values. The distribution is given as: Where gamma function in this case is treated as a continuous function depending on the wind speed data.

Recommendation
We recommend that for the investors or wind industry interested in studying and/or predicting the Narok wind speed, they should use the gamma distribution since it will give the best wind speed probabilities compared to the other form of distributions. We also recommend to researchers, investors, and wind industries to apply this type of gamma distribution in the examination of wind speed distribution in the other regions/parts of the country and also in other parts of the world.