Multiple Linear Regressions for Predicting Rainfall for Bangladesh

Agricultural economy is largely based upon crop productivity and rainfall. For analyzing the crop productivity, rainfall prediction is require and necessary to all farmers. Rainfall Prediction is the application of science and technology to predict the state of the atmosphere. It is important to exactly determine the rainfall for effective use of water resources, crop productivity and pre planning of water structures. Data mining might be used to make precise predictions for rainfalls. Most widely used techniques for rainfall is clustering, artificial neural networks, linear regression etc. In this article multiple linear regressions used for predicting rainfall in Bangladesh.


Introduction
Irrigation is the prime need for agricultural crop production in Bangladesh and most of irrigation depends upon the rain. A good rainfall result in the occurrence of a dry period for a long time or heavy rain both affect the crop yield as well as the economy of country, so due to that early prediction of rainfall is very crucial. A wide range of rainfall forecast methods are employed in weather prediction at regional and national levels. Fundamentally, two approaches are used for predicting rainfall. One is Empirical approach and the other is Dynamical approach. The empirical approach is based on analysis of historical data of the rainfall and its relationship to a variety of atmospheric and oceanic variables over different parts of the world. The most widely used empirical approaches, which are used for climate prediction, are regression, artificial neural network, fuzzy logic and group method of data handling. On the other hand in dynamical approach, predictions are generated by physical models based on systems of equations that predict the evolution of the global climate system in response to initial atmospheric conditions. The Dynamical approach is implemented by using numerical rainfall forecasting method [1]. In statistical analysis, regression models are often used for estimating the future events or values based on the previous values and events. Trend extraction and curve fitting methods are also used to estimate the future behavior of the time series and to fit the future data according to the trend. Regression is a statistical empirical technique and is widely used in business, the social and behavioral sciences, the biological sciences, climate prediction, and many other areas. Regression analysis includes parametric methods such as linear and logistic regression. Non-parametric methodologies such as projection pursuit, additive models, multivariate adaptive regression etc. have also been applied to estimation and prediction problems [2].

Reviews and Previous Findings
A model was proposed to estimate rainfall in Esparto using data mining process. Author used monthly rainfall values of Senirkent, Uluborlu and E˘girdir stations. The relative error of this model was 0.7% [3].
A forecasting model was proposed for prediction of gold price using linear regression. Author used factors such as inflation, money supply and concluded that MLR perform better than Naïve method of prediction [1].
MPR technique, an effective way to describe complex nonlinear I/P-O/P relationship for prediction of rainfall and then compared the MPR and MLR technique based on the accuracy [4].
Once upon described the development of a statistical forecasting method for SMR over Thailand using multiple linear regression and local polynomial-based nonparametric approaches. SST, sea level pressure (SLP), wind speed, EiNino Southern Oscillation Index (ENSO), and IOD were chosen as predictors. The experiments indicated that the correlation between observed and forecast rainfall was 0.6 [5]. a model which incorporate regression and artificial neural network (ANN) model to predict industry sales using both historical sales as well as economic indicator as predictor variable [6].
A model to forecast the growth potential of height with precision based on multiple polynomial regressions. This model is very helpful in children growth study [7].

Multiple Linear Regressions
Regression attempts to determine the strength of the relationship between one dependent variable usually denoted by Y and a series of other changing variables known as independent variables. In simple regression there are only two variables where one is the dependent variable and other is the independent variable and the relation among them is of kind as below. This is known as the deterministic model Y=A+BX Here Y= Dependent variable X= independent variable A, B= Regression parameters In Multiple regressions there are more than two variables among which one is dependent variable and all others are independent variable and the equation look like this: Yi =β0 +β1xi1+β2 xi 2+β3xi 3.....βpxip (2) To develop the multiple linear regression equation the parameter is obtained from the training data and variable are extracted from the dataset using correlation.
The quantity r, called the linear correlation coefficient measure the strength and direction of relationship between the two variables. The linear correlation coefficient is sometime called Pearson product moment correlation coefficient. The mathematical formulae for r are given as [8]: The coefficient of determination measures how well the regression line represents data, if the regression line passes through every point on the scattered plot it would be able to explain all of the variation [7].

R-Squared = Explained Variation/Total Variation
A high r2 shows that there exists a linear relationship between the two variables. If r2=1, it indicates the perfect relationship between the two variables [2]. The standard error of the estimate is a measure of the variability of predictions in a regression [8]. Let us consider yest as the estimated value of y for a given value of x. This estimated value can be obtained from the regression curve of y on. From this, the measure of the scatter about the regression curve is supplied by the quantity [9]: . ∑ The above equation 3 is called the Standard Error of Estimate of y on x.

Rainfall Prediction Using MLR
Universal processes of forecasting rainfall amount involve Data collection, data preprocessing and data selection, Reduction of explanatory predictor, building model using regression and at the last validity check [10][11][12][13].
Data Collection is the first most important step for data mining. The Weather dataset is collected from Bangladesh metrological department. The department maintains the dataset in the form of excel sheet on monthly as well as yearly basis.
Data Preprocessing is the next challenging task in data mining, the data obtained till now is noisy and there are some missing values and some unwanted data. The data have to clean by filling missing values and removing the irrelevant data.
Data selection is the next step after the data preprocessing here we have to select the data which are relevant to our analysis and left all other data we use correlation to determine which are correlate or not.
After that the predictors which have high inter correlation with others are reduced because the presence of many highly inter correlated explanatory variables may substantially increase the sampling variation of the regression coefficients, and degrade the model predictive ability. The next step after the reduction explanatory predictors is the building model with the use of training data. The technique used here is linear regression technique.

Experimental Results
The experiments were performed to evaluate the accuracy of rainfall prediction using multiple linear regressions. The prediction results are reported in this section. To measure the quality of the MLR equation, the predicted rainfall amount is compared with actual rainfall.
For experiments, regional rainfall data taken from Rajshahi, Bangladesh and precipitation, cloud cover, average temperature and vapor pressure are used as predictors. The data set for 30 years is used for the experiment. The following table shows the details of the predictor's correlation with the rainfall for prediction. When the MLR equation is used with test data for testing the accuracy of the MLR equation we obtain the rainfall amount which is close to the actual rainfall data, the graphical representation between the actual and predicted value of rainfall is represented in graph given below. The author plots a graph depicting the relationship between the actual value of rainfall data and predicted value of rainfall using Multiple regression equation and from the graph it is observed that MLR method for prediction of rainfall achieve closer values between actual and predicted rainfall values.

Conclusion
We have selected a method for rainfall prediction after analysis of Rajshahi rainfall dataset which is derived by some data mining techniques like firstly apply correlation analysis then regression analysis. Rainfall has a great impact on agriculture, economy not only in Bangladesh but across the whole world. So that we can predict rain in the future year by knowing climate factors which is very useful for farmers for their agricultural work. This is the only prediction regarding rain but not accurate because of climate factors. As we know that climate factors changes due to different reasons and here we have used some factors so other remaining factors can influenced the rain.