Meteorological Data Analysis for Arid Region of Karnataka

Meteorological data analysis is one of the time series prediction applications. Analysis of meteorological data give insights to the weather forecast and makes country more prepared for the worst situation like drought and flood. Northern part of Karnataka is usually a drought region. The paper provides insights into application of random forest and decision tree for a region of Karnataka called Raichur. The results of accuracy precision and recall are tabulated for Raichur region. There are 10 input features of climate considered in prediction of rainfall for a region. An accuracy of 96% is obtained after applying random forest to the meteorological data collected from IMD (Indian Meteorological Department). Raichur is an arid region of Karnataka which receives less rainfall. There were 13 input features considered for prediction of rainfall. The data was collected from Indian Meteorological Department (IMD) for a span of 17 years from January 1999 to December 2016 for prediction of rainfall. The decision tree classifier was applied to get an accuracy of 88%. The classification report shows a precision and recall of 0.90 and 0.97. Random forest an ensemble classifier was run through the dataset for an accuracy of 96%. The precision and recall of 1.00 and 0.99 was achieved. For both the algorithms a total of 11159 tuples were considered. There are total 11158 samples. The total training observations are 7810. The total testing samples are 3348. The decision rules are documented. Random forest algorithm shows a relative importance of parameters for Raichur rainfall prediction. A highest importance on rainfall prediction is Wet Bulb Temperature (WBT) and least important factor is Wind direction (FFF).


Introduction
Meteorological data analysis a field growing rapidly in atmospheric science. Large scale meteorological data is not supporting traditional approach. It requires a comprehensive approach to handle the data. One of the key area of climate research is extreme weather conditions like drought and flood. This paper focuses on one such issue which is the drought conditions north Karnataka regions like Raichur, Gulbarga, Bellary, Bagalkot. The paper focuses on prediction of rainfall for the Raichur region of Karnataka. Raichur is one of the drought stuck areas of Karnataka. The prediction shows an accuracy of 96% with 70% training sample and 30%test data set. There are total 10 parameters considered. These 10 features are related to humidity, pressure, temperature used to predict the rainfall of Raichur region. The data is collected from Indian Meteorological department (IMD) from 1999 to 2016. There are total 11158 samples. The total training observations are 7810. The total testing samples are 3348. A precision and recall is 0.99 and 0.97 respectively.
The Decision Tree algorithm [1] is applied for the Raichur data for prediction of rainfall. Decision tree is a classification learning technique which uses best spilt to categorize the data into target classes. Random forest is ensemble of trees which uses random sampling of input. It considers the output of different decision tree. Random forest combines output of different decision tree and then classifies the class target value.

Methodology
Supervised learning techniques are a model prepared through a training process. The input data called training data set is run on a model say a classification model to predict the output. If the predicted output is correct it is classified for the correct class label. This process continues until model achieves desired level of accuracy. One such supervised learning technique is classification. Classification has set of input features along with correct output feature. The algorithm learns by comparing the class label and predicted output.
Decision tree is a classification algorithm. It can be used to solve regression problems also. The algorithm goes as follows. It initially considers entire dataset to be root node. The tree is split based on maximum information gain or minimum impurity. This process is repeated until all nodes are pure, which is the terminal condition. [3] Random forest [5] is a classification algorithm which is an ensemble technique.
Random forest works as follows. It creates many decision trees. The output of each decision tree is given weightage. The output is calculated decision of different decision tree and weights given to it. The random forest uses random sampling. The advantage of random forest classifier is it wont overfit the data and applicable to categorical data.
Raichur is a region of north Karnataka which has defiant rainfall. Rainfall usually occurs in month of September. Decision tree algorithm was used for prediction of rainfall for Raichur region. The data is collected from Indian Meteorological department (IMD) from 1999 to 2016. There are total 11158 samples. The total training observations are 7810. The total testing samples are 3348. A precision and recall is 0.99 and 0.97 respectively. The data was initially considered to be root node. The Gini index is used to find best split of the node. Gini index is defined using the following formula: Gini index (node)=1-∑ j p j 2 Where P j is probability that the sample belongs to class j. In this case class label is rainfall.
The best split attribute at first level is vapour pressure. The level 2 of tree is split based on attribute temperature. At level 3 the splitting attribute is Station level Pressure. From the above results we can conclude that temperature, Humidity and station level pressure are the features used for predicting the rainfall in the region of Raichur.
Random forest is applied to Raichur data. The data is collected from Indian Meteorological department (IMD) from 1999 to 2016. There are total 11158 samples. The total training observations are 7810. The total testing samples are 3348. A precision and recall is 0.99 and 0.98 respectively. The measure for splitting each node is minimum Mean Squared Error (MSE). It is used as loss function for least squared regression. After running random forest algorithm it is found that the features affecting weather of Raichur are RH (Relative Humidity), Vapour pressure, MSLP (Mean Sea-Level Pressure) and DBT (Dry Bulb Temperature). The tree was constructed for depth of level 3.

Results and Discussion
The Raichur data was collected for a span of 17 years. The results of accuracy precision, recall and F1 score are tabulated as follows. The decision tree for Raichur region is as shown in figure  1.

Conclusion
Meteorological data analysis a field growing rapidly in atmospheric science. Large scale meteorological data is not supporting traditional approach. It requires a comprehensive approach to handle the data. Decision tree algorithm and Random forest is applied to Raichur data. The data is collected from Indian Meteorological department (IMD) from 1999 to 2016. There are total 11158 samples. The rainfall prediction for drought region of Karnataka is having accuracy of 96%. From the above results it is clear that the factors affecting rainfall of Raichur region are Vapour Pressure, temperature and sea level pressure. The least affecting factors are wind direction and wind speed.