Long Term Electricity Demand & Peak Power Load Forecasting Variables Identification & Selection

Electricity demand (kilowatt hour: kWh) and peak power load (kilowatt: kW) forecasting is very important for not only expansion planning purposes (long term), but also for dispatching purposes (short term). Hence, from the long term forecasting perspective to the very short term forecasting perspective, the nature of electricity demand and the peak power load forecasting has to be studied and understood very well. At first, the problem has to be understood very well, then the solution of this problem has to be studied and solved. These activities are in the scope of this research, development, demonstration, & deployment (RD) studies. The author thinks that the natural mechanisms of electricity demand and peak power load forecasting problem can be understood very well by finding, defining, identifying, and describing the factors (parameters, variables) that affect the electricity demand and peak power load. In this study, GATE is only used during corpus development as a backup check. R text mining package (Rtm) and TextSTAT are used as main text mining and analysis tools. 314 terms as candidate variable terms are found by this text analysis. Afterwards, all variables are studied and analyzed by a grey based natural reasoning with simple weighted average approach (WA) (only for long term factors as preliminary in this application) (on way of simple additive weighting method: SAW). Finally, 43 terms (e. g. population, weather, climate, economy, price) for variables are found for infant and mature RD studies of 100% renewable energy (RE) worldwide grid (Global Grid). Findings of this study can also be used in other grid types. It is believed that a specific dictionary and encyclopedia in this particular subject should be developed for researchers common sense which will also help building of the Global Grid Prediction Systems (GPS).


Introduction
Humankind lives with electricity today. It is consumed almost anywhere. It is generated with different methods. On one hand, there is non-renewable energy sources (NRES) group, which pollutes our World in every aspects and threatens our health and wildlife very much, on the other hand, there is renewable energy sources (RES) group, which is much cleaner than NRES group and threatens our World much less. In any case, all of these technologies (NRES or RES) change the environment. The question is about the amount and the effects of these changes. The main cause of change in the environment is humankind. None of the human activities is perfect. Human activities and decisions weren't good in past, they aren't good now, and they shall never be good enough in future.
Author of this study thinks that RES group should be preferred and 100% RES based grids should be operated. This research study is based on this main idea. Facts show us that 100% RES based grids may be operational in the long term. This utopia can't be possible by separated and distributed grids in our World according to this study's approach. Hence, subcontinents, continents and World should be connected for clean electricity generation. Interconnections can possibly be with some futuristic wire or wireless technologies (for wireless transmission correspond also to Dr. Prabhat Ranjan Tripathi). There are already some futuristic ideas announced and published for multi-continental and international electricity grids such as European Supergrid [1], Supergrid Concept for America [2], DESERTEC [3], Gobitec [4,5], Asian Super Grid [4,5], and Global Grid [6].
This research study focuses only on the Global Grid Concept. It is described as "a grid spanning the whole planet and connecting most of the large power plants in the world" [6]. The Global Grid Concept may be operational in the long term, so that research, development, demonstration, and deployment (RD 3 ) studies in several topics such as modeling of the Global Grid should be conducted and brought to scientific literature. One of the important steps of the Global Grid modeling is the electricity demand and peak power load prediction in the long run time horizon. Several methods should be studied and used for the electricity demand forecasting of the Global Grid Concept. Research findings will mainly affect the renewable power plants investments decisions, the Global Grid goals, and the international laws and agreements. Other important steps of the Global Grid modeling is the electricity demand prediction in the medium, short and very short run time horizons. The research findings of these studies will affect the whole Global Grid design. Some electricity demand and peak power load forecasting methods are based on variables (e. g. linear regression, multivariate regression, fuzzy inference system).
This research study aims to find, identify, and select the variables for the electricity demand and peak power load forecasting of the Global Grid (only for long term application and presentation due to research duration limitation).
This paper consists of four sections. Second section presents the literature review. Third section presents how variables are identified and selected for the electricity demand and peak power load prediction in the Global Grid by text analysis and grey based natural reasoning with simple weighted average approach. This approach and some identified and selected variables can also be used in grey, linear and non-linear regression and other models in the electricity demand and peak power load forecasting. The concluding remarks and further research are presented in the last section.

Literature Review
The literature review had two main folds. The first one was for finding relevant previous research studies in the electricity demand and peak power load forecasting subject (studies also include fuzzy word). The second one was for finding the appropriate publications that focused on the identification and selection of variables amongst all studies found in the first review step.
In the first review step, some online academic publication websites (totally 15 academic websites) were searched in June 2015 (search period: 20 days). These websites were alphabetically ordered as ACM Digital Library-ACM [7], ASCE [20], and World Scientific Publishing-WSP [21]. It was believed that searching and reviewing on these online academic publication websites was a good start for this research study and its following ones (a drop in the ocean: http://www.phrases.org.uk/ on web).
The first documents elimination was performed in this review step. The irrelevant studies weren't downloaded and stored in their specific folders. The journal and conference papers were only taken into account. The studies, that were related with the electricity demand forecasting, were also analyzed based on the websites.
The first three online academic publication websites were the Google Scholar, the ACM Digital Library, and the Springer. When these three websites were searched 96% of all research studies in this subject could be found. The Google Scholar covered 76% of all research studies in this subject. Henceforth, it was concluded that reviewing only on the Google Scholar was almost enough in this subject. Above all, reviewing on the Google Scholar, the ACM Digital Library, and the Springer were representing almost all previous research studies in this subject.
In the second review step, the saved and stored documents in the specific folder of this study were reviewed to find publications on the identification of variables. It was observed that there weren't any studies in this subject in this current collection. However, there were a few literature review and survey papers, that could be visited as for the previous studies.
One of the interesting literature review conference papers was by Elakrmi and Abu Shikhah [22]. It was found on the Jordan Engineers Association Conferences Websites (http://www.jeaconf.org/) reached by the Google Scholar. They mentioned that weather, cultural and social factors were used in the models. They also presented the methods in the literature in this subject. These methods were statistical-based methods, regression methods (single variable linear, polynomial, selected-model function such as exponential, logarithmic, and multi-variable), time series methods (ARMA: autoregressive moving average, ARIMA: autoregressive integrated moving average or Box-Jenkins, ARIMAX: autoregressive integrated moving average with exogenous variables, FARMAX: fuzzy autoregressive moving average with exogenous input variables, exponential smoothing, PCA: principal component analysis, similar-day approach, econometric or causal approaches, simulation or end-use approaches), artificial intelligence (AI) based methods (neural networks: ANN or NN, expert systems, fuzzy logic systems, support vector machines: SVM, particle swarm optimization: PSO) in their paper. They only mentioned two variables [22].
Another important study was by Hahn et. al. [23]. It was found on the European Journal of Operational Research journal website reached by the Google Scholar and the Science Direct. Their review was similar to Elakrmi and Abu Shikhah's approach. Other review studies had also similar review approaches.
This literature review was finalized in 29/08/2015 (search period in June 2015). It was understood that the researchers had never studied the identification and selection of variables for the electricity demand and peak power load forecasting of the Global Grid until June 2015.
This research study would hopefully give a good and interesting start for the scientific publications in this research topic.
The application in this study was only performed on the long term time horizon publications due to the limited time budget. The very-short, short, and medium term time horizon publications were left for the following research studies.

Identification & Selection of Variables
This study is performed and finalized under some limitations and preferences, such as the personal computer capacity limitation (Windows 10 Pro, Intel(R) Core(TM) i5 CPU 650 @ 3.20 GHZ, 6, 00 GB RAM), the internet connection speed limitation, the free software license or free and open-source software license applications and tools (offline or online) usage preference (academic study approach), and finally duration and time (no major coding, scripting, thousands of lines of code: KLOC) limitation. The simplified overall overview of the text analysis is presented in " Figure 2". The files are saved in an organized manner into four classified folders (long, medium, short and very short term collections). Only some of the documents (totally 220) can be classified and analyzed in this study. First, all files are converted into *. pdf format. Second, these files are merged by the PDF Split and Merge basic (http://www.pdfsam.org/). Third, this merged file for each classified folder is saved to its special folder. Fourth, these files are converted into *. txt and *. docx format by the PDF to Text (http://pdftotext.com/) and the PDF to DOC converter (http://pdf2doc.com/). Fifth, *. txt and *. docx files are compared and the most appropriate one is selected for the text analysis. These activities can be performed without any major trouble, because of having rather small number of files. When the number of files increases, the activities will take much longer. These data file preprocessing steps are presented in " Table 1". The conversion of *. pdf to *. txt is also performed by GATE Developer 8.0 (https://gate.ac.uk/) (language resources GATE document). The converted *. txt files are compared and the most appropriate one according to the author's view is selected as the corpus of this research (Corpus: "a collection of writing" [24][25][26][27][28][29][30][31]).
The corpus of this research is the documents that are collected and classified according to their forecasting period (long term collection). For instance, the *. txt file of the whole documents in the long term collection is the long term collection corpus (see in online electronic supplementary material files: ESM).
After this text analysis effort, all these terms as candidate variables are studied and analyzed by the grey based natural reasoning with simple weighted average (WA) approach (only for long term factors as preliminary in this application) on the way of simple additive weighting method (SAW). For natural reasoning see [41,42].
A simple natural reasoning model is built in this study as following (" Figure 9" and " Table 5"). IF term's impact AND data possibility is Very Low THEN Eliminate term directly, IF term's impact is Very High AND data possibility is Above Low THEN Choose term directly, ELSE Execute the current grey based WA approach (see ESM and " Figure 9"). There are 21 chosen terms (population, growth, temperature, economy, development, production, weather, climate, wind, employee, air temperature, anomaly, irradiance, irradiation, labor, radiation days, radiation, developing countries, domestic production, economic development, economic growth). There are 22 selected terms (economic, computer, journal, price, employment, bank, publication, tariff, climatic, democratic, earthquake, rainfall, cloudiness, coastal, industrialization, inequality, rich, deserts, internet, power-system, refrigerator, sunshine). Some of these terms can be used directly and some of them can be used with one another. Some of them have similar meanings can eliminated in further analysis. The ones like population (551), weather (74), climate (61), economy (100), price (170) are most likely useable ones according to the author.

Conclusions, Future Applications and Research
This study aims to find the variables for the electricity demand and power load forecasting models (e. g. fuzzy, grey, fuzzy inference). There is only one researcher in all parts of this study. Actually, the number of researchers has to be much higher. In following studies, the number of documents has to be increased in their specific folders. The text analysis activities and the grey based natural reasoning with WA approach have to be reviewed and improved in more detailed analysis. A simple additive weighting method (SAW) should be built for calculating the ranks of terms that represent the usage preference of terms as the variables. Moreover, a specific dictionary and encyclopedia in this particular subject have to be developed and presented on open access world wide web sites. It is believed that findings of these kinds of studies will help developing the Global Grid Prediction Systems (visit [47]).