Traditional Machine Learning Models for Building Energy Performance Prediction: A Comparative Research

: A large proportion of total energy consumption is caused by buildings. Accurately predicting the heating and cooling demand of a building is crucial in the initial design phase in order to determine the most efficient solution from various designs. In this paper, in order to explore the effectiveness of basic machine learning algorithms to solve this problem, different machine learning models were used to estimate the heating and cooling loads of buildings, utilising data on the energy efficiency of buildings. Notably, this paper also discusses the performance of deep neural network prediction models and concludes that among traditional machine learning algorithms, GradientBoostingRegressor achieves better predictions, with Heating prediction reaching 0.998553 and Cooling prediction Compared with our machine learning algorithm HB-Regressor, the prediction accuracy of HB-Regressor is higher, reaching 0.998672 and 0.995153 respectively, but the fitting speed is not as fast as the GradientBoostingRegressor algorithm.


Introduction
Predicting building energy consumption remains a challenging task because of the variety of factors that influence consumption, [1] such as the physical characteristics of the building, installed equipment, outdoor weather conditions and the energy use behaviour of the building's occupants [2,3].
Physical modelling approaches and data-driven approaches are the two main methods used for predicting the energy consumption of buildings.For thorough energy modelling and analysis, physical models (sometimes called engineering techniques or white box models) rely on thermodynamic principles.EnergyPlus, eQuest, and Ecotect are a few examples of building energy modelling software that use physical models.[4] Based on specific architectural and environmental data, such as building construction specifications, operating schedules, HVAC design information, and climate, sky, and solar/shadow information, these kinds of software compute the energy consumption of buildings.At the time of the simulation, the user might not have access to all of this specific information.Poor prediction performance may be the result of inaccurate inputs.[5] There are numerous prediction techniques currently in use, including machine learning, and numerous papers have been released.There is, however, a dearth of research on the examination of machine learning and deep neural network methods side by side.To close this gap, we conduct a comprehensive comparison of deep learning and conventional machine learning methods in this study utilising actual and freely accessible datasets.[4] The results and future research objectives are reviewed.The assessment of prior research on energy prediction models for various building type parameters is the main objective.
The essay is set up like follows: The training and test datasets, which have undergone pre-processing, are initially displayed.This is followed by a theoretical overview of conventional machine learning algorithms, metrics to evaluate the algorithms' strengths and limitations, a comparative assessment of the prediction outcomes, and a look ahead to

Data Collection and Processing
The dataset used for the study in this paper was collected and processed by Athanasios Tsanas, a professor at the University of Oxford, UK, and consists of 12 different building shapes for energy analysis.These buildings differ in terms of parameters such as glazing area, glazing area distribution and orientation.Various settings were modelled as a function of the above characteristics to obtain 768 building shapes.The dataset contains 768 samples and 8 features, and we consider wall area, roof area, and glazing area as key indicators that can affect the energy load efficiency of both (heating and cooling) [5].
Information on the variables: including relative density surface area, wall area, roof area, total height, orientation, glazing area, glazing area distribution (variance) thermal load cooling load.[7] These eight characteristic factors do not have the same magnitude of influence on the energy performance of the building and the amount of data on these characteristic parameters varies slightly, as we can see from the following diagram which shows the table headings of the data set and the form of data collection.Filling of missing data values is done using the mean method of filling, where the impact weights of the eight indicators are assumed to be equal, The figure below shows the characteristics of the 8 main indicators, which will help the reader understand the training data and the test data.[9]

Method
This chapter will use several types of machine learning algorithms that are currently in high use and have significant predictive efficiency, Decision Tree Algorithm, Random Forest Algorithm, Gradient Boosting Regression, all of which were significant at the time they were proposed and are based on data-driven predictive models.[10] In the field of building energy prediction, data collection involves collecting historical/available data for model training such as outdoor weather conditions and electricity consumption data.Data pre-processing can include data cleaning, data integration, data transformation and/or data reduction.Model training is the training of a model using a training dataset.Model testing is designed to evaluate the model using standard evaluation measures [8].
Decision trees have been a common approach to regression problems and the main focus of research in this area over the last few decades has been on the gradient boosting decision tree method (GBDT) based on the one proposed in.There are several open source packages that implement the GBDT algorithm (for both classification tasks and regression tasks).While the core ideas remain unchanged, these packages focus mainly on speed-up, parallelisation, large-scale dataset processing and robust training, whereas this paper uses a decision tree regression model (DTR) and a grid search (GridSearchCV) to find the best combination of hyperparameters, GridsearchCV is a tuning process that uses hyper parameterisation to determine the optimal values for a given model.GridsearchCV performs an exhaustive search for a specified set of parameters.This method is computationally expensive, but produces good results [11,12].
The decision tree consists of decision nodes, leaf nodes and the depth of the decision tree The decision tree is built based on information gain and the first step is to calculate the information entropy of the root node, which is calculated by the following formula: The information gain of the attributes is then computed, and the search for suitable attribute nodes is continued by repeating the previous approach.[11] The algorithm is called the random forest algorithm because it combines multiple decision trees, with each dataset being selected at random with a random set of features selected as input.The algorithm first assumes that Performance Prediction: A Comparative Research the training set T is of size N, the number of features is M, and the size of the random forest is K. [10] The size of the random forest is traversed K times: a new sub-training set D is sampled N times from the training set T with put-back sampling, and m features are randomly selected, where m < M. Using the new training set D and the m features, a complete decision tree is learned to obtain the random forest.[13,14]  Random forests reduce variance by averaging over many noisy but approximately unbiased trees, [15] thereby improving prediction accuracy.The variance of a random forest with a total number of trees (K) is (2) where σ 2 denotes the variance of individual trees, ρ denotes the correlation between trees, and M is the total number of trees in the set.Clearly, by increasing the total number of trees, M, the second term tends to zero.Thus, the variance of a random forest depends on three things [17,18]: The correlation ρ between any pair of trees: reducing the correlation reduces the total variance.This can be achieved by randomly selecting v from the p variables to split at each split node when growing trees on the bootstrap dataset.Reducing v reduces the correlation between trees and the strength of individual trees, and vice versa.Therefore, the optimal value of v needs to be found for a particular dataset.[16] Variance per tree σ 2 , or in other words, strength per tree: strengthening the performance of each tree reduces the total variance of the model.
Total number of trees M: The second term of the equation can be reduced by increasing M. [19] Therefore, we should train a sufficient number of trees to ensure that the second term of the equation becomes zero.
In general, random forests are based on the idea of bagging, but with the diversity of each tree forced by random feature selection.The theoretical background of Random Forests supports parallel computing and thus its training can be accelerated by parallel computing.The prediction performance of a random forest is influenced by three main factors: the correlation between individual trees, the performance of each tree and the total number of trees [20,21].
Gradient boosting regression (GBR) is a technique that learns from its mistakes, and unlike bagging, the boosting method sequentially generates the underlying model.By focusing on these difficult-to-estimate training cases, prediction accuracy is improved by developing multiple models in sequence.During the boosting process, examples that are difficult to estimate using the previous base model appear more frequently in the training data than examples that are correctly estimated.Each additional base model is designed to correct the errors made by its previous base model.[22] GBR proposes a modification to the gradient boosting method by using a fixed size regression tree as the base model.[23] Assume that the number of leaves per tree is J.Each tree divides the input space into J disjoint regions R 1m , R 2m ,..., R jm and predicts the constant value b jm for region R jm.The regression tree can be formally represented as Replacing $ in the generic gradient boosting method with a regression tree, the model update equation and gradient descent step Becoming: = argmin ∑ $ + , /-$ , % $ + ∑ ∈ " #0 A separate best ρ jm is used for each region R jm and the b jm can be discarded.The model update rule becomes: = argmin ∑ 1 2 ∈3 45 , /-$ , % $ + ∑ ∈ " #0 (9)

Experimental Assessment Indicators
In order to compare the performance between algorithms, metrics are essential to assess the strengths and weaknesses of the models.
The mean absolute error refers to the average of the distance between the model prediction and the true value of the sample [21].The formula for this is shown below: R² is the goodness of fit, which is the degree to which the regression line fits the observations.In statistics for line regression analysis of variables, when least squares is used for parameter estimation, R² is the ratio of the regression sum of squares to the total sum of squares of deviations [22], indicating the proportion of the total sum of squares of deviations that can be explained by the regression sum of squares, the larger this proportion the better.[23] The more accurate the model, the more significant the regression effect.r-squared is between 0 and 1, the closer to 1, the better the regression fit, and the better the fit is generally considered to be for models over 0.8.
One advantage of MAE over MSE is that MAE is less sensitive to predicting data outliers and is more inclusive: We generally use Accuracy and Error rate to evaluate the model as a whole, from a holistic perspective.[25] Error Rate > $ =data,$ 1 −data,$ #predict,$ −data,$ # (13)

Experiments and Results
The experimental environment was based on the Pytorch 11.0 framework, CUDA 11.3 and CUDNN 8.2, and the training model was based on an NVIDIA GeForce RTX 3060 (12 GB).For DecisionTreeRegressor prediction, the minimum sampling interval min_samples_split was initialized to 15, the maximum depth to 6, the maximum number of leaf ends to 31, and the minimum number of sampled leaves to 6.The optimal hyperparameters were determined by GridsearchCV with 'max_depth' of 6, 'max_leaf_ nodes' of 31, 'min_samples_leaf' of 5, and 'min_samples_split' of 17 [26].
R2 is an indispensable metric when measuring the fit of DecisionTreeRegressor, Random Forests and Gradient Boosting Regression.The DTR algorithm has an R-Squared of 0.991543907727448 for the training dataset and an R-Squared for the test data set is 0.973711991905928.The random forest algorithm initialises the hyperparameters 'n_estimators' at [350, 400, 450], 'max_features' at [1,2], 'max_depth' at [85, 90, 95], and the best hyperparameters obtained by the CV algorithm are 'max _depth' is 90, 'max_features' is 1, 'n_estimators' is 450, and it is worth noting that the R-Squared for the training dataset of the Random Forest algorithm is 0.991543907727448 The R-Squared for the test dataset is 0.973711991905928.Gradient Boosting Regression has one more hyperparameter subsample than the above two algorithms.In this experiment, we set the downsampling to 1.0 and the R-Squared of the training dataset is 0.998672666920205.
The R-Squared of the test dataset is 0.9914370717646062.From the data representation of R-Squared alone the prediction fit of Gradient Boosting Regression is the best, HB-Regressor has 6 hidden layers, of which the first layer has 180 neurons as input and the activation function is RULE, the RULE function can well eliminate the gradient disappearance problem: ReLU has a constant gradient of 0 for inputs greater than The gradient of ReLU is constant at 1 when the input is greater than 0, thus avoiding the training difficulties caused by the gradient disappearance problem, and the number of iterations set is 10000 [26].
The table below shows the prediction accuracy from the heating training set, the prediction accuracy from the heating test set, the prediction accuracy from the cooling training set, the prediction accuracy from the cooling test set and the R-Squared index.We can see that the prediction accuracy of HB-Regressor on the heating training set, and the prediction accuracy on the heating test set are slightly higher than the other three machine learning algorithms, but the prediction accuracy on the test set is 99.8672% slightly lower than the accuracy on the training set, and the fit of HB-Regressor is the most excellent, reaching 99.4384%, and the HB-Regressor is excellent overall, but the prediction accuracy on the cooling training dataset is slightly worse than that of the GBR algorithm.
The graph below shows the predicted versus true value curves for the cooling and heating data for HB-Regressor.[24] The predicted and true value curves are coloured differently and we can see that the predicted values are correct in most cases.As the forecasting model calculates the heating demand based on the input of the data time, the forecast range becomes a key parameter for the accuracy of theestimation using measured or forecast data.

Discussion
Based on the analysis of the case studies in this research and the methodologies used there, machine learning deep network prediction methods outperformed techniques like decision trees, random forests, etc. in terms of accuracy.This could be the outcome of the input selection procedure, which disregards the historical lags of the input variables.The use of such lags may be advantageous for prediction techniques, which frequently rely on recent and historical data.Real-time access to the measured data is necessary in this situation.Although predictive methods perform better, they have the drawback of requiring a lot of data to support them, much of which may not be readily accessible in real life.Prediction techniques that rely on the current time step value nonetheless perform well enough with such an input set of data [18].
All ML models seem to have acceptable target variable errors on the test data when focusing on prediction error.Given each application, the deep learning and updated tree-based models seem to be close to the low end of the error range.However, one of the best performing algorithms in this study was also discovered to be the HB-Regressor.The case study may not have explored the latency of the inputs and/or the potential length of the dataset, which may have better highlighted the advantages of the deep learning approach.As a result, the differences seen across the test dataset appear to be small.Since the 1990s, numerous studies have created various forecasting models for energy use in buildings.Physical approaches and data-driven methods are the two basic categories into which these techniques fall.The physical technique, which is considered the classic method used in the design phase for building energy evaluation, is primarily based on the use of energy modelling tools.Support vector machines (SVM), random forests (RF), and other machine learning (ML) techniques, which need fewer construction parameters, are frequently used in data-driven methodologies.Furthermore, because findings can be obtained quickly, data-driven solutions have been found to be more accurate and efficient.Decision trees (DT), artificial neural networks (ANN), and support vector machines (SVM), to mention a few, were some of the most efficient techniques used.These algorithms have all occasionally performed better than one another.It was done to compare various models, including the multilayer perceptron ANN and SVM.It was determined that ANNs perform better in predicting energy use than SVMs.Additionally, feature selection and hyperparameter tuning are significant factors that frequently have an impact on the performance of ML models.The model's findings are significantly impacted by this.

Future Work
Future research work could also explore the application of other types of machine learning algorithms in energy consumption prediction.For example, deep learning algorithms have been shown to outperform other machine learning algorithms in many other areas (e.g.image classification and multimodal data analysis), but have not been fully investigated in the area of building energy consumption prediction.
As new data-driven models are developed, sharing more information about the development process and purpose, validation and reusability of these models is essential to avoid unnecessary duplication of research efforts.Some important model information (for example, the purpose of the prediction) is sometimes not reported or not adequately described.Insufficient information provides limited guidance on the applicability of certain models to new contexts, which may inhibit model reusability [27].

Conclusion
It is obvious that a high performance ML model can be developed to forecast building energy use at the design stage given the high accuracy in this study.The ability to estimate building energy use has been shown through previous research.There isn't a recognised top ML model during the design phase, though.As they can provide energy performance results in just a few seconds, machine learning models are far more effective than conventional simulation techniques.This has been proven in numerous earlier investigations.However, no research has been done to create models for the building design phase by implementing feature selection methods and hyperparameter tuning a variety of algorithms.Accuracy and R Square measures were used to assess four base models.
In terms of predicting building energy performance, Gradient Boosting (GB) fared better than other conventional machine learning models.This study compared different machine learning models for predicting energy performance, and the results showed that integrated deep learning models outperformed other models that could support their liability to produce high-quality results because they were trained on a variety of algorithms that were theoretically similar to the theory.These results, however, go against some previous studies where the GBR algorithm fit produced superior prediction accuracy at lower levels.The findings demonstrated that the GBR algorithms lacked both performance and computational efficiency.However, this is explained by the idea that GBR algorithms perform better in small datasets whereas the deep learning algorithm HB performs better in large datasets.

Informed Consent Statement
Informed consent was obtained from all subjects involved in the study.

Figure 1 .
Figure 1.Six features, relative density surface area, wall area, roof area, total height, orientation.

Figure 2 .
Figure 2. Two features, glazing area, glazing area distribution, Glazing area is a key indicator of the energy load that can be influenced.The data is divided into training set, validation set and test set with k-fold cross-validation, and the ratio of training set, validation set and test set is 7:1:2.The method of dividing the data set is random sampling, The following figure shows the Correlation matrix of the training data.

Figure 3 .
Figure 3. Correlation matrix of the training data.

Figure 4 .
Figure 4. Correlation matrix of the training data.

Figure 5 .
Figure 5. Heating test and predicted data/cooling test and predicted data.

Table 1 .
Headings of the data set.