Application and Analysis of Random Forest Algorithm for Estimating Lawn Grass Lengths in Robotic Lawn Mower

This paper states an estimation method for lawn grass lengths or ground conditions based on random forest algorithm from the observation data obtained by fusion of sensors. This estimation relates to Digital Twin and Virtual Twin of Hybrid Twin approach for the autonomous driving of robotic lawn mowers. The robotic lawn mowers are becoming popular with the advent of efficient sensors and embedded systems and we are now developing a practical autonomous driving and its group control algorithm for large lawn grass areas. However, one of the important functions of robotic lawn mower, that is, the length of lawn grasses or such ground conditions as dirt, gravel, or concrete, etc., are not recognized precisely with the current robotic lawn mower. As a result, the motor for cutting lawn grasses is running with constant rotation speed from the beginning to the end of operation of robotic lawn mower. This leads to the waste of battery and gives a large drawback for the control of robotic lawn mower. In order to precisely control the rotation speed of motor and save the battery, the lawn grass lengths and ground conditions are estimated by using the effective sensor data. The application of random forest algorithm to the fusion of sensors on a commercial robotic lawn mower attained more than 90% correct estimation ratio in several experiments on actual lawn grass areas. Now, the suggested algorithm and the fusion of sensors are evaluated against wide range of lawn and grounds.


Introduction
Recently, the robotic lawn mowers (robo-mowers) are becoming popular with the advent of efficient sensors and embedded systems. However, the length of lawn grasses or such ground conditions as dirt, gravel, or concrete, etc., are not recognized. As a result, the motor for cutting lawn grasses is running with constant rotation speed from the beginning to the end of operation of robo-mower. In order to precisely control the rotation speed of motor, the lawn grass lengths and ground conditions are estimated by using the effective sensor data. The authors are now promoting the research and development on autonomous driving and group control for work vehicles [1][2][3].
The Hybrid Twin TM approach [4] is efficient for controlling the object in real time. The suggested estimation method is useful for controlling the operation of motor for lawn grass cutter. When the robot is running on the area with long lawn grasses, the motor should be running with the specified maximum rotation speed. On the other hand, the rotation speed should be reduced or stopped when the robo-mower is running on the area with short lawn grasses or on the area without any lawn grasses, respectively. Then, the battery consumption will be reduced. Moreover, such a control of travelling speed as to reduce or to increase according to the length of lawn grasses is realized, the working time will be greatly reduced. Furthermore, a wire must be used for defining the working area because a robo-mower cannot recognize the ground conditions. Therefore, If the ground without lawn grasses is properly recognized, this wire installing is not necessary and, as a result, the needed maintenance will be reduced.
These requirements are efficiently implemented with the Hybrid Twin approach. The Hybrid Twin is an extension of Digital Twin [5] and it consists of Digital Twin and Virtual Twin as shown in Figure 1. The Virtual Twin has a fast 1-D simulator modelling the real system physically, and the Digital Twin is a numerical model of real system. The Hybrid Twin is a kind of feedback controlling system between real and virtual spaces.
In the application of Hybrid Twin, firstly, the set of numerical data s t obtained by sensors modelling the real space at time t is an input to Digital Twin. Secondly, the set of parameters obtained by Digital Twin is given to Virtual Twin and here, the set of parameters determining the behaviors in the next time t ∆t is given to the real system. By repeating this loop, the dynamic behaviors of the target real system can be precisely controlled. The Hybrid Twin approach are used in such mission critical controls as aerospace and nuclear power industrial fields, etc.
In the control of robo-mower, several sensors attached to its body can obtain a set of data and the suggested algorithm estimates the lengths of lawn grasses and ground conditions in Virtual Twin as a 1-D simulator. Then, the parameter controlling rotation speed of motor cutting lawn grasses is given the motor of robo-mower. This paper describes the necessary sensor data which should be used as a numerical model of robo-mower and the algorithm estimating the lengths of lawn grasses and ground conditions. Therefore, the novelty of this paper is that the random forest algorithm is applied to the estimation system mentioned above and a sensor fusion, that is, a combination of necessary sensor data is suggested. Moreover, the research is based on the implementation of Hybrid Twin for autonomous driving of robo-mower as well as same type of work vehicles.
The sensor fusion and the utilization of obtained big data have attracted many researchers' concern. Recently, there are detailed surveys on the combination of sensor fusion and big data analysis [6,7]. Some applications to actual problems are also reported [8][9][10]. The popular approach for the big data analysis is the utilization of machine learning. G. Takami et al. [8] take the problem of observation of plant status. They used three kinds of sensors and a deep learning algorithm for the big data analysis. The details of deep learning algorithm are not described, and the processing time of observation system is not known. However, it may be useful that they suggested the expectation of deterioration analysis of sensors by their combination. S. Alonso et al. [9] also adopted the same approach for observing a screw compressor in a chiller. They used five kinds of sensor data and 1D convolutional neural network (CNN) for their analysis. The adoption of 1D CNN makes the monitoring fast and the real-time processing is realized. Their approach is likely to be suitable for the data without any estimated features, however, in our case, it is known that some features may be efficient for the estimation in advance. C. Li et al. [10] deals with the diagnosis of rotating machinery. They used the vibration sensor signals and the Gaussian-Bernoulli deep Boltzmann Machine was used for their analysis. The accuracy of fault estimation was evaluated; however, its real-time processing requirement was not mentioned. Therefore, this approach can't be applied to the problem dealt with in this paper.  In the followings, the robo-mower used in this paper is described in chapter 2 and of course, the discussions are not limited to this robo-mower. Chapter 3 is the description on the suggested algorithm based on random forest which is used as a 1-D simulator in Virtual Twin. In chapter 4, the experimental results based on the big data obtained by the fusion of sensors and a set of features for classifying the obtained sensor data. Moreover, the set of necessary sensors and the performance evaluations of suggested algorithm are stated. Finally, in chapter 5, the obtained results are summarized.

Robotic Lawn Mower
An example of commercial robo-mower [11] is shown in Figure 2. This robot is used for the experiments by attaching some sensors, single-board computer, personal computer and some peripheral devices. All these devices are under the management of ROS (Robot Operating System) running on the personal computer. The robo-mower can be autonomously driven; however, it is controlled by a Bluetooth controlling device in the following experiments for increasing the accuracy of experiment. A camera can be a candidate as a sensor but its cost including image processing software and hardware is inadequate for a commercial product. In the experiments, it is shown that no camera is needed for the required estimation.

Machine Learning Algorithm as 1-D Simulator
As a 1-D simulator included in Virtual Twin to estimate lawn grass lengths or ground conditions, a machine learning algorithm is adopted. The random forest algorithm is used because of its high performance and short processing speed.

Random Forest Algorithm
The random forest algorithm, which is one of machine learning algorithms, originates from Breiman [12], and recently, its deep version is also suggested [13]. This algorithm is used for classification, regression or clustering, etc., and is a kind of ensemble algorithm using a set of decision trees as weak learners in order to avoid the over-fitting and to keep the generalization. This is fast and attains comparatively high performances. According to the paper [13], the deep random forest algorithm attains better results in some applications, however, the performances are almost the same in other applications. In the followings, an original random forest algorithm is adopted.
The random forest algorithm consists of given number of binary decision trees. The training and the inference phases are shown in Figures 3 (a) and (b), respectively. In the configuration of binary decision trees, a set of training data sampled from the input data is given to each of the binary decision trees. Then, the binary decision tree is constructed in the way shown in Figure 4. The data consist of the followings: 1,2, ⋯ , : input data for classification, regression or clustering, etc., 1,2, ⋯ , : features for classifying input data .

Configuration of Binary Decision Tree
An example of binary decision tree is shown in Figure 4. In the root node, the input data is divided into two subsets by using the conditions, and . If the data satisfy the condition , the data is classified to the class as shown in this figure. When all data are classified to the corresponding class (that is, a leaf), the binary decision tree is completed. Here, for example, CART (Classification And Regression Tree) algorithm is used and the objective function may be Gini's diversity index [13]. All parameters in binary decision trees are utilized in the consecutive classification phase.

Classification of Data
In the classification of data, for example, the Bagging method in the ensemble algorithm is applied. Here, the data which should be classified are given to all binary decision trees and then, the decision by each binary decision tree is obtained. The final decision, that is, the class the data belongs to, is determined based on the rule of majority. This process is fast if the binary decision trees are executed in parallel and the quality of decision would be better than by using just one binary decision tree. The processing time for categorizing the field is very much important as a 1-D simulator in the Virtual Twin.

Experimental Results
This section describes the experimental results when the random forest algorithm is applied to the actual lawn grass length estimation.

Sensors
The sensors attached to the robo-mower shown in Figure 2 are listed in Table 1. The robo-mower has been equipped with the built-in sensors. The 9-axis Inertia Measurement Units (IMU), MPU-9250 [14], are attached to the inside and to the surface of body for measuring the acceleration and angular acceleration. Six built-in sensors are available for measuring the corresponding parameters as shown in Table 1. Here, the noise of sensor is negligibly small and outliers are excluded.
The objective of experiments is to determine what combination of sensors would be effective for obtaining the relationship between sensors attached to the robo-mower and the estimation of the lawn grass lengths or ground conditions.

Measurement Data
The data measured by sensors are obtained by actually driving the robo-mower on the field with long lawn grasses, with short lawn grasses and without lawn grasses. The actual remote-controlled driving of robo-mower is shown in Figure  5. The remote-controlling function by a Bluetooth communication is incorporated to robo-mower by mounting a mini-PC and by executing ROS on it. The mini-PC can also handle the collected sensor data. The collected data are manually categorized into the data set for long or for short lawn grasses according the specified height of grass cutter. If the length of lawn grasses is higher than that of grass cutter, the lawn grasses set to be long and otherwise, the lawn grasses are set to be short. When the heights of lawn grasses and the grass cutter are equal, a human operator determines the lawn grass is long or short according the operation sound of grass cutter. The measurement data are collected so as to the total time be 2.3 hours. All data are collected on flat lands on sunny days.

Features for Classifying Data
Such statistical features of input data,  Table 2. Actually, the total length of data is less than 2.3 hours Grass Lengths in Robotic Lawn Mower because of some issues with measurement devices, the obtained data are used.
In the experiments, a subset of time frames obtained from each field data is used for configuring the binary decision trees and the completed forest is applied to the remaining test data. Then, the performances of classification are evaluated. Among these data, the number of time frames are randomly selected as configuring the binary decision trees (training) in each group. The remaining time frames are used as test data for random forest's performances (testing). These are shown in Table 3.

Evaluation Criteria
Each of the time frame data has its own label, that is, long lawn grasses, short lawn grasses and not lawn grasses, and the prediction can be verified. The prediction process consists of two stages. The first stage is used to estimate whether the area is with lawn grasses or not. In the second stage, the area is further estimated whether it is with long lawn grasses or short lawn grasses when the area is estimated to have lawn grasses. In the testing, four kinds of evaluation criteria are used. These are defined below [15]. (1) Accuracy

Evaluation Results
Seven combinations of selected sensor data are shown in Table 5. These combinations cover all actual cases. By using the data from Cases 1 to 7, the best combination of sensor data is determined based on the above-mentioned evaluation criteria.
The procedure of experiment is as follows. Select the sensor data corresponding to the cases shown in Table 5 collected in three ground conditions, that is, "Long Lawn Grasses", "Short Lawn Grasses", and "Not Lawn Grasses".
Determine the subset of sensor data (1) and partition it for configuring the binary decision trees and for testing the random forest according to the number of time frames shown in Table 3.
Configure the binary decision trees. Evaluate the performances of random forest based on the evaluation criteria.
The number of binary decision trees, that is, the size of forest is set to 1,000. Each binary decision tree is configured by using the seven features mentioned in section 4.3 until each leaf coincides to one of three ground conditions. An example of binary decision tree is shown in Figure 6. Here, the feature, median value, obtained from built-in vertical angle sensor data with its threshold, 370.75, is used for classifying the input data on the root node. The class "TreeBagger" in Statics and Machine Learning Toolbox in MATLAB [16] is used as an implementation of random forest algorithm. The processing time for configuring 1,000 binary decision trees is less than ten minutes on a PC with the standard performances. The completed forest is applied to the testing data whose size is around 700 in each of three ground conditions shown in Table  3. The processing time needed for prediction is almost negligibly small and this is no issue in actual Hybrid Twin approach. The performances are shown in Table 6. Seven cases are evaluated with respect to the measurement criteria in each ground condition. The most important performance is the accuracy and it becomes high when the built-in sensor data are used. Particularly, Cases 6 and 7 without using built-in horizontal or vertical angle sensor have higher accuracy. It seems reasonable that the battery status and motor rotation conditions contribute to higher performances because the rotation of motor would be high when it encounters the long lawn grasses. On the other hand, the load to both of grass cutting motor and travelling motor would be reduced when the robo-mower is travelling on the ground without lawn grasses. From the evaluation results, Case 6 would be desirable among seven cases. The reason is that: The accuracy is very high, the difference is only 0.1 points from the maximum 92.28%, The recall ratio of "Short Lawn Grasses", 87.08%, is highest.
Especially, the low recall ratio of "Short Lawn Grasses" means that the probability of incorrectly recognizing the short lawn grasses as long lawn grasses or the ground other than lawn grasses becomes high. Then, the travelling speed of robo-mower is reduced and the rotation of grass cutting motor is increased. This would increase the working time and waste the electric power. Moreover, this is much more serious, the robo-mower will not travel the area and will not cut lawn grasses when the short lawn grass area is recognized as the ground without lawn grasses. Therefore, it would be concluded that Case 6 is the best in this evaluation results. In Case 6, such sensor data as the acceleration and the angular acceleration values obtained by 9-axis IMU attached inside of body, the voltage, current and power of battery, and the rotations of grass cutting motor and of travelling motor obtained by built-in sensors are utilized.

Rotation of Grass Cutting Motor
Horizontal / Vertical Angles √ √ √ Figure 6. Example of binary decision tree in random forest algorithm.

Feature Importance
In section 4.5, the efficient combination of sensor data is discussed and determined. As another discussion, it should be evaluated how much each feature contributes to classify the sensor data. This analysis is called the "feature importance" in the decision tree. When there are some features, not all features but only such features with large importance may be used to classify the dataset. As a result, the processing time and the memory usage can be reduced, and a random forest algorithm can be implemented on, for example, a single-board microcomputer.  The importance of features is obtained and shown in Table 7 in the first estimation stage in Case 6. This stage estimates whether the area is with lawn grasses or not. In Case 6, seven kinds of sensor data are used as shown in Table 5. Also, as shown in section 4.3, there are seven features and, currently, 49 features obtained from seven data set are used in the classification. By applying the standard algorithm [16] for calculating the feature importance, the result is shown in Table 7. Here, such features as "Kurtosis of y-Directional Angular Acceleration of 9-Axis IMU attached Inside of Body" and "Minimum of y-Directional Angular Acceleration of 9-Axis IMU attached on Surface of Body" have importance values more than 1. In Figure 7, the relationship between "Kurtosis of y-Directional Angular Acceleration of 9-Axis IMU attached Inside of Body" versus the area with or without lawn grasses is shown. Actually, the values of the kurtoses between the area with lawn grasses and the area without lawn grasses are almost different. However, the difference is not large enough to separate the areas with or without lawn grasses and it is necessary to use the features with second or third importance in Table 7.  The second feature, "Minimum of y-Directional Angular Acceleration of 9-Axis IMU attached on Surface of Body," gives the same kind of relationship shown in Figure 8. From these figures, it can be shown that the higher accuracy of estimating the area is not attained by only using these two features. Moreover, the importance of features in the second estimation is shown in Table 8. This stage estimates whether the area is with long lawn grasses or with short lawn grasses. Here, the feature, "Mean of Currents of Built-in Battery" has the largest importance. Therefore, it is likely that the larger amount of current is needed when a long lawn grasses should be cut. The corresponding relationship is shown in Figure 9. The values of feature are largely different between the areas with long and short lawn grasses. In this paper, high accuracy of classification is necessary and 49 features are used, but by calculating the importance for each feature, such features, less than 49 features, as necessary and sufficient may be obtained according to the required accuracy.

Conclusions
The issue to recognize the ground conditions, that is, with long lawn grasses, short lawn grasses, or without lawn grasses by analyzing the data obtained from sensors attached to the robo-mower is dealt with. Ten kinds of sensor data are obtained, and they are analyzed by applying a large-scale random forest algorithm with 1,000 binary decision trees and seven kinds of features. In the experiments by actually driving a robo-mower, the combination of such sensor data as acceleration and angular acceleration data from 9-axis inertia measurement unit attached inside of body, the voltage, current and power of battery, rotation of grass cutting motor and rotation of travelling motor obtained from built-in sensors attains the best performance from the practical point of view. Actually, in testing, the accuracy against 2,010 sensor data is 92.18% and the processing times are five to ten minutes in training (configuration of 1,000 binary decision trees) and negligibly small in testing, respectively. This shows the feasibility of suggested approach. Finally, the feature importance in the decision tree is discussed for evaluating the contribution of each feature. This analysis is useful for obtaining the necessary and sufficient features. As future work, the variation of ground conditions, that is, the kind of lawn grasses and grounds without lawn grasses should be increased in experiments. Moreover, the implementation of suggested system should be discussed by using single-board microprocessors and efficient communication methods between robo-mower and a central controller for realizing the Hybrid Twin system.