Autonomous Systems and Reliability Assessment: A Systematic Review

The advancement of technology has heralded novel computing devices and gadgets like self-driving cars, IoT devices, and autonomous systems. These advancements required high computational demand in achieving its goals. In matching the high computational demand of these new technologies, machine learning, parallelism, multicore processing and scaling are some of the approaches and techniques put in place. However, there is a pressure on the architectural development of recent computing devices as the traditional transistors seem to be fast outgrown. This article examines the reliability of autonomous systems using the PRISMA approach. Autonomous systems are systems that can fully operate and perform operations (computational or otherwise) with minimal human intervention. They are also capable of evaluating their performance. Thus, there is a need for a high degree of reliability. Several existing autonomous systems were reviewed and reliability issues of these systems were discussed. It was discovered that the reliability of a complex system is dependent on the reliability of underlying individual components and compromise of any of the underlying components of the autonomous system can affect the overall reliability of the entire system. The effort to enhance the reliability of these components will, in turn, improve the reliability of the entire system.


Introduction
The advancement in technological innovation and solutions have seen the birth of many devices and technologies such as the Internet of Things, Cloud computing, intelligent devices, autonomous systems, to mention a few. These systems were built based on different architectural design. Computer architecture as a field of computer science plays a vital role in the development of these technologies. Several decades ago, the push of technological advances was majorly driven by the available architecture hence, solutions developed were solely based and dependent on the architectures provided. However, the rapid and relentless advancement in the technological field is now pulling the architecture of recent solutions [1]. This calls for an architecture that will support the massive and fast computations required by these recent technologies. Frequency scaling, delay scaling, multicore processing, parallelism are some of the measures put in place so as to meet up with computational demands of recent technologies. Scaling will slow down as technology advances and device variability will significantly increase as a result of temperature and random dopant variation [2]. These variations will become extreme and devices will behave unpredictably which will make devices degrade over time. As more components are being loaded on the computing devices in the bid to meet up with computational demands, and further technology scales, hardware will face numerous sources of errors including process variations, high energy particle strikes, ageing, insufficient burn-in, thermally induced timing errors, and design which will make the design of reliable computing systems extremely difficult [2].
The advent of intelligent systems and artificial intelligence have given rise to the development of systems that are autonomous. They can fully operate and perform operations (computational or otherwise) with minimal human intervention. However, a system that requires minimal human intervention and can evaluate its performance to determine its failure or success need a high degree of reliability. The action of such systems should be predictable and goals achievable. The need to develop an architecture that cuts across the software and hardware components of autonomous systems that will support the independent actions of such systems to ensure that desired results are gotten within a specified period calls for the architectural reliability assessment of such systems. This paper focuses on the reliability assessment of autonomous systems. Section 2 discusses autonomous systems, system reliability and why its reliability is important, section 3 discusses system reliability, section 4 discusses the methodology used for the systematic review, section 5 discussed the reliability assessment of some autonomous systems.

Autonomous Systems
In proving user satisfaction and ensuring seamless and convenient computation, technologies such as Machine Learning, data analytics and artificial intelligence have played a major role. They have brought about the advent of computational devices that aids the human way of life. It is used in virtually every field and spheres of human endeavour be it education, health, finances, agriculture and communication. These technologies are fast-growing and we are seeing the fruition of some devices that will take actions with limited or no human intervention. It is common to see terms like robots, artificial intelligence, internet of things and self-driving cars. These technologies can accept input from their environment, plan specific actions, execute the actions and measure the level of satisfaction derived from the results of their actions. A system that can do this is known as an autonomous system [3]. For a system to be considered truly autonomous, the system must be able to gather information, find a solution based on this information, and execute an action to achieve a goal. In an autonomous system, hardware and software work together to solve a problem by acting. For a system that requires little human intervention, the reliability of such a system is paramount to maximize the output of such systems [4]. Intelligence, robustness and reliability are attributes of a truly autonomous system. Autonomous systems interact with their environment and thus requires a comprehensive overview. In doing so, prior knowledge (i.e. programmed knowledge in the knowledge base of the system), environment module (i.e. information the system has about the current environment and entities) and the real world (which serves as the environment the system is being deployed) are important in ensuring efficiency and reliability of the system [5]. These modules interact as shown in Figure 1. This allows autonomous systems to learn and acquire new information about their environment and thus make better decisions and actions.

System Reliability
Regardless of the computational power of a device or system, end users are majorly concerned with the performance and reliability of a system. Higher performance and low energy consumption play a vital role in the driving product maps also the ability for the device to perform computations correctly and continue doing so over the expected life span on the device is also a key factor in user satisfaction. The probability for a system, including all hardware, firmware, and software, to satisfactorily perform the task for which it was designed or intended within a specified time and in a specified environment is regarded as system reliability [6,7]. In measuring or determining reliability, different metrics such as time of failure, failure function, reliability function, failure rate and meantime of failure are used [8]. A system with a high failure in time is said to be less reliable than a system with a lower failure in time. The more a system fails to achieve its specified goal within the specified time, the less reliable the system is. Everyday systems are now being pervaded by reliability concerns as well as other constraints such as cost, performance, and power consumption. Traditional solutions tend to focus on power, cost, performance, and area concerns which have led to hardware reliability solutions being expensive due to extreme conservations [9]. Reliability solutions also provided by software developers are also eschewed due to high-performance overheads. This has made providing reliability be at odds with meeting performance and energy targets because reliability is a key consideration in the development of computational devices. Emerging technologies, such as non-volatile memory (NVM) and die-stacked memory which are being used for meeting the reliability demands of computing devices, have significant reliability challenges of their own. For example, most of the NVM technologies repeatedly writes to the memory cells and this repeated write can cause permanent faults in those cells thereby affecting the reliability of those cells. Some other industry-standard techniques are also used to ensure the reliability of systems. These techniques are implemented at various stages of the design and may involve additional steps by a third party (could even be the end-users) [10]. Process nodes can be used to optimize the resilience of transistors against faults while circuit guard technique can be used to set the operating margin to reduce certain types of failures. Radiation hardening is also another technique used to make storage elements less vulnerable to soft errors. Parity checking and Error Correcting Code (ECC) are used to detect and correct errors at the microarchitectural level [11]. These approaches are just some of the ways of ensuring system reliability.

Methodology
The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) was used. Using this approach, four phases were involved in the systematic review. The first phase was the identification phase. At this stage, online databases were consulted for relevant materials. Google scholar was the primary database consulted and the search strings used included 'System reliability', 'reliability in autonomous systems', 'autonomous systems and reliability', 'concepts of reliability', 'reliability perspective of autonomous systems'. Other synonyms were used to gather more data. From the gathered data, the second phase which is the elimination/screening stage was carried out. The elimination/screening stage was where duplicate materials were figured out and excluded, materials that could not necessarily have a meaningful impact on the research was also eliminated. The title and abstracts were used to carry out this exercise. At the third stage which is the eligibility phase, materials that had not been excluded were thoroughly and critically analyzed for information extraction. The materials that had a reliability assessment method were used in the review and assessment of reliability in autonomous systems. This final exercise (which is the inclusion phase) concludes the PRISMA approach.

Reliability Assessment in Some Existing Autonomous Systems
Reliability evaluation of an autonomous smart grid system was done by [12]. It was noted that system reliability has always been a major focus area for the design and operation of modern autonomous grids. The system was designed to have intelligent functionalities which include better situational awareness and operator assistance, actions to increase resilience against component failures and natural disasters, minimize power outage, maximizing asset utilization, and integration of renewable sources (solar and wind) to ensure consistency and uninterrupted power supply. Despite efforts put in place to make the system autonomous and ensure higher reliability, the mix of these resources consumes a higher computation power and thus net demand will drastically increase which will eventually accentuate reliability challenge even further [12]. Factors such as larger transfer of energy over a long distance, ageing infrastructure, grid congestion, environmental sustainability concerns, more complex problems with shorter decision times and smaller error margins are some of the factors that will make the development of a reliable autonomous system challenging. It was proposed that intelligent multiagent frameworks should be developed for autonomous systems, adaptive and proactive adjustment and protection and control settings should be utilized.
How wind energy can be used to improve the reliability of autonomous power systems was discussed in [13]. Since wind is a resource that is readily available regardless of the time of day, the restriction of wind penetration in an autonomous power system should be avoided. This will make power outages in autonomous power systems limited [13].
A probabilistic model for battery storage to effectively facilitate the reliability of renewable energy-based autonomous power systems was proposed by [14]. It was noted that energy storage has emerged the most valuable option in ensuring and maintaining the reliability of this type of systems. The model proposed to consist of multiple states of battery State of Charge (SOC) and the probability of each state is associated with the state. The dependability of autonomous energy power system on environmental condition poses a threat to the reliability of the system reason being the variation of the environment and climatic change. The reliability of this system can be enhanced by integrating two resources in proper combination (e.g. solar and wind) in such a way that the weakness of one can be compensated with the strength of the other [14]. In so doing, there is a need to determine the battery level of these systems to know what system to utilize at a particular time. The probabilistic model uses SOC to determine which system to utilize. In as much this approach is to improve the reliability of the system, it could result in redundancy as one component tends to be idle while the other functions.
A statistical approach was used to evaluate the reliability of an autonomous vehicle [15]. It was discovered even though the number of crashes was significantly reduced as compared to that of human-controlled vehicles, injuries and fatalities are metrics that need to be used when evaluating the reliability of the autonomous vehicle. The statistical result shows that autonomous vehicles will have to be driven thousands of millions of miles and even thousands of billions of miles to assess the reliability in terms of fatalities and injuries [15]. This report shows that statistically, I may be impossible to evaluate the reliability of autonomous vehicles.
The reliability of autonomous self-driving cars was also reviewed by [16]. Cutting-edge navigation, localization, object recognition, state estimation and control technologies have propelled the reliability of autonomous driving cars. it was recorded that existing self-driving cars can have between 95% -100% accuracy in detecting road lanes, stop signs, pedestrians, and any obstacle. The reliability of the autonomous vehicle is affected by different interacting factors such as hardware fault tolerance, software architecture, machine learning, perception in a dynamic environment, the interaction between human and vehicles and so on [16]. Despite the reliability and measures put in place, the uncertainty of manufacturing tolerance of parts, inaccurate measurement or imprecise perception about the environment, reliability issues associated with machine learning especially when the input object is outside data range, limited test scenarios and procedures due to diverse road conditions are factors that are not catered for which can affect the reliability of the system.
Reliable automated systems (robots) were designed to lessen and prevent the results of hazardous events. In an autonomous system that has multiple components, failure in a component can lead to the breakdown of the entire system. it is assumed that these different components have a different failure rate. The reliability analysis of individual components of a system with a large number of components will prove challenging. It is to this end that [17] developed a modified branching process for reliability analysis of complex multiple robot systems. In their work, the probability for system failure was calculated based on the breakdown of significant parts of the robots [17].
Monte Carlo simulation to perform a reliability assessment of a remote hybrid renewable energy system [18]. It was noted that system reliability of a hybrid energy system depends on various factors such as the wind speed, solar radiation, load demand, wind turbine generator Force Outage Rate (FOR) and hardware status of photovoltaic panels. Regardless of the acceptance of the hybrid energy system, an outage of any of the structure component will result in a loss of load in the system, thus the reliability of each component has to be well-thought-out while evaluating the performance of the hybrid system. [18].
The reliability of a semi-autonomous vehicle was accessed and enhanced by [19]. They used risk detection to evaluate the reliability of Semi-autonomous vehicles. The flaw in the existing system was observed and a solution was proposed by using a facial monitor and biometric features. Semi-Autonomous vehicles are seen as vehicles that can easily switch from human-controlled to fully autonomous system. In improving reliability of this type of system, they used facial monitoring system and biometric features to monitor health condition of drivers and if there is a distress/a flag, maybe dizziness, drowsiness or fatigue, the car automatically switches to autonomous mode and send alerts/ observed signs to health care provider of the driver (Naga, Rupesh, Karthikeya & Ravi, 2018). This way the safety of the car and user is intact and the reliability is improved. However, when a component of the system is down it becomes difficult for the system to perform that which it's to do. Hence the reliability is at risk. A risk detection approach for components of the system is needed to improve the reliability of the entire system.
In line with this, a failure prevention and correction scheme for autonomous underwater vehicles was proposed by [20]. They believed that the reliability of a system is due to the failure prevention and correction procedures i.e. risk mitigation procedure put in place by the system developers. They argue that understanding and eliminating failure modes are key to increasing the reliability of the system. However, there is a difference between failure and faults. Failure is seen as the inability of an item, component or system to perform a required function. While fault is seen as a component fault or human error [20]. In their study, they proposed a scheme that will reduce the likelihood of failure occurrence by using probabilistic models to predict failure occurrence. These occurrences are to be updated in the risk profile of the system.

Testing the Reliability of an Autonomous System
Despite the efforts made by researchers to assess the reliability of the autonomous systems, it is worthy of note that the reliability of such systems is heavily dependent on the reliability of underlying components that make up the system. Reliability is measured as the probability of a failure not occurring. i.e. Reliability = (1-probability of failure) It is the probability that the system will perform that which is required within the specified period. Components in the system could be arranged in series or in parallel. Series arrangement is carried out when the reliability of components arranged in that form all components are to perform maximally and a failure of one of these components can affect the reliability of the system. The AND logic gate is used to connect these components. On the other hand, components arranged in parallel indicates that failure on one of the components does not necessarily affect the reliability of the system. The OR logic gate is used to connect these components together.
If components are in series, the system performs optimally provided all the components are fully functional. The formula for this is shown in Equation 2. The reliability of the system (R) is dependent on the reliability of all the system components (r 1 , r 2 , r 3 ,…, r n ). it is also worthy of note that the more components in a series arrangement, the higher the chance of reduced reliability.
If components are arranged in parallel, the system performs optimally and is reliable if any one of the components is fully operational. This arrangement is shown in Figure 3. The reliability of the system is dependent on the reliability of any of the components. It is also worthy of note that the more the number of components arranged in parallel, the higher the chance of increased reliability. The reliability decreases if the number of the component decreases.

Conclusion and Recommendation
Autonomous systems are systems designed to capture inputs via sensor devices or receptors, analyze the input data, plan steps towards action, execute the action and evaluate the satisfaction level of the outcome. Intelligence and reliability are major factors that define and autonomous system. These are systems that contain different components that work together to achieve a common goal. From reviewed literature, it had been made clear that the compromise of any of the underlying components of the autonomous system can affect the overall reliability of the entire system. Thus, proactive measure for component's fault detection is proposed. A probabilistic model that examines state of component and determine the possibility of a component failure will help reduce component failure which will enhance the reliability of the system.