Validating Clocking Subsystem in Post Silicon Environment

With tremendous growth of automotive and consumer market, demand of semiconductors is also growing. Every new day comes up with a new micro-controller with upgraded feature set. As the feature set is increasing, so is the complexity of the devices. This increased complexity majorly impacts the clocking and power sub system of a micro controller. In this paper, we will talk about clocking sub-system that is also known as HEART of any micro controller. To have a healthy heart of a micro controller, there should be robust testing of micro controller under various conditions. In a multiple clocking domain architecture, there are major issues of SoC getting stuck or wrong clock output. Sometimes, clock can get glitchy due to extreme weather conditions as well. It can also malfunction due to wrong configurations or a marginal configuration. So, to rule out all this kind of issues, randomization, sweeps, testing under different process, voltage and thermal conditions plays an important role. Though it is never possible to cover all the combinations during bench validation of these complex SoC, but in this paper, we have tried to capture some type of tests that can be performed to test the robustness of a micro controller.


Introduction
As automotive industry is growing tremendously, so is the semiconductor industry growing. Every new day comes up with a micro controller [1,2] with new and upgraded features. As these micro controllers get plugged in to the automotive that travels across the world even in the extreme environment conditions, so these should be tested with the same conditions as well to make sure that there are no surprises with these controllers in these extreme conditions. One of the vital components whose misbehavior can lead the malfunctioning of complete system is clock signal. Clocking subsystem is also known as HEART of any micro controller. Sometimes clock signal may get glitchy due to weather conditions, that may further lead to many problems in the system. So to make sure that a microcontroller will get smooth clock no matter whatever the weather conditions are, it is very important to have stress testing of the clock signal.
Post-silicon debug is turning into a noteworthy issue for Integrated Circuit (IC) designers. Recommendations are emerging to help lessen the debugging time of ICs. Modern ICs, however, are profoundly and immensely colossal and intricate, and they present several critical challenges to an efficacious debug solution. In particular, multi clock and power domain problems, pose major challenges to existing solutions. Many of the most complex and difficult to find bugs in ICs are the consequence of interactions between different blocks and subsystems, which are often in different clock domains.
ICs these days have multiple platform clocks whose range may vary from 2 to 20 clocks or may be even more. Hence covering them becomes important part of the validation. But how do we plan to cover this? Do we plan to cover each and every scenario, validate each and everything at all frequencies with different dividers? Lot of work!!! So, in this paper, we will talk about few of the type of tests that can be performed in post-silicon validation to validate clocking sub system with the maximum coverage without making the use of expensive equipment. Though it is never possible to cover all the combinations during bench validation, but we have tried to capture some tests that give a confidence to test the robustness of a micro controller.

Asynchronous Events
Asynchronous events are the events that are not controller by a software or timing controlled. These can occur anytime and can interrupt the ongoing functionality of a micro controller. These complex microcontrollers support various asynchronous events [3] such as resets, non-maskable interrupts [4], internal timers, external timers, transition from one power mode [5,6] to other and many more. While validation clocking IPs, asynchronous events play a vital role as these events can help us to find the corner cases that may not be possible in simulations or emulation platform. We can trigger asynchronous events between different clock transition processes which involve hardware transactions. This will help us to validate the loss of clock scenarios [7], clock stuck scenarios, clock divider stuck scenarios and phase divider stuck scenarios. Instead of just triggering these events at a particular time frame, we can also sweep these asynchronous events throughout the window of any operation. Granularity of these sweeps can be adjusted according to the available equipment. Smaller the granularity, more are the chances to find corner scenarios. Asynchronous events can be generated in different power cycles or in the same power cycle, and the time window for generation of these events can be increased or decreased. Since this involves sweeping at much lower granularity -in nano seconds (ns) or in pico seconds (ps) -this helps us to gain in-depth design knowledge and go to such nano scales which was not possible earlier. This sweeping can be tried during different clock switching scenarios, while configuring clocks, while switching clocks ON/OFF etc. Figure 1. Explains the asynchronous event sweeping.

Clock Randomization
With the increasing number of clocks, the coverage of One-Dimensional Configuration space increases. One-Dimensional coverage refers to the number of configurations possible while switching ON/OFF various clocks with different clock settings in a single power mode. As the number of power modes (low power modes or fully functional modes) are also increasing day by day, so onedimensional coverage is not sufficient to rule out the corner scenarios, so we need to have two-dimensional coverage for the same. Two-dimensional coverage matrix includes the mode transitions as well (from low power modes to fully functional modes and vice-versa). For example: Switching from full power mode with all clock sources ON and PLL [8] as system clock to a power down mode with all clocks OFF. While drafting this two-dimensional coverage, we need to take care of some of the constraints that may arise from some architectural limitations, safety considerations or general logical dependencies of one clock on another. Missing these constraints may end up in adding the illegal configurations as well resulting in adding up unnecessary validation time. So, considering all the constraints we can randomize the system clock selection and check the working of our SoC for different conditions.
Randomizing clock is very important as some of the features might work as expected at fast clock but might not work at slow frequencies and at inaccurate clocks (such as FIRC). Also, randomization is important to check that micro controller if functional after reconfiguration with some different value and there are no marginal issues. All the random configurations are working and there is no sequential dependency of configurations.

Clock Monitor Units
With headway in innovation, expanding intricacy of configuration, scaling of innovation and introduction of multi core architecture, there has been an increase in demand of different power modes support (especially low power mode support), resulting in multiple clock and power domains. Since this increases the level of complexity in the design, there is some probability of introduction of clock domain crossing [9,10] and reset domain crossing [11] related challenges. As a result, more preponderant are the chances of failures/defects. Any defect in this domain can prove catastrophic, especially when dealing with SOCs for the automotive industry. System bus and peripheral clock monitoring provide transaction-level visibility, enabling a means to analyze complex system events and to correlate hardware and software activity.
In order to detect and tackle any such clocking failure, a Clock Monitoring Unit is added to take corrective actions. Though it's always necessary to validate and debug everything with Clock Monitors on, but it's also necessary to check the expected functionality of the system and peripherals with Clock monitors switched off since sometimes issues get hindered while validating everything with monitors ON. Normally when Clock Monitors are switched on events occur as expected but when we start checking the IP functionality with monitors switched off, some of the IP features may not work as expected such as lock status bits, clocks locking circuitry and many more. This way it is important to validate everything even with monitors switched off.

Exposing Clock Signals at the Pads
There should be a provision to bring out system clock and other available clocks to pad so that they can be used for debug purposes. If the clock is available at GPIO, in that case it can also be checked if the clock generated is as per expectation or no. We can also check the accuracy or consistency of the clock. All such clock sources, clock dividers and phase dividers should be checked in every possible mode especially in low power mode.

PVT Testing
As mentioned earlier as well that these microcontrollers can be made to work in any part of the world that may vary with extreme environment conditions, so it is mandatory to test these in the same conditions. Therefore clocking validation becomes important across process, temperature [12] and voltage [13]. Process here means that regression suit should be run across different process corners of silicon along with varying range of temperature and voltage. With the change in process, voltage and temperature current also varies significantly. More is the voltage, greater is the current which means charging or discharging of capacitors will take place a higher speed. And hence, correspondingly delays will be less. Similarly, as the process becomes slower, lesser is the current and more is the delay.
Further, at higher temperatures, electrons in other periphery of device will not be able to enter the same periphery due to increased collision of electrons, and hence current will be less which will result in higher delays. Temperature and voltage range may vary as per the data sheet of microcontroller. The tests that are targeted for this PVT [14,15] testing should include all the clock switching including all the supported power modes. This is the area where we can have more randomization, or we can say that we can combine PVT testing with randomization. Random clock switching along with random low power mode transitions can be checked while voltage is being fluctuating (but with in spec range) and across different temperature. Same suit can be repeated across different process corners.

Conclusion
Validating clocking subsystem can never be treated as 100% complete. There can always be some way to make the device malfunction either with physical attacks or with environmental conditions. We should always target the validation with destructive mind set. Always try to validate the subsystem by going against the specs. We tried to capture some of the tests that can be covered in post silicon validation for robust validation and to have a "HEALTHY HEART" of the microcontroller, but still complete coverage of all the combinations of clocking subsystem is not possible. So there should be a way to restrict these combinations either from design or system so that malfunctioning of the device can be prevented in field due to invalid combinations or due to the combinations that have not gone through robust testing.