Low Power Parallel Prefix Adder Design Using Two Phase Adiabatic Logic

: In this paper low power implementation of parallel prefix adders using two phase adiabatic logic has been investigated. A new structure has been proposed for the main blocks of parallel prefix adder. Three parallel prefix adders including Kogge-Stone, Brent-Kung and Ripple Carry have been considered. The effects of power clock frequency and loading capacitance on the new blocks have also been considered. Simulation results using 180nm technology parameters and trapezoidal waveform show an average of 34% power reduction in the main building blocks of the adder at 200MHz clock frequency. This power reduces to 54% for sine wave power clock waveform. This research suggests adiabatic implementation of parallel prefix adders for low power microprocessor and signal processing applications


Introduction
In recent years wide use of portable devices, such as cell phones, tablets and GPSs, has demanded for low power electronic circuit. In contrast, increasing the number of transistors in chip area and higher clock frequencies have resulted in higher power consumption of electronic devices. Therefore, design of low power electronic circuits using new nanometer technologies is an important factor in dealing with this phenomena.
Adders are widely used in microprocessors and digital signal processing systems. Since adders are used in many computational algorithms such as multiplication, division, floating point calculations and address generators, the efficiency of the processor can be dependent on the efficiency of the adder module. Many adders have been discussed in the literature that have different speed, power and chip area. Parallel prefix adders are among the high speed methods that are used for large number of bits and high speed applications [1]. Unfortunately, parallel prefix adders consume large chip area and power. Therefore, design of low power parallel prefix adders is important. Many techniques have been introduced to decrease power consumption of parallel prefix adders using lower power supply voltage, node capacitance and switching activity factor. These techniques are not very effective in large scale applications. Therefore, adiabatic techniques have been considered for these applications [2]. Adiabatic family circuits have offered lower power consumption comparing to standard CMOS logic [3]. Adiabatic circuits are divided into fully adiabatic and quasi-adiabatic circuits. In [4] SCRL (Split-Level Charge Recovery Logic), a charge recovery logic circuit has been introduced that uses a split level method to recover charge in the logic circuit. This method is similar to standard CMOS, but replacing VDD and GND with clock pulses. In [5] 2N-2N2D method has been introduced that uses diodes to implement quasi-adiabatic circuit. Using diodes causes irreversible charge transfer and reduces output voltage. In [2,6,7], complimentary pass transistor logic (CPAL) reduces power consumption using pass transistor. In [6] asynchronous adiabatic logic has been implemented using Dual Rail Domino Logic (DRDAAL). This method brings the advantage of asynchronous circuits to adiabatic logic. DRDAAL uses Dual Rail Domino for higher speed and lower Power Delay Product (PDP) comparing to static CMOS and quasi adiabatic logic.
Authors in [8] and [9] introduce another family of adiabatic circuits that have better energy recovery comparing to other methods and uses single phase sine wave for the clock. The current spikes are reduced by a factor of 4 that makes the circuits attractive for power attach resistant and cryptography applications. This paper discusses power reduction in parallel prefix adders using 2 phase clock pulse (2PASCL) adiabatic method. Instead of using DC power supply a clocked power supply is used and by omitting the series diodes, output voltage has been increased and the voltage drop has been canceled [10,11]. New structures of the Brent-kung, Kogge-Stone blocks have been redesigned using 2PASCL and the simulation results have been compare with static CMOS technology. Conclusion remarks are at the end.

Adiabatic Logic
Adiabatic technique has been used for saving power during charge and discharge method. The Adiabatic is a Greek term meaning a change with no loss or heat generation. The term is widely used in thermodynamic systems. In electronics it means charge conservation in each transfer [6]. In adiabatic systems the energy stored in the circuit can return to power supply in certain phases of the circuit operation. If all nodes in a circuit are charged with constant current the circuit will dissipate the least amount of energy. To implement this, the power supply of the circuit waveform is changed from DC to AC waveform [10]. Figure 1 shows charge and discharge method of an adiabatic switching. The rate of switching reduces in adiabatic logic compared to static CMOS, because the power supply waveform has changed. The supply voltage is applied during Φ and Φ phases. If Im is the average of charging current of CL, the total power dissipation in charging phase can be expressed in equation (1).
Where R is the transistor effective resistance and wiring resistance, CL is load capacitance, p T is time for the supply to reach from 0 to dd V during Φ. Considering (1) the energy dissipated in the system is inversely proportional to the charging time, / l p RC T . Therefore, in theory, if the charging time approaches infinity, the energy dissipated approaches zero.

Clocked Power Adiabatic Circuits
Short circuit power dissipation is a large part of total power dissipation in static CMOS circuits. This type of dissipation happens during rise and fall time of the input clock. In adiabatic logic by modifications of the clock waveform this type of power is reduced. In adiabatic logic with clocked power, the clock also serves as the power supply of the circuit. One of the common waveforms used as clocked power is trapezoidal waveform. This waveform has four phases of charge, evaluation, discharge and idle. Using trapezoidal waveforms stabilizes the output logic at VDD and GND comparing to other waveforms. However, different phases of clock make the circuit slow. Besides, additional circuits are required for linear ramp generation [12]. Sinusoidal waveform is also used for the clocked power adiabatic circuits. Using sine wave saves more energy comparing to trapezoidal and linear waveforms. Most sinewave designs use two phase clock that makes larger circuit implementations easier [13].
Depending on charge recovery method, adiabatic circuits may or may not have diode in the circuit structure. Using diode makes circuit implementation easier, but reduces output voltage swing and also increases power dissipation across diode. A logic gate based on Adiabatic Dynamic CMOS Logic (ADCL) that uses diodes is shown in figure 2.  [14].
P1 and N2 diodes have been added comparing to static CMOS and a sine wave clock has been used. Diodes separate charging and discharge paths and will have power dissipation. The speed of the ADCL reduces with increasing number of gates and in complex logic circuits [14].

Two Phase clock Adiabatic Split Charge Logic (2PASCL)
In 2PASCL logic family the charging current passes through a transistor instead of diodes in contrast to diode based adiabatic circuits. This removes voltage drop across diodes and increases output amplitude [11]. Figure 3 shows an inverter using 2PASCL that uses two complementary Φ and Φ for the power supply and ground. Didoes D1 and D2 make charge recovery from output nodes and discharging of internal nodes easier. Φ and Φ are complement of each other and when Φ changes form 0.9V to 1.8V, Φ changes from 0.9V to 0V.
As shown in figure 4, there will be no transition at the output when output is high and the pull up network is on, or when the output is low and the pull down network is on. However, if the output is high and the pull down is on, the output capacitor is discharged through transistor and D2. This happens when Φ is rising and Φ is falling (evaluation phase). In contrast, if Φ is falling and Φ is rising (hold phase) and the output is high and the pull up is on, output is discharged through D1 [11].

Parallel Prefix Adders
Parallel prefix adders are fast and are used in large number of bits and high performance adders. The three main blocks of parallel adders are: pre-computing, carry generate, postcomputing [1]. The block diagram of parallel prefix adder is shown in figure 5. In the first stage, Pi (Propagate) and Gi (Generate) signals are produced. These signals are fed to carry generate block. Different structures of parallel prefix adders are different with respect to carry generation method. The elements of the carry generate adders are shown in figure 6 consisting of black and gray and white operators. Buffers are used to cancel loading effects in these blocks. The last stage consists of two bit XOR gates to produce output bits. In the next section two common parallel prefix methods for Brent-Kung [15] and Kogge-Stone [16] are introduced.  Figure 7. Brent-Kung parallel prefix adder [17]. Using Two Phase Adiabatic Logic The Brent-Kung parallel prefix adder has more logic level and smallest chip area comparing to Kogge-Stone and Lander-Fisher. The number of blocks in this adder is obtained using 2(n-1) -Log2 and n is the number of bits. The fan out is two and has 2 2*log 1 n − levels. The block diagram for carry generate section of this adder is shown in figure 7 for a 16-bit adder [1].

Kogge-Stone Parallel Prefix Adder
The Kogge-Stone is a fast parallel prefix adder and is widely used in VLSI applications. The adder is used in large number of bits, since this adder has the least delay comparing to other methods. However, the adder consumes relatively large chip area. The carry generate block of such adder is shown in figure 8. This adder has Log level and the fan out of two in each stage. Large wiring is required between the logic levels. Since this adders has more blocks comparing to Brent-Kung method the power dissipation of the adder is high. If the layout is carefully designed, it may not increase the chip are drastically [1].
In this section the two parallel prefix adders Brent-Kung and Kogge-Stone, and Ripple Carry adder are built using new adiabatic structures. The XOR is widely used in the adder structures and its transistor level schematic is shown in figure   9. To investigate the operation of the XOR gate in the 2PASCL adiabatic logic, two trapezoidal and sine waveforms have been applied to the power supply of the gate. The power signals are complementary and Φ and Φ have been used. The simulation results for the XOR gate using 180nm technology parameter, 1.8V power supply voltage, 200fF load capacitor, at 200MHz clock frequency are shown in figure 10 and 11. As is shown in figure 10, the output signal is discharged to threshold voltage of D2.  The same power supply waveforms are also applied for all the blocks in the parallel prefix adder. The simulation result for the critical path delay, power dissipation with sine and trapezoidal waveforms are shown in tables 1 and 2.   table 3 and are compared to static CMOS logic. The effects of power supply clock frequency and the load capacitor on the power consumption of the adders have also been investigated as shown in figure 12. A comparison plot has also been shown in figure 13.

Discussion
Using simulation results listed in table 1, the power dissipation of XOR, gray and black operators for the trapezoidal waveform has been reduced by 31.32%, 39.4% and 31.14% respectively comparing to the static CMOS logic. However this method increases critical path delay and increases number of transistors. Using table 2 results, it can be concluded that sine wave clock waveform reduces power dissipation comparing to trapezoidal waveform [10,11]. This result also holds true for the maximum rise time and fall time of trapezoidal waveform. The disadvantage of using sine wave is increasing delay comparing to trapezoidal waveform. Figure  12 shows that increasing clock frequency increases power dissipation in all 3 adders and if the frequency increases above 100MHz, the rate of change in power dissipation will be the same for all adders. Considering figure 13 results it can be concluded that increasing load capacitance has little effect on power dissipation using 2PASCL and the slope of power dissipation approaches zero with large load capacitance. In contrast in static CMOS logic the power dissipation directly increases with load capacitance [10,11].
Parallel prefix adders are faster for large number of bits comparing to Ripple Carry adders. However, parallel prefix adders consume more power. As shown in the figure 13 simulations results, the power dissipation of parallel prefix adders decreases and approaches Ripple Carry adders using 2PASCL method and even surpasses as the load capacitance increases. Static power dissipation also exists in the 2PASCL adiabatic logic, but its effects are less important in long channel devices and increases with short channel technologies. Considering static power dissipation in 2PASCL adiabatic circuit can be investigated as future work.

Conclusion
In this paper 2PASCL adiabatic circuit has been proposed for parallel prefix adders to reduce power consumption. Diode based adiabatic circuits have smaller output voltage and power dissipation due to diode voltage drop. Using the proposed method in this paper the diode is by passed by a transistor and the above mentioned disadvantages are canceled out. A new structure has been introduced for parallel prefix adder building blocks using 2PASCL adiabatic logic. Using 180nm technology parameters, simulation results show that the power dissipation of parallel prefix adders using the 2PASCL adiabatic circuit has been reduced by 34% in average using trapezoidal waveform and 54% using sine power clock waveforms. Furthermore, the capacitive load has minimum effect on the power dissipation using this method. Therefore this method can be used for parallel prefix adders in low power microprocessor and signal processing applications.