Inverted Gate Vedic Multiplier in 90nm CMOS Technology

This paper proposes the design and implementation of an enhanced binary multiplication technique. Vedic Mathematics is a system of mathematics that was discovered by Indian mathematician Jagadguru Shri Bharathi Krishna Tirthalji in the period between 1911 and 1918. The main objective of this paper is to design an improved binary multiplier which is faster and low-powered. The performance of our proposed full adder design is proven to be more effective in comparison with the standard full adder cell both designed in 90nm. The proposed modified 2-Bit and 4-Bit Vedic multipliers also beat the existing Vedic multiplier based in Urdhva Tiryagbhyam sutra in terms of operating frequency, energy and area. ThedesignsareimplementedoncadenceVirtuoso90nmCMOStechnology operating at 2V supply. Comparedtotheexisting standard fulladderdesigns in 90nm, the proposed implementation has shown that it offers significant improvements in terms of power and speed consuming 60% less power and is able to operate 20% faster. The proposed 2-Bit multiplier operated at 2V is proven to be more effective. The design was further extended to realise a 4-Bit multiplier. The power consumed by the standard 4Bit multiplier designed using standard 90nm cells was 361.2μW and the power consumed by the proposed 4-Bit multiplier design was found to be 290.2μW, which reflectsa 20% decrease in the power usage.


Introduction
With the advent of VLSI technology and the exponential growth in the number of transistors on the chip, there is a need for newer architectures to be faster, and at the same time, power consumption to be at its minimum. This paper presents a way of modifying the existing design of the Vedic multiplier to suit for various applications. The modified Full adder and 2-Bit Vedic multiplier are proved to be much more efficient when compared to their conventional standard designs. The proposed design of the 4-Bit multi-plier using these modified designs helps improve the overall performance of the system. Discrete Cosine Transformation (DCT) plays a crucial role for compression of images, the works of [3] propose a custom multiplication algorithm for reducing the complexity of matrix multiplication. Vedic multiplications techniques are proven to be more effective than conventional methods [13], which is the main driving factor for our study into Vedic mathematics. Various multipliers and dividers based on Vedic sutras addressed in [6] and the authors conclude that the use of these sutras in the computing algorithm of the digital system would reduce the complexity of design, area, execution time and power consumption. Karatsuba al-gorithm is preferred for higher bit multiplication and numer-ous implementations of higher bit multipliers on Xilinx tool have been performed [4,7,8]. Complexity of the circuits can be reduced by using techniques like Gate Diffusion In-put (GDI) as studied in [11].

Urdhva Triyagbhyam
The word "Urdhva Triyagbhayam" resources "vertically" and "crosswise" multiplication [1]. It is a nonspecific algo-rithm for N-Bit numbers. The advantage of this algorithm is that delay and area increases slowly as the number of binary bits for multiplication increase. These circuits show high regularity and hence the low layout area.   Conventional full adders use AND and OR gates for thee generation of carry, whereas the proposed full adder logic eliminates the use of AND and OR gates for the carry bit generation. The NAND gates are used instead. NAND gate has equal fall and rise times. But a NOR gate has two pMOS connected in series. This requires increase in sizes of transistors in NOR to have good fall and rise times.

Proposed Full Adder Design
The use of NAND gates instead of OR and AND gives the advantage of reducing the delay, power consumed and area occupied by the circuit. Since NAND has a better ration of output high drive and output low drive as compared to NOR. NAND is preferred over NOR [2].
Basic gates are designed using the aspect ratio 1.6. In Figure 3 binary numbers A1A0 and B1B0 are considered. In the first step, Y1 is obtained by multiplying the bits A0 and B0. The second step involves adding the products A0B1 and A1B0. The summation of these, results as Y1 and the carry is propagated to the next stage. Addition is per-formed using a standard half adder.

2-Bit Multiplier
In the proposed design as shown in Figure 4 the use of the standard half adder is eliminated and the circuit is mod-ified which helps reduces the delay and power consumption. The design is tested and verified using Multisim and Cadence Virtuoso 90nm shown in Figure 5 and Figure 6.

4-Bit Multiplier
The 4x4 multiplication is done considering the grouping 2 bits together each of the 4-Bit input, Figure 7. Each group of 2 bits of each input is handled by a separate 2-Bit multiplier.   The vertically and crosswise operation is performed for a total of four times which requires four 2-Bit multipliers. The partial products produced by each of 2-Bit multipliers are added in a Carry Save Adder.

Carry Save Adder (CSA)
Carry Save Adder is used to add three numbers at an instant. Using this property of CSA, eliminates the need for a third adder in the 4-Bit multiplier design. The use of two adders instead of three decreases the delay, power dissipa-tion and area consumed. CSA has the advantage of using parallelism to significantly boost computational efficiency as there are multiple operands.
The proposed 4-Bit design uses two Carry Save Adders (CSA-1 and CSA-2) which are discussed in the next section.

Carry Save Adder (CSA-1)
The design of CSA is modified to suit the needs of the 4-Bit multiplier. CSA-1 is used to add three numbers, two of which are four bits and the third in two bits wide. Figure 10 Shows the schematic of CSA-1 implemented in 90nm. CSA-1 adds the partial products of the first three 2-Bit multipliers.  Figure 11 shows the schematic of CSA-2. CSA-2 is modified to add two numbers, one of four bits that is the output of the fourth 2-Bit multiplier and the other of three bits that is the output of CSA-1.

Layouts Designed in 90nm Technology
The main goals of placement of cells are to achieve tim-ing, power, area optimization and least routing. To begin with, metal-2 tracks are laid horizontally uniformly spaced to achieve 7 track standard cells. Metal-3 tracks are laid out vertically as per the routing requirements. Placement of cells a crucial role in terms of area optimization, timing analy-sis and power consumption. The area occupied by the 2-Bit multiplier layout is 43.13 µm 2 and 4-Bit multiplier is 421.92 µm 2 in 90nm.
The substrate taps are placed within each cell and the metal wires run within the cell connecting the transistors properly. Supply rails are designed to ensure that they bear more current and have less resistance.   Inside each cell, the polysilicon runs vertically to form the transistor and the diffusion and the metal1 runs horizon-tally. Metal1 runs vertically where it does not interfere with other connections, it helps to save the area. The layouts de-signed are LVS and DRC clean to ensure high overall perfor-mance and reliability. Figure 13 shows the 4-Bit multiplier layout which consists of four 2-Bit multipliers (labeled as 1, 2, 3 and 4), CSA-1 and CSA-2 as sub-blocks. Power rails run both horizontally and vertically at regular intervals and step down to metal-1 to power individual cells. Area occupied by CSA-1 and CSA-2 are 150.69µm 2 and 93.47µm 2 respectively. Figure 15 shows the layout design of CSA-2. The cells are placed in this manner to have minimum global and local. routing congestion. VDD and VSS tracks run horizontally and adjacent cells share these tracks to minimize area occu-pied.

Results
The performance analyses of the full adders are studied and results are shown in Table 1. The proposed full adder saves 60% of power and is 20% faster the existing standard cell in 90nm architecture in terms of power consumption, operating frequency and area savings, by reducing the transistor sizing according to fan-out required. The graphs shown in Figure 16 and Figure 17 depict the rise and fall delay time for the output MSB bit. The color red and blue indicate the propagation delay of the standard 4-Bit and the proposed 4-Bit multipliers respectively. One of the two 4-Bit inputs is made to vary from (0001) 2 through (1111) 2 , and the other is made constant (1111) 2 .  The performance analysis of the proposed 2-Bit multiplier is shown in Table 2, it is found to be faster and it consumes lesser power when compared to the existing design. The 4-Bit multiplier architecture is designed using both the 2-Bit designs and the results obtained favor the proposed Figure 18 depicts the power consumed by the standard 4-Bit multiplier designed using standard 90nm cells and the power consumed by the proposed 4-Bit multiplier design. In the worst case, the standard design consumes 361.2µW and the proposed design consumes 290.2µW, which reflects an 20% decrease in the power usage.

Conclusion
The traditional method of generating carry bit in the full adder makes use of AND and OR gates, which has been modified along with reducing the sizing of transistors without compromising on speed to create a faster and lower power consumption full adder. The standard full adder cell present in gpdk090 consumes 131µW of power in compari-son with 51.9µW. This proposed full adder is implemented in the design of CSA's which improves the overall performance of the 4-Bit multiplier architecture.
The 2-Bit multiplier based on Urdhva Tiryagbhyam sutra is modified which enhances the performance in terms of operating frequency and power consumption. The 4-Bit multiplier is constructed in two variants, one using the standard full adder and 2-Bit Vedic multiplier design, the latter consisting of the proposed full adder and 2-Bit Vedic multiplier designs. The performance analyses of the two variants are conducted and the results obtained favor the proposed design.