High-Performance SRAM in Nanoscale CMOS: Design Challenges

Transcription

High-Performance SRAM in Nanoscale CMOS: Design Challenges
High-Performance SRAM in Nanoscale CMOS: Design Challenges and Techniques
Ching-Te Chuang, Saibal Mukhopadhyay, Jae-Joon Kim, Keunwoo Kim, and Rahul Rao
IBM T. J. Watson Research Center, Yorktown Heights, NY 10598, U. S. A.
Phone: +1-914-945-3596, Fax: +1-914-945-1358, e-mail: ctchuang@us.ibm.com
the redundancy needed for repair of the projected array size
at 45 nm impractical, and even prohibitive for designs
This paper reviews the design challenges and techniques of below “5-Sigma”, as a “4-Sigma” design requires 33 fixes
high-performance SRAM in the “End of Scaling” nanoscale per Mb of cells (0.57 fixes per Mb of cell for “5-Sigma”
CMOS technologies. The impacts of technology scaling, design).
such as signal loss due to leakage, degradation of noise
margin due to VT scatter caused by process variations and While it is possible and circuit techniques have been
random dopant fluctuation, and long term reliability developed to monitor, detect, and compensate for the global
degradation such as NBTI, are addressed. Design directions systematic die-to-die variations, it is much more difficult to
and leakage/variation/degradation tolerant SRAM circuit develop similar schemes for local random variations.
techniques to mitigate various performance and reliability In addition to leakage and variation, NBTI (Negative Bias
constraints in conventional planar CMOS technology are Temperature Instability) [4, 5] has recently emerged as a
discussed. Examples are given and merits discussed for cell limiting factor in the scaling of PMOS. At high negative
isolation and strength preservation, thin cell layout, bit-line bias and elevated temperature, the PMOS VT gradually
and word-line leakage mitigation, migration to large signal drifts to become more negative, thus reducing PMOS
read-out, unclamped bit-line, dual-supply, dynamic current drive and affecting cell stability, margin, and Vmin
Read/Write supply, floating power-line, header/footer (minimum operating voltage) of SRAM. This long term VT
power-gating structures, Read- and Write-assist circuits, drift and other gate oxide related VT degradation
leakage/variation detection and compensation techniques, mechanisms must be accounted for over the life span of
word-line and bit-line pulsing schemes, gate leakage usage. Furthermore, with high-ț gate materials, PBTI
tolerant design, and NBTI tolerant design. Alternative cell (Positive Bias Temperature Instability) in NMOS must be
structures, such as asymmetrical SRAM, 7T, and 8T considered, as high-ț gates exhibit significant charge
SRAMs, which decouple the cell storage node from the trapping and thus VT shift in NMOS [6, 7].
Read-disturb and half-select disturb to improve the SNM
2. Cell Stability and Disturb
are discussed. Finally, some design issues and opportunities
in emerging technologies such as FD/SOI and multi-gate The cell stability, Static Noise Margin (SNM), and Vmin of
6T SRAM are limited by leakage, variation, and supply
FinFET are illustrated.
voltage in the physical domain. In the design domain, the
major limiting factors are the conflicting Read/Write
1. Technology Scaling and SRAM Design Challenges
requirements and cell disturbs.
High-performance SRAMs are crucial elements for high- Read/Write Requirements: To facilitate Read and
performance microprocessor cache memory and SoC minimize Read-disturb (Fig. 4), it is desirable to have a
applications. Due to large number of transistor count, the strong pull-down NMOS and a small ratio (strength ratio
use of the smallest transistors in the cells, and leakage and of access pass-transistor NMOS to pull-down NMOS).
variation in sub-100 nm CMOS technologies, SRAMs have However, in order to improve Writability (Write margin)
become a major yield limiter. SRAM is more vulnerable to and Write performance, it is desirable to have a strong
process variations and VT mismatch than logic circuits since access pass-transistor NMOS.
minimum size (or sub-groundrule, or groundrule
“waivered”) devices are used in the cell, and there is no Read Disturb: Conventional 6T SRAM is most unstable
“averaging” effect as in logic data paths. As the cell size for (worst SNM) in Read operation (Fig. 4), since the access
6T SRAM is scaled down to between 0.5 μm2 to 0.75 μm2 pass-transistor NMOS and the pull-down NMOS form a
at 65 nm node, and below 0.5 μm2 at 45 nm node (Fig. 1) voltage divider, and the cell “0” storage node rises during
[1], the local random device VT mismatch (VT scattering) Read (Read-disturb voltage). If the Read-disturb voltage is
(Fig. 2) due to discrete dopant effect (Fig. 3(a)) [2, 3] larger than the “Trip” voltage of the other half-cell inverter,
further exasperates the problem to impose a fundamental the cell can flip. As shown in Fig. 4, with technology
limit on the attainable cell size. At 50 nm Leff (90/65 nm scaling, the variations in the Read-disturb voltage (“Cell
node), there are only about 200 dopant atoms in the device down-level”) and inverter Trip voltage (“Cell switch-point”)
channel. At 12 nm Leff (32/28 nm node), the number of increase, causing overlap and thus failure starting at 90 nm
dopant atoms in the device channel is expected to reduce to node [8].
merely 2.6 [2]. The “Sigma” (ı) of VT variation has Half-Select Disturb: During a Read or Write operation, the
remained relatively flat for technology scaling down to half-selected cells on the same word-line are actually
around 130 nm node, yet it is rising quickly starting at 90 experiencing a Read operation, and thus disturb similar to
nm and is expected to increase by a factor of 4 to 30 nm Read-disturb described above (Fig. 5).
node (Fig. 3(b)) [3]. This rapid increase in VT variation,
against the backdrop of lower supply voltage, has rendered
Abstract
978-1-4244-1656-1/07/$25.00 ©2007 IEEE
4
3. Circuit and Design Techniques
Many circuit and design techniques have been developed to
mitigate the leakage and variation, to improve the cell
stability, noise margin, Vmin, and to extend the scalability of
6T SRAM. The most notable ones are:
Thin-Cell Layout: Starting around 90 nm node, thin-cell
layout with uni-directional poly (Fig. 6) have become
prevalent across the entire industry [9-14]. The “thin”
vertical height reduces bit-line loads for performance and
noise immunity, and the uni-directional poly improves the
manufacturability and yield.
Hierarchical Bit-Lines with Short Local Bit-Lines: The
short local bit-lines reduce the bit-line load, charge injection
into the cell, leakage, and noise coupling, thus improving
the performance, power, and noise margin. The global bitlines are typically reset to “Low”, thus further reducing the
power [15-17].
Large Signal Domino-Like Sensing: Large signal read-out
schemes with full swings at local and global bit-lines have
replaced the traditional small signal differential sensing
schemes to ensure stability, combat variability, improve
margin, and address reduced cell current problem. In most
cases, single-ended large signal Read and differential Write
are employed [16, 17].
Unclamped (Floating) or Weakly Clamped Local BitLines: The local bit-lines are left unclamped (floating) or
weakly clamped to suppress or reduce the leakage from the
reset (precharge) devices during the standby mode or for
any unselected sub-arrays during active mode. This is
achieved by turning off the precharge devices in the case of
unclamped bit-lines [16, 17]. In the case of weakly clamped
bit-lines, the bit-line precharged voltage level is lowered by,
for example, one VT drop by using NMOS precharge
devices in source-follower configuration (as opposed to full
rail precharge using PMOS in the clamped bit-line case).
Unclamping of bit-lines also reduces the half-select disturb
since the voltage level at these bit-lines drift lower due to
the current and leakage.
Dual Supplies: The supply voltage for cells and
performance critical peripheral circuits (VCS) is set at a level
150mV – 200mV higher than the logic supply level (VDD) to
ensure cell stability and improve performance, while lesscritical peripheral circuits stay at VDD to reduce power
consumption [15-17]. Fig. 7 illustrates the voltage domains
of the 6.0 GHz SRAM array for CELL Broadband EngineTM
[17] in 65 nm PD/SOI technology. The cell, word-line, bitline precharge signals are at VCS (1.0 V) to improve cell
stability, Read performance , and allow writeability to track
cell stability. Local bit-lines, evaluation, and global write
lines are at VDD (0.8V) to reduce cell stress and lower
power. This dual supply scheme has helped to improve
Vmin from 0.875 V to 0.80 V.
Adaptive Read/Write Supply: The cell supply voltage can
be adaptively/dynamically switched based on Read, Write,
or Standby. Fig. 8 depicts a dynamic supply scheme using
column based header switches [18]. High cell supply
voltage is used for Read (VCC,Cell > VWL), while lower cell
voltage (200 mV lower) is used for Write (VCC,Cell < VWL).
The scheme improves noise margin for both Read and Write
while minimizing leakage, and significantly reduces the
random single bit failure in a 3.0 GHz, 70 Mb SRAM in 65
nm technology.
Proper combination of adaptive Read/Write supply and cell
body bias improves the Read and Write margin, and has
been shown to enable 30% power reduction and 0.3 V Vmin
for a 64 Kb low-power SoC SRAM in 90 nm CMOS
technology [19].
One variation of adaptive Read/Write supply scheme is the
“Floating Power-line Write” where the cell supply voltage
for the selected column is switched off by column power
switch, thus putting the cell supply node in a floating state.
In this scheme, the Write current lowers the cell supply
node for the selected column, thus facilitating Write,
reducing Write power, and improving Vmin by 100 mV in
32Kb/512Kb SRAMs in 90 nm CMOS [20].
Capacitive coupling technique has also been used to
bootstrap cell supply voltage to above VDD during access to
achieve higher cell current and improved stability in a picoJoule class, 1 GHz, 32 KByte x 64b DSP SRAM [21].
Power-Gating
with Header/Footer: Power-gating
structures are used to reduce the leakage in Sleep/Standby
mode or Shut-off mode [21, 22]. Fig. 9 illustrates the
scheme used in the 3.4 GHz, 16MB L3 cache in dual-core
Xeon processor in 65 nm CMOS technology [22], where
NMOS sleep transistor is used in the sub-arrays and PMOS
power-gating is used in the periphery.
Read-Assist and Write-Assist Circuits: The variation
detection and compensation technique is best illustrated in
the scheme depicted in Fig. 10 [23]. In this scheme, a
process and temperature variation tolerant Read-Assist
Circuit (RAC) utilizes Replica Access Transistors (RATs)
as variation sensing devices. The RATs have topologically
the same layout structure as SRAM access transistors. The
selected WL level inversely tracks strength of access
transistors, thus compensating the access transistor strength
variation.
Capacitive coupling technique has also been employed to
facilitate Write. Fig. 11 shows a Write-Assist Circuit (WAC) with divided array supply scheme [23]. In this scheme,
the array supply and a dummy supply (at GND level in
Standby or Read) in the selected column are put into
floating state during Write. The array supply is then shorted
to the dummy supply by an NMOS switch, causing the array
supply to be capacitively coupled down to facilitate Write.
The capacitive coupling effect can be enhanced by dividing
the array supply line into short segments. The capacitive
coupling effect provides a fast, large response, and there is
no need for additional supply. The scheme enhances Write
margin against increasing variation due to scaling, resulting
in significantly higher yield even with a much smaller cell
in a 4 x 256 Kb SRAM macros in 45 nm LSTP CMOS
technology
Word-Line and Bit-Line Pulsing: Proper control of wordline pulse width and temporarily lower bit-line voltage are
also effective techniques to improve cell stability [24]. The
word-line pulse width must be short enough to limit the
Read-disturb; while long enough to ensure adequate ¨VBL
for sensing during Read, and long enough to maintain
enough Write margin. The bit-line voltage is pulled-down
by 100 – 300 mV just before word-line turns on to reduce
5
be scaled down without degrading the Read performance
and Read disturb, while that in the 8T cell can. At 45 nm
node, the cell areas cross if the operating voltage is 1.0 V,
and the area of the 8T cell becomes smaller at the 32 nm
node. If the operating voltage is 0.8 V, the 6T cell area is
saturated and cannot be shrunk beyond 45 nm due to the
large width of the cell pull-down NMOS needed for Read
performance and margin. Furthermore, in 8T cell, high
voltage can be used for Read word-line to enhance Read
performance without causing Read disturb; and high voltage
can be used for Write word-line to facilitate Write and
expand Write margin. These voltage control schemes
further enable the scalability of 8T SRAM cell as shown in
Fig. 13 [27].
5. Asymmetrical SRAMs
The Read-disturb in 6T SRAM can be mitigated/eliminated
by either cutting off the inverter feedback path or skewing
one of the cell inverter to have a higher Trip voltage, while
employing single-ended Read with the other cell inverter.
7T Asymmetrical SRAM: Fig. 14(a) shows the schematics
of a 7T asymmetrical SRAM cell [30]. In this scheme, the
Read word-line is separated from the Write word-line, and
single-ended Read is employed. An extra NMOS is inserted
in the left-side cell inverter, which cuts off the feedback
loop during Read, thus eliminating the failure due to Readdisturb. The scheme has been implemented in a 64 Kb
SRAM in 90 nm CMOS technology and demonstrated 20 ns
access at 0.5V. A Vmin of 0.44 V (determined by Write
SNM) has been achieved with 9% area overhead.
6T Asymmetrical SRAM: Fig. 14(b) shows the schematics
of a 6T asymmetrical SRAM cell [31]. Again, the Read
word-line is separated from the Write word-line, and singleended Read is employed. The left-side cell pull-down
NMOS is weakened, thus increasing the Trip voltage of the
left-side half-cell inverter to improve the Read margin.
Weakening the left-side pull-down NMOS by sizing down
the device width alone is limited by the minimum width in a
given technology, and normally does not provide sufficient
improvement in Read SNM. High-VT device, thick-oxide
device, or mid-gap gate material (which increases VT) can
be used in combination with device down-sizing to further
improve the Read margin [31].
6. Design Considerations for Emerging Devices
In this section, we discuss the design issues/considerations
and some novel schemes exploiting the unique features of
emerging devices such as fully-depleted SOI (FD/SOI)
devices and multi-gate/FinFET devices.
6.1. Fully-Depleted SOI (FD/SOI) Devices
Due to better short-channel control, very low junction
capacitance, and the elimination of hysteric VT variation,
Ultra-Thin Body Fully-Depleted SOI (UTB FD/SOI) device
has emerged as a possible candidate for sub-65 nm
technologies.
Random Dopant Fluctuation (RDF) Effect: The local VT
variation due to RDF is expected to be more serious in
FD/SOI devices compared with bulk-CMOS or PD/SOI
devices. This is primarily due to the stronger dependence of
bulk change (QB) on the body doping density (Nb) in
FD/SOI device (QB Į Nb) compared with bulk-CMOS or
PD/SOI device (QB Į ¥Nb). The RDF effect can be reduced
Read-disturb during Read, and half-select disturb during
Write.
Gate Leakage Tolerant Design: The gate leakage of large
devices in word-line driver can be reduced by collapsing the
voltage across the gate oxide. In [25], the voltage level at
the gate of the word-line driver NMOS device is left
floating, and discharged by junction leakage in standby,
thus reducing gate leakage.
NBTI Tolerant Design: The bias dependence of NBTI can
be exploited to minimize the impact on long term reliability.
Fig. 12 shows a NBTI tolerant sense amplifier design for L1
cache and TLB in a 64b microprocessor in 130 nm
technology [5]. In the original circuit (Fig. 12(a)), The VT
shift over silicon lifetime due to NBTI causes VT mismatch
of as high as 50 mV between pMOS M3 and M4, resulting
in sense delay degradation of 42%. Noting that the NBTI
induced VT shift depends strongly on the gate-source bias
and temperature but barely on the drain voltage, the new
circuit (Fig. 12(b)) replaces the cross-coupled M3/M4 by
T3/T4, for which gates are commonly biased at about 40%
VDD during sensing. As the gate bias is identical for T3 and
T4, the VT mismatch is minimized. pMOS T1/T2 are added
to speed up the output low-to-high transition. Their VT
mismatch is not critical since they get activated after the
critical part of the amplification. The new circuit improves
the deteriorated sense delay by 22%.
4. 8T SRAMs
8T SRAM has recently emerged as a promising candidate to
replace 6T SRAM in sub-32 nm technologies [26-29]. The
8T SRAM utilizes a cell schematically similar to cells
commonly used in register files, by separating the Read
word-line and Write word-line and adding 2 stacked NMOS
for single-ended Read (Fig. 13). If offers the following
advantages:
Read Disturb: By separating the Read word-line and Write
word-line and isolating the cell storage node from Read
current path, the Read disturb is completely eliminated.
Half-Select Disturb: Half-select disturb still exists during
Write operation for 8T SRAM. The half-select disturb can
be eliminated by:
(1) Array architecture approach: The array can be floorplaned such that all bits in a word are spatially adjacent, and
thus requiring no column select. The constraint for this
approach is that bits from different words can not be
physically interleaved. This approach has been used in a 5.3
GHz, 0.41 V Vmin, 32Kb sub-array in 65 nm PD/SOI
technology [28].
(2) Gated Write word-line signal (Byte Write): The local
Write word-line select signal is gated, and “on” only for the
selected block. This scheme has been implemented in a 6.6+
GHz, 0.4 V Vmin, dual-supply, 1.2 Mb SRAM in 65 nm
PD/SOI technology [29].
(3) Write-back scheme: The Read word-line is activated
even during Write, and all cell data in selected word-line are
read out to D-latches. The Data-out is then written back to
half-selected cells. The scheme has been used in a 0.42 V
Vmin, 64 Mb SRAM in 90 nm CMOS technology [27].
Cell Size Scaling: At 90 nm node, the cell size of 8T
SRAM is bout 20% larger than 6T SRAM cell. However, in
6T cell, the device width of cell pull-down NMOS cannot
6
illustrates an example where a down-sized front-gate is used
for the left-side pull-down NMOS while the back-gate is
biased to GND. The scheme offers 1.9X SNM improvement
over symmetrical cell and 1.5X improvement over
asymmetrical cell with device sizing only (Fig. 20(b)) [37].
7. Conclusions
We have discussed CMOS technology scaling and resulting
leakage/variation/reliability design challenges for highperformance
SRAMs.
Cell
stability,
Read/Write
requirements, Read-disturb, and Half-select disturb are
shown to impose constraints on performance, margin and
cell scalability. Examples of circuit techniques to mitigate
various power, performance, and reliability constraints are
given. Alternative cell structures, such as 8T SRAM and
asymmetrical SRAM, and their merits are discussed. Design
considerations for emerging devices, such as FD/SOI and
double-gate/FinFET are addressed, and novel circuit
schemes exploiting unique features of emerging devices are
discussed.
Acknowledgement
The authors were partially supported by DARPA Contract
HR0011-07-9-0002.
References
by lowering the body-doping. However, reduction of bodydoping increases the leakage current. Proper tuning of backgate bias or gate workfunction, combined with body-doping
reduction, has been shown to be very effective in achieving
a given leakage target for robust SRAM applications [32,
33].
Asymmetrical VT Shift: Fig. 15 shows the distribution of
I-V characteristics due to RDF effect for PD/SOI and
FD/SOI devices. For FD/SOI device, the mean VT is lower
than the nominal VT expected from continuous doping case.
This asymmetry in VT shift causes a large spread in leakage,
and degrades the performance and margin as the silicon film
thickness is reduced (Fig. 16). This asymmetrical VT shift
originates from the penetration of drain fringing electric
field underneath the channel, which affects the barrier at the
source for carrier injection and degrades the Short Channel
Effect (SCE). The source carrier injection (hence channel
current) depends exponentially on the barrier height, and
thus highly non-linear and highly asymmetrical. This effect
can be mitigated by using thin Buried Oxide (BOX)
structure, which reduces the penetration of drain fringing
electric field, thus reducing the asymmetry and leakage
spread, at small expense of performance and Write margin
[32, 33].
6.2. Double-Gate/FinFET Devices
Quasi-planar multi-gate devices, such as FinFET, hold the
promise to extend scaling beyond conventional planar
CMOS technologies. The device structure imposes some
constraints while, at the same time, offers unique features
that can be exploited for SRAM applications [34-37].
Fin Height (Device Width) Quantization: The
quantization of device width to integer multiple of fin height
constrains the opportunity of cell transistor sizing for
stability optimization in FinFET SRAM.
Surface Orientation and Mobility: In FinFET devices,
different surface orientation can be realized by rotating the
fin in layout (Fig. 17 and 18(b)). The orientation
dependence of mobility can be exploited to optimize the ȕratio of cell transistors to improve the performance and
margin. As shown in Fig. 18(a), with optimized orientation,
the Read SNM can be improved by 23-35%, and access
time by ~22-33%, at expense of containable Write margin
loss and area [34].
Independent Gate Feedback Schemes: The independentgate capability in double-gate devices can be exploited to
improve performance and margin. Fig. 19(a) shows a 6T
SRAM cell with “Yin-Yang” feedback with double-gate
devices [35]. The front-gate and back-gate of the cell pulldown NMOS devices are tied together. As such, the “On”
NMOS becomes stronger, and the “Off” NMOS becomes
weaker, thus reducing the Read-disturb to improve the
SNM. An improved version of the “Yin-Yang” feedback
cell is shown in Fig. 19(b) [36], where the back-gates of the
access pass-transistor NMOS’s are connected to the storage
nodes. This scheme has the same Read characteristic as
Yin-Yang feedback cell, yet improved Write performance
and margin as the cell “High” side access pass-transistor
becomes stronger.
Back-Gated Asymmetrical SRAM: The back-gating
capability of asymmetrical double-gate device can be
exploited for asymmetrical SRAM applications. Fig. 20(a)
[1] http://www.itrs.net/
[2] D. J. Frank et al., “Monte Carlo Modeling of Threshold
Variation due to Dopant Fluctuations,” Digest of Tech. Papers,
Symp. VLSI Technology, 1999, pp. 169-170.
[3] S. Samaan, “The Impact of Device Parameter Variations on the
Frequency and Performance of Microprocessor Circuits,” ISSCC
Microprocessor Circuit Design Forum on Managing Variability on
sub-100nm Designs, Feb. 19, 2004.
[4] J.C. Lin, A.S. Oates, H.C. Tseng, Y.P. Liao, T.H. Chung, K.C.
Huang, P. Y. Tong, S.H. Yau, and Y.F. Wang, “Prediction and
Control of NBTI – Induced SRAM Vccmin Drift,” Tech. Digest,
IEDM, pp. 345-348.
[5] T. Takayanagi et al., “A Dual-Core 64b UltraSPARC
Microprocessor for Dense Server Applications,” Digest of Tech.
Papers, ISSCC, 2004, pp. 58-59.
[6] S. Zafar, et. al, "A Comparative Study of NBTI and PBTI
(Charge Trapping) in SiO2/HfO2 Stacks with FUSI, TiN, Re
Gates," Dig. Tech. Papers, Symp. VLSI Technology, 2006, pp. 3031.
[7] J. C. Lin, A.S. Oates, and C.H. Yu, “Time Dependent Vccmin
Degradation of SRAM Fabricated with High-k Gate Dielectrics,”
Proc. International Reliability Physics Symposium, 2007, pp. 439444.
[8] H. Pilo, “SRAM Design in the Nanoscale Era,” Digest of Tech.
Papers, ISSCC, SE5, 2005, pp. 366-367.
[9] M. Iwai et al., “45nm CMOS Platform Technology (CMOS6)
with High Density Embedded Memories,” Digest of Tech. Papers,
Symp. VLSI Technology, 2004, pp. 12-13.
[10] F. L. Yang et al., “45nm Node Planar-SOI Technology with
0.296μm2 6T-SRAM Cell,” Digest of Tech. Papers, Symp. VLSI
Technology, 2004, pp. 8-9.
[11] F. Arnaud et al., “Low Cost 65nm CMOS Platform for Low
Power & General Purpose Applications,” Digest of Tech. Papers,
Symp. VLSI Technology, 2004, pp. 10-11.
[12] S. Zhao, et al., Transistor Optimization for Leakage Power
Management in a 65nm CMOS Technology for Wireless and
Mobile Applications, Digest of Tech. Papers, Symp. VLSI
Technology, 2004, pp. 14-15.
[13] K. Zhang et al., “A 3-GHz 70Mb SRAM in 65nm CMOS
Technology with Integrated Column-Based Dynamic Power
Supply,” Digest of Tech. Papers, ISSCC, 2005, pp. 474-475.
7
Power Electronics and Design (ISLPED2007), Portland, Oregon ,
August 27-29, 2007, pp. 20-25.
[34] S. Gangwal et al, “Optimization of Surface Orientation for
High-Performance, Low-Power and Robust FinFET SRAM,” Proc.
IEEE Custom Integrated Circuits Conf. (CICC), 2006, pp. 433436.
[35] M. Yamaoka et al., “Low Power SRAM Menu for SOC
Application
Using
Yin-Yang-Feedback
Memory
Cell
Technology,” Digest of Tech. Papers, Symp. VLSI Circuits, 2004,
pp. 288-291.
[36] Z. Guo et al., “FinFET-Based SRAM Design”, Proc.
International Symp. on Low Power Electronics and Design
(ISLPED), 2005, pp.2-7.
[37] J. J. Kim, K. Kim, and C. T. Chuang, “Independent-Gate
Controlled Asymmetrical SRAM Cells in Double-Gate MOSFET
Technology for Improved READ Stability,” Proc. European SolidState Circuits Conference (ESSCIRC), 2006, pp. 74-77.
[14] E. Leobandung et al., “High Performance 65 nm SOI
Technology with Dual Stress Liner and low capacitance SRAM
cell,” Digest of Tech. Papers, Symp. VLSI Technology, 2005, pp.
126-127.
[15] M. Khellah et al., “A 4.2GHz 0.3mm2 256kb Dual-Vcc
SRAM Building Block in 65nm CMOS,” Digest of Tech. Papers,
ISSCC, 2006, pp. 624-625.
[16] J. Davis et al., “A 5.6GHz 64kB Dual-Read Data Cache for
the POWER6TM Processor,” Digest of Tech. Papers, ISSCC,
2006, pp. 622-623.
[17] J. Pille et al., “Implementation of the CELL Broadband
EngineTM in a 65nm SOI Technology Featuring Dual-Supply
SRAM Arrays Supporting 6GHz at 1.3V,” Digest of Tech. Papers,
ISSCC, 2007, pp. 322-323.
[18] K. Zhang et al., “A 3-GHz 70Mb SRAM in 65nm CMOS
Technology with Integrated Column-Based Dynamic Power
Supply,” Digest of Tech. Papers, ISSCC, 2005, pp. 474-475.
[19] Y. Morita et al., “A Vth-Variation-Tolerant SRAM with 0.3V Minimum Operation Voltage for Memory-Rich SoC under DVS
Environment,” Dig. Tech. Papers, Symp. VLSI Circuits, 2006, pp.
16-17.
[20] M. Yamaoka et al., “Low-Power Embedded SRAM Modules
with Expanded Margins for Writing,” Digest of Tech. Papers,
ISSCC, 2005, pp. 480-481.
[21] Azeeze et al., “A Pico-Joule Class, 1 GHz, 32 KByte x 64b
DSP SRAM with Self Reverse Bias,” Digest of Tech. Papers,
Symp. VLSI Circuits, 2003, pp. 251-252.
[22] S. Rusu et al., “A Dual-Core Multi-Threaded Xeon®
Processor with 16MB L3 Cache,” Digest of Tech. Papers, ISSCC,
2006, pp. 102-103.
[23] M. Yabuuch et al., “A 45nm Low-Standby-Power Embedded
SRAM with Improved Immunity Against Process and Temperature
Variations,” Digest of Tech. Papers, ISSCC, 2007, pp. 326-327.
[24] M. Khellah, Y. Ye, N. S. Kim, D. Somasekhar, G. Pandya, A.
Farhang, K. Zhang, C. Webb, V. De, “Wordline and Bitline
Pulsing Schemes for Improving SRAM Cell Stability in Low-Vcc
65nm CMOS Designs,” Dig. Tech. Papers, Symp. VLSI Circuits,
2006, pp. 12-13.
[25] K. Nii et al., “A 90 nm Low Power 32K-Byte Embedded
SRAM with Gate Leakage Suppression Circuit for Mobile
Applications,” Digest of Tech. Papers, Symp. VLSI Circuits, 2003,
pp. 247-250.
[26] L. Chang et al., “Stable SRAM Cell Design for the 32 nm
Node and Beyond,” Digest of Tech. Papers, Symp VLSI
Technology, 2005, pp. 128-129.
[27] Y. Morita et al., “An Area-Conscious Low-Voltage-Oriented
8T-SRAM Design under DVS Environment,” Digest of Tech.
Papers, Symp. VLSI Circuits, 2007, pp. 256-257.
[28] L. Chang et al., “A 5.3GHz 8T-SRAM with Operation Down
to 0.41V in 65nm CMOS,” Digest of Tech. Papers, Symp. VLSI
Circuits, 2007, pp. 252-253.
[29] R. Joshi et al., “6.6+ GHz Low Vmin, read and half select
disturb-free 1.2 Mb SRAM,” Digest of Tech. Papers, Symp. VLSI
Circuits, 2007, pp. 250-251.
[30] K. Takeda et al., “A Read-Static-Noise-Margin-Free SRAM
Cell for Low-Vdd and High-Speed Applications,” Digest of Tech.
Papers, ISSCC, 2005, pp. 478-479.
[31] Keunwoo Kim, Jae-Joon Kim, and Ching-Te Chuang,
“Asymmetrical SRAM Cells with Enhanced Read and Write
Margins,” Proc. 2007 International Symp. on VLSI Technology,
Systems, and Applications (VLSI-TSA), Hsinchu, Taiwan, April
23-25, 2007, pp. 162-163.
[32] S. Mukhopadhyay, K. Kim, X. Wang, D. J. Frank, P. Oldiges,
C. T. Chuang., and K.. Roy, “Optimal UTB FD/SOI Device
Structure using Thin-BOX for sub-50-nm SRAM Design,” IEEE
Electron Device Letters, vol. 27, no. 4, April 2006, pp. 284-287.
[33] Saibal Mukhopadhyay, Keunwoo Kim, and Ching-Te
Chuang, “Analysis and Optimization of Thin-BOX FD/SOI
Devices for Low-Power and Stable SRAM in sub-50nm
Technologies," Proc. 2007 International Symposium on Low
Fig. 1. 2005 ITRS Product Function Size Trends [1].
VDD
BR
BL
p0 p1 p2
δvti
pi
δvtj
pi
pn
pn
p0
p1
p2
Global systematic variation, die-to-die
WL
δVti
δVtj
Local device mismatch
Fig. 2. Global systematic and local random variation in
device threshold voltage degrade yield of SRAM design.
(a)
(b)
Fig. 3. (a) Number of dopant atoms in the channel as
function of effect channel length [2], and (b) relative
“Sigma” of VT variation as function of technology node [3].
8
Iread
Fig. 4. Scaling and cell stability margin of 6T SRAM [8].
BL
‘1’
BR
BL
‘1’
‘1’
Fig. 7. Example of SRAM with dual-supply, hierarchical
short local bitline, large signal single-ended “rippledomino” Read, and differential Write [17].
BR
‘0’
WL0
‘0’
WL1
Selected for
WRITE
‘1’
COL0
COL1
Disturb failure
can occur
Fig. 5. Half-select disturb (illustration for Write operation).
Fig. 8. Column based header switch dynamic supply scheme
to optimize Read/Write performance [18].
Fig. 6. “Thin Cell” layout to improve performance, noise
immunity, manufacturability, and yield [9-14].
Fig. 9. Power-gating with footer switch [22].
9
Fig. 10. Process and temperature variation tolerant ReadAssist Circuits (RAC) [23].
Fig. 13. Cell size scaling of 6T and 8T SRAM [27].
R/WWL
WWL
PL
AL
PR
Q AR
NR
Qb
NL
WBL
Make the two sides
R/WBL
unbalanced
Trip voltage of PL-NL
pair goes up
(a)
(b)
Fig. 14. (a) Asymmetrical 7T SRAM cell [30], and (b)
asymmetrical 6T SRAM cell [31].
Fig. 11. Capacitive Write-Assist Circuit (WAC) with
divided array supply scheme [23].
Fig. 12. NBTI tolerant TLB sense amplifier circuit [5].
Qbulkα N BODYWdmα N BODY
Qbulkα N BODY Tsiα N BODY
(a)
(b)
Fig. 15. Random Dopant Fluctuation (RDF) effect in (a)
PD/SOI, and (b) FD/SOI device [32, 33].
10
PR
PL
XL
VDD
PR
PL
XR
XL
XR
Q=GND
NL
NR
Q=GND
back feedback Qb=VDD
(yin)
NL
NR
front feedback
(yang)
(a)
(b)
(a)
(b)
Fig. 16. SRAM in FD/SOI: (a) leakage vs. Si film thickness, Fig. 19. Double-gate SRAM cells: (a) Yin-Yang feedback
and (b) statistical distribution of Read-disturb voltage and cell [35], and (b) improved Yin-Yang feedback cell [36].
cell inverter Trip voltage due to RDF effect [32, 33].
1.0
R/WWL
WWL
300 mV
270 mV
PL
AL
PR
Q AR
NR
Qb
Qb (V)
0.8
Sym. 6T
Asym. w/ NL=0.25um
0.6
160 mV
Asym. w/ NL=0.1um
0.4
NL (FG)
0.2
300 mV
320 mV
R/WBL
WBL
0.0
0.0
0.2
0.4
0.6
0.8
1.0
Q (V)
(a)
Fig. 17. Modification of surface orientation in FinFET [34].
(b)
Fig. 20. (a) Back-gated asymmetrical 6T SRAM cell in
asymmetrical double-gate technology, and (b) Read SNM
compared with symmetric 6T SRAM cell [37].
(a)
(b)
Fig. 18. Optimized multi-oriented FinFET 6T SRAM cells:
(a) Access Time vs. Read SNM, and (b) (100) PUP, (110)
AX, (100) PD on (a) (100) wafer [34].
11