Lecture 7: CMOS Proximity Wireless Communications for 3D
Transcription
Lecture 7: CMOS Proximity Wireless Communications for 3D
EE290c Spring 2007, Tues & Thurs 9:30-11:00, 212 Cory UCB Lecture 7: CMOS Proximity Wireless Communications for 3D Integration (2) Tadahiro Kuroda Visiting MacKay Professor Department of EECS University of California, Berkeley tadahiro@eecs.berkeley.edu, kuroda@elec.keio.ac.jp http://bwrc.eecs.berkeley.edu/Classes/ee290c_s07 http://www.kuroda.elec.keio.ac.jp/ © T. Kuroda (1/42) ISSCC2007 Energy dissipation reduction to 0.14pJ/b Pulse shaping Extension of communications range to 1.2mm for through-package link Preamplifier and offset cancellation Energy Dissipation (Power/Bandwidth) [pJ/b] 1000 Toshiba Probe IC on FCB NEC NTT TI 100 Intel NEC Inductors in FCB NEC Keio Rambus Hitachi 10 CLK Target LSI on PCB Fujitsu TeraChip Sun TX RX Keio Probe IC (transceiver) Keio NC State Inductors in FCB SFT To/From in-Circuit-Emulator 1 Target LSI (in SSOP pkg) glue This Work (180nm) 0.1 ’96 ’97 ’98 © T. Kuroda (2/42) ARCES ’99 ’00 ’01 ’02 Year (90nm) ’03 ’04 ’05 ’06 ’07 PCB On-chip inductors Flexible-Circuit-Board (FCB) Low-Power Design Needed HDTV Camcorder M SR A DR AM og l a n A C PU Mobile Phone Portable Game High Bandwidth Low Power SiP application: high performance and yet low power Example: H.264 video decoding for 1080HDTV Required bandwidth = 20Gb/s Decoder power = 100mW (C.C. Lin, ISSCC’06) Total IO power < 10mW IO energy dissipation < 0.5pJ/b (10mW / 20Gb/s) © T. Kuroda (3/42) Previous Works Capacitive Coupling 0.14pJ/b d=2µm A. Fazzi, (CICC’05) Inductive Coupling 2.8pJ/b d=20µm N. Miura, (ISSCC’06) 0.14pJ/b inductive-coupling transceiver Tx: digital pulse shaping, Rx: process scaling No performance degradation © T. Kuroda (4/42) Data Transceiver Circuit Txdata Txdat a IT - + Rxclk VR VB IT [mA] Txdata Txclk Pulse Generator ETX= 2.2pJ/b 1.8 V 0 1.8V VR [mV] Txclk 0 5 0 -5 50 0 Rxdata Rxclk -50 1.8V 0 1.8V ERX= C VDD2 = 0.6pJ/b Rxdata © T. Kuroda (5/42) Rxdata 0 0 2 Time [ns] 4 6 Energy Dissipation in Tx IT Pulse Generator SP τ 0 VR= MdIT/dt VP= 2 M IP / τ = 2 M SP Txdata VDD M + VR - τ ETX= VDD IP τ = VDD SP τ2 -VP Time © T. Kuroda (6/42) Txdata IT VP 0 Txclk ETX= VDD IP τ IP Energy Dissipation in Tx Txclk IT IP Pulse Generator SP τ τ/2 0 VR= MdIT/dt Txdata VDD IT VP M + VR - 0 ETX= VDD IP τ = VDD SP τ2 -VP Time © T. Kuroda (7/42) Txdata Bathtub Curve 10-3 @ 1Gb/s IT [mA] 4 10-6 180ps BER 0 VR [mV] 60 10-9 0 180ps -60 Timing Margin=150ps 10-12 © T. Kuroda (8/42) 250 Time 350 300 Sampling Timing [ps] 400 Inter-Channel Skew 10-3 @ 1Gb/s 10ps Skew in 64ch Array 10-6 BER 30µm 10-9 Timing Margin=150ps 10-12 © T. Kuroda (9/42) 250 350 300 Sampling Timing [ps] 400 Pulse Shaping Circuit Pulse Width Control (6bit) 4ps Step Txdata Pulse Slew Rate Control (4bit) 20w Txclk 4-Phase Clk 0º 45º 90º 135º 0º~45 º 135º PI PI 6bit Pulse 0º~45 135º º Pulse Txdata 24w Pulse Amplitude Control (5bit) 24w © T. Kuroda (10/42) 20w IT Tx Chip -+ Rxclk τ VR Rx Rxdata Rx Chip Simulated Waveforms Pulse Amplitude Control Pulse Width Control 1.2 60ps 0.4 Con Slew stant Rat e IT [mA] 0.8 1 0.5 0 0 80 60 40 20 0 -20 -40 -60 -80 80 60 40 20 0 -20 -40 -60 -80 0 VR [mV] VR [mV] IT [mA] 1.5 60ps 100 200 Time [ps] © T. Kuroda (11/42) 300 400 0 60ps 100 200 Time [ps] 300 400 Txclk 4-Phase Clk 0º 45º 90º 135º PI PI 6bit 0º~45 º Txdata 135º Txdata Tx Tx ITC IT Pulse Width Control Timing Control Tx Chip Sampling Timing Control 135º VRC VR 1bit Rx Rx 90º 6bit Rxdata Rxclk 45º -+ 0º 0º~135º © T. Kuroda (12/42) PI 4-Phase Clk PI -+ Clock Link Rx Chip Txclk ITC VB2 VRC [V] + VRCVB1 VSA VDD Rxclk © T. Kuroda (13/42) 1.8 Sclk 0 1 0 -1 0.1 0 -0.1 1.1 Rxclk Rxclk [V] VSA [V] Txclk ITC [mA] Txclk [V] Clock Transceiver 0.9 0.7 1.8 0 0 1 2 3 Time [ns] 4 5 Test Chip in 180nm CMOS Rx Chip Data Transceiver (1Gb/s) Tx Chip (10µm-Thick) Clock Transceiver (1GHz) 30µm 200µm © T. Kuroda (14/42) Clock Jitter Reduction 1GHz Rxclk Txclk Txclk ITC 100ps 4.8psrms ½ of [3] + VRCVB2 VB1 6 @ 1GHz VSA VDD Rxclk © T. Kuroda (15/42) 200mV Rxclk 4.8psrms Jitter Jitter [psrms] Clock Slew Rate (Sclk) Control 2psrms Jitter 5.6 5.2 4.8 6 8 10 Sclk [mV/ps] 12 Pulse Amplitude (VP) Control τ=60ps @ 1Gb/s 100 VP=20mV 10-3 BER 60m V 10-9 80m Time V τ 10-6 40m VR VP V 10-12 © T. Kuroda (16/42) 65 85 105 Sampling Timing [ps] Pulse Width (τ) Control VP=60mV @ 1Gb/s 100 Time BER τ 10-6 10-9 /b 53pJ s, 0. 120p VR VP b 3pJ/ =0.1 ps, E TX b τ=60 3pJ/ , 0.2 80ps /b 36pJ s, 0. 100p 10-3 25ps 10-12 © T. Kuroda (17/42) 20 40 60 80 100 Sampling Timing [ps] 120 Supply Noise Immunity ETX=0.13pJ/b @ 1Gb/s 10-3 C ha ng e Supply Noise (1GHz Random Load Change) 1G H z ha ng e Lo ad 350mV 50 kH z BER 50ns VDD Rx Tx Chip Chip Rx Load Probe Tx Load 10-9 Lo ad C 10-6 Board 10-12 0 © T. Kuroda (18/42) 100 200 300 400 500 Supply Noise [mV-peak-to-peak] 600 Test Chip in 90nm CMOS Tx Chip (10µm-Thick) Metal Inductor P=30µm 3x3 Channel Array Rx Chip (750µm-Thick) © T. Kuroda (19/42) Bathtub Curve in 90nm CMOS 100 @ 1Gb/s τ=60ps, ETX=0.11pJ/b, ERX=0.03pJ/b BER 10-3 10-6 10-9 Timing Margin=30ps 10-12 © T. Kuroda (20/42) -40 -30 -20 -10 0 10 Sampling Timing [ps] 20 Performance Summary This Work Previous Work Energy Dissipation in Tx/Rx, ETOTAL 0.14pJ/b 0.33pJ/b 2.8pJ/b Energy Dissipation in Tx, ETX 0.11pJ/b 0.13pJ/b 2.2pJ/b Energy Dissipation in Rx, ERX 0.03pJ/b 0.2pJ/b 0.6pJ/b Process 90nm CMOS (VDD=1V) 180nm CMOS (VDD=1.8V) Data Rate 1Gb/s Bit Error Rate <10-12 Clock Rate 1GHz Channel Area 30µm x 30µm Distance 15µm © T. Kuroda (21/42) World Lowest Energy (0.14pJ/b) 1000 Energy Dissipation [pJ/b] Toshiba (350nm) HDTV H.264/AVC (23.1Gb/s) NEC (250nm) NTT (250nm) TI (180nm) 100 Intel (180nm) NEC (130nm) NEC (130nm) Rambus (90nm) Keio (350nm) Hitachi (250nm) 10 Wire Bonding 200mW Fujitsu (90nm) TeraChip (130nm) Sun (350nm) [2]Keio Keio (180nm) (250nm) µ-bump 20mW w/ interposer [1]SFT 1 (180nm) This Work (180nm) Inductive This Work (90nm) 0.1 ’96 ’97 ’98 ’99 ’00 ’01 ’02 ’03 ’04 ’05 ’06 2mW ’07 Year [20.2] “A 0.14pJ/b Inductive-Coupling Inter-Chip Data Transceiver with Digitally-Controlled Precise Pulse Shaping” [16] ISSCC’07, Keio Univ. © T. Kuroda (22/42) Summary: Energy Reduction ERX = CV 2 scale as CMOS gate 2.8pJ/bit Q = CV E = QV T pulse ETX = QV = Rx.data 0.6pJ/b ∂I ∂t 2 Idt ⋅ V ∝ V ⋅ T pulse ∫ = const. iT max Tx.data 2.2pJ/b 0.14pJ/bit 180nm 1.8V 90nm 1.0V 0.03pJ/b 0.11pJ/b vR Tpulse t t shorten pulse width (timing issue) lower voltage © T. Kuroda (23/42) ISSCC2007 Energy dissipation reduction to 0.14pJ/b Pulse shaping Extension of communications range to 1.2mm for through-package link Preamplifier and offset cancellation Energy Dissipation (Power/Bandwidth) [pJ/b] 1000 Toshiba Probe IC on FCB NEC NTT TI 100 Intel NEC Inductors in FCB NEC Keio Rambus Hitachi 10 CLK Target LSI on PCB Fujitsu TeraChip Sun TX RX Keio Probe IC (transceiver) Keio NC State Inductors in FCB SFT To/From in-Circuit-Emulator 1 Target LSI (in SSOP pkg) glue This Work (180nm) 0.1 ’96 ’97 ’98 © T. Kuroda (24/42) ARCES ’99 ’00 ’01 ’02 Year (90nm) ’03 ’04 ’05 ’06 ’07 PCB On-chip inductors Flexible-Circuit-Board (FCB) Background Pulse-based inductive-coupling technique High-speed, low-power, and low-cost chip to chip communication in a SiP (BW > 1Tbps) z Communication range: 10µm – 100µm (ref. Miura et. al., ISSCC2006, 23.4) New applications opened up by extension of communication distance to a millimeter range Detachable high-speed wireless interfaces for Real-time on-chip bus monitor High-speed memory access Durable contactless connector, etc © T. Kuroda (25/42) Target of This Study 9 Wireless logic probing through LSI package for firmware debugging Merits Down sizing and cost reduction by elimination of package test pins and PCB pattern for debugging Flexibility enhancement by detachable interface Security improvement by elimination of easily accessible test pins Electrical isolation by removal of contacts © T. Kuroda (26/42) System Overview PC Probe IC (Amp. etc.) Target Probe (FCB) µ-controller LSI USB PCB Debugger Probe (Flexible-Circuit-Board) Enlarged wireless interface Probe IC Inductive-coupling Target µ-controller LSI © T. Kuroda (27/42) Inductors Bus Probing for Debugging Probe IC on FCB CLK Inductors in FCB Target LSI on PCB TX RX Probe IC (transceiver) Inductors in FCB Target LSI (in SSOP pkg) To/From in-Circuit-Emulator glue Flexible-Circuit-Board (FCB) PCB On-chip inductors [18] ISSCC’07, Keio Univ. [20.3] “An Attachable Wireless Chip-Access Interface for Arbitrary Data Rate Using Pulse-Based Inductive-Coupling through LSI Package” © T. Kuroda (28/42) Die Photograph CLK • Technology 0.25µm CMOS Standard digital process with embedded flash ROM 3 layer AL • Power supply MCU core and transceiver:2.5V •Die size 10.1mm2 MCU core TX © T. Kuroda (29/42) RX Block Diagram © T. Kuroda (30/42) Signaling © T. Kuroda (31/42) Tradeoff by Inductor Size Inductor size Large Small Cost High Low Self resonant frequency Low High Must be long Can be short Attainable data rate Low High Communication distance Long Short TX/RX inductor alignment Easy Difficult Pulse width © T. Kuroda (32/42) Attainable Communication Distance Detectable level Coupling Coefficient 1 D=10µm D=100µm 10-2 10-4 10-6 D=1mm Targe t Only Comp. Amp. + Comp. (30dB) D X Noise floor 10-8 1mm 10mm 10µm 100µm Communication Distance X © T. Kuroda (33/42) Data Receiver • Pre-amplifier for high sensitivity • DAC for offset cancellation • Delay line for decision timing adjustment © T. Kuroda (34/42) Interference Problem Switching noise from digital circuits and I/O buffers (mainly just after the clock edge) Noise coupling to the receiver via substrate, power lines, ground lines and bonding wires Malfunction of the asynchronous clock receiver Our Solution To de-sensitize the clock receiver after the clock transition © T. Kuroda (35/42) Clock Receiver © T. Kuroda (36/42) Experimental Setup Development Kit (target µ-controller) Wireless probe Debugger Reference © T. Kuroda (37/42) Measured Clock and Data Waveforms Clock Transmitted (upper) Received (lower) © T. Kuroda (38/42) Data Transmitted (upper) Received (lower) Received Pulse Waveform Signal amplitude (DAC input value) 30 20 10 0 -10 -20 -30 0 0.5 1 1.5 2 2.5 time (nsec) © T. Kuroda (39/42) 3 3.5 4 Alignment Tolerance 1.E+00 1 Vertical distance: 1.2mm 1.E-01 -2 1.E-02 10 1.E-03 BER -4 1.E-04 10 1.E-05 1.E-06 10-6 1.E-07 1.E-08 10-8 1.E-09 1.E-10 10-10 -1 © T. Kuroda (40/42) -0.5 0 0.5 1 Horizontal alignment error (mm) Chip Specification Technology Chip size Supply voltage LSI: 0.25µm CMOS, 3-layer metal Probe: 2-layer metal FCB 2.4mm x 4.2mm MCU core and transceiver : 2.5V I/O : 3.3V - 5.0V Data rate 20 Mbps (full-duplex) Communication distance 1.2 mm (@ BER < 10-10 ) Alignment tolerance 0.5 mm (@ BER < 10-10 ) Power dissipation (@20Mbps) CLK: TX 14.3 mW, RX 10.4 mW © T. Kuroda (41/42) DATA: TX 0.5 mW, RX 8.1 mW Summary: Range Extension Wireless chip access interface through LSI package was realized for firmware debugging. Preamplifier and offset cancellation DAC in the receiver extend the communication range to 1.2mm with enough alignment tolerance. De-glitch circuit enables the reliable clock transmission even in the presence of interference. The interface achieved 20Mbps and has the potential data rate of up to 500Mbps/ch. © T. Kuroda (42/42)