HPEC 2010 - Lightwave Research Laboratory

Transcription

HPEC 2010 - Lightwave Research Laboratory
Enabling High Performance Embedded
Computing through Memory Access via
Photonic Interconnects
Gilbert Hendry
Eric Robinson
Vitaliy Gleyzer
Johnnie Chan
Luca P. Carloni
Nadya Bliss
Keren Bergman
* This work is sponsored by the Defense Advanced Research Projects Agency (DARPA) under Air Force contract FA8721-05-C-0002. Opinions,
interpretations, conclusions, and recommendations are those of the author and are not necessarily endorsed by the United States Government.
MIT Lincoln Laboratory
HPEC 2010 - 1
Hendry, et al. 9/17/2010
Columbia University
Silicon Photonics:
Advantages and Disadvantages
Advantages
High bit rates (40 Gb/s demonstrated)
Speed of light latency
Distance not a factor (power, latency)
Bandwidth density through WDM
Disadvantages
Unsolved challenges
(stability, packaging, cost)
MIT Lincoln Laboratory
HPEC 2010 - 2
Hendry, et al. 9/17/2010
Photonic Interconnects hold
potential for on-chip computing.
However, the target applications
must be considered to determine if
photonics will be beneficial for them
Columbia University
Embedded Computing:
ISR Applications
Image
Registration
SAR Image
Formation
Where is the image in
relation to other images
already taken?
How many pulses can
feasibly be combined and
what size of an image
can we take?
Image
Sharpening
Can image fidelity be
improved through using
additional information or
multiple pictures?
MIT Lincoln Laboratory
HPEC 2010 - 3
Hendry, et al. 9/17/2010
Columbia University
Image Registration
Image Registration Involves:
• Image Orientation and Scaling
• Image Alignment
Produces an image that “fits”
properly with other registered
images to get a global view of the
area.
MIT Lincoln Laboratory
HPEC 2010 - 4
Hendry, et al. 9/17/2010
Columbia University
Image Sharpening
Image Fusion:
Filtering:
Fuses two low resolution
images to form a high
resolution result.
Enhances image fidelity by
combining filters with the
original image (Bicubic,
Bilinear, Halfband...)
MIT Lincoln Laboratory
HPEC 2010 - 5
Hendry, et al. 9/17/2010
Columbia University
SAR Image Formation
Synthetic Aperture Radar (SAR) is
an imaging technique that uses
RADAR pulses rather than
photography
SAR Processing:
• Image formation nontrivial,
requires combining pulses
• The more pulses that can
be processed, the higher the
image resolution
• SAR can operate in
conditions where traditional
photography fails (low light,
cloud cover)
MIT Lincoln Laboratory
HPEC 2010 - 6
Hendry, et al. 9/17/2010
Columbia University
ISR Application Kernels
ISR Kernels:
Matrix
Multiply
Fourier
Transform
• Matrix Multiply,
Projective Transform,
Fourier Transform
• Used in a broad range
of ISR applications
• Typically a
performance bottleneck
• Demand high
throughput from the
memory and network
modules
Projective
Transform
MIT Lincoln Laboratory
HPEC 2010 - 7
Hendry, et al. 9/17/2010
Columbia University
Characteristics of ISR Applications
ISR Applications Ideal Candidates
for Photonic Interconnects
• Large Memory Access Size
• Low Power Requirements
• High Memory Access to Compute Ratio
• High Throughput Requirements
MIT Lincoln Laboratory
HPEC 2010 - 8
Hendry, et al. 9/17/2010
Columbia University
Silicon Photonic
Waveguide Technology
1.28 Tb/s Data Transmission Experiment
(occupies small slice of available WG BW)
100 ps
[Vlasov and McNab, Optics Express 12 (8) 1622 (2004)]
before injection
into waveguide
after 5-cm waveguide
and EDFA
C28
C23
C46
C51
(1559 nm) (1555 nm) (1541 nm) (1537 nm)
[B. G. Lee et al., Photon. Technol. Lett. 20 (10) 767 (2008)]
MIT Lincoln Laboratory
HPEC 2010 - 9
Hendry, et al. 9/17/2010
Low-loss (1.7 dB/cm),
high-bandwidth (> 200
nm) silicon photonic
waveguides can be
fabricated in
commercial CMOS
process.
Columbia University
Silicon Photonic Modulator
and Detector Technology
(CW)
LASER
modulator
detector
Ge-on-Si Detectors:
 40-GHz bandwidths
 1 A/W responsivities
Receivers (detectors w/ CMOS
amplifiers):
 1.1 pJ/bit demonstrated at 10 Gb/s
 Scalable to < 50 fJ/bit
[M Lipson, Optics Express (2007)]


85 fJ/bit demonstrated at 10 Gb/s
Scalable to < 25 fJ/bit
MIT Lincoln Laboratory
HPEC 2010 - 10
Hendry, et al. 9/17/2010
[M Watts, Group Four Photonics (2008)]
Columbia University
[S Koester, J. Lightw. Technol. (2007)]
Ring Resonators
• Modulator/filter

Broadband
λ
MIT Lincoln Laboratory
HPEC 2010 - 11
Hendry, et al. 9/17/2010
λ
Columbia University
Higher Order Switch Designs
MIT Lincoln Laboratory
HPEC 2010 - 12
Hendry, et al. 9/17/2010
[A. Biberman, IEEE Phot. Tech. Letters (2010)]
Columbia University
Two Network Designs
•
•
Purely circuit-switched
TDM-arbitrated
MIT Lincoln Laboratory
HPEC 2010 - 13
Hendry, et al. 9/17/2010
Columbia University
Circuit-switched P-NoCs
Electronic
p- Control n-region
region
1V
Ohmic Heater
Thermal
Control
0V
0V
Transmission
1V
Off-resonance
profile
On-resonance
profile
MIT Lincoln Laboratory
HPEC 2010 - 14
Hendry, et al. 9/17/2010

Injected
Wavelengths
Columbia University
Peripheral Memory Access
Processor Core
Network Router
Memory Access Point
MIT Lincoln Laboratory
HPEC 2010 - 15
Hendry, et al. 9/17/2010
Columbia University
Memory Access Point
On Chip
Chip
Boundary
From Memory Module
Data
plane
To/From
Network-on-Chip
To Memory Module
Modulators
Off Chip
Control
plane
Memory
Control
[V. R. Almeida et al. Cornell]
MIT Lincoln Laboratory
HPEC 2010 - 16
Hendry, et al. 9/17/2010
Columbia University
Two Network Designs
•
•
Purely circuit-switched
TDM-arbitrated
MIT Lincoln Laboratory
HPEC 2010 - 17
Hendry, et al. 9/17/2010
Columbia University
Photonic TDM Network
• Mesh topology
• Distributed switch control
• Single dimension transmission
• Controlled by fixed time slots :
MIT Lincoln Laboratory
HPEC 2010 - 18
Hendry, et al. 9/17/2010
Columbia University
Vertical Memory Access
Vertical Coupler
[J. Schrauwen et al. U of Ghent.]
MIT Lincoln Laboratory
HPEC 2010 - 19
Hendry, et al. 9/17/2010
Columbia University
SDRAM DIMM Anatomy
DRAM_Bank
DRAM_Chip
data
IO
Cntrl
Banks
(usually 8)
Row
addr/en
Col
Decoder
Row
Decoder
Col
addr/en
data
Sense
Amps
DRAM cell arrays
Addr/c
ntrl
Ranks
SDRAM device
DRAM_DIMM
MIT Lincoln Laboratory
HPEC 2010 - 20
Hendry, et al. 9/17/2010
Columbia University
Optical Circuit Memory (OCM)
Anatomy
Mux Chip
Laser In
drivers
To Mux
Chip
DRAM Chip
Addr/
cntrl
IO
Cntrl
Bank
data
From
Mux
Chip
IO
Gating
Rx Dec.
AWG
AWG
AWG
AWG
AWG
AWG
AWG
AWG
AWG
AWG
Addr/cntrl
Laser Source
Waveguide
MIT Lincoln Laboratory
HPEC 2010 - 21
Hendry, et al. 9/17/2010
Waveguide Coupling
VDD, Gnd
Columbia University
Results: Circuit Switched
Application Performance
Performance
47.3
Performance (GOPS)
50
40
31.82
27.8 26.51
30
17.76
20
10
1.04
0.78
4.74
1.75
13.48
4.32
3.12
0
Emesh
EmeshCS
PS-1
PS-2
Network Type
Projective Transform
Matrix Multiply
FFT
EmeshCS yields the best performance, but PS-1 and PS-2 are competitive
MIT Lincoln Laboratory
HPEC 2010 - 22
Hendry, et al. 9/17/2010
Columbia University
Results: Circuit Switched
Power
Network Power
19
20
Power (W)
15.8
15
11.2
11.1
11.4
11.2
10
4.37
5
4.35
4.28
2.21
2.17
2.15
0
Emesh
EmeshCS
PS-1
PS-2
Network Type
Projective Transform
Matrix Multiply
FFT
PS-1 and PS-2 use much less power than electronic alternatives
MIT Lincoln Laboratory
HPEC 2010 - 23
Hendry, et al. 9/17/2010
Columbia University
Results: Circuit Switched
Performance/Watt Comparison
Performance per Watt Improvement
Improvement Factor
100
86.7 89.33
87.64
80
68.6
60
40
26.9 29.01
20
1
1
9.67
6.72
2.82
1
0
Emesh
EmeshCS
PS-1
PS-2
Network Type
Projective Transform
Matrix Multiply
FFT
PS-1 and PS-2 give the best performance per unit of power
MIT Lincoln Laboratory
HPEC 2010 - 24
Hendry, et al. 9/17/2010
Columbia University
Results: Circuit Switched
Power Budget Breakdown
Projective Transform
Electronic
components
dominate the power
of all the systems in
question
PS-1 and PS-2 both
dominated by Electronic
Crossbar
The Electronic Crossbar requires a
significant amount of power. However, in
the Electronic Mesh, the Electronic
Buffers dominate the energy
consumption
MIT Lincoln Laboratory
HPEC 2010 - 25
Hendry, et al. 9/17/2010
Emesh dominated by
Electronic Buffer
EmeshCS dominated by
Crossbar and Electronic
Wire
Columbia University
Results: TDM
Projective Transform
30
15.49
16.02
11.22
40
30
10
20
5
10
0
0
Emesh
Pmesh
P-TDM P-ETDM
Network Type
51.04
50
GOPS
Power(W)
20
15
60
23.97
25
Performance per Watt
Improvement
Performance
20.87
7.55
1.11
Improvement Factor
Network Power
25
22x
20
13x
15
10
5x
5
0
Emesh Pmesh P-TDM P-ETDM
Network Type
Emesh
Pmesh
P-TDM
P-ETDM
Network Type
TDM Results:
• Performed on a smaller image (256x256)
• Yields the best performance when
packets can be sent in a single time slice
• Constant setup cost means smaller
packages can be sent with less overhead
TDM yields advantages when message sizes are smaller
MIT Lincoln Laboratory
HPEC 2010 - 26
Hendry, et al. 9/17/2010
Columbia University
Conclusions
•
ISR front-end application performance is
of increasing importance in the
community
•
These applications put large demands on
the memory and network subsystems
•
Photonics offers a low-powered approach
to meeting these performance demands
•
Photonics-specific network architecture
design important here
MIT Lincoln Laboratory
HPEC 2010 - 27
Hendry, et al. 9/17/2010
Columbia University
Fin
•
Sponsored by DARPA MTO (PhotoMAN)
For the full details on these photonic architectures, see our other
publications in the Journal of Parallel and Distributed Computing
(JPDC) 2011 and Supercomputing (SC) 2010
MIT Lincoln Laboratory
HPEC 2010 - 28
Hendry, et al. 9/17/2010
Columbia University