Werden sich Chips in Zukunft selbst heilen? - Eikon eV
Transcription
Werden sich Chips in Zukunft selbst heilen? - Eikon eV
Werden sich Chips in Zukunft selbst heilen? Einsatz von Autonomic Computing Prinzipien in künftigen System on Chip Architekturen A. Herkersdorf Lehrstuhl für Integrierte Systeme TU München www.lis.ei.tum.de Outline Integrated Systems (R)Evolution Moore’s Law continues and enables SoCs of ever more increasing complexity, heterogeneity Conservative worst-case design approach no longer feasible when reaching physical CMOS limits Autonomic ASoC Design Concept – Conquer Complexity with Complexity ASoC Architecture Platform ASoC Research Challenges Acknowledgements: Conclusions This is joint ongoing work with W. Rosenstiel, Uni Tübingen Autonomic SoC - 2 © Lehrstuhl für Integrierte Systeme 1 Source: ITRS Microprocessor complexity Lmin 109 108 106 105 104 Pentium 4 107 103 Pentium Pentium III 80486 Pentium II 80386 106 100 80286 8086 105 8008 104 4004 103 10 1 Designer Productivity 0.1 10 5 2 1 0.5 0.2 0.1 0.05 Min.transistor channel length (µm) 1011 1010 Designer Productivity (K trans. / PM) Proposition: 1968 1972 1976 1980 1984 1988 1992 1996 2000 2004 2008 2012 Year Challenges originating from CMOS scaling: New conceptual approach in IC design method necessary Incorporating autonomic principles into SoC architecture and design tools Increasing fabrication defects and Coping with billion transistor design sensitivity for stochastic events from reaching CMOS physical limits Increasing dynamic and static power dissipation densities complexities at all abstraction levels “Productivity gap” increasing engineering costs and/or development time cycles © Lehrstuhl für Integrierte Systeme Autonomic SoC - 3 Last Year’s EIKON Seminar Aktuelle Herausforderungen an die EDA 1,000 10,000 Transistors / Staff Month. 1,000 100 58%/Yr. compounded Complexity growth rate 10 100 10 1 x 0.1 xx 0.01 xx x x 1 21%/Yr. compound Productivity growth rate x 0.1 0.01 2007 2009 2005 2001 2003 1999 1997 1995 1993 1989 1987 1985 1983 0.001 1981 Complexity Systemkomplexität Siliziumkomplexität Fertigungsschwankungen Sinkende Zuverlässigkeit … Transistor per Chip (106) Transistors / Chip Productivity (K) Transistors / Staff - Mo. 100,000 10,000 1991 Chipcapacity (transistors per chip) Integrated Systems (R)Evolution Source: U. Schlichtmann, “Das Zeitalter der Entwurfsautomatisierung für Integrierte Schaltungen”, EIKON Mitgliederversammlung, 3.02.2004 Autonomic SoC - 4 © Lehrstuhl für Integrierte Systeme 2 Last Year’s EIKON Seminar Aktuelle Herausforderungen an die EDA 10,000 Transistors / Staff Month. 1,000 100 58%/Yr. compounded Complexity growth rate 10 10 1 0.1 xx 0.01 xx x x x 1 21%/Yr. compound Productivity growth rate x 0.1 0.01 Kritischer Pfad 100% 2007 2009 2005 2001 2003 1999 1997 1995 1993 1991 1989 1987 1985 1983 1981 0.001 • Entwurfskosten 100 Productivity (K) Transistors / Staff - Mo. 100,000 Transistors / Chip 1,000 Systemkomplexität Nominale Bibliothekscharakterisierung Siliziumkomplexität NOM Fertigungsschwankungen Fertigungsschwankungen Sinkende Zuverlässigkeit Abwägung möglich zwischen • Produkteigenschaften Korrelationen … • Fertigungskosten Zellbibliothek Complexity Transistorcharakterisierung Transistor per Chip (106) 10,000 ASIC-Design morgen (Statistisches Design) Source: U. Schlichtmann, “Das Zeitalter der für Integrierte Schaltungen”, EIKON Mitgliederversammlung, f 3.02.2004 Parametrische Ausbeute Entwurfsautomatisierung tcrit CLK Autonomic SoC - 5 © Lehrstuhl für Integrierte Systeme Autonomic Computing – IBM Manifesto Server Trends: Inter-networked computer systems witness ever increasing complexity and diversity Millions of lines of code Heterogeneous OS and application SW stacks #1 I/T Challenge: Dealing with the complexity of I/T systems originating as a consequence of technological progress How Does Nature Deal with Complexity? Example: human organism and its autonomic nervous system Conquer Complexity with Complexity… Sub-conscious selfcontrol, -management, -organisation, -healing capabilities and learn to work around imperfections Figure source: Encarta Encyclopedia © Microsoft Corporation Autonomic SoC - 6 © Lehrstuhl für Integrierte Systeme 3 Autonomic System on Chip (ASoC) Systems are defined at different levels: Compute Servers, Internet Routers “Box level” Chip-level multi-processors, platformbased mixed-signal SoCs “Chip level” Why to embed “autonomic” concepts into chip HW layer? Holistic approach: SoC CMOS technology is basis of higher-order IT systems Investigate and establish HW “hooks” for higher layer autonomic software Nanosecond control loop cycle periods don’t allow for SW-in-theloop HW defects may prevent SW diagnosis activation Autonomic SoC - 7 © Lehrstuhl für Integrierte Systeme Autonomic SoC – Future Perspectives CPU stand-by SRAM CPU Network i/f CPU su B HW accel. SDRAM Cntrl. Dealing with defects: Continuous, SoC component- specific diagnosis determines and disables faulty: (sub-) building blocks, interconnect resources, memory regions Active/Stand-by selection among redundant IP functions Determines minimum feature set for current system operation Peripherals Autonomic SoC - 8 © Lehrstuhl für Integrierte Systeme 4 Autonomic SoC – Future Perspective CPU stand-by SRAM su B Performance/Power Optimization: fF CPU vV When work load increases, raw bus capacity is increased, critical transactions among bus HW accel. entities are prioritized, voltage and frequency of CPU are raised, secondary interfaces are temp. SDRAM Cntrl. deactivated, redundant resources may be activated for load balancing, Network i/f reconfigurable HW accelerators Peripherals are activated on demand to facilitate CPU off-load Autonomic SoC - 9 © Lehrstuhl für Integrierte Systeme Existing Techniques Reusable for ASoC DVFS: dynamic voltage and frequency scaling BIST: built-in-self test On-Chip debugging aids: ChipScope, RISCWatch Reconfigurable computing platforms General area of fault tolerant systems However, ... either not yet widely used, or mostly in isolation Have to be made syntactically compatible and semantically coupled Standardization in industry forum required Autonomic SoC - 10 © Lehrstuhl für Integrierte Systeme 5 ASoC Architecture Platform FUNCTIONAL layer Evolutionary approach with compatibility to SoC design method CPU CPU Bus RAM PU Rededicate part of SoC capacity to self-x surveillance and control Autonomic Network i/f Element functions in Autonomic SoC layer Autonomic Element (AE) library monitor Closed HW control loop between evaluator FE and AE element actuator communication i/f monitor, evaluate, memorize, AUTONOMIC layer (communicate), execute Autonomic SW functions have Maintaining state info on PU history access to AE parameters in AE elements can be exploited for emergent system behaviour Autonomic SoC - 11 © Lehrstuhl für Integrierte Systeme ASoC Architecture: Planned Activities CPU: Extension of DIVA/RAZOR concepts Functional and runtime errors Dynamic switch-over between parallel EXE pipelines Demonstrate with public domain soft core (e.g. LEON2) Memories: On the fly interleaved rd/wr AE Interconnect: Shared Buffer-Insertion Ring Native broadcast support Simultaneous p2p connections Simple comm. i/f Distributed AE Control: accesses (w/o parity, ECC) to monitor memory locations Detect / repair sporadic and persistent bit defects Exclude defective locations from future accesses by addr. translation Couple network / system i/f event Autonomic SoC - 12 rate with DVFS and on-chip bus data path width & frequency modulation © Lehrstuhl für Integrierte Systeme 6 ASoC Research Challenges New Architectures, Design Methods and Tools Dealing with graceful degradation and redundancy in distributed SoCs with changing structure Interleaving and/or co-existence of functional and autonomic layers Validation techniques for partially verified interdependent macros Concepts for dynamic and coupled power/performance mgmt. Methods for flexible and dynamic HW/SW repartitioning Defective HW is replaced on demand by equivalent SW process Dynamically loaded HW configuration replaces lowerperforming SW process Autonomic SoC - 13 © Lehrstuhl für Integrierte Systeme Conclusions Reliable design of complex SoCs is the key challenge for nanoelectronics when CMOS technology approaches its physical limits Application of autonomic or organic computing principles has been proposed to tackle this challenge Requires conceptual change in SoC design process, architecture enhancements, and tools support Requires innovative contributions at all levels of abstraction: technology, device, gate, circuit, system, software Thanks! Autonomic SoC - 14 © Lehrstuhl für Integrierte Systeme 7