What Is a Compiler When The Architecture Is Not Hard(ware) ?
Transcription
What Is a Compiler When The Architecture Is Not Hard(ware) ?
What Is a Compiler When The Architecture Is Not Hard(ware) ? Krishna V Palem This work was supported in part by awards from Hewlett-Packard Corporation, IBM, Panasonic AVC, and by DARPA under Contract No. DABT63-96-C-0049 and Grant No. 25-74100-F0944. Portions of this presentation were given by the speaker as a keynote at the ACM LCTES2001 and as an invited speaker at EMSOFT01 UCB November 8, 2001 Krishna V Palem Georgia Tech 2 Embedded Computing Why ? What ? How ? UCB – November 8, 2001 Krishna V Palem Georgia Tech The Nature of Embedded Systems Visible Computing View is of end application (Hidden computing element) UCB – November 8, 2001 Krishna V Palem Georgia Tech 3 4 Favorable Trends Supported by Moore’s (second) law – Computing power doubles every eighteen months Corollary: cost per unit of computing halves every eighteen months From hundreds of millions to billions of units Projected by market research firms (VDC) to be a 50 billion+ space over the next five years High volume, relatively low per unit $ margin UCB – November 8, 2001 Krishna V Palem Georgia Tech 5 Embedded Systems Desiderata Low Power - High battery life Small size or footprint Real-time constraints Performance comparable to or surpassing leading edge COTS technology UCB – November 8, 2001 Rapid time-to-market Krishna V Palem Georgia Tech 6 Timing Example Video-On-Demand Predictable Timing Behavior UCB – November 8, 2001 Unpredictable Timing Behavior Krishna V Palem Georgia Tech 7 Current Art Vertical application domains Telecom Industrial Automation Select computational kernels To ASIC Meet desiderata while overcoming NRE cost hurdles through volume High migration inertia across applications Long time to market UCB – November 8, 2001 Krishna V Palem Georgia Tech 8 Subtle but Sure Hurdles For Moore’s corollary to be true – Non-recurring engineering (NRE) cost must be amortized over high-volume – Else prohibitively high per unit costs Implies “uniform designs” over large workload classes – (Eg). Numerical, integer, signal processing Demands of embedded systems – – – – “Non uniform” or application specific designs Per application volume might not be high High NRE costs infeasible cost/unit Time to market pressure UCB – November 8, 2001 Krishna V Palem Georgia Tech The Embedded Systems Challenge Multiple application domains 3D Industrial Graphics Automation Medtronics E-Textiles Telecom Rapidly changing application kernels in moderate volume Custom computing solution meeting constraints Sustain Moore’s corollary – Keep NRE costs down UCB – November 8, 2001 Krishna V Palem Georgia Tech 9 10 Responding Via Automation Multiple application domains 3D Industrial Graphics Automation Medtronics E-Textiles Telecom Rapidly changing application kernels in low volume Power Size Automatic Tools Timing Application specific design UCB – November 8, 2001 Krishna V Palem Georgia Tech 11 Three Active Approaches Custom microprocessors Architecture exploration and synthesis Architecture assembly for reconfigurable computing UCB – November 8, 2001 Krishna V Palem Georgia Tech Custom Processor Implementation Application analysis Application Proprietary ISA, Fabricate Architecture Processor Specification Proprietary Tools Language with custom extensions Compiler Binary Custom Processor implementation Proprietary ISA Time Intensive High performance implementation Customized in silicon for particular application domain O(months) of design time Once designed, programmable like standard processors Tensilica, HP-ST Microelectronics approach UCB – November 8, 2001 12 Krishna V Palem Georgia Tech Architecture Exploration and Synthesis The PICO Vision Program In Automatic synthesis of application specific parallel / VLIW ULSI microprocessors And their compilers for embedded computing Chip Out “Computer design for the masses” “A custom system architecture in 1 week tape-out in 4 weeks” B Ramakrishna Rau “The Era of Embedded Computing”, Invited talk, CASES 2000. UCB – November 8, 2001 Krishna V Palem Georgia Tech 13 14 Custom Microprocessors Application(s) define workload Optimizing Compiler Microprocessor (eg) Itanium + UCB – November 8, 2001 Analyze Define Compiler Optimizations Define ISA extension (eg) IA 64+ Design Implementation Krishna V Palem Georgia Tech 15 Application Specific Design Applications Single Application Analyze Program Analysis Extended EPIC Compiler Technology Explore and Synthesize implementations Library of possible implementations (Bypass ISA) VLIW Core + Non programmable extension Application specific processor runs single application UCB – November 8, 2001 Krishna V Palem Georgia Tech The Compiler Optimization Trajectory Compiler Hardware Frontend and Optimizer Superscalar Determine Dependences Dataflow Determine Dependences Determine Independences Indep. Arch. Determine Independences Bind Operations to Function Units VLIW Bind Operations to Function Units Bind Transports to Busses Bind Transports to Busses TTA Execute B. Ramakrishna Rau and Joseph A. Fisher. Instruction-level parallel: History overview, and perspective. The Journal of Supercomputing, 7(1-2):9-50, May 1993. UCB – November 8, 2001 Krishna V Palem Georgia Tech 16 What Is the Compiler’s Target ``ISA’’? Compiler Frontend and Optimizer Determine Dependences Determine Independences Hardware Superscalar Dataflow Indep. Arch. Bind Operations to Function Units VLIW Bind Transports to Busses Determine Dependences Determine Independences Bind Operations to Function Units Bind Transports to Busses TTA Execute B. Ramakrishna Rau and Joseph A. Fisher. Instruction-level parallel: History overview, and perspective. The Journal of Supercomputing, 7(1-2):9-50, May 1993. UCB – November 8, 2001 Target is a range of architectures and their building blocks Compiler reaches into a constrained space of silicon Explores architectural implementations O(days – weeks) of design time Exploration sensitive to application specific hardware modules Fixed function silicon is the result Verification NRE costs still there One approach to overcoming time to market Krishna V Palem Georgia Tech 17 18 Choices of Silicon High level design/synthesis (EDIF) Netlist Fixed silicon implementation Standard cell design etc. UCB – November 8, 2001 Emulated i.e. Reconfigurable target Krishna V Palem Georgia Tech 19 Reconfigurable Computing UCB – November 8, 2001 Krishna V Palem Georgia Tech FPGAs As an Alternative Choice for Customization Frequent (re)configuration and hence frequent recustomization Fabrication process is steadily improving Gate densities are going up Performance levels are acceptable Amortize large NRE investments by using COTS platform UCB – November 8, 2001 Krishna V Palem Georgia Tech 20 21 Major Motivations Poor compilation times – Lack of correspondence between standard IR and final configurations – Place and route inherently complex Would like to have compile times of the order of seconds to minutes Provide customization of data-path by automatic analysis and optimization in software UCB – November 8, 2001 Krishna V Palem Georgia Tech 22 Adaptive EPIC Adaptive Explicitly Parallel Instruction Computing Krishna V. Palem and Surendranath Talla, Courant Institute of Mathematical Sciences; Patrick W. Devaney, Panasonic AVC American Laboratories Inc. Proceedings of the 4th Australasian Computer Architecture Conference, Auckland, NZ. January 1999 Adaptive Explicitly Parallel Instruction Computing Surendranath Talla Department of Computer Science, New York University PhD Thesis, May 2001 Janet Fabri award for outstanding dissertation. UCB – November 8, 2001 Krishna V Palem Georgia Tech Compiler-Processor Interface ISA Source program format Compiler ADD semantics LD format semantics Registers Exceptions.. Executable UCB – November 8, 2001 Krishna V Palem Georgia Tech 23 Redefining Processor-Compiler Interface Source program ISA format mysub semantics myxor format semantics Compiler Executable Let compiler determine the instruction sets (and their realization on chip) UCB – November 8, 2001 Krishna V Palem Georgia Tech 24 25 EPIC execution model Record of execution Functional units ILP UCB – November 8, 2001 Krishna V Palem Georgia Tech 26 Adaptive EPIC execution model Record of execution ILP-1 Reconfigure datapath Configured Functional units ILP-2 UCB – November 8, 2001 Krishna V Palem Georgia Tech 27 What can be efficiently compiled for today? FPGA Complex ASIC DPGA RAW RaPiD CVH Parallelism TRACE (Multiscalar) GARP MultiChip Adaptive EPIC SMT SuperSpeculative SuperScalar Dataflow Early x86 Simple ASIC 0 4 16 EPIC/VLIW TTA VECTOR Simple Pipelined/ Embedded 32 64 128-512 1K-10K 100K-1M >1M Approximate instruction packet size UCB – November 8, 2001 Krishna V Palem Georgia Tech 28 Adaptive EPIC AEPIC = EPIC processing + Datapath reconfiguration Reconfiguration through three basic operations: – add a functional unit to the datapath – remove a functional unit from the datapath – execute an operation on resident functional unit RLA F1 UCB – November 8, 2001 F2 Krishna V Palem Georgia Tech 29 AEPIC Machine Organization I-Cache F/D Configuration cache CRF GPR/FPR/… Level-1 configuration cache Multi-context Reconfigurable Logic Array Array Register File L1 L2 UCB – November 8, 2001 Krishna V Palem Georgia Tech 30 AEPIC Compilation UCB – November 8, 2001 Krishna V Palem Georgia Tech 31 Key Tasks For AEPIC Compiler Partitioning – Identifying code sections that convert to custom instructions Operation synthesis – Efficient CFU implementations for identified partitions Instruction selection – Multiple CFU choices for code partitions Resource allocation – Allocation of C-cache and MRLA resources for CFUs Scheduling – When and where to schedule adaptive component instructions UCB – November 8, 2001 Krishna V Palem Georgia Tech 32 Compiler Modules Source program Front-end processing Partitioning High-level optimizations (EPIC core) Configuration Library Mapping (Adaptive) Pre-pass scheduling Resource allocation Machine description Post-pass scheduling Code generation Simulation Performance statistics UCB – November 8, 2001 Krishna V Palem Georgia Tech 33 Resource allocation for configurations Reconfigurable logic can accommodate only two configurations simultaneously Record of execution B G R UCB – November 8, 2001 Overlapping live-range region RL Processor Krishna V Palem Georgia Tech 34 Managing Reconfigurable Resource (contd.) Simpler version related to register allocation problem New parameters – No need to save to memory on spill – configurations immutable – Load costs different for different configurations – C-cache, MRLA = multi-level register file Adapted register allocation techniques to configuration allocation – Non-uniform sizes of configurations: graph multi-coloring – Adapted Chaitin’s coloring based techniques UCB – November 8, 2001 Krishna V Palem Georgia Tech 35 The Problem With Long Reconfiguration Times Configuration load time Configuration execution time UCB – November 8, 2001 Krishna V Palem Georgia Tech 36 Speculating Configuration Loads f1 f2 Speculated configuration loads Configuration execution time • Mask reconfiguration times! • Need to know when and where to speculate • If f1>>f2 do not speculate to “red” empty load slot UCB – November 8, 2001 Krishna V Palem Georgia Tech 37 Sample Compiler Topics Configuration cache management Power/clock rate vs. Performance tradeoff Bit-width optimizations UCB – November 8, 2001 Krishna V Palem Georgia Tech 38 More Generally Architecture Assembly Applications An ISA view Synthesis and other hardware design off-line Much closer to compiler “Compiler” selects assembles and optimizes program optimizations implies Build off-line (synthesis, place and route) Program Prebuilt Implementations faster compile time Data path Storage Interconnect Dynamically variable ISA Architecture implementation UCB – November 8, 2001 Also applicable to yield fixed implementations in silicon Krishna V Palem Georgia Tech