Ph.D. Research Plan Presentation

Transcription

Ph.D. Research Plan Presentation
Anup Gangwar
Embedded Systems Group
(http://www.cse.iitd.ac.in/esproject)
Department of Computer Science & Engineering
Indian Institute of Technology Delhi
June 11, 2002
Presentation Outline
 Introduction and motivation
 Specialization opportunities in VLIW processors
 Methodology
 Validation framework (supporting tools required)
 Work plan
 Status of work
 References
Research Plan Presentation, June 11, 2002
http://www.cse.iitd.ac.in/esproject
Slide 2
Introduction
 Why customize architectures?


General purpose computing domain Vs embedded
Customization leads to cheaper design solutions
 Architectural choices for exploiting ILP

Superscalar processors
 Try to extract ILP at run time, so, complex hardware
 Limited clock speeds and high power dissipation
 Not suited for embedded type of applications

VLIW processors
 Compiler has lot of knowledge about hardware
 Compiler extracts ILP statically, so, simplified hardware
 Possible to attain higher clock speeds
Slide 3
Introduction - Problems with VLIW Processors
 Complex compiler required for extracting ILP
 Adequate hardware support needed for compiler
controlled execution
 Code size expansion due to explicit NOPs if,

The application does not contain enough parallelism

The compiler is not able to extract parallelism from the application

Need for good instruction encoding and NOP compression
schemes
Slide 4
 Methodology
 Work plan
 Status of work
 References
Slide 5
Specialization Opportunities -> FUs
Slide 6
Specialization Opportunities -> FUs
(contd...)
 Functional Unit Types




MISO or Multiple Input Single Output
MIMO or Multiple Input Multiple Output
MIMO with LD/ST or MIMOs with memory interaction
Rigid or flexible I/O timeshapes
NAME
Inputs and Sources
Outputs and Dests.
I/O Policy
MISO
Multiple (Regfile)
Single (Regfile)
Flexible or Rigid
MIMO
Multiple (Regfile)
Multiple (Regfile)
Flexible or Rigid
MIMO with
LD/ST
Multiple (Regfile or
Mem.)
Multiple (Regfile or
Mem.)
Flexible or Rigid for Reg.
and block LD/ST for
mem.
Slide 7
Specialization Opportunities -> Reg. File
 Single register file organization doesn’t scale well



Area grows as N3
Delay grows as N3/2
Power grows as N3
where N is the no. of Functional Units connected to the register file
 Clustered VLIW architectures are the solution



Each FU can read from/write to only a subset of registers
Data copying may increase execution latency
Powerful application analysis required to overcome above
mentioned problems
Slide 8
Specialization Opportunities -> Reg. File
(contd...)
A Clustered VLIW Architecture
Slide 9
Specialization Opportunities -> Interconnect
 Clustering FUs together requires deciding ICN

between different clusters

between clusters and memory
 Analysis of data access patterns required for evaluating
cost-performance tradeoffs
 Current ASIP vendors do not offer customizable
interconnects
Slide 10
Specialization Opportunities -> Encoding
 Instruction encoding/decoding scheme affects





Code size
Object code compatibility
Branch miss prediction penalty
Hardware cost
Address specification in code size
 Each UniOp is equivalent to a RISC/CISC instruction
UniOp
UniOp
UniOp
UniOp
MultiOp
Slide 11
Specialization Opportunities -> Encoding
IALU.0
ADD
IALU.1
FALU.0
NOP
FMUL
(contd...)
BU.0
NOP
NOPs in a MultiOp
VLIW Processor Pipeline with Instruction Decompressor
Slide 12
Specialization Opportunities -> Summary
Slide 13
 Methodology
 Work plan
 Status of work
 References
Slide 14
Existing Methodologies -> Simulation Driven
Slide 15
Task Set and
Constraints
Architecture
Description
Application Parameter
Extraction
Architecture Design Space Exploration
Retargetable Compiler
Instruction Encoding Specialization
Validation
(Simulation with encoded instructions)
Architecture Description
(Output to synthesizer)
VLIW ASIP Synthesis Methodology
Slide 16
 Methodology
 Work plan
 Status of work
 References
Slide 17
Validation Framework -> Trimaran
C Program
Bridge Code
IMPACT
•ANSI C Parsing
•Code profiling
•Classical machine independent optimizations
•Block formation
ELCOR
ELCOR IR
•Machine dependent
code optimizations
Generated Simulator
(Statistics)
•Compute and
stall cycles
•Cache stats
•Spill code info
•Code scheduling
SIMULATOR Generator
•Register allocation
•ELCOR IR to low level C files
•HPL-PD virtual machine
•Cache simulation
•Performance statistics
HMDES Machine Description
Slide 18
Validation Framework -> Trimaran
(contd...)
REBEL
Low level C files
C libraries
Emulation Library
Code Processor
HMDES
Native Compiler
Executable for the host platform
Slide 19
Validation Framework -> Retargetable Assembler
Instruction Encoding
Description
Toolkit Generator
Generated Assembler
Assembly Instructions
Object Code
To Simulator
(for simulation with encoded instructions)
Slide 20
 Methodology
 Work plan
 Status of work
 References
Slide 21
Work Plan -> Interconnect/RF/FU Specialization
 Initially model the interconnect problem as ILP and
later on move to other solutions
 Code selection problem in compilers is similar to
identifying compute intensive parts for AFUs
 No. and type of FUs has not been properly explored
 RF clustering problem has not been dealt with
elsewhere
 Jacome et. al.


Deal with Interconnect/RF/FU specialization simultaneously
Operation chaining is not considered
Slide 22
Work Plan -> Encoding/Decoding Specialization
 Goal is to be able to generate encoding schemes
automatically
 Work of Shail Aditya et. al.




Basically a parameterized encoding scheme
Techniques especially for HPL-PD architecture
Do not talk of dynamic code size minimization
Encoding template is fixed exploration limited only to within the
template design space
 Various encoding templates need to be explored, also
the template itself may be derived from application
 Dynamic code size minimization needs to be considered
Slide 23
 Methodology
 Work plan
 Status of work
 References
Slide 24
Work Status -> Specialized FUs in Trimaran
 Modeling MISOs




Model as external function calls
Replace in Trimaran bridge code and replace with AFU op
Model new AFU in MDES with the required ops
Introduce the semantics in simulator op definitions file
 Modeling MIMOs




Model as external function calls returning voids
Replace in Trimaran bridge code and replace with AFU op
Explicitly reserve registers in C-code for returning values
Introduce operation semantics in simulator op definition file
Slide 25
Work Status -> Specialized FUs in Trimaran
(contd...)
 Modeling MIMOs with LD/ST


Model as regular MIMOs
Memory interaction with block LD/ST at beginning and end of
execute cycles
 Additionally



Possible to impose register file constraints
Various I/O timeshapes, rigid or flexible
Possible to introduce pipelined functional units
Slide 26
Work Status -> Instruction Enc. in Trimaran
Slide 27
Work Status -> Instruction Enc. in Trimaran
(contd...)
 New Jersey Machine Code Toolkit (NJMC)



Deals with bits at symbolic level
Can be used to write assemblers/disassemblers
Specification in SLED (Specification Language for
Encoding/Decoding)
 Model instruction decompressor in HMDES
 Instrument ELCOR to generate assembly code
 Encoding is done using procedures generated by NJMC
 Problems with NJMC


VLIW instruction need to be broken up into 32 bit tokens
Encoded instructions must end on 8 bit boundary
Slide 28
Work Status -> Code Gen. for Clustered ASIPs
 ELCOR

Disadvantages
 ELCOR is heavily oriented towards HPL-PD architecture
 Does not support clustered VLIW architecture

Advantages
 Strong optimizing compiler
 Rich library to deal with the IR
 IMPACT compiler system offers another choice for
building a backend
 Feasibility study being carried out to fix a particular
direction of work
Slide 29
 Methodology
 Work plan
 Status of work
 References
Slide 30
References
 Bhuvan Middha, Varun Raj, Anup Gangwar, M. Balakrishnan, Anshul Kumar and
Paolo Ienne, “A Trimaran based framework for exploring design space of VLIW
ASIPs with coarse grain FUs”, ISSS-2002.
 Anup Gangwar, M. Balakrishnan and Anshul Kumar, “A framework for studying the
effect of VLIW processor instruction encoding and decoding schemes”, Mini
Project, Dept. of CSE.
 M. Jacome and G. de. Veciana, “Design challenges for new application specific
processors”, IEEE Design and Test of Computers-2000.
 B. Ramakrishna Rau and Michael S. Schlansker, “Embedded computer architecture
and automation”, IEEE Computer-2001
 Michael S. Schlansker and B. Ramakrishna Rau, “EPIC: An architecture for
instruction-level parallel processors”, HPCA-2000.
 N. G. Busa, A. van der Werf and M. Bekooij, “Scheduling coarse grain operations
for VLIW processors”, ASPDAC-1998.
 Shail Aditya, Scott A. Mahlke and B. Ramakrishna Rau, “Code size minimization and
retargetable assembly for custom EPIC and VLIW processors”, ISSS-1999.
Slide 31

Ph.D. Research Plan Presentation

Transcription

Similar documents

read our latest annual report

CSE506 PhD Section Overview Nima Honarmand

Big Data and Computing

Contents on Registration Form Programming Paradigms (CS2305)

Document 6561703

Click Here - Excel Group of Institution

Presentation of Fall 2014 CSE 197 INTERNSHIPS

Details

Change is Good, You Go First - the Power of Four Walls Branding