Slides - WCET 2013

Transcription

Slides - WCET 2013
Predictable hardware: The AURIX
Microcontroller Family
Worst-Case Execution Time Analysis
WCET 2013, July 9, 2013, Paris, France
Jens Harnisch (Jens.Harnisch@Infineon.com),
Infineon Technologies AG,
Automotive Microcontroller, Application and Concept Engineering
22-Jul-13
Copyright © Infineon Technologies 2011. All rights reserved.
Page 1
Confidential
Outline
 Challenges for Automotive Microcontrollers
 Power train application characteristics
 AURIX: WCET related system features
 AURIX: WCET related core features
 Multi-Core Software Considerations
 Tracing Options
22-Jul-13
Copyright © Infineon Technologies 2011. All rights reserved.
Page 2
Confidential
Automotive Microcontroller – The Challenge
Safety &
Security
Performance
Microcontroller
Concept
Power
Consumption
Cost
Keep Constraints for Hard Real Time Support:
• Predictability
• Keep latency constraints (e.g. for interrupts)
• Non intrusive trace support
22-Jul-13
Copyright © Infineon Technologies 2011. All rights reserved.
Page 3
Confidential
Characteristics
of typical applications
running on AURIX (power train)
 Static assignment of tasks to cores, separated schedulers (AMP)
 Time and event driven components (event driven components handled via preemption)
 Roughly every 7th instruction is a branch
 Floating point instructions: varying, depends on code generation
 Data locality: Data (except for LUTs) fits usually into DSPR, having 0
clocks latency
 Look up tables are frequently used
 Instruction level parallelism: 0.8 is often a good result; with high
optimization more than 1 instruction per cycle is feasible
(TriCore TC 1.6P has 3 pipelines)
22-Jul-13
Copyright © Infineon Technologies 2011. All rights reserved.
Page 4
Confidential
AURIX Multi-Core
Controller:
WCET related system features
Checker Core
FPU
PMI
FPU
TriCore
1.6P
PMI
DMI
TriCore
1.6P
FPU
DMI
PMI
TriCore
1.6P
DMI
 Highly predictable architecture
with duplicated resources
(local memories, crossbar) to
avoid resource conflicts
Crossbar
Program
Flash
Data
Flash
Sys
RAM
Bridge
FFT
SDMA
HSSL
BCU
SCU
STM
GTM
GPT12x
CCU6x
HSM
DS-ADC
ADC
Peripheral Bus
 Priority driven and round robin
arbitration for masters
attached to crossbar
 Starvation protection in
crossbar
 No cache coherence
10/04/2013
22-Jul-13
IOM
FCE
I²C
PSI5(S)
SENT
QSPI
ASCLIN
MSC
MultiCAN+
FlexRay
Ethernet
SCR
EVR
5V or 3.3V
single supply
Copyright
Copyright ©
© Infineon
Infineon Technologies
Technologies 2013.
2011. All
All rights
rights reserved.
reserved.
 Dedicated and scalable
communication instructions
Page 5
Confidential
TriCore
1.6E and TriCore 1.6P:
WCET related features
TriCore 1.6Efficiency
F
D
E1
E2
TriCore 1.6Performance
F1
Scalar Harvard
F2
PD
D
E1
E2
PD
D
E1
E2
PD
D
E1
E2
Superscalar Harvard
Common TriCore 1.6 Instruction Set
 One pipeline only for high power efficiency
 Superscalar: integer, load store and loop pipeline
 4 pipeline stages for up to 200MHz
 6 pipeline stages for up to 300MHz
 Power 0.2mW/MHz
 Power 0.3mW/MHz
 1.2 – 1.4 DMIPS/MHz
 1.6 – 2.3 DMIPS/MHz
 Caches: 2 way LRU
 Easy to describe RAW and structural dependencies
 No support of simultaneous multithreading
 4 stage data store buffer
22-Jul-13
Copyright © Infineon Technologies 2011. All rights reserved.
Page 6
Confidential
Support for hard real time systems
 Support for high average case performance usually contradicts
high predictability
 Static timing analysis: precision of results and efficiency of
analysis strongly depend on hardware architecture features
 Three classes of hardware architectures [1]:
– Fully timing compositional architectures: no timing anomalies
– Compositional architectures with constant-bounded effects: timing
anomalies without domino effects -> TriCore is assumed to belong to
this class
– Non-compositional architectures
[1] Predictability Considerations in the Design of Multi-Core Embedded Systems.
C. Cullmann, C. Ferdinand, G. Gebhard, D. Grund, C. Maiza, J. Reineke, B. Triquet, S. Wegener, and R.
Wilhelm. Ingénieurs de l'Automobile, 807, 2010.
22-Jul-13
Copyright © Infineon Technologies 2011. All rights reserved.
Page 7
Confidential
Design
Guidelines for predictable
multicore architectures
 Established by C. Cullmann, C. Ferdinand,
G. Gebhard, D. Grund, C. Maiza, J. Reineke,
B. Triquet, S. Wegener and R. Wilhelm [1]
1.
Fully timing compositional architecture
2.
Disjoint instruction and data caches
3.
Caches with LRU replacement policy
4.
A shared bus protocol with bounded
AURIX
(with bounds)
access delay
5.
Private caches
6.
Private memories, or, only
share lonely resources
[1] Predictability Considerations in the Design of Multi-Core Embedded Systems. C. Cullmann, C. Ferdinand,
G. Gebhard, D. Grund, C. Maiza, J. Reineke, B. Triquet, S. Wegener, and R. Wilhelm. Ingénieurs de
l'Automobile, 807, 2010.
22-Jul-13
Copyright © Infineon Technologies 2011. All rights reserved.
Page 8
Confidential
Some software mapping examples
 Interrupt driven tasks on one core, time driven tasks on other(s)
 Split all tasks across cores to balance performance
 With and without OS (computationally intensive functionality)
 Split by ASIL level
 Split by tier 1 / OEM
FPU
PMI
TriCore
1.6P
LMU
DMI
Overlay
 Important: Differentiate
Progr.
Flash
PMU
Progr.
Flash
Progr.
Flash
Progr.
Flash
SRI Cross Bar
between your(!) optimization
objectives (metrics) and constraints
RAM
PMU
Data Flash
BROM
Key Flash
Checker Core
Checker Core
FPU
PMI
TriCore
1.6P
FPU
DMI
Overlay
PMI
TriCore
1.6E
DMI
Overlay
Standby
 Examples for both optimization
objectives and constraints :
 Core load, memory balance,
specific task latencies,
end-to-end latencies
22-Jul-13
Copyright © Infineon Technologies 2011. All rights reserved.
Page 9
Confidential
What does mapping of software mean?
 To map: functionality to cores, code and data to memories!
To use: scheduling and communication (AUTOSAR mechanisms and/or direct mechanisms).
 Major mapping criteria: Core performance, support for lock step, memory access latencies
and sizes, access conflicts
 Summary for the software developer:
 Mapping of functionality often constrained by performance of cores, lock step and
hence is not so complex
 Mapping of code and data: Up to 9(!) different memory locations to consider: possibly
complex
FPU
PMI
TriCore
1.6P
DMI
PMU
RAM
PMU
SRI Cross Bar
PMI
22-Jul-13
Checker Core
Checker Core
FPU
FPU
TriCore
1.6P
DMI
PMI
TriCore
1.6E
Copyright © Infineon Technologies 2011. All rights reserved.
DMI
Page 10
Confidential
Real life
example: Race condition and
deadlock after few micro seconds!
PMI
TriCore
1.6P
DMI
RAM
core 0
ready
core 1 core 2
ready ready
not used
SRI Cross Bar
PMI
22-Jul-13
TriCore
1.6P
DMI
PMI
TriCore
1.6E
DMI
 3 bits of a 32 bit variable used to
synchronize all three cores,
located in RAM and cached in all
3 DMIs
Copyright © Infineon Technologies 2011. All rights reserved.
Page 11
Confidential
Real life example: Details
 3 bits of a 32 bit variable used in RAM to synchronize all 3
cores, after synchronization
cores should proceed independently.
 Atomic operation applied to 32 bit variable to modify ready
bit for each core.
PMI
TriCore
1.6P
DMI
 Because the synchronization may often be necessary, the
developer decided to cache the variable.
RAM
SRI Cross Bar
 However, atomicity refers only to physical memory location
(in this case the cache, and not LMU RAM)!
 Atomicity was wanted, but not really guaranteed, hence a
race condition was provoked!
PMI
TriCore
1.6P
DMI
PMI
TriCore
1.6E
DMI
 Incorrect status of variable (due to race condition) led to
deadlock.
 Incorrect solution of the developer: change timing via cross
bar priorities -> software was executing without deadlock,
but was still incorrect!
 Correct solution: Do not cache globally used variables!
22-Jul-13
Copyright © Infineon Technologies 2011. All rights reserved.
Page 12
Confidential
Trace, Measure
and Calibrate in Target
System with Emulation Devices (EDs)
TC1798ED
Product Chip Part (SoC)
Cerberus
Shared
Bus I/F
(DMA)
SoC
MLI0
DAP/
JTAG
SBCU
PCP2
SRI
XBAR
TC1.6
BOB
POB
BOB
POB
LMU
EEC
EMLI
Message Sequencer
MCX
MCDS
IOC32
DMC
EEC
Emulation Memory EMEM (768 KB SRAM)
Back Bone Bus BBB
EBCU
ED_block_diagram.vsd
 Tracing as non-intrusive technology (with regards to timing)
gains importance for multi-core software development
 EDs: Same package, no additional pins needed
 Calibration memory with same access speed as flash
22-Jul-13
Copyright © Infineon Technologies 2011. All rights reserved.
Page 13
Confidential
Flow for using MCDS
Target
Execution
Control
– Establish test case
GBits
Trace Qualification
Trace Message
Trace Storage
Trace Reconstruction
mHz
Debug Tool
22-Jul-13
– Configure “trigger, event, action”
– Generate only relevant
information
– Save messages chronologically
– Interpolate history
– Present in human readable form
Copyright © Infineon Technologies 2011. All rights reserved.
Page 14
Confidential
Summary
 Reaching higher performance with more cores rather than
with core and memory hierarchy optimizations might simplify
WCET, if there is little interference between the cores
 Powerful protection and tracing mechanisms will be needed to
assure non-interference or at least assess the degree of
interference
 AURIX has a focus on protection mechanisms to avoid
interference (where not needed) and powerful tracing options
to assess interference (where unavoidable)
22-Jul-13
Copyright © Infineon Technologies 2011. All rights reserved.
Page 15
Confidential
Timing
and Performance Analysis Partners,
Acknowledgements
 Academia Partners
 Prof. Dr. Bernhard Bauer, Augsburg University
 Prof. Dr. Heiko Falk, Ulm University
 Industry Partners
 AbsInt Angewandte Informatik GmbH, Gliwa GmbH,
Symtavison GmbH, Timing Architects GmbH
 Debugger Vendors with Trace support: iSYSTEM AG für
Informatiksysteme, Lauterbach GmbH, PLS Programmierbare
Logik & Systeme GmbH
 Acknowledgements
 Prof. Dr. Claire Maiza (Grenoble INP/ Ensimag)
 Reinhard Deml, Frank Hellwig,
Pawel Jewstafjew, Dr. Albrecht Mayer (Infineon)
22-Jul-13
Copyright © Infineon Technologies 2011. All rights reserved.
Page 16
Confidential
Thank you for the attention
22-Jul-13
Copyright © Infineon Technologies 2011. All rights reserved.
Page 17

Similar documents

Invited Talk - DVCon India

Invited Talk - DVCon India type (read, write, fetch, Accurate Bus Traffic Detailed Processor Trace load) Processor Generator data & number of bytes Model • Instruction Timing• ,Transferred op-code, Busaddress, Protocolmnemon...

More information

AURIX MultiCore_Lauterbach_handout.pptx

AURIX MultiCore_Lauterbach_handout.pptx 180MHz, 2.5M TC1746 TC1766/67180MHz, 2.5M

More information

(TriCore) Jul 31, 2012 | PDF

(TriCore) Jul 31, 2012 | PDF CPU load at 180MHz frequency, for the complete Field Oriented Control (FOC) algorithm. TriCore™ AURIX™ family offers multicore architecture, allowing inverter control, hybrid torque management and ...

More information