SIERRA - Computer Graphics at Stanford University

Transcription

SIERRA - Computer Graphics at Stanford University
SIERRA -- A Computational Framework for
Engineering Mechanics Applications
http://www.esc.sandia.gov/sierra.html
James R. Stewart jrstewa@sandia.gov
H. Carter Edwards hcedwar@sandia.gov
Engineering Sciences Center
Sandia National Laboratories
Albuquerque, NM
1
Outline
• SIERRA concepts and overview
• Basic framework services
• Advanced framework services
2
DOE / ASCI
• Developed for
– Department of Energy (DOE)
– Accelerated Strategic Computing Initiative (ASCI)
• Challenges
– Coupled multiphysics: solid+fluid+thermal+chem.
– Large unstructured meshes: 100,000,000s elements
– Massively parallel computing: 1,000s of processors
– Advanced algorithms: mesh erosion, h-adaptivity,
multilevel, dynamic load balancing
3
SIERRA Concept
Applications share a single framework which
provides common capabilities
•
•
•
•
Simplify utilization of ASCI supercomputers
Consolidate common capabilities
Eliminate redundant development and maintenance
Encourage architecturally similar applications
Application developers work in a common
software development environment
•
•
•
•
•
Uniform access to ASCI resources
Utilize a common source code repository
Coordinate development efforts
Consolidate the set of software development tools
Share software development processes
4
SIERRA Contributors
• Framework
–
–
–
–
–
–
–
• Tools
Noel Belcourt
Kevin Copps
Carter Edwards
Jonathan Rath
Greg Sjaardema
Jim Stewart
Alan Williams
–
–
–
–
–
Kathy Aragon
Dorothy Brethauer
Christi Forsythe
Mark Hamilton
Erik Illescas
+ many Application Developers!
5
Migration of Sandia
Application Codes
VIPAR
Parachute performance code, vortex
method coupled with transient dynamics
PREMO
(SACCARA)
PRONTO
Transient dynamics
Lagrangian solid mechanics
JAS
Quasi-static solid mechanics
COYOTE
Thermal mechanics with chemistry
GOMA
Incompressible fluid
mechanics with free surface
SALINAS
Structural dynamics
SACCARA
Compressible fluid mechanics
DAKOTA
Design optimization
FUEGO/
SYRINX
DAKOTA
VIPAR
SALINAS
KRINO
SIERRA
ANDANTE
PRESTO
(PRONTO)
ARIA
(GOMA)
CALORE
(COYOTE)
ADAGIO
(JAS)
Current and future SIERRA-based codes
Current stand-alone codes
6
Outline
• SIERRA concepts and overview
• Basic framework services
• Advanced framework services
7
Basic Framework Services
Mechanics Mgmt
Field Mgmt
SIERRA
Kernel
Mesh Mgmt
Parallel Communications
Bulk Mesh Data I/O
User Input Parsing
Linear Solver I/F
Master Element I/F
8
Mesh Management
• Unstructured mesh
– Arbitrary mesh object connections
– Mix element topologies (hex, tet, quad, …)
• Fully distributed mesh data structure
• Dynamic creation/deletion of mesh objects
• Can define mesh subsets
– Define by part, material type, boundary, constraint, …
– Define unions and intersections of subsets
9
Field Management
• Application defined fields (a.k.a. variables)
– Text name
– Type (int, real, vector, full tensor, symmetric tensor, …)
– Aggregate types (e.g. collection of material variables)
– Optionally associated with a master element
(interpolation field, integration field)
• Fields are defined on mesh objects
• Associated with a mesh subset
– ( field , mesh-object ∈ subset ) → allocated value
– ( field , mesh-object ∉ subset ) → NO value
10
Mesh/Field Management
Finite Element Mesh
Collection of mesh objects
100
100
data
200
200
data
410
300 300
430
420
442
441400
440
443 444
410
420
data
data
430
data
443
data
300
441
442
444
data
Contiguous workset array
data
data
data
11
442
420
443
100
300
200
410
430
441
444
data
data
data
data
data
data
data
data
data
data
Mesh/Field Management
Workset arrays are:
• Delivered to the mechanics
algorithm in Fortran array order
• Sized to minimize cache misses
12
442
420
443
100
300
200
410
430
441
444
data
data
data
data
data
data
data
data
data
data
Finite Element Services
• SIERRA provides a master element interface
• Elements are implemented and shared by
application developers
• Current (partial) list of master elements (each
with fully integrated, uniform gradient, and control
surface/volume versions)
– 8-node hexahedron
– 4-node tetrahedron
– 4-node quadrilateral (in 2D and 3D, including shell)
– 3-node triangle (in 3D, including shell)
– 6-node wedge
– Others, and more on the way…
13
Mechanics Management
• A mechanics module consists of algorithms
and supporting data
– Associate mesh subsets with mechanics modules
– Declare fields to support mechanics modules
– Uses zero-to-many master elements
• Mechanics modules are supplied by
application
– Examples: solvers, BC’s, external forces, etc
– Nest mechanics modules inside other modules
14
Mechanics Module Hierarchy
Domain
Procedure (time step control)
Region A
Region B
(single step of physics A)
(single step of physics B)
Mechanics
Mechanics
Mesh and Fields
Mesh and Fields
Transfer
15
Parallel Communication Services
• Simple scalar reductions (e.g. global dot
product)
• Global assembly
– Update/sum field values on subdomain boundary
• Inter-processor operations
– Communicate fields between processors (MPI)
• Multiple domain decompositions
– Redistribute mesh between decompositions
– SIERRA uses the Zoltan (SNL) partitioning library
16
Linear Solver Interface
Application Mechanics
Element, Boundary, and
Constraint Contributions
Linear Solver Interface
Solution Values
Finite Element Interface (FEI)
Prometheus
Trilinos
Spooles
PETSc
Others
17
User Input Parsing
Parameter
Values
Application’s
Mechanics
Command
Registration
SIERRA
Parser
Query
Specifications
Command
Specifications’
XML Database
Generate
Documentation
Parse
Commands
User
Input File
User
Command
Specifications’
HTML Pages
18
Scaling Tests: 3 Applications
Presto Scaling for 2K Elements/Processor
Relative to 32 Processors on ASCI-Red
• Presto: explicit dynamics
• Adagio: quasi-statics
• Calore: thermal conduction
and enclosure radiation
Scaling
10
ASCI-Red
ASCI-Blue-Mtn
ASCI-Blue-Pac
1
0.1
1
10
100
1000
10000
Number of Processors
Adagio Scaling for 2K Elements/Processor
Relative to 32 Processors on ASCI-Red
Calore Scaling for 10K Elem ents/Processor
Relative to 32 Processors on ASCI-Red
10
ASCI-Red
ASCI-Blue-Mtn
ASCI-Blue-Pac
1
Scaling
Scaling of
Nonlinear Iteration
10
ASCI-Red
ASCI-Blue-Mtn
ASCI-Blue-Pac
1
0.1
0.1
1
10
100
1000
1
10000
10
100
1000
Number of Processors
Number of Processors
19
10000
Outline
• SIERRA concepts and overview
• Basic framework services
• Advanced framework services
20
Basic Framework Services
Mechanics Mgmt
Field Mgmt
SIERRA
Kernel
Mesh Mgmt
Parallel Communications
Bulk Mesh Data I/O
User Input Parsing
Linear Solver I/F
Master Element I/F
21
Advanced Framework Services
Mechanics Mgmt
Field Mgmt
SIERRA
Kernel
Mesh Mgmt
Parallel Communications
Bulk Mesh Data I/O
User Input Parsing
Linear Solver I/F
Master Element I/F
Field Transfers
H-Adaptivity
Load Balancing
Element Death
22
Key Concept:
Multiple Domain Decompositions
• Mechanics algorithm chooses an efficient domain decomposition
• Mechanics algorithms are independent of a specific decomposition
Four Processor Example
• Primary Decomposition - graph based,
best suited for element computations
• Secondary Decomposition – geometry based,
best suited for search algorithms.
Original Finite Element Mesh
23
Dynamic Load Balancing:
Surface Example
P1
P0
Surface
(a)
Original
Undecomposed Model
PrimarySurface
Surface
Primary
nodes
• • nodes
faces
• • faces
edges
• • edges
NOTbalanced
balancedfor
for
NOT
surfacealgorithm
algorithm
surface
P0
(b)
Primary Decomposition
Topology Based
All of surface is on P0
Load Balancing
Decomposition
P1
(c)
Secondary Decomposition
Geometry Based
Distributed surface on P0 & P1
SecondarySurface
Surface
Secondary
nodes
• • nodes
faces
• • faces
edges
• • edges
Balancedfor
for
Balanced
surfacealgorithm
algorithm
surface
Two copies of the same surface (nodes, edges, faces) using exactly
the same amount of memory. However, they are distributed over the
processors differently for each view.
24
Key Technology:
Parallel Communication Specification
“CommSpec”
• Relation: Source-Mesh → Destination-Mesh
– { (MeshObject,SrcProc) → (MeshObject,DestProc) }
• Symmetric: Relation = converse(Relation)
– Shared mesh objects on inter-processor boundaries
• Nonsymmetric
− Arbitrary on and off processor mesh object connectivity
− Inter-mesh transfers, load balancing, periodic boundary
conditions, contact algorithms, …
25
Parallel Transfer Algorithm
• Transfer nodal field from source to destination
• Mesh decompositions not geometrically aligned
⇒ must “geometrically rendezvous” the meshes
Source
Mesh
Field & Mesh Data
(CommSpec A)
Source
Rendezvous
Mesh
Interpolated
Results
Determine
common
geometric
ToCToA)
(CommSpec
decomposition
forBrendezvous
Destination
Mesh
SearchInterp.
Results
Field Data
ToC)
(CommSpec
A
(CommSpec BT)
On-processor interpolation
On-processor search
(CommSpec C)
26
Mesh Data
(CommSpec B)
Destination
Rendezvous
Mesh
Coupled Calore/Fuego
(Thermal/Flow) Pipe Flow Problem
Calore
• Mesh both pipe and fluid
• Transfer fluid temperature
to Fuego
Fuego
• Different mesh for fluid
• Transfer fluid velocity to
Calore
27
H-Adaptivity: Overview
• Goal: Selectively refine elements to
achieve solution accuracy more efficiently
Solve Physics
Stopping Criterion
Satisfied?
Estimate Solution
Error Distribution
Resolve
Markers (2:1)
Mark Elements
(Adaptive Strategy)
Restrict
Variables
Global Mesh Update
Application
SIERRA
Either
28
No
Yes
Prolong
Variables
Proceed to
Next Timestep
H-Adaptivity
Global Mesh Update
Restrict
Variables
Unrefine Mesh
Global Mesh Update
Rebalance Load
Prolong
Variables
Refine Mesh
Rebalancing is done when the mesh is “smallest”!
29
Adaptive Mesh Refinement
• Elements are hierarchically split into child
elements
• Parent elements are retained in the data structure
• Hanging node constraints are handled by the
application code
• 2:1 refinement ratio is globally enforced
P0
R
R
P1
2:1 refinement ratio enforced
R = Refine element
30
Adaptive Mesh Unrefinement
• Parent elements are restored by merging their
children
• Child elements are then deleted from the data
structure
U U
U U
U = Unrefine element
31
Adaptive Mesh Unrefinement
• All children must be marked for unrefinement to
occur:
No unrefinement!
U
U U
U = Unrefine element
• Unrefinement will not break the 2:1 refinement ratio:
U
U
U
U
No unrefinement!
U = Unrefine element
32
Dynamic Load Rebalancing
• Option 1: Rebalance “genesis” mesh only
– Parents and children are not allowed to split
P0
P0
P1
P1
Genesis Mesh
Mesh after 3 refinements:
- P0: 3 elements
- P1: 10 elements
33
Dynamic Load Rebalancing
• Option 2: Allow parent-child splitting among
processors
P0
P0
P1
P1
Genesis Mesh
Mesh after 3 refinements:
- P0: 6 elements
- P1: 7 elements
Tradeoff: More communication required,
but the mesh is more evenly balanced
34
H-Adaptivity Example:
Electron Beam Rastering
• 3D Si wafer at room
temperature
• Beam is modeled
by time-dependent
heat flux
• Heating is very
localized
• a posteriori error
indicator: H1
seminorm of
temperature
35
Electron Beam Rastering
• Temperature in K
• Two refinements
per time step
• Dynamic load
rebalancing
Solution after
8 time steps
36
Electron beam rastering (cont)
•Temperature is in K
•Two refinements
performed each time
step
Solution after
32 time steps
37
Wafer Slice and Rotation
38
Electron Beam Rastering
8 Proc Dynamic Load Rebalancing
Movie
39
Summary
• SIERRA provides scalable framework services
needed by mechanics application codes
• Application code developers focus on mechanics, not
computer science
• Software frameworks such as SIERRA will become
increasingly more common [e.g, Trellis (Rensselaer
Polytechnic Institute), Deal.II and UG (Univ of Heidelberg),
Others…]
• “Supporting Infrastructures” mini-symposium at 5th
World Congress on Computational Mechanics,
Vienna, July 2002
40
EXTRA SLIDES
41
Element Refinement Template
• Defines topology and connectivity of children
• Refinement must maintain validity of mesh
• 2D examples:
42
Bulk Mesh Data Input/Output
• Services
– Parallel IO for mesh topology and field values
– Restart
• Simple application interface, specify:
– What files for input/output
– Which mesh subsets and fields
– When to output
• Transparent access to multiple file formats
– ExodusII (SNL), DMF (ASCI Tri-Lab), …
43