SIERRA - Computer Graphics at Stanford University
Transcription
SIERRA - Computer Graphics at Stanford University
SIERRA -- A Computational Framework for Engineering Mechanics Applications http://www.esc.sandia.gov/sierra.html James R. Stewart jrstewa@sandia.gov H. Carter Edwards hcedwar@sandia.gov Engineering Sciences Center Sandia National Laboratories Albuquerque, NM 1 Outline • SIERRA concepts and overview • Basic framework services • Advanced framework services 2 DOE / ASCI • Developed for – Department of Energy (DOE) – Accelerated Strategic Computing Initiative (ASCI) • Challenges – Coupled multiphysics: solid+fluid+thermal+chem. – Large unstructured meshes: 100,000,000s elements – Massively parallel computing: 1,000s of processors – Advanced algorithms: mesh erosion, h-adaptivity, multilevel, dynamic load balancing 3 SIERRA Concept Applications share a single framework which provides common capabilities • • • • Simplify utilization of ASCI supercomputers Consolidate common capabilities Eliminate redundant development and maintenance Encourage architecturally similar applications Application developers work in a common software development environment • • • • • Uniform access to ASCI resources Utilize a common source code repository Coordinate development efforts Consolidate the set of software development tools Share software development processes 4 SIERRA Contributors • Framework – – – – – – – • Tools Noel Belcourt Kevin Copps Carter Edwards Jonathan Rath Greg Sjaardema Jim Stewart Alan Williams – – – – – Kathy Aragon Dorothy Brethauer Christi Forsythe Mark Hamilton Erik Illescas + many Application Developers! 5 Migration of Sandia Application Codes VIPAR Parachute performance code, vortex method coupled with transient dynamics PREMO (SACCARA) PRONTO Transient dynamics Lagrangian solid mechanics JAS Quasi-static solid mechanics COYOTE Thermal mechanics with chemistry GOMA Incompressible fluid mechanics with free surface SALINAS Structural dynamics SACCARA Compressible fluid mechanics DAKOTA Design optimization FUEGO/ SYRINX DAKOTA VIPAR SALINAS KRINO SIERRA ANDANTE PRESTO (PRONTO) ARIA (GOMA) CALORE (COYOTE) ADAGIO (JAS) Current and future SIERRA-based codes Current stand-alone codes 6 Outline • SIERRA concepts and overview • Basic framework services • Advanced framework services 7 Basic Framework Services Mechanics Mgmt Field Mgmt SIERRA Kernel Mesh Mgmt Parallel Communications Bulk Mesh Data I/O User Input Parsing Linear Solver I/F Master Element I/F 8 Mesh Management • Unstructured mesh – Arbitrary mesh object connections – Mix element topologies (hex, tet, quad, …) • Fully distributed mesh data structure • Dynamic creation/deletion of mesh objects • Can define mesh subsets – Define by part, material type, boundary, constraint, … – Define unions and intersections of subsets 9 Field Management • Application defined fields (a.k.a. variables) – Text name – Type (int, real, vector, full tensor, symmetric tensor, …) – Aggregate types (e.g. collection of material variables) – Optionally associated with a master element (interpolation field, integration field) • Fields are defined on mesh objects • Associated with a mesh subset – ( field , mesh-object ∈ subset ) → allocated value – ( field , mesh-object ∉ subset ) → NO value 10 Mesh/Field Management Finite Element Mesh Collection of mesh objects 100 100 data 200 200 data 410 300 300 430 420 442 441400 440 443 444 410 420 data data 430 data 443 data 300 441 442 444 data Contiguous workset array data data data 11 442 420 443 100 300 200 410 430 441 444 data data data data data data data data data data Mesh/Field Management Workset arrays are: • Delivered to the mechanics algorithm in Fortran array order • Sized to minimize cache misses 12 442 420 443 100 300 200 410 430 441 444 data data data data data data data data data data Finite Element Services • SIERRA provides a master element interface • Elements are implemented and shared by application developers • Current (partial) list of master elements (each with fully integrated, uniform gradient, and control surface/volume versions) – 8-node hexahedron – 4-node tetrahedron – 4-node quadrilateral (in 2D and 3D, including shell) – 3-node triangle (in 3D, including shell) – 6-node wedge – Others, and more on the way… 13 Mechanics Management • A mechanics module consists of algorithms and supporting data – Associate mesh subsets with mechanics modules – Declare fields to support mechanics modules – Uses zero-to-many master elements • Mechanics modules are supplied by application – Examples: solvers, BC’s, external forces, etc – Nest mechanics modules inside other modules 14 Mechanics Module Hierarchy Domain Procedure (time step control) Region A Region B (single step of physics A) (single step of physics B) Mechanics Mechanics Mesh and Fields Mesh and Fields Transfer 15 Parallel Communication Services • Simple scalar reductions (e.g. global dot product) • Global assembly – Update/sum field values on subdomain boundary • Inter-processor operations – Communicate fields between processors (MPI) • Multiple domain decompositions – Redistribute mesh between decompositions – SIERRA uses the Zoltan (SNL) partitioning library 16 Linear Solver Interface Application Mechanics Element, Boundary, and Constraint Contributions Linear Solver Interface Solution Values Finite Element Interface (FEI) Prometheus Trilinos Spooles PETSc Others 17 User Input Parsing Parameter Values Application’s Mechanics Command Registration SIERRA Parser Query Specifications Command Specifications’ XML Database Generate Documentation Parse Commands User Input File User Command Specifications’ HTML Pages 18 Scaling Tests: 3 Applications Presto Scaling for 2K Elements/Processor Relative to 32 Processors on ASCI-Red • Presto: explicit dynamics • Adagio: quasi-statics • Calore: thermal conduction and enclosure radiation Scaling 10 ASCI-Red ASCI-Blue-Mtn ASCI-Blue-Pac 1 0.1 1 10 100 1000 10000 Number of Processors Adagio Scaling for 2K Elements/Processor Relative to 32 Processors on ASCI-Red Calore Scaling for 10K Elem ents/Processor Relative to 32 Processors on ASCI-Red 10 ASCI-Red ASCI-Blue-Mtn ASCI-Blue-Pac 1 Scaling Scaling of Nonlinear Iteration 10 ASCI-Red ASCI-Blue-Mtn ASCI-Blue-Pac 1 0.1 0.1 1 10 100 1000 1 10000 10 100 1000 Number of Processors Number of Processors 19 10000 Outline • SIERRA concepts and overview • Basic framework services • Advanced framework services 20 Basic Framework Services Mechanics Mgmt Field Mgmt SIERRA Kernel Mesh Mgmt Parallel Communications Bulk Mesh Data I/O User Input Parsing Linear Solver I/F Master Element I/F 21 Advanced Framework Services Mechanics Mgmt Field Mgmt SIERRA Kernel Mesh Mgmt Parallel Communications Bulk Mesh Data I/O User Input Parsing Linear Solver I/F Master Element I/F Field Transfers H-Adaptivity Load Balancing Element Death 22 Key Concept: Multiple Domain Decompositions • Mechanics algorithm chooses an efficient domain decomposition • Mechanics algorithms are independent of a specific decomposition Four Processor Example • Primary Decomposition - graph based, best suited for element computations • Secondary Decomposition – geometry based, best suited for search algorithms. Original Finite Element Mesh 23 Dynamic Load Balancing: Surface Example P1 P0 Surface (a) Original Undecomposed Model PrimarySurface Surface Primary nodes • • nodes faces • • faces edges • • edges NOTbalanced balancedfor for NOT surfacealgorithm algorithm surface P0 (b) Primary Decomposition Topology Based All of surface is on P0 Load Balancing Decomposition P1 (c) Secondary Decomposition Geometry Based Distributed surface on P0 & P1 SecondarySurface Surface Secondary nodes • • nodes faces • • faces edges • • edges Balancedfor for Balanced surfacealgorithm algorithm surface Two copies of the same surface (nodes, edges, faces) using exactly the same amount of memory. However, they are distributed over the processors differently for each view. 24 Key Technology: Parallel Communication Specification “CommSpec” • Relation: Source-Mesh → Destination-Mesh – { (MeshObject,SrcProc) → (MeshObject,DestProc) } • Symmetric: Relation = converse(Relation) – Shared mesh objects on inter-processor boundaries • Nonsymmetric − Arbitrary on and off processor mesh object connectivity − Inter-mesh transfers, load balancing, periodic boundary conditions, contact algorithms, … 25 Parallel Transfer Algorithm • Transfer nodal field from source to destination • Mesh decompositions not geometrically aligned ⇒ must “geometrically rendezvous” the meshes Source Mesh Field & Mesh Data (CommSpec A) Source Rendezvous Mesh Interpolated Results Determine common geometric ToCToA) (CommSpec decomposition forBrendezvous Destination Mesh SearchInterp. Results Field Data ToC) (CommSpec A (CommSpec BT) On-processor interpolation On-processor search (CommSpec C) 26 Mesh Data (CommSpec B) Destination Rendezvous Mesh Coupled Calore/Fuego (Thermal/Flow) Pipe Flow Problem Calore • Mesh both pipe and fluid • Transfer fluid temperature to Fuego Fuego • Different mesh for fluid • Transfer fluid velocity to Calore 27 H-Adaptivity: Overview • Goal: Selectively refine elements to achieve solution accuracy more efficiently Solve Physics Stopping Criterion Satisfied? Estimate Solution Error Distribution Resolve Markers (2:1) Mark Elements (Adaptive Strategy) Restrict Variables Global Mesh Update Application SIERRA Either 28 No Yes Prolong Variables Proceed to Next Timestep H-Adaptivity Global Mesh Update Restrict Variables Unrefine Mesh Global Mesh Update Rebalance Load Prolong Variables Refine Mesh Rebalancing is done when the mesh is “smallest”! 29 Adaptive Mesh Refinement • Elements are hierarchically split into child elements • Parent elements are retained in the data structure • Hanging node constraints are handled by the application code • 2:1 refinement ratio is globally enforced P0 R R P1 2:1 refinement ratio enforced R = Refine element 30 Adaptive Mesh Unrefinement • Parent elements are restored by merging their children • Child elements are then deleted from the data structure U U U U U = Unrefine element 31 Adaptive Mesh Unrefinement • All children must be marked for unrefinement to occur: No unrefinement! U U U U = Unrefine element • Unrefinement will not break the 2:1 refinement ratio: U U U U No unrefinement! U = Unrefine element 32 Dynamic Load Rebalancing • Option 1: Rebalance “genesis” mesh only – Parents and children are not allowed to split P0 P0 P1 P1 Genesis Mesh Mesh after 3 refinements: - P0: 3 elements - P1: 10 elements 33 Dynamic Load Rebalancing • Option 2: Allow parent-child splitting among processors P0 P0 P1 P1 Genesis Mesh Mesh after 3 refinements: - P0: 6 elements - P1: 7 elements Tradeoff: More communication required, but the mesh is more evenly balanced 34 H-Adaptivity Example: Electron Beam Rastering • 3D Si wafer at room temperature • Beam is modeled by time-dependent heat flux • Heating is very localized • a posteriori error indicator: H1 seminorm of temperature 35 Electron Beam Rastering • Temperature in K • Two refinements per time step • Dynamic load rebalancing Solution after 8 time steps 36 Electron beam rastering (cont) •Temperature is in K •Two refinements performed each time step Solution after 32 time steps 37 Wafer Slice and Rotation 38 Electron Beam Rastering 8 Proc Dynamic Load Rebalancing Movie 39 Summary • SIERRA provides scalable framework services needed by mechanics application codes • Application code developers focus on mechanics, not computer science • Software frameworks such as SIERRA will become increasingly more common [e.g, Trellis (Rensselaer Polytechnic Institute), Deal.II and UG (Univ of Heidelberg), Others…] • “Supporting Infrastructures” mini-symposium at 5th World Congress on Computational Mechanics, Vienna, July 2002 40 EXTRA SLIDES 41 Element Refinement Template • Defines topology and connectivity of children • Refinement must maintain validity of mesh • 2D examples: 42 Bulk Mesh Data Input/Output • Services – Parallel IO for mesh topology and field values – Restart • Simple application interface, specify: – What files for input/output – Which mesh subsets and fields – When to output • Transparent access to multiple file formats – ExodusII (SNL), DMF (ASCI Tri-Lab), … 43