Automated Design Exploration and Optimization + HPC Best Practices
Transcription
Automated Design Exploration and Optimization + HPC Best Practices
Automated Design Exploration and Optimization + HPC Best Practices 1 © 2011 ANSYS, Inc. June 8, 2012 Laz Foley & Simon Pereira Confidence by Design Detroit June 5, 2012 Outline • The Path to Robust Design • ANSYS DesignXplorer • • • • Mesh Morphing and Optimizer RBF Morph Adjoint Solver HPC Best Practices The Path to Robust Design obust Design is an NSYS Advantage ngle Physics olution Robust Design Optimization •Algorithms •Published API Design Exploration “What if” Study Multiphysics Solution •Integration Platform ccuracy, robustness, eed… •Parametric Platform •DOE, Response Surfaces, Correlation, Sensitivity, Unified reporting, etc. •Six Sigma Analysis •Probabilistic Algorithms •Adjoint solver methods Single Physics/Multi Physics e physics is the entry point mulation i‐Physics enables real al prototyping Fluid Pressure Distribution 2 way coupled with a transient structural solution Stress ? omagnetic Force Density with Thermal‐Stress and Electromagnetic Force load Deformation “What If?” Interactively adjust the parameter values and “Update” Needed for "What If?" Parametric CAD Connections Pervasive Parameters Persistent Updates Managed State, Update Mechanisms Remote Solve Manager (RSM) Design Exploration Optimization Optimal Candidates ANSYS DesignXplorer Initial vs. Optimized Design Output Initial Design Optimized Tt Ratio 1.116 1.126 pt Ratio 1.674 1.709 η [%] 71.65 76.25 Power [MW] 1.208 1.268 Engineers can easily appreciate the value of understanding. Thermal Stress Pressure & Flow Velocity Parametric Geometry Deformation ust manifold design Sigma Analysis All samples reports max deformation below 1.5 mm mum Displacement should not exceed 1.5 mm Parameters Diameter of the d ess at inlet al Temperature e RPM Response Parameters Max Flow Temperature Max Deformation Max Von‐Mises stress Uncertainty of input Response Surface showing the effect of engine speed and thickness at outlet on Where are you? may want to consider how far along the “path” your simulation practices are haps you could improve your use of optimization and greater leverage your investment in Simulation? ANSYS DesignXplorer Integral with Workbench • Parametric multiphysics modeling with automated updates • Bi‐directional CAD, RSM, scripting, reporting and more... Robust Design Tools at ANSYS NSYS DesignXplorer Unified Workbench solution NSYS Fluent Built‐in Mesh Morphing and shape Optimization (MMO) tools Baseline Design Being hooked up to DX for 14.5 Adjoint solver High sensitivity – changes to shape have a big effect on drag Optimized Design High sensitivity – Shape on Downforce ࣔࢊ࢝ࢌ࢘ࢉࢋ ࢙ࣔࢎࢇࢋ ࢘ࢇࢍ Low sensitivity R14 Robust Design Tools at ANSYS NSOFT Optimetrics Produce “families of curves” Simultaneous solve with DSO packs Access to EBU Adjoint nd more XY Plot 1 Ansoft Corporation Maxwell3DDesign1 350.00 Curve Info Fm Setup1 : LastAdaptive Gap='0.001in' 300.00 Fm Setup1 : LastAdaptive Gap='0.002in' Fm Setup1 : LastAdaptive Gap='0.003in' Fm [newton] 250.00 Fm Setup1 : LastAdaptive Gap='0.004in' 200.00 Fm Setup1 : LastAdaptive Gap='0.005in' 150.00 Fm Setup1 : LastAdaptive Gap='0.006in' 100.00 50.00 0.00 0.00 100.00 200.00 300.00 400.00 500.00 ampturns [A] 600.00 700.00 800.00 900.00 1000.00 Optimization Partners NSYS simulation software has been effectively used in concert with many optimization partners MATLAB (Mathworks) ModeFrontier (Esteco) OptiSLang (Dynardo) eArtius Optimus (Noesis) RBF‐Morph Sculptor (Optimal) Sigma Technology (IOSO) TOSCA (FE‐DESIGN) iSight (Dassault) Qfin (Qfinsoft) ANSYS DesignXplorer ANSYS DesignXplorer DesignXplorer is everything under this Parameter bar… • Low cost & easy to use! • It drives Workbench • Improves the ROI! ANSYS Workbench Solvers DX Design of Experiments With little more effort than for a single run, you can use DesignXplorer to create a DOE Correlation Matrix Understand how your parameters are correlated/influenced by other parameters! Sensitivity Understand which parameters your design is most sensitive to! Response Surface Understand the ensitivities of the output arameters (results) wrt the input parameters. 2D Slices Response 3D Response Goal‐Driven Optimization Use an optimization algorithm or screening to understand tradeoffs or discover optimal design candidates! Robustness Evaluation Input parameters have variation! Make sure your design is robust! Six Sigma, TQM ut meters also! Understand how our performance will vary with your Predict how many parts will likely fail? Understand which inputs require the Industry Testimonials “The ease of using simulation tools has helped to transform our organization from a test‐centric culture to an analysis‐centric culture.” Bob Tickel, Director of Analysis at Cummins “This technology makes it possible to quickly evaluate hundreds of designs in batch processes to explore the complete design space so that we know we have the best possible design.” Stresses Ken Karbon, Staff Engineer, General Motors er the course of the design process, Dyson’s engineers steadily improved the formance of the fan to the point that the final design has an amplification ratio of 15 ne, a 2.5‐fold improvement over the six‐to‐one ratio of the original concept design. team investigated 200 different design iterations using simulation, which was 10 es the number that would have been possible had physical prototyping been the mary design tool. Physical testing was used to validate the final design, and the ults correlated well with the simulation analysis.” Example 1: Slit Die Need uniform outflow Minimize pressure drop P2 P3 Good GOOD BAD Pressure Drop Bad P1 Example 2: Combustor Diffuser Length Outlet Exit Height Outlet − 3 parameters − Minimize pressure loss − Minimize mach number Outlet Inlet Dump Gap Sensitivity Mesh Morphing and Optimizer Fluent Morpher‐Optimization Feature Allows users to optimize product design based on shape deformation to achieve design objective Based on free‐form deformation tool coupled with various optimization methods Mesh Morphing lies a geometric design change directly to the mesh in the solver s a Bernstein polynomial‐based morphing scheme eform mesh deformation defined on a matrix of control points leads to a ooth deformation orks on all mesh types (Tet/Prism, CutCell, HexaCore, Polyhedral) r prescribes the scale and direction of deformations to control points distributed evenly through the rectilinear region. Process What if? OR Optimizer Setup Case Setup Case Run Run Regions Setup Morph Parameters Setup Optimizer Morph Deformation Optimize Evaluate Choose Optimizer Auto Optimal Deformation Definition • Define constraint(s) (if any) • Select control points and prescribe the relative ranges of motion Objective Function • Objective Function: Equal flow rate Baseline Design Optimized Design Optimizer Algorithms; Compass, Powell, Rosenbrock, Simplex, Torczon uto • Optimize! Example: L‐Shaped Duct Application: L‐shaped duct Objective Function: Uniform flow at the outlet Significant Improvement in Flow Uniformity RBF Morph How Does RBF‐Morph work? A system of radial functions is used to fit a solution for the mesh movement/morphing, from a list of source points and their prescribed displacements Radial Basis Function interpolation is used to derive the displacement at any location in the space The RBF problem definition is mesh independent. RBF‐Morph is Integrated with Fluent Example 1: Internal Flow Example 3: External Flow Example 4: 50:50:50 Optimisation Courtesy of Volvo Cars Recently conducted conceptual study by ANSYS in conjunction with Volvo Cars 50 Million cell hybrid mesh of Volvo XC60 50 Design variants investigated using RBF‐Morph Addon for ANSYS Fluent and Workbench Design Explorer 50 hours total clock time to complete full optimisation on HPC Cluster External Aerodynamics olvo XC60 vehicle model Four shape parameters RBF Morph to define shape parameters ANSYS DesignXplorer • To drive shape parameters • To create DOE • To perform Goal Driven Optimization eal Aerodynamics ptimization Process – Capacity – Automatic – Fast Step #1 : Baseline Model Volume Mesh – TGrid Cell Count : 50.2 Million Cells rism Layers : 10 (First Aspect Ratio 10, Growth 1.1) rism Count : 24.4 Million Cells kewness < 0.9 Prism Layers Cut Plane Y=0 Step #2 : CFD Setup oundary Conditions nlet : Velocity Inlet 100 kmph Outlet : Pressure Outlet, 0 Pa (Gage) Side walls : Wall, no‐slip Top wall : Wall, no‐slip olver Settings Steady, PBCS, Green Gauss Node Based Gradient Fluid : Incompressible air, Density = 1.225 kg/m3, Turbulence : Realizable K‐epsilon, Non‐equilibrium wall treatment Discretization : Pressure – Standard Momentum, TKE, TDR – 2nd Order • Solution Controls – Courant Number = 200 – ERF Momentum, Pressure = 0.75 – URFs Density = 1.0, Body Forces = 1.0 Step #3 : RBF‐Morph Boat Tail Angle (P2) Roof Drop Angle (P3) Step #3 : RBF‐Morph Step #3 : RBF‐Morph • Fully integrated within FLUENT and Workbench • Easy to use • Parallel => rapidly morph large size models • Mesh independent solution works with all element types (tetrahedral, hexahedral, polyhedral, etc.) • Superposition of multiple RBF-solutions makes the FLUENT case truly parametric (only 1 mesh is stored) • RBF-solution can also be applied on the CAD • Precision: exact nodal movement and exact feature preservation. Step #4 : Setup DesignXplorer Central Composite Design, Face Centered, Enhanced Step #5 : Running Simulations Task 768 Cores 384 Cores 288 Cores 240 Cores 144 Cores Time (Seconds) Time (Seconds) Time (Seconds) Time (Seconds) Time (Seconds) Baseline Case (i.e. Design Point 1) d volume mesh of baseline into the CFD solver and y solver settings Solution 225 340 365 481 228 6979 11153 14409 17256 27246 ing CFD data file 681 538 558 600 532 Each Subsequent Design Point ph vehicle shape 84 59 65 69 100 Solution 1284 1754 2208 2630 4100 ing CFD data file 734 559 572 621 532 30.80 35.63 42.98 50.28 72.19 al Run Time (Wall Clock) eded for All 50 Design nts (Hours) Results : Optimization Results : low Results Discussion – Design point 1, 9, 19 & 25 – Velocity contours – Iso‐surface of total pressure = 0.0 Design Points Boat Tail Angle (P2) Long Roof Angle Green House (P3) (P4) Front Spoiler Drag Force (N) Angle (P5) (P1) 1 0.000 0.000 0.000 0.000 388.01 9 0.000 1.500 0.000 1.900 393.01 19 1.850 ‐2.300 ‐0.700 0.000 372.30 25 ‐1.850 1.500 ‐0.700 0.000 397.33 Design Point #1 Design Point #19 Design Point #25 Adjoint Solver Adjoint Solution? An adjoint solver allows specific information about a fluid system to be computed that is very difficult to gather otherwise. The adjoint solution itself is a set of derivatives. • They are not particularly useful in their raw form and must be post‐processed appropriately. • The derivative of an engineering quantity with respect to all of the inputs for the system can be computed in a single calculation. – Example: Sensitivity of the drag on an airfoil to its shape. There are 4 main ways in which these derivatives can be used: 1. Qualitative guidance on what can influence the performance of a system strongly. 2. Quantitative guidance on the anticipated effect of specific design changes. 3. Guidance on important factors in solver numerics. 4. Gradient‐based design optimization. How to Use the Results ‐ Qualitative GOAL: Identify features of a system design that are most influential in the performance of the system. EXAMPLE: – Sensitivity of the Drag on a NACA 0012 airfoil to changes in the shape of the airfoil. – The shape sensitivity field is extracted from the adjoint solution in a post‐processing step. High sensitivity – changes to shape have a big effect on drag Low sensitivity – changes to shape have a small effect on drag How to Use the Results ‐ Quantitative GOAL: Identify specific system design changes that benefit the performance and quantify the improvement in performance that is anticipated. EXAMPLE: – Design modifications to turning vanes in a 90 degree elbow to reduce the total pressure drop. – The optimal adjustment that is made to the shape is defined by the shape sensitivity field (steepest descent algorithm). – Effect of each change can be computed in advance based on linear extrapolation. Baseline Modified Original P = ‐232.8 Pa Expected change computed using the adjoint and linear extrapolation = 10.0 Pa Make the change and recompute the solution. Actual change = 9.0 Pa How to Use the Results ‐ Solver Numerics GOAL: Identify aspects of the solver numerics and computational mesh that have a strong influence on quantities that are being computed that are of engineering interest. EXAMPLE: – Use the adjoint solution to identify parts of the mesh where mesh adaption will benefit the computed drag by reducing the influence of discretization errors. Adapted Mesh Baseline Mesh Adapted Mesh Detail How to Use the Results ‐ Optimization GOAL: Perform a sequence of automated design modifications to improve a specific performance measure for a system EXAMPLE: – Gradient‐based optimization of the total pressure drop in a pipe. – Flow solution is recomputed and the adjoint recomputed at each design iteration. 100 90 80 Initial design ptot [Pa] 70 60 50 40 30 Final design 20 30% reduction in total pressure drop after 30 design iterations 10 0 Mesh Morphing Once a desired change to the geometry of the system has been selected, how is that change to be made? • Mesh morphing provides a convenient and powerful means of changing the geometry and the computational mesh. – Use Bernstein polynomial‐based morphing scheme discussed earlier Mesh Morphing & Adjoint Data Flow Example: Sensitivity of lift to surface shape Select portions of the geometry to be modified Adjoint to deformation operation Surface shape sensitivity becomes control point sensitivity (chain rule for differentiation) Benefit of this approach is two‐fold Smooths the surface sensitivity field Provides a smooth interior and boundary mesh deformation Mesh Morphing, Adjoint Data & Constraints The adjoint solution is determined based on the specific flow physics of the problem in hand. The effect of other practical engineering constraints must be reconciled with the adjoint data to decide on an allowable design change. Example: – Some walls within the control volume may be constrained not to move. – A minimal adjustment is made to the control‐point sensitivity field so that deformation of the fixed walls is eliminated. Fixed wall Fixed wall Moveable walls Current Functionality e adjoint solver is released with all Fluent 14 packages. ocumentation is available Theory Usage Tutorial Case study aining is available nctionality is activated by Loading the adjoint solver add‐on module new menu item is added at the top level Current Functionality Application Drivers ey initial application areas are: Low‐speed external aerodynamics – F1 (increase downforce) – Production automobiles (decrease drag) Low‐speed internal flows – Total pressure drop (reduce losses) n Fluent 14.5 a mechanism for users to efine a wide range of observables of nterest will be provided. • • • • Forces Moments Pressure drop Swirl • • • • • Ratios Products Variances Linear combinations Unary operations Current Scope NSYS‐Fluent flow solver has very broad scope djoint is configured to compute solutions based on some assumptions Steady, incompressible, laminar flow. Steady, incompressible, turbulent flow with standard wall functions. First‐order discretization in space. Frozen turbulence. he primary flow solution does NOT need to be run with these restrictions Strong evidence that these assumptions do not undermine the utility of the adjoint solution data for engineering purposes. lly parallelized. radient algorithm for shape modification Mesh morphing using control points. djoint‐based solution adaption Example 1: Automotive Aerodynamics Surface map of the drag sensitivity to shape changes Example 2: Pressure Drop in a Duct Total Pressure Drop (Pa) Geometry Predicted Result Original ‐‐‐ ‐22.0 Modified ‐14.8 ‐18.3 ggressive adjustment results in a 17% reduction in loss in just one design iteration HPC Best Practices Guidelines : • Know your hardware lifecycle • Have a goal in mind for what you want to achieve • Using Licensing productively • Using ANSYS provided processes effectively Hardware Considerations • This section is meant to provide an overview of the different hardware components and how they can effect solution time. • Hopefully this will give you some of the tools to understand why some of the benchmark numbers in better detail. • ANSYS would always recommend that the best thing to do before buying a system is to look at the latest benchmarks. • If you are not sure please ask. Effect of Clock Speed Impact of CPU Clock on Application Performance Processor: Xeon X5600 Series Hyper Threading: OFF, TURBO: ON Active cores: 12/node; Memory speed: 1333 MHz (performance measure is improvement relative to CPU Clock 2.66 GHz) 1.40 1.35 Improvement due to Clock Higher is better 1.30 1.25 1.20 1.15 2.66 GHz 1.10 2.93 GHz 1.05 3.47 GHz 1.00 0.95 0.90 0.85 0.80 Clock Ratio eddy_417K aircraft_2M turbo_500K sedan_4M truck_14M Effect of Memory Speed We can see here the effect of memory speed. Impact of DIMM speed on ANSYS/FLUENT Application Performance (Intel Xeon x5670, 2.93 GHz) Hyper Threading: OFF, TURBO: ON Active threads per node: 12 (performance measure improvement is relative to memory speed of 1066 MHz) This has implications on how you build your hardware. On other processors non‐ optimally filling the memory channels can slow the memory speed. 125% 120% Impact of Memory Speed Some processors types have slower memory speeds by default. 130% 115% 110% 1066 MHz 105% 1333 MHz 100% 95% 90% 85% 80% eddy_417K turbo_500K aircraft_2M sedan_4M ANSYS/FLUENT Model truck_14M Turbo Boost (Intel) / Turbo Core (AMD) Turbo Boost (Intel)/ Turbo Core(AMD) is a form of over‐clocking that allows you to give more GHz to individual processors when others are idle. With the Intel’s have seen variable performance with this ranging between 0‐8% improvement depending on the numbers of cores in use. The graph below for CFX on a Intel X5550. This only sees a maximum of 2.5% improvement. Hyper‐Threading: ANSYS Fluent Hyper‐Threading Technology makes a single physical processor appear as two logical processors. Evaluation of Hyperthreading on ANSYS/FLUENT Performance iDataplex M3 (Intel Xeon x5670, 2.93 GHz) TURBO: ON (measurement is improvement relative ot Hyperthtreading OFF) This is not the same as physically having two logical processors and does not give double the speedup. It is worth noting that this has licensing implications as you would need to oversubscribe the physical cores and hence would need double the HPC Licenses. Improvemet due to Hyperthreading . Higher is better In our tests we’ve seen as high as a 20% increase in performance although you can see the actual performance can be quite variable from the graph opposite. HT OFF (12 threads on 12 physical cores) HT ON (24 threads on 12 physical cores) 1.10 1.05 1.00 0.95 0.90 eddy_417K turbo_500K aircraft_2M ANSYS/FLUENT Model sedan_4M truck_14M AMD vs. Intel • Traditionally Intel take the “power approach” in general in their 2 socket systems (faster core but less of them per processor/socket). • Traditionally AMD take the economies of scale approach (more cores per processor but individually slower clock speeds). • Remember that this landscape changes because they are constantly in competition with each other. • Please note that whilst we do have some numbers for the new Intel Sandy‐bridge chips we do not have scaling numbers for the equivalent AMD 6200 series at the time of writing this presentation. 2 Socket vs. 4 Socket Systems Current 4 socket systems come up slower than their 2 socket counterparts (based on Intel Westmere vs. Xeon E7‐8837). • Clock speed slower • Memory speed slower • No additional memory bandwidth. Performance of ANSYS Fluent on two‐socket and four‐socket based systems Performance measure is Fluent Rating (higher values are better) 4‐socket based Systems IBM HX5 Blade, X3850 (Xeon E7‐8837 series) 2‐socket based Systems HS22/HS22V Blade, 3550/3650 M3, Dx360 M3 (Xeon 5600 Series) odes 1 Sockets Cores 2 12 Fluent Rating 88 Nodes Sockets Cores 1 2 16 Fluent Rating 96 Effect of the Interconnect When going for multiple systems linked together the interconnect becomes an important factor. ANSYS/FLUENT Performance iDataplex M3 (Intel Xeon x5670, 12C 2.93 GHz) Network: Gigabit, 10-Gigabit, 4X QDR Infiniband (QLogic, Voltaire) Hyperthreading: OFF, TURBO: ON Models: truck_14M QLogic 10-Gigabit Gigabit FLUENT Rating 4000 Higher is better The interconnect is the fabric that connects the nodes. We can see from the graph opposite with FLUENT how quickly the performance of Gigabit Ethernet drops off. Voltaire 5000 3000 2000 1000 0 12 24 48 96 192 Number of Cores used by a single job 384 768 ANSYS Fluent Auto‐Partitioning Time to Partition 200M Cavity Case over 768 cores cavity case, 768 cores 12 to partitioning is now very quick Time in seconds ss than 10s to process 800M cells! 10 8 6 4 2 0 rial pre‐partitioning step no Time nger required 200M 400M 600M 800M 2.914 4.706 6.617 9.86 Time in seconds Time to Partition 111M Truck Case truck_111m 9 8 7 6 5 4 3 2 1 ANSYS CFX Partitioning ptimize parallel partitioning in multi‐core clusters (CFX)β Partitioner determines number of connections between partitions and optimizes part.‐host assignments P4 P6 P2 P7 P1 P8 P3 P5 Partitioning step finds adjacency amongst partitions; partitions with max adjacency are grouped on same compute nodes ‐use previous results to initialize calculations on large problem (CFX) β P1 P3 P2 P7 Large case interpolation for cases with >~100M nodes P5 P6 P4 P8 ean up of coupled partitioning option for multi‐domain cases (CFX) Eliminates ‘isolated’ partition spots amatically reduced partitioning times for cases th fluid‐solid interfaces and very large numbers of gions Compute Node 1 Compute Node 2 ANSYS Fluent Parallel Scalability Intel Harpertown nsistently improved scalability 7000 6000 oss releases 5000 4000 6.3.0 3000 12.0.0 2000 1000 0 dan, 4M cells 0 n X5560 @ 2.80GHz (Nehalem EP) 100 200 300 400 500 600 Intel Westmere 70000 60000 50000 40000 13.0.0 12.0.0 13.0.0 30000 20000 14.0.0 ANSYS Fluent Parallel Scalability Intel Harpertown nsistently improved scalability 450 400 oss releases 350 300 250 6.3.0 200 12.0.0 150 100 50 0 ck, 111M cells 0 ICE 8400EX, Intel 6‐core 200 400 600 800 1000 1200 Intel Westmere hex‐core 2.93 GHz 2500 2000 1500 13.0.0 12.0.0 13.0.0 1000 500 14.0.0 ANSYS Fluent Parallel Scalability on Intel SYS Fluent 14 tive Performance er is better Leading Performance for fluid flow simulation Sedan_4m 1.86 Geomean 1 6 core Xeon X5675 Geomean 8 core Xeon E5‐2680 Truck_14m 1.53 1 6 core Xeon X5675 8 core Xeon E5‐2680 The memory bandwidth of the Intel® Xeon® processor E5-2600 product family allows excellent scalability and per core performance. Support for higher speed memory DIMMs, added on-core capacity for memory loads, as well as a larger cache size are key to extending performance and scalability. Higher memory bandwidth has a pronounced impact with fully coupled solver applications, which are the most memory intensive. Sedan_4m is shown as an example of fully coupled solver performance. Truck_14m is representative of segregated solver performance. The horizontal line at 1.63 represents the geomean speedup over 6 standard benchmarks. ANSYS CFX Parallel Scalability on Intel Intel Xeon 5650 AirliftReactor BigPipe CombBVM CombEDM Cylinder IndyCar Internal LeMansCar LES_001 Pump RadCity RadFurnace tageCompressor aticMixer100MM StaticMixer100 StaticMixer200 taticMixer 400k Turbine Wigley100 Intel Xeon E5-2680 • Good scalability and more operations per clock make obtaining results on Intel® Xeon® E5 1.68x faster than on Intel Xeon 5600 platforms • For end user it is about faster turnaround or solving larger tasks with the same resources along with lower TCO Source: Published/submitted/approved results as of March 6, 2012. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may Including Monitors 3072 cores alability with Monitors Scalability to higher core counts Simulations with monitors including plotting and printing Hex‐core mesh, F1 car, 130 million cells monitor‐enabled Monitor support optimizations 200 400 600 800 1000 Example data for scaling with R14 monitors maintain scalability expectations Fluids I/O UENT, CFX and AUTODYN use a “singular” e structure. This means there is one global set of files and every process writes to them. is methodology falls down at a large mber of cores where the file I/O becomes bottleneck. CFX deals with this by using inline compression cdat) FLUENT has both inline compression (cdat) and at v12.x introduced support for a Parallel File pdat). rallel file system support in ANSYS UENT – ~10x ‐ 20x speedup for data write – Eliminates scaling bottleneck for data ANSYS FLUENT Serial I/O Parallel I/O HPC Fluids Demonstration Case To Demonstrate 50:50:50 Method – Volvo XC60 vehicle model – Four shape parameters – RBF Morph (Integrated within FLUENT) to define shape parameters – Grid morphing in parallel ANSYS WorkBench (Frame Work to Automate Process) – To drive shape parameters – To create DOE – To perform Goal Driven Optimization The 50:50:50 Method 50 50 design points in the design space EXTENT 50 50 million cells used in CFD simulation of each design point ACCURACY 50 50 hours total elapsed time to simulate all the design points SPEED “One – Click” – Entire design space is simulated and post‐ processed completely automatically after the initial baseline case setup HPC Fluids Demonstration Case STEP 1 Prepare Meshed Model for Baseline Vehicle Shape STEP 2 CFD Solver Setup, Define Shape Parameters STEP 3 Generate DOE using Input Shape Parameters Mesh Morpher Integrated within FLUENT Solver (FLUENT), Optimizer (DX) & Post Morph Vehicle Shape Processor (CFD Post) Integrated within STEP 4 ANSYS WorkBench Run CFD Simulation Collate Data, HPC Fluids Demonstration Case Task 768 Cores 384 Cores 288 Cores 240 Cores 144 Cores Time (Seconds) Time (Seconds) Time (Seconds) Time (Seconds) Time (Seconds) Baseline Case (i.e. Design Point 1) d volume mesh of baseline into the CFD solver and y solver settings Solution 225 340 365 481 228 6979 11153 14409 17256 27246 ing CFD data file 681 538 558 600 532 Each Subsequent Design Point ph vehicle shape 84 59 65 69 100 Solution 1284 1754 2208 2630 4100 ing CFD data file 734 559 572 621 532 30.80 35.63 42.98 50.28 72.19 al Run Time (Wall Clock) eded for All 50 Design nts (Hours) HPC Fluids Demonstration Case Compute Cluster Details 1. Intel’s Endeavor Cluster 2. Intel Xeon X5670 (dual socket) 3. Clock speed 2.93 GHz 4. Six cores per socket (12 cores per node) 5. 24 GB RAM @ 1333 MHz, SMT ON, Turbo ON 6. QDR Infiniband 7. RHEL Server Release 6.1 GPU Acceleration for CFD Radiation View Factor calculation (ANSYS FLUENT 14 ‐ beta) capability for “specialty physics” view factors, ray tracing, reaction rates, etc. Getting the right setup is a balancing act.. Factors to Consider • HPC Licensing Cost • Cost of Hardware • Complexity of Deployment and Maintenance HPC Licensing Cost • ANSYS HPC is licensed in either the HPC Workgroup/Enterprise (or individually) or HPC Packs. • Given that it is licensed per partition (which in most cases translated to a core) – the best value for money is in getting the best scalability per core as possible. • When running multiple cores make sure you are using them as effectively as the memory bandwidth allows. Cost of Hardware • ANSYS will, in general, recommend the best hardware for performance that gets you the best out of your licensing investment. However you may need to make trade‐off's for your budget. • 2 socket systems provide the best performance but more inherently more complexity (and hence cost) because of the need for high speed interconnects when in a cluster. • Current 4 socket systems have less performance than their 2 socket counterparts but are also cheaper because of their lack of requirement for the high speed interconnects to get to higher numbers of nodes at the low end. Complexity of Deployment and Maintenance • A large cluster can have significant overheads in ease of deployment & on‐going maintenance costs. • A 4 socket system, whilst having less performance, may provide an easier deployment and maintenance route at the lower end and will be a better fit to what the average IT department is used to. • Often users get too caught up on per core performance at the detriment of not getting any extra speedup at all. • It is important to purchase something you feel you can internally support. • Purchase 3rd party support for high performance clusters if you do not feel you have the skills to support it internally. Remember the Following ... If you opt for unsupported infrastructure – This does not mean that it will not work but you use them at your own risk. – We may ask you to replicate it on a system that is supported before providing further support if you run into problems! We recommend: – Buying Supported Operating systems and Hardware – Using ANSYS Supported Practices – Talking to us before buying! It is in all our interests that you get this right! Information Available NSYS Partner Solutions – http://www.ansys.com/corporate/partners/partners‐hpc.asp • Reference configurations • Performance data • White papers • Sales contact points erformance Data – http://www.ansys.com/benchmarks Information Available NSYS Platform Support http://www.ansys.com/services/ss‐platform‐support.asp – Platform Support Policies – Supported Platforms – Supported Hardware – Tested systems NSYS Virtual Demo Room http://www.ansys.com/demoroom/ – Click on HPC! Information Available e Manual Sections on best practices and parallel processing for various solvers nstallation walkthroughs for installing the products, parallel processing, licensing and RSM remote solve manager) NSYS Advantage Online Magazine Information Available stomer Portal http://www1.ansys.com/customer/ – Knowledge Resources – Installation and Systems FAQ’s stomer Support http://www1.ansys.com/customer/ Portal, Email or Phone Automated Design Exploration and Optimization + HPC Best Practices