HP CLUSTER MANAGEMENT UTILITY
Transcription
HP CLUSTER MANAGEMENT UTILITY
HP CLUSTER MANAGEMENT UTILITY March 2011 Various scenarios managed by CMU Deployment of images Site has customized or multiple images tuned for their workload Site wants to deploy image quickly across many nodes Support for emerging Site wants to leverage new features not yet in mainstream HPC technologies Staff has HPC and Linux competency System management Cost Need for simple central GUI for monitoring and issuing commands Need for real time monitoring of node status and activity on cluster and subgroups “Free software” tools don’t work across all platforms and applications, and lack support More expensive options may include features not required and a steep learning curve HP CMU Overview – – Easy, low-cost customizable utility Scalable cluster CLI and GUI − − – Features: • • • – – – One-click selection of groups of nodes with menu-selectable operations Extensible Scalable provisioning Configurable scalable monitoring Remote management commands Proven: over 50000 licenses, Top500 sites included with 1000’s of nodes Broad HP hardware platforms support Multiple Linux distributions • Including Hybrid support w/Windows SUSE Linux Enterprise HP CMU Major Features – Provisioning (GUI and CLI) • Capture & deploy a golden image on all the nodes (or groups of nodes) • Scalable provisioning: 4000+ nodes • Unassisted auto-install (kickstart, autoyast, debian preseed) support • Diskless compute nodes support – Management (GUI and CLI) • day to day administration of the cluster from one central point. • halt, re/boot or broadcast commands to a set of nodes • Cmudiff tool for identifying outliers in configuration or operation – Monitoring • view cluster activity in real time ‘at a glance’ • receive alerts when something special happens on a compute node or on a set of compute nodes • dynamic resource group creation as jobs submitted • collectl support http://collectl.sourceforge.net/ – Lightweight: 1 RPM, easy to upgrade! HP CMU configuration GUI – HP CMU is configured and customized using the HP CMU GUI. Tasks include: • Manually adding, removing, or modifying nodes in the HP CMU database • Invoking the scan node procedure to automatically add several nodes • Adding, deleting, or customizing HP CMU groups • Managing the system images stored by HP CMU • Configuring actions performed when a node status changes such as display a warning, execute a command, or send an email • Exporting HP CMU node list in a simple text file for reuse by other applications • Importing nodes from a simple text file into HP CMU Database HP CMU configuration – 1. Start HP CMU on the management node. – 2. Start the GUI client on the GUI workstation. – 3. Scan the compute nodes. – 4. Create the network entities. – 5. Create the golden image. More than one golden image can be created. – 6. Create the logical groups and user groups. – 7. Backup each golden image in its logical group. – 8. Clone the compute nodes. – 9. Deploy the management agent on the compute nodes. − Install monitoring rpm. − Ping all nodes from the management node. Compute node administration – Halting – Rebooting – Booting and powering off using the management card of the compute nodes – Broadcasting a command to selected compute nodes using a secure shell connection or a management card connection – Direct node connection by clicking a node to open a secure shell connection or a management card connection System disk replication – Creating a new image. While backing up a compute node system disk, you can dynamically choose which partitions to backup – Replicating available images on any number of compute nodes in the cluster – Managing as many different images as needed for different software stacks, different operating systems, or different hardware – • Cloning from one to 4096 nodes at a time with a scalable algorithm which is reliable and does not stop the entire cloning process if any nodes are broken – • Customizing reconfiguration scripts associated with each image to execute specific tasks on compute nodes after cloning Compute node monitoring – You can monitor up to 4096 nodes using a single window. – HP CMU provides the connectivity status of each node as well as sensors. – HP CMU provides a default set of sensors such as CPU load, memory usage, I/O performance, and network performance. – You can customize this list or create your own sensors. You can display sensor values for any number of nodes. – Information provided by HP CMU is used to ensure optimum performance and for troubleshooting. – You can fix thresholds to trigger alerts. All information is transmitted across the network at time intervals, using a scalable protocol for realtime monitoring. A Frame containing the nodes in the cluster B Tree structure of nodes C All possible states of a node D Drop-down menu to view nodes based on classification • Network Entity • Logical Group • User Group E Tool bar containing start, stop, and refresh buttons. The green LED appears when monitoring polls the cluster nodes. F Menu bar G Title of the figures and tables displayed in the main frame H The main frame displaying active monitoring and configuration information Example of building a supercomputer with CMU System mgr environment Provisioning System Admin and monitoring Platform HPC LSF MPI RTM Adaptive Cluster Application Center HP SHMEM & UPC User environment Job scheduling and resource mgmt MPI HPC Open source, Dev. Job tools in CMU tools Auto discover and config Cluster database Linux Distribution Solution Architecture CMU HP Cluster Platform HP Cluster Test Suite (cluster hw diagnostics) Multiple Linux Distros, drivers Provisioning a cluster stack with CMU – Scan node automatically registers the nodes with their network parameters – Install and configure the cluster stack on one “golden” compute node • Use CMU kickstart/autoyast tools to install a new OS • Install workload scheduler/MPI/LVP/HP-MPI/applications/etc. for their environment • Configure their existing user accounts, filesystems, etc. – Backup the compute node image with CMU into a repository • Customer choice: disk image or diskless? – Clone/Distribute this “golden image” to the rest of the cluster • While cloning, perform automatic firmware updates if needed • Perform post-cloning and node-customization tasks – Provision multiple stacks on multiple sets of nodes if desired – Use the image editor for minor image modifications without a full backup operation CMU Diskless Support – Based on NFS-root design • Golden node is installed on disk, then copied to CMU mgt node • Root file-system is read only and shared among all compute nodes • Specific read-write directories are created for each node and mounted on each node • List of read-write directories and files is customizable – Diskless cluster requirements • 1 NFS server for each 256 nodes (CMU supports configuring multiple NFS servers) • 4 GB non-SATA storage for each compute node in NFS server • Additional NFS servers as part of HA solution for NFS • Network infrastructure optimized for diskless needs (1Gb/10GbE & no bottlenecks) – CMU recommends disk-based clusters • Easier to manage; more cost-effective; no single points of failure • Security is the only advantage of diskless (all data resides on NFS server) • Disk failures are a valid concern, but CMU cloning restores image on new disk fast • Better diskless performance is a myth because the same tunings required for diskless can be done for disk-based clusters CMU GUI Basics CMU Cluster Mgmt Panel displays all nodes in selected groupings: by switch location; by image; or by custom grouping Node States display current state of each node CMU Main Display Panel Alerts displayed along the bottom CMU GUI Basics – – – – Right-click in the main area to select which sensors to display CMU pre-configured with standard sensors: CPU and memory usage, and disk and network I/O Simple to add any sensor or alert CMU provides simple support for monitoring GPU temp and ECC errors on sl390s HP CMU monitoring interface – large cluster view Dynamic grouping in one petal CMU Remote Management Commands Selected nodes Power commands Broadcast commands Provisioning commands User-defined commands CMU Remote Management Commands • Multi-window broadcast command (access OS or console) Type here… … and see it there! CMU Remote Management Commands • Single-window pdsh with cmudiff example One command executed across a set of selected nodes… …finds one node running with an old BIOS version! Worldwide CMU Deployments UNIVERSITIES GOVERNMENT and RESEARCH LABS ENGINEERING ENERGY HP Enabling HPC Innovation, Affordability and Efficiency Purpose-built systems for scale Holistic energy mgmt portfolio Modular and adaptable solutions Worldwide expertise and experience 21 Performance Efficiency Agility Confidence Resources www.hp.com/go/hpc www.hp.com/go/cmu: • link to white papers, documentation • CMU Forum on IT Resource Center Outcomes that matter.