RCC Documentation Release 1.0 RCC Staff October 20, 2014
Transcription
RCC Documentation Release 1.0 RCC Staff October 20, 2014
RCC Documentation Release 1.0 RCC Staff October 20, 2014 CONTENTS 1 . . . . . . 3 3 4 4 4 4 4 2 Accessing RCC Resources 2.1 Accounts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Allocations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 5 6 3 User Guide: New Users Start Here 3.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Logging In . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Remote Visualization . . . . . . . . . . . . . . . . . . . . . . 3.4 Software Modules . . . . . . . . . . . . . . . . . . . . . . . 3.5 Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Running Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Managing Allocations . . . . . . . . . . . . . . . . . . . . . 3.8 Software Documentation, Examples, and Submission Scripts . 3.9 Accessing and Transferring Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 10 10 14 16 18 21 26 27 27 Frequently Asked Questions 4.1 General . . . . . . . . . . . . . 4.2 Getting Started . . . . . . . . . 4.3 Allocations . . . . . . . . . . . 4.4 Software . . . . . . . . . . . . 4.5 Cluster Usage . . . . . . . . . . 4.6 Performance and Coding . . . . 4.7 File I/O, Storage, and Transfers 4 5 Introduction 1.1 Resources . . . . . . . 1.2 Services . . . . . . . . 1.3 Grant Support . . . . 1.4 New Faculty . . . . . 1.5 Education . . . . . . . 1.6 Where To Go For Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 39 39 41 42 44 45 46 Tutorials 5.1 Introduction to RCC Hands-on Session . . 5.2 Introduction to RCC for UChicago Courses 5.3 Introduction to RCC for CHEM 268 . . . . 5.4 Introduction to RCC for CSPP 51087 . . . 5.5 Introduction to RCC . . . . . . . . . . . . 5.6 Introduction to Crerar Computer Lab . . . 5.7 Hierarchical Storage Management . . . . . 5.8 Software Modules Tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 47 59 65 72 79 80 85 88 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i 5.9 5.10 5.11 5.12 5.13 6 Working With Data: Data management plans, transferring data, data intensive computing . Slurm Tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to Python Hands-On Exercises . . . . . . . . . . . . . . . . . . . . . . . . . Computational Tools for Chemists and Biochemists - Biopython Bio.PDB . . . . . . . . . Computational Tools for Chemists and Biochemists - Open Babel . . . . . . . . . . . . . Software Documentation 6.1 Applications . . . . . . . . . 6.2 Debugging and Optimization . 6.3 Environments . . . . . . . . . 6.4 Libraries . . . . . . . . . . . 6.5 Scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 93 93 94 97 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 101 104 124 145 147 7 Software Module List 8 KICP 161 8.1 Get an account . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 8.2 Submit A Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 8.3 Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Index ii 153 165 RCC Documentation, Release 1.0 Contents: CONTENTS 1 RCC Documentation, Release 1.0 2 CONTENTS CHAPTER ONE INTRODUCTION The Research Computing Center (RCC) is a centrally managed organization in the Office of the Vice President for Research and National Laboratories. It provides research computing services and resources to University of Chicago researchers across all divisions, colleges, institutes, and departments. This manual provides information to help you gain access to and leverage this support. • • • • • • Resources Services Grant Support New Faculty Education Where To Go For Help 1.1 Resources RCC manages a research computing infrastructure that includes the following professionally managed components: • Tightly coupled nodes with a high-performance, non-blocking FDR10 Infiniband network • Loosely coupled nodes with a GigE network • Nvidia GPU nodes with the latest Nvidia Tesla accelerators • Xeon PHI nodes, with the latest Intel coprocessors • Shared Memory nodes with up to 1 terabyte of system memory • High core count nodes, with up to 64 processing cores per node • A Hadoop MapReduce cluster, with 160 TB HDFS storage • Petascale project storage, for analysis, collaboration, and backup of research data • High performance scratch storage • A data visualization laboratory, with active sterescopy 3D and high resolution 2D capabilities It is free to access and use these resources. Faculty, departments, and institutes can receive dedicated computing resources by augmenting this cluster through the Cluster Partnership Program (CPP). Through the CPP, RCC leverages the collective buying power of the UChicago community to negotiate prices on cutting edge hardware with top-tier vendors. Contact info@rcc.uchicago.edu for more information. 3 RCC Documentation, Release 1.0 1.2 Services High performance computers have grown increasingly complex over the past decades, creating a high barrier to entry for many researchers. The RCC scientific computing group provides one-on-one consulting, training, and technical user support in the form of code porting, optimization, parallelization, scripting, programming, and more. If the needs of your project or group exceeds what can reasonably be provided by basic user support you may benefit from the Consultant Partnership Program. Contact info@rcc.uchicago.edu for more information. 1.3 Grant Support The RCC Scientific Computing group will assist principal investigators at various stages of grant preparation, providing information about guaranteed computing and storage resources, writing the computational sections of proposals, pricing, and other information about hardware and software, responses to special needs and other special assistance. RCC may be able to support your grant application with • a letter of commitment guaranteeing the availability of hosting facilities, computer hardware, storage, system administration, and consultant time, • customized content for your Data Management Plan that considers your research project needs and RCC services, • agreements with set prices and dates for the acquisition and management of research computing hardware. These requests will be handled on an individual basis. Contact info@rcc.uchicago.edu for more information. 1.4 New Faculty Are you a new member of the faculty at the University of Chicago? RCC can assist you with your research computing needs in advance of your move. Get started with an account, assistance transferring data, porting applications to the RCC cluster, and purchasing dedicated compute nodes and storage for your group at any time. Contact info@rcc.uchicago.edu for more information. 1.5 Education The RCC can support your education-related activities by • providing training for your class/workshop on research computing, • providing accounts and resource allocations on the cluster for students and instructors, • reserving the Data Visualization Laboratory for occassional lectures or class meetings. Contact info@rcc.uchicago.edu for more information. 1.6 Where To Go For Help If this documentation doesn’t have the information you need or you have other issues, the Research Computing Center staff is available to answer your questions. Support requests, suggestion, and error reports can be sent via email to help@rcc.uchicago.edu or by calling 773-795-2667 during normal business hours. Users can also stop by the RCC’s offices located in Regenstein 216. 4 Chapter 1. Introduction CHAPTER TWO ACCESSING RCC RESOURCES In order to make use of resources provided by the Research Computing Center you will need a user account and an associated allocation of resources. This document describes the necessary steps to obtain each of these, and the policies which govern their availability. • Accounts – What is a User Account? – Sign up for a User Account • Allocations – What is an allocation? – Types of allocations – How do I apply? – What are my responsibilities? – Managing Allocations 2.1 Accounts 2.1.1 What is a User Account? A User Account is a set of credentials for accessing RCC systems and is used to uniquely identify individuals. The RCC uses the University of Chicago CNetID and its associated password for authentication. Use of RCC systems is governed by the RCC User Policy. 2.1.2 Sign up for a User Account All University of Chicago faculty and staff researchers eligible to be a grant Principal Investigator (PI; see this link for the University policy) are eligible to obtain an account by completing a PI Account Request. A PI account enables a user to log into RCC systems, apply for access to RCC resources, and grant access and delegate responsibilities to collaborators at the University of Chicago and elsewhere. Users who are ineligible for a PI account and have a valid CNetID should complete a User Account Request and provide the name of their PI. The PI will be asked to approve the account request via email. We can also provide a CNetID for non-University of Chicago users. Please contact us at help@rcc.uchicago.edu for more details. 5 RCC Documentation, Release 1.0 2.2 Allocations 2.2.1 What is an allocation? An allocation is a quantity of computing time and storage resources that are granted to a PI. The basic unit of computational resources is the service unit (SU), which represents the use of one hour on one core of the Midway cluster. Storage resources are allocated in units of gigabytes (GB) for the duration of an allocation. Allocations are intended to support researchers furthering specific research goals, and are of limited duration. Dedicated access and long-term storage are not governed by these allocations but rather through the Cluster Partnership Program. There are various types of allocations with differing application deadlines, lengths, and responsibilities. These will be discussed in the following sections. 2.2.2 Types of allocations The RCC offers 4 types of allocations to support various needs: • Startup: These allocations are meant for users to quickly gain access to RCC, port and test their codes, and gather information to use when applying for Research allocations. • Research: This is the most common type of allocation, and is designed to support specific research programs. These are divided into Research I, for medium-sized needs, which are allocated continuously throughout the year, and Research II, which are intended for larger research programs and are allocated bi-annually. Researchers who apply for Research II allocations may be granted a temporary Research I allocation until their Research II can be reviewed. • Education: This class of allocation is designed to support short-term training, courses, and workshops that require access to high-performance computing capabilities. The RCC will provide temporary user accounts for participants who do not already have access to RCC resources. • Special: Requests for resources to support time-critical or high-impact research that cannot be accomplished through standard Research allocations. 2.2.3 How do I apply? Links to the Research Allocation and Account creation forms can be found at access.rcc.uchicago.edu. If you already have a PI-account you need only to submit the Research Allocation Form. Startup allocations are given automatically when PI accounts are created. For most programs, a Research allocation will be most appropriate, and applications are submitted using the Research Allocation Form. Education and Special allocations are handled on an as-needed basis, and can be requested by contacting the RCC at info@rcc.uchicago.edu. Applications should provide a basic description of the intended research, and sufficient detail of codes and research plans to justify the proposed allocation. Requesters should provide a brief proposal with the following information: • A summary of the research goals and the potential impact. • A description of results and publications stemming from any previous RCC allocations. • An estimate of the RCC resources that will be needed, including estimates of the number of nodes/cores, SUs, memory and storage requirements, and whether special resources will be used (GPU or large-memory nodes). Justification based on actual performance and scaling data is preferred but not required. The proposal should demonstrate the ability to efficiently use the requested resources to accomplish the stated research goals. • Applications for Special allocations should include the information required for a Research allocations, and also provide a justification for why a standard Research allocation is insufficient for for their needs, whether by 6 Chapter 2. Accessing RCC Resources RCC Documentation, Release 1.0 critical deadline or allocation limits. Large requests will be required to demonstrate scalability and the efficient use of resources. PIs who are new to the system should make use of their startup allocation to determine how well their codes perform on RCC systems and use this to estimate the resources required for their respective research projects. The limits on the size of allocations and deadlines for submission are provided in the table below. All Research II allocations expire on September 30 of the current allocation period (e.g. allocations during the period 2013-2014 will expire on 9/30/2014). Research I allocations are granted throughout the year and will roll-over at the end of the allocation period. Maximum size (SU) Maximum size (GB) Submission Deadline(s) Start Date(s) Expiration Startup 5,000 500 N/A N/A N/A Research I 100,000 500 N/A N/A N/A Research II 1,000,000 1,000 3/7, 9/12 4/1, 10/1 9/30 Note: Limits for RCC Cluster Partners are 200,000 and 2,000,000 SUs for Research I and II allocations, respectively. 2.2.4 What are my responsibilities? Any publications based on research conducted using RCC resources must acknowledge that contribution as described in How do I cite RCC in my publications and talks?. At the conclusion of each allocation, RCC will seek to collect the following information from each PI: • A summary of the time used, tasks performed, and research goals accomplished with the allocated resources. • A list of grant proposals, publications, presentations, patents, news articles, etc., that were contributed to from the allocation. • Feedback for the RCC, including issues using RCC resources or suggestions on how to improve the RCC environment. This information is essential to the RCC’s ability to justify the resources it provides to the University of Chicago community. 2.2.5 Managing Allocations Please see the documentation on managing allocations in the user-guide: Managing Allocations. 2.2. Allocations 7 RCC Documentation, Release 1.0 8 Chapter 2. Accessing RCC Resources CHAPTER THREE USER GUIDE: NEW USERS START HERE • Basics • Logging In – SSH – NX • Remote Visualization – Prerequisites – Using Sviz • Software Modules • Storage – Persistent Storage * Home * Project – Scratch * $HOME/midway-scratch * /scratch/local – File System Permissions – Quotas – Snapshots – Backup • Running Jobs – Slurm Commands – Submitting a Job – Slurm Options * Requesting Nodes, Tasks, CPUs, and Memory For Your Job * Additional Slurm Options – Slurm Partitions – Partition Limits • Managing Allocations • Software Documentation, Examples, and Submission Scripts • Accessing and Transferring Files – SCP – Using WinSCP – Globus Online – Samba – HTTP 9 RCC Documentation, Release 1.0 3.1 Basics RCC maintains a traditional Linux operating system environment. Login and compute nodes run the 64-bit version of Scientific Linux 6, which is primarily developed by Fermilab. Scientific Linux is a recompilation of Red Hat Enterprise Linux (RHEL), and they should maintain binary compatibility. Nodes boot via the network and run the operating system completely from RAM. Local disks are available but are used only for local scratch and swap space. All users must have a UChicago CNetID to log in to any RCC systems. Your RCC account credentials are your CNetID and password: Username: CNetID Password: CNet password Hostname: midway.rcc.uchicago.edu Note: RCC does not store your CNet password and we are unable to reset your password. If you require password assistance, please contact UChicago IT Services. Users typically log in via SSH and work via the command line. The Slurm resource manager is used to schedule jobs. Users can submit serial, parallel, and interactive jobs to the queue. All nodes mount an IBM GPFS file system. Home, project, and high performance scratch directories are available via GPFS. 3.2 Logging In 3.2.1 SSH Access to RCC is provided via secure shell (SSH) login, a tool that allows you to connect securely from any computer (including most smart phones / tablets). Most Unix-like operating systems (Mac OS X, Linux, etc) provide an ssh utility by default that can be accessed by typing the command ssh in a terminal. To login to Midway from a Linux or Mac computer, open a terminal and at the command line enter: $ ssh <CNetID@>midway.rcc.uchicago.edu Windows users will first need to download an ssh client, such as PuTTY, which will allow you to interact with the remote Unix command line. Use the hostname midway.rcc.uchicago.edu and your CNetID username/password to access the RCC login node. 3.2.2 NX It is possible to use NX to log in to Midway. NX is used for remote X sessions. In other words, NX gives you a graphical interface to Midway: 10 Chapter 3. User Guide: New Users Start Here RCC Documentation, Release 1.0 There are some differences between the Mac OS X, Windows, and Linux clients, but for the most part these steps should be followed: • Download and install the appropriate NX client here: http://www.nomachine.com/download.php • Create a new connection using the ssh protocol. • Name the connection Midway. Enter either midway-login1.rcc.uchicago.edu or midwaylogin2.rcc.uchicago.edu for the host field. Choosing a specific host allows you to suspend and resume NX sessions. 3.2. Logging In 11 RCC Documentation, Release 1.0 • During the connection setup, select advanced • Select Use the NoMachine login and then hit the Settings button. • Save the Midway NX Key to your local computer. • Select Use an alternate server key and enter the path to the key you downloaded in the previous step. • Select Continue and then connect • Sign in using your CNetID (You may save your password in the config file to avoid signing in every time you connect via NoMachine) 12 Chapter 3. User Guide: New Users Start Here RCC Documentation, Release 1.0 • Click <New virtual desktop or custom session> and select Create a new KDE virtual desktop • You will be presented with a barebone IceWM desktop 3.2. Logging In 13 RCC Documentation, Release 1.0 • Right-click on the desktop to open the Terminal, log off, and other utilities Note: Access the NoMachine settings by hovering the mouse in the top right corner of the NoMachine window, or by using the hotkey (ctrl + option + 0) for Mac OSX and (ctrl + alt + 0) on Windows. From there you can change the display settings for the virtual desktop. 3.3 Remote Visualization RCC provides a custom remote visualization tool that can be used to run graphics-intensive OpenGL applications on designated visualization nodes in the Midway cluster. This tool is called sviz. Sviz works by launching a TurboVNC server on a Midway compute node that is equipped with a high-performance GPU device. Once the TurboVNC server is started, you can connect to it with a TurboVNC viewer and run graphics intensive applications through the GUI. 3.3.1 Prerequisites You will need to install the TurboVNC viewer software on your local machine. This software is available for most platforms and can be obtained from: http://sourceforge.net/projects/turbovnc/files/ 3.3.2 Using Sviz 1. Log into Midway via SSH 2. At the prompt, execute the sviz command: [username@midway ~]$ sviz Submitted batch job 4655917 Waiting for JOBID 4655917 to start (press CTRL-C to stop) ........... ----------------------------------------------------------Select your preferred resolution for this session or press ENTER for default: (1) 800x600 14 Chapter 3. User Guide: New Users Start Here RCC Documentation, Release 1.0 (2) 1024x768 (3) 1280x720 (4) 1440x900 (5) 1920x1080 HINT: Your default resolution can be specified in: /home/username/.vnc/turbovncserver.conf ----------------------------------------------------------Enter the number corresponding to your selection: 3. Select your desired resolution for the remote visualization session by entereing the number corresponding to your selection and press enter. 4. You will now be presented with connection information for your session: A VNC server is now active for your session. Server IP address: 10.50.181.229:5904 Password: 73633147 - Closing this terminal will terminate your session! - This session has been started with an 8-hour time limit. - Make sure to preface any OpenGL-dependent commands with ’vglrun’. Note: Closing the terminal you used to launch sviz will terminate your remote visualizaiton session! 5. On your local machine, start your TurboVNC viewer software and enter the Server IP address (including port number) and use the provided password. 6. You should now be delivered to a desktop on a Midway visualizaiton node. To obtain a terminal, right-click on the desktop background and select Terminal 7. To run an OpenGL-dependent program, preface the command to launch the program with the vglrun command: $ vglrun <command> 8. To terminate your remote visualization session, type exit at the prompt in the terminal with which sviz was launched. 3.3. Remote Visualization 15 RCC Documentation, Release 1.0 3.4 Software Modules See also: Software Module List, Software Documentation RCC uses Environment Modules for managing software. The modules system permits us to set up the shell environment to make running and compiling software easier. It also allows us to make available many software packages and libraries that would otherwise conflict with one another. RCC has a large selection of software available, but if you need software not currently available in the module system, send a detailed request to help@rcc.uchicago.edu. Modules typically set a standard set of environment variables if necessary as well as any specialized environment variables that software packages require. The typical set of environment variables set are the following: CPATH Set the compiler include path LD_LIBRARY_PATH Set the runtime shared library search path for shared libraries LIBRARY_PATH Set the compiler/linker shared library search path for shared libraries MANPATH Set the manual page search path PATH Set the shell executable search path PKG_CONFIG_PATH Set the path searched by the pkg-config command. The pkg-config command is frequently used by software packages that depend on other packages to determine suitable include and/or linker options. You can see what software modules are available with this command: $ module avail Display a module: $ module display openmpi ------------------------------------------------------------------/software/modulefiles/openmpi/1.6: conflict intelmpi mpich2 mvapich2 module-whatis setup openmpi 1.6 compiled with the system compiler conflict openmpi prepend-path PATH /software/openmpi-1.6-el6-x86_64/bin prepend-path LD_LIBRARY_PATH /software/openmpi-1.6-el6-x86_64/lib prepend-path LIBRARY_PATH /software/openmpi-1.6-el6-x86_64/lib prepend-path PKG_CONFIG_PATH /software/openmpi-1.6-el6-x86_64/lib/pkgconfig prepend-path CPATH /software/openmpi-1.6-el6-x86_64/include prepend-path MANPATH /software/openmpi-1.6-el6-x86_64/share/man ------------------------------------------------------------------- List all available versions of a specific module: $ module avail openmpi -------------------- /software/modulefiles -----------------------openmpi/1.6(default) openmpi/1.6+intel-12.1 openmpi/1.6+pgi-2012 Load a default software module: 16 Chapter 3. User Guide: New Users Start Here RCC Documentation, Release 1.0 $ module load openmpi Load a specific software module version: $ module load openmpi/1.6+intel-12.1 List currently loaded modules: $ module list Currently Loaded Modulefiles: 1) slurm/2.4 2) vim/7.3 4) emacs/23.4 5) env/rcc 3) subversion/1.6 6) openmpi/1.6 Unload a module: $ module unload openmpi Module names reflect the name, version, and compiler of the installed software. They will follow the pattern NAME/VERSION+COMPILER. The software packages are at /software on the system. The software is installed to directories that follow the pattern NAME-VERSION-BUILDTAG+COMPILER. Most software also has a file named build.log-TIMESTAMP in its root directory that has the output of the install process for the software. Here is an explanation of the components of the module name and directory path: NAME The name of the software module VERSION The version of the software. Typically we only use major version numbers of software unless there is a good reason for maintaining multiple minor versions of a particular piece of software. BUILDTAG This identifies what Linux distribution and/or architecture the software was either built on or should be used on. Typically this will be el6-x86_64 to signify the binary is compatible with 64 bit Enterprise Linux 6 or just x86_64 to signify any 64 bit Linux distribution. This is not used in the name of the software module. COMPILER This is optional and specified if the software was built with a non-default compiler, including MPI. Any modules that require a specific compiler will load the compiler module as well. Examples are intel-12.1 and intelmpi4.0+intel-12.1. As an example, OpenMPI has modules compiled with the default compiler (GCC), intel/12.1, and pgi/2012. The default version uses GCC: $ module avail openmpi --------------------- /software/modulefiles ---------------------openmpi/1.6(default) openmpi/1.6+intel-12.1 openmpi/1.6+pgi-2012 The directories in /software: $ ls -d /software/openmpi-1.6-el6-x86_64* /software/openmpi-1.6-el6-x86_64 /software/openmpi-1.6-el6-x86_64+intel-12.1 /software/openmpi-1.6-el6-x86_64+pgi-2012 3.4. Software Modules 17 RCC Documentation, Release 1.0 3.5 Storage RCC has a GPFS shared file system which is used for user’s home directories, shared project space, and high performance scratch. All compute nodes also have local storage that can be used for scratch if necessary. RCC uses IBM GPFS for shared file storage. We maintain two data storage systems, one optimized for performance and the other for capacity. The high-performance storage system has a raw capacity of 86TB, and hosts scratch space. The high-capacity storage system provides persistent storage of data and files, and has a raw capacity of 480TB. It hosts space for the home and project directories. 3.5.1 Persistent Storage Persistent storage areas are appropriate for long term storage. They have both file system snapshots and tape backups for data protection. The two locations for persistent storage are the home and project directories. Home Every RCC user has a home directory located at /home/CNetID. This directory is accessible from all RCC compute systems and is generally used for storing frequently used items such as source code, binaries, and scripts. By default, a home directory is only accessible by its owner (mode 0700) and is suitable for storing files which do not need to be shared with others. The standard quota for a user’s home directory is 25 GB. Project A project directory is created for all approved project accounts. It is located at /project/project. These directories are generally used for storing files which are shared by members of a project / research group and are accessible from all RCC compute systems. The default permissions for files and directories created in a project directory allow group read/write with the group sticky bit set (mode 2770). The group ownership is set to the project account. The quota for each project directory varies with the size of the allocation. 3.5.2 Scratch $HOME/midway-scratch High performance shared scratch space is available in $HOME/midway-scratch. This is a symlink to /scratch/midway/CNetID. This scratch space is intended to be used for reading or writing data required by jobs running on the cluster. Scratch space is neither snapshotted nor backed up and may be periodically purged. It is the responsibility of the user to ensure any important data in scratch space is moved to persistent storage. The default permissions for this scratch space allow access only by its owner (mode 0700). The standard soft quota on scratch is 100 GB. The hard limit is 5 TB, and the quota grace period is 30 days. It is possible to exceed the 100 GB soft limit for up to 30 days before writing to scratch is disabled. It is not possible to exceed the 5 TB hard limit. Additionally, users that have a quota status of expired on their scratch directory for 45 days will have their scratch directory deleted. The expired status occurs after the 30 day grace period. Therefore, if you are above the 100 GB soft limit for 75 days all of your files will be deleted from scratch. You will get email notifications before any of your files are deleted. At this time, there is no automated purge of scratch for users below the soft limit. 18 Chapter 3. User Guide: New Users Start Here RCC Documentation, Release 1.0 /scratch/local All compute nodes have a single local hard disk available for scratch space for situations where that would be more appropriate. It is available in /scratch/local. Users should create a directory and use that directory for scratch space. All files in /scratch/local will be removed when a node is rebooted. 3.5.3 File System Permissions Let’s summarize the default file system permissions: Directory $HOME $HOME/scratch-midway /project/project Permissions 0700 – Accessible only to the owner 0700 – Accessible only to the owner 2770 – Read/write for the project group The default umask is 002. When new files or directories are created, the umask influences the default permissions of those files and directories. With the umask set to 002 all files and directories will be group readable and writable by default. In your home directory, the group ownership will be set to your personal group, which is the same as your CNetID, so you will still be the only user that can access your files and directories. In the project directories, the group sticky bit causes the group ownership to be the same as the directory. This means files created in a project directory will be readable and writable by the project group, which is typically what is wanted in those directories. Here is an example of what this means in practice: $ ls -ld $HOME /project/rcc drwx------ 108 wettstein wettstein 32768 2013-01-15 drwxrws--- 24 root rcc-staff 32768 2013-01-15 $ touch $HOME/newfile /project/rcc/newfile $ ls -l /project/rcc/newfile $HOME/newfile -rw-rw-r-- 1 wettstein wettstein 0 2013-01-15 10:48 -rw-rw-r-- 1 wettstein rcc-staff 0 2013-01-15 10:48 10:51 /home/wettstein 10:48 /project/rcc /home/wettstein/newfile /project/rcc/newfile Both files are readable and writable by the group owner due to the default umask, but the group owner differs due to the sticky bit being set on /project/rcc. Note: This applies only to newly created files and directories. If files or directories are copied from elsewhere, the ownership and permission may not work like this. GPFS also allows file system ACLs to be set. These allow more fine-grained permissions than the typical Unix permissions of user/group/other, but ACLs are more complicated to apply. The commands used to view and change the ACLs are getfacl and setfacl, respectively. You’ll need to consult the man page for those commands for full details, but here is an example that allows read access on $HOME/acldir to user drudd: $ mkdir $HOME/acldir $ touch $HOME/acldir/acltest # change the existing ACLs and create a default ACL for drudd $ setfacl -Rm u:drudd:rX,d:u:drudd:rX $HOME/acltest # create a file to check that the default ACL workds $ touch $HOME/acldir/acltest2 # show the current ACLs on $HOME/acltest $ getfacl -R $HOME/acltest getfacl: Removing leading ’/’ from absolute path names # file: home/wettstein/acltest # owner: wettstein # group: wettstein user::rwx user:drudd:r-x 3.5. Storage 19 RCC Documentation, Release 1.0 group::rwx mask::rwx other::r-x default:user::rwx default:user:drudd:r-x default:group::rwx default:mask::rwx default:other::r-x # file: home/wettstein/acltest/aclfile # owner: wettstein # group: wettstein user::rwuser:drudd:r-group::rwmask::rwother::r-# file: home/wettstein/acltest/aclfile2 # owner: wettstein # group: wettstein user::rwuser:drudd:r-x #effective:r-group::rwx #effective:rwmask::rwother::r-- 3.5.4 Quotas Home directories, project directories, and shared scratch directories have quotas enforced on them. To check your current quotas use the quota command. Typical output may look like this: Disk quotas for user rccuser: Filesystem type used quota limit files grace ----------------- --------- --------- --------- ---------- -------home USR 2.90 G 10.00 G 12.00 G 39979 none midway-scratch USR 24.68 G 5.00 T 6.00 T 105263 none project-rccproj FILESET 24.33 T 40.00 T 41.00 T 14365487 none Descriptions of the fields: Filesystem This is the file system or file set where this quota is valid. type This is the type of quota. This can be USR for a user quota, GRP for a group quota, or FILESET for a file set quota. File set quotas can be considered a directory quota. USR and GRP quotas can exist within a FILESET quota to further limit a user or group quota inside a file set. used This is the amount of disk space used for the specific quota. quota This is the quota or soft quota. It is possible for usage to exceed the quota for the grace period or up to the hard limit. limit This is the limit or hard quota that is set. It is not possible for usage to exceed this limit. 20 Chapter 3. User Guide: New Users Start Here RCC Documentation, Release 1.0 files This is the number of files currently counted in the quota. There are currently no quotas enforced on the number of files. grace This is the grace period which is the amount of time remaining that the quota can be exceeded. It is currently set to start at 7 days. The value none means that the quota is not exceeded. After the quota has been exceeded for longer than the grace period, it will no longer be possible to create new files. 3.5.5 Snapshots Automated snapshots of the home and project directories are available in case of accidental file deletion or other problems. Currently snapshots are available for these time periods: • 4 hourly snapshots • 7 daily snapshots • 4 weekly snapshots Snapshots are in these directories: • /snapshots/home/*/home/CNetID – Home snapshots • /snapshots/project/*/project – Project snapshots The subdirectories refer to the frequency and time of the backup, e.g. daily-2012-10-04.06h15 or hourly-2012-1009.11h00. To restore a file from a snapshot, run either cp or rsync. 3.5.6 Backup Backups are performed on a nightly basis to a tape machine located in a different data center than the main storage system. These backups are meant to safeguard against disasters such as hardware failure or events that could cause the loss of the main data center. Users should make use of the snapshots described above to recover files. During periods of high activity tape backups may take longer than 24 hours to complete. It is therefore possible that the tape backup can occassionally be a couple of days out of date. 3.6 Running Jobs Slurm (Simple Linux Utility for Resource Management) is used to manage resources on the Midway compute cluster. It allocates access to resources, provides a framework for starting, executing, and monitoring calculations, and arbitrates contention for resources by managing a queue of pending jobs. 3.6.1 Slurm Commands There are several Slurm commands available to perform job submission, view the current job queue, view node status, etc. The most important commands you need to get started are below. sbatch submit a batch job script (similar to qsub) srun submit a job for execution in real time or run a task within an existing allocation sinteractive interactively run jobs from within a normal shell (similar to qsub -I) squeue report the state of jobs or job steps (similar to qstat) 3.6. Running Jobs 21 RCC Documentation, Release 1.0 sacct display accounting data for jobs scancel cancel a running job, or remove a pending job from the queue (similar to qdel) sinfo view information about nodes and partitions sview graphical display of cluster status (requires X or an NX client ) 3.6.2 Submitting a Job To check that your account is working properly, you can simply run sinteractive after you have logged in. This will give you a single cpu interactive session using all of the defaults for your account. For typical usage, though, you will want to create a submission script and set various options. An example follows: #!/bin/bash # Simple sbatch submission script. Slurm options in the submission script # start with #SBATCH #SBATCH #SBATCH #SBATCH #SBATCH --ntasks=1 --job-name=MyJobName --output=MyJobName.out --time=1:00:00 hostname Copy and paste this into a file named myscript.sbatch. This script can be submitted to the queue with the sbatch command: $ sbatch myscript.sbatch After the job starts, the output can be seen by: $ less MyJobName.out An example MPI Job script (see the MPI section for more details on submitting MPI jobs): #!/bin/bash # Set the number of nodes to 2 #SBATCH --nodes=2 # Set --exclusive to guarantee this is the only job running on a machine #SBATCH --exclusive # Set --constraint=ib to select nodes that have an Infiniband interface #SBATCH --constraint=ib # load the openmpi module. Depending on your user environment, it may be # necessary to add ’. /etc/profile’ for the module command to be available # in your submission script module load openmpi # no -n option is required for mpirun. All RCC supported MPI libraries # can automatically determine the correct layout based on the Slurm options # that were specified in the submission mpirun hello-ompi 22 Chapter 3. User Guide: New Users Start Here RCC Documentation, Release 1.0 3.6.3 Slurm Options Options for the sbatch command can be given either on the command line or in the submission script as seen above by prefixing the line with #SBATCH. Most of these options can also be used with srun or sinteractive. Requesting Nodes, Tasks, CPUs, and Memory For Your Job Slurm gives several options for requesting the number of nodes, tasks, CPUs, and memory for a job. The following options can be used to tell Slurm what resources to allocate to a job: -N, --nodes=<nodes> This option is used to request the number of actual nodes for the job. For example, if the job requests --nodes=2, it is guaranteed that the job will be allocated resources on 2 nodes. -n, --ntasks=<ntasks> This option is used to request the number of tasks (typical a single process or MPI rank) for a job. By default a task typically runs on 1 cpu, but that can be changed with --cpus-per-task. The tasks will be split across your node allocation. Normally the scheduler will minimize the number of nodes used for the tasks, but there are no guarantees about task placement unless other options to explicity request the number of nodes (--nodes=<nodes>) or number of tasks per node (--ntasks-per-node=<ntasks>) are set. The mpiexec and mpirun from RCC supported MPI versions will use the number of tasks specified in --ntasks=<ntasks> as the number of ranks to start if no other explicit option is given. See the MPI section for more details on submitting MPI jobs. -c, --cpus-per-task=<ncpus> This option allows the job to change the number of cpus allocated per task. The default is 1 cpu per task. A single task will not span multiple nodes, so when this option is used it ensures that the requested number of CPUs for the task will be allocated on a single node. This is typically used for threaded (OpenMP) applications, but can be used in other situations. See Hybrid MPI/OpenMP for an example of using this option. --ntasks-per-node=<ntasks> This options allows the job to explicitly specify how many tasks per node should be run. This option can be used if memory requirements for a job limit the number of processes a single node can run. It is also combined with the --cpus-per-task=<ncpus> for hybrid MPI/OpenMP when there is a requirement to run multiple threaded OpenMP MPI ranks on each node. --mem-per-cpu=<MB> This option requests a specific amount of memory either node or per core in Megabytes. The default value for 2000 MB, which is based on hardware specifications of a standard compute node. If a job requests additional memory on a standard compute node the number of tasks possible per node will be lowered so the memory request can be met. If a job requests more memory than is available on a node the job will never be eligible for run. --exclusive This option to ensures that the job is the only job running on the node/nodes specified. Setting this flag is equivalent to requesting all of the cores on a node. The other allocation options above can be set as necessary, but the entire node will be allocated to the job regardless of those options. Slurm sets several environment variables in the job environment to reflect these options: SLURM_JOB_NUM_NODES The number of nodes allocated to the job. SLURM_JOB_NODELIST The list of nodes allocated to the job. SLURM_NTASKS_PER_NODE The number of tasks per node allocated to the job 3.6. Running Jobs 23 RCC Documentation, Release 1.0 SLURM_NTASKS The number of tasks allocated to the job. SLURM_CPUS_PER_TASK The number of cpus per tasks allocated to the job. These variables can be used in submission scripts as command line arguments for programs or to set environment variables such as OMP_NUM_THREADS in the job submission script. Additional Slurm Options These are additional frequently used options for sbatch and other Slurm submission utilities: -A, --account=<account> Choose the account your job should be charged to if you have membership in multiple accounts -C, --constraint=<list> This allows you to choose nodes with specific features set. For multi-node MPI jobs, --constraint=ib should be set to guarantee that the job will be run on a node with Infiniband. -J, --job-name=<jobname> Set the name of the job in the queue. The default is the name of the submission script. --mail-type=<type> Notify the user by email when the state of the job changes (typically used when a job starts and finishes). Valid types are BEGIN, END, FAIL, REQUEUE, and ALL. --mail-user=<user> Email address used when sending notifications specified by –mail-type. If not specified it defaults to the email address provided when the RCC account was requested, typically your University of Chicago email address {cnetid}@uchicago.edu. -o, --output=<output pattern> Send the job output to a specific file. The default is slurm-%j.out, where %j is the job id. -p, --partition=<partition> Set the partition for your job --qos=<qos> Set the Quality of Service (QOS) for your job. The QOS used affects the number of nodes, cpu cores, and wall time available for your job -t, --time=<time> Set the time limit for this job. Setting an appropriate time limit allows the scheduler to work more effectively. 1:00:00 specifies a 1 hour time limit --gres=gpu:<N> Choose the number of GPUs per node allocated to your job when running on the GPU partition (1 or 2 only) For each of the commands provided above there are many more options and configurations allowing for incredible control when specifying job parameters, output reports, and system interactions. 3.6.4 Slurm Partitions Slurm groups nodes into what it calls partitions. This is similar to what other resource managers would call a queue. To select which partition your job runs in use --partition=<partition>. It is not necessary to use that option to submit to the default partition. The following partitions are available to all users: 24 Chapter 3. User Guide: New Users Start Here RCC Documentation, Release 1.0 sandyb This is the default partition. Nodes in this partition have 16 cores total with 32 GB of RAM. The processors are 8 core Intel Xeon E5-2670 2.6 GHz. There are both tightly and loosely coupled nodes available, so it is necessary to use the --constraint option to guarantee which type of node your job should run on. Use --constraint=ib to use nodes that have an Infiniband interface. Use --constraint=noib to use nodes that don’t have Infiniband. westmere These nodes have 12 cores total and 24 GB of RAM. The processors are Intel Xeon X5675 3.06 GHz. All nodes are tightly coupled. This partition is currently able run typical high performance jobs, but it also has a special QOS to run large numbers of single core jobs. These jobs would typically be referred to as high throughput jobs. Add --qos=ht to submit these types of jobs. bigmem These nodes have a large amount of memory to run jobs that need more memory than typical jobs. There are 2 nodes with 16 cores and 256 GB of RAM. The processors are 8 core Intel Xeon E5-2670 2.6 GHz. There is also 1 node with 1 TB of RAM and 4 x 8 core E7-8837 2.67GHz processors. All nodes are tightly coupled. These nodes are for jobs that need more memory than typical jobs. gpu These nodes have 16 cores total with 256 GB of RAM. The processors are 8 core Intel Xeon E5-2670 2.6 GHz. All nodes are tightly coupled. Nodes in this partition have 2 identical Nvidia GPUs per machine. See the GPU section of this manual for more details. amd These nodes have 64 cores total with 256 GB of RAM. The processors are 16 core AMD Opteron 6386 SE 2.8 GHz. All nodes are tightly coupled. Other partitions are available, but those are nodes purchased through the Cluster Partnership Program and available only to members of the group that purchased those nodes. 3.6.5 Partition Limits Limits on resources, like the maximum wall time for a job or the maximum number of CPUs a single user can use, are in place to try to balance usage among various groups or users. These limits are implemented through a Slurm QOS. To view the currently active QOS and the limits for each QOS run the command: rcchelp qos QOS are available in certain partitions only, so you may not be able to use a QOS even if it is listed. An error message will be given at job submission time when an invalid QOS/partition combination is given. Here is a list of the typical QOS limits that are set: MaxWall The maximum wall time for a single job MaxNodes The maximum number of nodes that a single job can use MaxCPUs The maximum number of CPUs that a single job can use MaxCPUsPerUser The maximum total number of CPUs that jobs from a single user can use MaxNodesPerUser The maximum total number of nodes that jobs from a single user can use 3.6. Running Jobs 25 RCC Documentation, Release 1.0 MaxJobs The total number of jobs that a single user can have running MaxSubmit The total number of jobs that a single user can have submitted Note: All of the limits above are per QOS. In other words, jobs submitted by the same user but under a different QOS will only count towards the limits specific to each job’s QOS. 3.7 Managing Allocations RCC provides the command line tool accounts for users to monitor their allocation account balance, job usage, and do simple reporting. To see complete documentation use the option --help. The accounts tool accepts the following options: accounts balance Query the balance for a specified account or for the accounts associated with a given user (you may only query balance for another user when they belong to an account for which you are the PI). accounts balance has the following specific options: --account ACCOUNT[,...] Show the balance for the specified account(s). The current user must belong to those accounts to query balance information. --user USER[,...] Show the balance for accounts that users(s) are associated with. --period PERIOD Show the balance for the specified allocation period (e.g. 2012-2013). If not specified, the balance for the current active allocation period is provided. accounts usage Query usage associated with the specified users or accounts. With no options this command will show the usage for the current user in the current active allocation period. accounts usage has the following specific options: --account ACCOUNT[,...] Show usage information for the specified account(s). --user USER[,...] Show usage information for the specified users(s). --partition PARTITION[,...] Limit usage information to the specified partition(s). --period PERIOD[,...] Limit usage information to the specified allocation period(s). -byuser -bypartition -byperiod -byjob Show the usage information broken down by the respective quantity. These flags cannot be combined. The output when using the -byjob flag may be extensive, we recommend using :command:more to control the output. -all Report usage information including jobs run on partitions where usage is not enforced through allocations. accounts allocations Break down the current allocation account total into individual contributions. This may include several alloca- 26 Chapter 3. User Guide: New Users Start Here RCC Documentation, Release 1.0 tions (Startup, Research I, etc), as well as credits to the account total for job failures, etc. accounts allocations has the following specific options: --account ACCOUNT[,...] Show allocations for the specified account(s). --period PERIOD[,...] Show allocations for the specified period(s). accounts list This command prints all allocation accounts associated with the current user. accounts periods This command prints all allocation periods that can be queried with accounts. accounts partitions This command lists the active partitions for which usage is enforced. To see a complete list of partitions use the slurm command sinfo 3.8 Software Documentation, Examples, and Submission Scripts RCC maintains a custom command-line help system called RCCHelp on the cluster. This system allows you to download batch submission scripts, source code, and input decks for many popular software packages directly from the terminal. Enter: rcchelp for a list of topics. Topics include include compilers, MPI, python, scientific applications, and more. 3.9 Accessing and Transferring Files RCC provides a number of methods for transferring data in/out of Midway. For relatively small amounts of data, we recommend the scp command. For non-trivial file transfers, we recommend using Globus Online for fast, secure and reliable transfers. When working on the UChicago network it is also possible to mount the Midway file systems using Samba. 3.9.1 SCP Most UNIX-like operating systems (Mac OS X, Linux, etc) provide a scp command which can be accessed from the command line. To transfer files from your local computer to your home directory on Midway, open a terminal window and issue the command: Single files: $ scp file1 ... <CNetID>@midway.rcc.uchicago.edu: Directories: $ scp -r dir1 ... <CNetID>@midway.rcc.uchicago.edu: When prompted, enter your CNet password. Windows users will need to download an SCP client such as WinSCP that provides a GUI interface for transferring files via scp. 3.8. Software Documentation, Examples, and Submission Scripts 27 RCC Documentation, Release 1.0 3.9.2 Using WinSCP WinSCP is a scp client software that can be used to move files to and from Midway and a Windows machine. WinSCP can be obtained from http://www.winscp.net. Use the hostname midway.rcc.uchicago.edu and your CNet credentials when connecting. If prompted to accept the server’s host key, select “yes.” The main WinSCP window allows you to move files from your local machine (left side) to Midway (right side). 28 Chapter 3. User Guide: New Users Start Here RCC Documentation, Release 1.0 3.9.3 Globus Online Globus Online is a robust tool for transferring large data files to/from Midway. The RCC has a customized Globus Online login site at https://globus.rcc.uchicago.edu and uses Single Sign On capabilities of CILogon. If you have already signed up, here is the connection information: URL: https://globus.rcc.uchicago.edu End Point: ucrcc#midway Follow these instructions to get started: • Go to https://globus.rcc.uchicago.edu and hit Proceed 3.9. Accessing and Transferring Files 29 RCC Documentation, Release 1.0 • Select “University of Chicago” for the Identity Provider 30 Chapter 3. User Guide: New Users Start Here RCC Documentation, Release 1.0 • Enter your CNetID and password when prompted • You will need to link your University of Chicago credentials to a Globus Online account. Either create a new Globus Online account or sign in to your existing account if you have one. 3.9. Accessing and Transferring Files 31 RCC Documentation, Release 1.0 • Once you are signed in, enter ucrcc#midway as the Endpoint and hit the Go button 32 Chapter 3. User Guide: New Users Start Here RCC Documentation, Release 1.0 • If you want to transfer files from your local computer, click the Get Globus Connect link on the transfer page and follow the instructions. 3.9. Accessing and Transferring Files 33 RCC Documentation, Release 1.0 There is extensive documentation on the Globus Online site as to how to transfer files in different modes. Please refer to their documentation for more details or contact us with any RCC specific issues. 3.9.4 Samba Samba allows uses to connect to (or “mount”) their home and project directories on their local computer so that the file system on Midway appears as if it were directly connected to the local machine. This method of accessing your RCC home and project space is only available from within the UChicago campus network. From off-campus you will need to connect through the UChicago virtual private network. Your Samba account credentials are your CNetID and password: Username: ADLOCAL\<CNetID> Password: CNet password Hostname: midwaysmb.rcc.uchicago.edu Note: Make sure to prefix your username with ADLOCAL\ On a Windows computer, use the “Map Network Drive” functionality and the following UNC paths: Home: \\midwaysmb.rcc.uchicago.edu\homes Project: \\midwaysmb.rcc.uchicago.edu\project On a Mac OS X, use these URLs to connect: Home: smb://midwaysmb.rcc.uchicago.edu/homes Project: smb://midwaysmb.rcc.uchicago.edu/project To connect on a Mac OS X computer you can either click the above URLs or follow these steps: 34 Chapter 3. User Guide: New Users Start Here RCC Documentation, Release 1.0 • Use the Connect to Server utility in Finder • Enter one of the URLs from above in the input box for Server Address. • When prompted for a username and password, select Registered User. • Enter ADLOCAL\user for the username and enter your CNet password. 3.9.5 HTTP It is possible to share files from your home directory via HTTP as well. Create a directory named public_html in your home directory and give at least o+x permissions on your home directory and public_html directory. This can be done with the following command: chmod o+x $HOME $HOME/public_html. The URL will be: http://users.rcc.uchicago.edu/~cnetid 3.9. Accessing and Transferring Files 35 RCC Documentation, Release 1.0 36 Chapter 3. User Guide: New Users Start Here CHAPTER FOUR FREQUENTLY ASKED QUESTIONS 37 RCC Documentation, Release 1.0 38 • General – How do I cite RCC in my publications and talks? • Getting Started – How do I become an RCC user? – What is my RCC username and password? – Can an external collaborator get a CNetID so they can log in to RCC? – What do I need to do get an account if I used to work at UChicago and had a CNetID? – What do I do if I left UChicago and my CNetID password no longer works? – How do I change/reset my password? – What groups am I a member of? – Which RCC systems do I have access to? – How do I access the data visualization lab in the Zar room of Crerar Library? – How do I access RCC systems? – What login shells are supported and how do I change my default shell? – Is remote access with Mosh supported? – Is SSH key authentication allowed on RCC machines? – Why is SSH key authentication not working even though I’ve added my public key to $HOME/.ssh/authorized_keys? • Allocations – What is an allocation? – What is a service unit (SU)? – How do I obtain an allocation? – How is my usage charged to my account? – How do I check the balance of my allocation? – How do I see how my allocation has been used? • Software – What software does RCC offer on its compute systems? – How do I use the modules utility? – How do I get help with RCC software? – Why is command XXX not available? – Why do I get an error that says a module cannot be loaded due to a conflict? – How do I request a new software package or an updated version of an already installed software package? – Why do all module commands in my scripts fail with module command not found? – Why doesn’t the tail -f command work? – Why can’t I run Gaussian? • Cluster Usage – How do I submit a job to the queue? – Where should I run serial executables? – Can I login directly to a compute node? – How do I run a set of jobs in parallel? – What are the queue limits? – Why isn’t my job starting? – Why does my job fail after a couple of seconds? – Why does my job fail with “exceeded memory limit, being killed”? – How do I get the maximum wall time for my jobs increased? – Can I create a cron job? • Performance and Coding – What compilers does RCC support? – Which versions of MPI does RCC support? – Can RCC help me parallelize and optimize my code? – Does RCC provide GPU computing resources? • File I/O, Storage, and Transfers – How do I get a Globus Online Account? – How much storage space have I used / do I have access to? – How do I get my storage quota increased? Chapter 4. Frequently Asked Questions – How do I share files? – I accidentally deleted/corrupted a file, how do I restore it? – How do I request a restore of my files from tape backup? RCC Documentation, Release 1.0 4.1 General 4.1.1 How do I cite RCC in my publications and talks? Please send an email to info@rcc.uchicago.edu with the title, journal, or conference where a presentation was given that made use of RCC resources enough to justify citation. These citations help RCC demonstrate the role of computational resources and support staff in research at UChicago. Reference the RCC as “The University of Chicago Research Computing Center” in citations. Acceptable example citations are below. • This work was completed in part with resources provided by the University of Chicago Research Computing Center. • We are grateful for the support of the University of Chicago Research Computing Center for assistance with the calculations carried out in this work. • We acknowledge the University of Chicago Research Computing Center for support of this work. 4.2 Getting Started 4.2.1 How do I become an RCC user? RCC user account requests should be submitted via our online application forms. General user account requests can be entered on the User Account Request form, and applications for Principal Investigator accounts can be lodged with the PI Account Request service. 4.2.2 What is my RCC username and password? RCC uses University of Chicago CNetIDs for user credentials. When your RCC account is created, your RCC username/password will be the same as your CNetID credentials. 4.2.3 Can an external collaborator get a CNetID so they can log in to RCC? RCC can create CNetIDs for external collaborators as necessary. In the User Account Request form, the user will need to enter their date of birth for us to create a CNetID for them. 4.2.4 What do I need to do get an account if I used to work at UChicago and had a CNetID? Your CNetID exists forever, so it should be possible to use your CNetID for your RCC account. If you know you have an RCC account, but you can’t log in see the next question. 4.2.5 What do I do if I left UChicago and my CNetID password no longer works? It should be possible to use your CNetID for authentication indefinitely, but IT services may expire it when you leave. If you have an RCC account, but you still can’t log in, it is likely that password authentication has been disabled by IT services. Contact us and we will ask IT services to reactivate authentication for your CNetID. 4.1. General 39 RCC Documentation, Release 1.0 4.2.6 How do I change/reset my password? RCC cannot change or reset your password. Go to the CNet Password Reset page to change or reset your password 4.2.7 What groups am I a member of? To list the groups you are a member of, type groups on any RCC system. 4.2.8 Which RCC systems do I have access to? Most users are granted access to the Midway compute cluster by default. For specific requests or questions, please contact help@rcc.uchicago.edu. 4.2.9 How do I access the data visualization lab in the Zar room of Crerar Library? The Zar room and its visualization equipment can be reserved for events by contacting RCC at help@rcc.uchicago.edu. More information regarding RCC’s visualization facilities can be found on the Data Visualization Lab webpage. 4.2.10 How do I access RCC systems? There are various ways to access RCC systems. • To access Midway interactively, both SSH and NX can be used. See Logging In for more details. • If you want to access files stored on Midway, it is possible to use scp, Globus Online, and Samba (Windows file shares). See Accessing and Transferring Files for more details. 4.2.11 What login shells are supported and how do I change my default shell? RCC supports using these shells for your login shell: • /bin/bash • /bin/tcsh • /bin/zsh Use this command to change your shell: $ chsh -s /path/to/shell It may take up to 30 minutes for that change to become active. 4.2.12 Is remote access with Mosh supported? Yes. To use Mosh, first log in to Midway via SSH, and add the command module load mosh to your ~/.bashrc (or ~/.zshenv if you use zsh). Then, you can log in by entering the following command in a terminal window: $ mosh <CNetID>@midway.rcc.uchicago.edu 40 Chapter 4. Frequently Asked Questions RCC Documentation, Release 1.0 4.2.13 Is SSH key authentication allowed on RCC machines? SSH key pair authentication is allowed on Midway. RCC recommends using passphrase-protected private keys in combination with an SSH agent on your client machine for security. Add your public key to $HOME/.ssh/authorized_keys on Midway to allow access from your machine. 4.2.14 Why is SSH key authentication not working even though I’ve added my public key to $HOME/.ssh/authorized_keys? The default umask on Midway can cause an issue with SSH key authentication. Use this command to correct the permissions if there are problems using key-based authentication: $ chmod -R g-w ~/.ssh 4.3 Allocations 4.3.1 What is an allocation? An allocation is a quantity of computing time and storage resources that are granted to a PI. An allocation is necessary to run jobs on RCC systems. See allocations for more details. 4.3.2 What is a service unit (SU)? A service unit (SU) is an abstract quantity of computing resources defined to be equal to one core*hour on the Sandybridge nodes of the Midway cluster. 4.3.3 How do I obtain an allocation? The RCC accepts proposals for large allocations (more than 50,000 SUs) bi-annually. Medium sized allocations of up to 50,000 SUs, special purpose allocations for time-critical research, or allocations for education and outreach may be submitted at any time at Research Allocation Form. See allocations for more details. 4.3.4 How is my usage charged to my account? The charge associated with a job on Midway is the product of the following factors: 1. The number of cores assigned to the job (if the --exclusive flag is used this is equal to the number of cores per node times the number of nodes) 2. The elapsed wall-clock time in hours 3. A partition-specific usage-factor, which translates from that resource’s cpu-hours to SUs. This is normalized so that a Midway Sandybridge node has a usage-factor of 1.0 Note: Usage is tracked in units of 0.01 SU. 4.3. Allocations 41 RCC Documentation, Release 1.0 4.3.5 How do I check the balance of my allocation? The accounts tool provides an easy way for users to check their account balance. Use the following option to query your balance in the current allocation period $ accounts balance 4.3.6 How do I see how my allocation has been used? The accounts tool has a number of methods for summarizing allocation usage. To see an overall summary use $ accounts usage To see the individual jobs that contribute to that usage use the --byjob option $ accounts usage --byjob 4.4 Software 4.4.1 What software does RCC offer on its compute systems? Software available within the RCC environment is continuously changing and we often add new software and versions of existing software. Information about available software can be found in the Software section of this manual. To view the the current list of installed software, log in to any RCC system and use the command: $ module avail To view software versions available for a specific piece of software use this command: $ module avail <software> 4.4.2 How do I use the modules utility? The module system can be accessed by entering module on the command line. There is a tutorial for modules available here. More information can also be found in the Software Modules section of the User Guide. 4.4.3 How do I get help with RCC software? The primary resource for any software is the official documentation and manual which can be accessed by issuing the following command: $ man <command> RCC also maintains supplementary documentation for issues specific to our systems, including basic usage and customizations. To access this documentation run the following on any RCC system: $ rcchelp <command> 42 Chapter 4. Frequently Asked Questions RCC Documentation, Release 1.0 4.4.4 Why is command XXX not available? You probably have not loaded the appropriate software module. To use most software packages, you must first load the appropriate software module. See Software Modules for more information on how to use the module system. 4.4.5 Why do I get an error that says a module cannot be loaded due to a conflict? Some modules are incompatible with each other and cannot be loaded simultaneously. This is especially true for software that provides the same commands, such as MPI implementations, which all provide mpirun and mpiexec commands. The module command typically gives you a hint about which module conflicts with the one you are trying to load. If you see such an error you will need to remove the previously loaded module with the command: $ module unload <module name> 4.4.6 How do I request a new software package or an updated version of an already installed software package? Send email to help@rcc.uchicago.edu with the details of your software request including what software package you need, which version, and any optional dependencies you require. 4.4.7 Why do all module commands in my scripts fail with module command not found? Depending on your shell or working environment, it is possible that the module setup commands aren’t run. To correct this you need to source the appropriate shell startup scripts in your script. • For bash/sh/zsh add source /etc/profile • For tcsh add . /etc/csh.cshrc 4.4.8 Why doesn’t the tail -f command work? By default, tail uses an event based process to look for changes to a file, but this doesn’t work on GPFS file systems. You can add the hidden option ---disable-inotify to make tail poll for changes instead. Alternatively, there is a command available named gpfstail that wraps tail to do the right thing. Run it like this (no need to specify -f): $ gpfstail file.txt 4.4.9 Why can’t I run Gaussian? Gaussian’s creators have a strict usage policy, so we have limited its availability on RCC systems. If you need to use Gaussian for your research, please contact us at help@rcc.uchicago.edu to request access. 4.4. Software 43 RCC Documentation, Release 1.0 4.5 Cluster Usage 4.5.1 How do I submit a job to the queue? RCC systems use Slurm to manage resources and job queues. For more details see Running Jobs. 4.5.2 Where should I run serial executables? Jobs that do not require inter-node communication are most appropriately run on Midway’s high-throughput (loosely coupled) compute nodes. RCC staff is finalizing the queueing systems and policies governing these nodes and will update this document when policies are decided. 4.5.3 Can I login directly to a compute node? It is possible to create an interactive job using the command sinteractive. This command takes the same arguments as sbatch. Interactive jobs are useful for developing and debugging code, but please remember to log out once you are finished so your job allocation will be released. 4.5.4 How do I run a set of jobs in parallel? There are a variety of methods for configuring parallel jobs based on the software package and resource requirements. Look at the Software section of this manual to see if there is specific documentation for the type of parallel job you are trying to run. 4.5.5 What are the queue limits? Run rcchelp qos to see the current limits. 4.5.6 Why isn’t my job starting? There are a number of reasons that your job may be sitting in the queue. The output of squeue typically will help determine why your job is not running. Look at the NODELIST(REASON). A pending job may have these reasons: • (Priority): Other jobs have priority over your job. • (Resources): Your job has enough priority to run, but there aren’t enough free resources to run it. • (QOSResourceLimit): Your job exceeds the QOS limits. The QOS limits include wall time, number of jobs a user can have running at once, number of nodes a user can use at once, etc. This may or may no be a permanent status. If your job requests a wall time greater than what is allowed or exceeds the limit on the number of nodes a single job can use, this status will be permanent. However, your job may be in this status if you currently have jobs running and the total number of jobs running or aggregate node usage is at your limits. In this case, jobs in this state will become eligible when your existing jobs finish. Please contact RCC support if you feel that your job is not being properly queued. Note: If you see a large number of jobs aren’t running when resources are idle, RCC staff may have an upcoming maintenance window. Your job may be requesting a wall time which will overlap our maintenance window, which 44 Chapter 4. Frequently Asked Questions RCC Documentation, Release 1.0 will cause the job to stay in the queue until after maintenance is performed. RCC staff will notify users via email both prior to performing maintenance and after the maintenance is completed. 4.5.7 Why does my job fail after a couple of seconds? There is most likely a problem in your job submission script (ex: the program you are attempting to run cannot be found by a compute node), or the program you are attempting to run is producing an error. Check the output from your job, which is slurm-$JOBID.out by default, for details on what may have gone wrong. If you require further assistance troubleshooting the problem, send your submission script and output from your job to help@rcc.uchicago.edu. 4.5.8 Why does my job fail with “exceeded memory limit, being killed”? By default, SLURM allocates 2GB of memory per CPU core being used in a job. This follows from the fact most midway nodes contain 16 cores and 32GB of memory. If your job requires more than the default amount of memory per core, you must include the --mem-per-cpu=<MB> in your sbatch job script. For example, to use 16 CPU cores and 256GB of memory on a bigmem node the required sbatch flags would be: --ntasks=16 --cpus-per-task=1 --mem-per-cpu=16000 4.5.9 How do I get the maximum wall time for my jobs increased? The RCC queuing system attempts to provide fair and balanced resource allocation to all RCC users. The maximum wall time per job exists to prevent individual users from using more than their fair share of cluster resources. If your particular job requires an extraordinary amount of wall time, please submit a special request for resources to help@rcc.uchicago.edu. 4.5.10 Can I create a cron job? RCC does not support users creating cron jobs. However, it is possible to use Slurm to submit cron-like jobs. See Cron-like Batch Job Submission 4.6 Performance and Coding 4.6.1 What compilers does RCC support? RCC supports the system GCC, Intel Composer, and PGI. See the Compilers section of this manual for more details. 4.6.2 Which versions of MPI does RCC support? RCC maintains builds of OpenMPI, IntelMPI, and MVAPICH2 for supported compilers. See the MPI section of this manual for more documentation and samples for MPI. 4.6. Performance and Coding 45 RCC Documentation, Release 1.0 4.6.3 Can RCC help me parallelize and optimize my code? Support staff is available to consult with your research team to help parallelize and optimize your code for use on RCC systems. Contact RCC staff to arrange a consultation. 4.6.4 Does RCC provide GPU computing resources? Yes. RCC maintains a number of GPU-equipped compute nodes. For details on how to submit jobs to the GPU nodes see the GPU section of this manual. 4.7 File I/O, Storage, and Transfers 4.7.1 How do I get a Globus Online Account? See the Globus Online section of the User Guide for a walkthrough of setting up an account that uses University of Chicago Web-Single-Signon capabilities. 4.7.2 How much storage space have I used / do I have access to? Users have a 25G storage quota in their home directory. Project storage space varies by project. Use the quota command to list your current usage and available storage areas. 4.7.3 How do I get my storage quota increased? To request an increase in the amount of data you can store in your project, home or scratch directories, send a request to help@rcc.uchicago.edu specifying and justifying the amount of additional storage you require. RCC staff will review your request and inform you of the result of the evaluation. Users who participate in the Cluster Partnership Program can purchase additional storage space to augment their default quota allocation. 4.7.4 How do I share files? Using the /project directory is the preferred way to share data amongst group members. Project directories are created for all project accounts. The default permissions restrict access to the project group account, but permissions can be customized to allow access to other users. 4.7.5 I accidentally deleted/corrupted a file, how do I restore it? The best way to recover a deleted/corrupted file is from a snapshot. Snapshots of home and project space occurs on a hourly, daily, and weekly basis. Snapshots can be accessed in /snapshots. In that directory are timestamped directories of available snapshots. Copy the files from the appropriate directory back to where you want them. 4.7.6 How do I request a restore of my files from tape backup? RCC maintains a tape backup of all home and project directories, but only for disaster recovery purposes. There is no long term history of files on tape. You should use file system snapshots to retrieve a previous version of a file or directory. 46 Chapter 4. Frequently Asked Questions CHAPTER FIVE TUTORIALS Tutorials for RCC resources 5.1 Introduction to RCC Hands-on Session These materials are intended to be used during the hands-on session of the Introduction to RCC workshop. 5.1.1 Exercise 1 Get access to RCC 1.1 Request an Account Request a user account by completing the applicable form found at http://rcc.uchicago.edu/accounts. • If you are an eligible Principal Investigator go directly to the PI Account Request form. • If you are a student, post-doc, or otherwise a member of a UChicago research group, complete the General User Account Request Form. • RCC will create these accounts during the workshop if possible. • PIs must acknowledge and approve the request for a user to be added to their projects, so in some cases the activation time will be longer. 1.2 Temporary Accounts If you do not have an account we can give you temporary access. Let the instructor know you need a temporary account and they will give you a Yubikey. The Yubikey allows you to access guest accounts as indicated in the images below. Identify your guest username, which is rccguest plus the last four digits of your Yubikey identification number. 47 RCC Documentation, Release 1.0 Note the button on the bottom of the Yubikey, which will enter your password when pressed. 1. Insert the Yubikey into a USB slot and ssh to Midway 2. Username : rccguest#### 3. Password : touch the contact 4. Let the instructor know if you have trouble. 5.1.2 Exercise 2 Log in to Midway Access to RCC is provided via secure shell (SSH) login, a tool that allows you to connect securely from any computer (including most smartphones and tablets). All users must have a UChicago CNetID to log in to any RCC systems. Your RCC account credentials are your CNetID and password: Username Password Hostname CNetID CNetID password midway.rcc.uchicago.edu Note: RCC does not store your CNet password and we are unable to reset your password. If you require password assistance, please contact UChicago IT Services. 48 Chapter 5. Tutorials RCC Documentation, Release 1.0 Most UNIX-like operating systems (Mac OS X, Linux, etc) provide an SSH utility by default that can be accessed by typing the command ssh in a terminal. To login to Midway from a Linux/Mac computer, open a terminal and at the command line enter: ssh <username>@midway.rcc.uchicago.edu Windows users will first need to download an SSH client, such as PuTTY, which will allow you to interact with the remote Unix server. Use the hostname midway.rcc.uchicago.edu and your CNetID username and password to access the Midway login node. 5.1.3 Exercise 3 Reserve a node to work on interactively RCC uses the SLURM resource manager to schedule jobs. To request one processor to use interactively, use the sinteractive command with no further options: sinteractive Note: We have reserved nodes for use during this workshop. If Midway is under heavy use, include the option --reservation=intro in your SLURM commands to access nodes reserved for this workshop session. For example: sinteractive --reservation=intro The sinteractive command provides many more options for reserving processors. For example, two cores, instead of the default of one, could be reserved for four hours in the following manner: sinteractive --nodes=1 --ntasks-per-node=2 --time=4:00:00 The option --constraint=ib can be used to ensure that an Infiniband connected node is reserved. Infiniband is a fast networking option that permits up to 40x the bandwidth of gigabit ethernet on Midway. Note: Some examples related to MPI/parallel jobs will only work on Infiniband capable nodes. Be sure to reserve use an IB nodes for such exercises below. e.g., sinteractive --constraint=ib 5.1.4 Exercise 4 Submit a job to the scheduler RCC resources are shared by the entire University community. Sharing computational resources creates unique challenges: • Jobs must be scheduled in a fair manner. • Resource consumption needs to be accounted. • Access needs to be controlled. Thus, a scheduler is used to manage job submissions to the cluster. We use the Slurm scheduler to manage our cluster. How to submit a job 1. sinteractive - Gets the specified resources and logs the user onto them 2. sbatch - Runs a script which defines resources and commands 5.1. Introduction to RCC Hands-on Session 49 RCC Documentation, Release 1.0 SBATCH scripts contain two major elements. After the #!/bin/bash line, a series of #SBATCH parameters are defined. These are read by the scheduler, SLURM, and relay informaiton about what specific hardware is required to execute the job, how long that hardware is required, and where the output and error (stdout and stderr streams) should be filed. If resources are available the job may start less than one second following submission. When the queue is busy and the resource request is substantial the job may be placed in line with other jobs awaiting execution. The second major element of an sbatch script is the user defined commands. When the resource request is granted the script is executed just as if it were run interactively. The #SBATCH lines appear as comments to the bash interpreter. An example sbatch script: #!/bin/bash #SBATCH --job-name=test #SBATCH --output=test.out #SBATCH --error=test.err #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 # load your modules here module load intel # execute your tasks here echo "Hello, world" date ls pwd hostname echo "Done with processing" Copy and modify the script above, then submit it to scheduler a few times. If a reservation is active for this workshop you can add #SBATCH --reservation=intro to the script header to ensure that the script run on the reserved nodes. This reservation is not required. When a scheduled job runs SLURM sets many environmental variables that may be helpful to query from within your job. You can explore the environment variables by adding env | grep SLURM to the sbatch script above. The output will be found in the output file(s) defined in your script header. 5.1.5 Exercise 5 Target Hardware With Job Submission Scripts Different types of hardware are usually organized by partition. Rules about job limits such as maximum wallclock time and maximum numbers of CPUs that may be requested are governed by a QOS (Quality of Service). You can target appropriate compute nodes for your job by specifying a partition and a qos in your batch script. Note: Queue parameters change frequently. Always refer to the official documentation and test your submissions before submitting a large run of jobs. #SBATCH --exclusive #SBATCH --partition #SBATCH --qos Exclusive access to all nodes requested, no other jobs may run here Specifies the partition (ex: ’sandyb’, ’westmere’) Quality of Service (ex: ’normal’, ’debug’) 5.1 Specific Configurations • GPU 50 Chapter 5. Tutorials RCC Documentation, Release 1.0 --gres=gpu: Specifies number of GPUs to use • Big Memory - Nodes with >=256G RAM --partition=bigmem By default, SLURM allocates 2GB of memory per CPU core being used in a job. This follows from the fact most midway nodes contain 16 cores and 32GB of memory. If your job requires more than the default amount of memory per core, you must include option:–mem-per-cpu=<MB> in your sbatch job script. For example, to use 16 CPU cores and 256GB of memory on a bigmem node the required sbatch flags would be: --partition=bigmem --ntasks=16 --cpus-per-task=1 --mem-per-cpu=16000 • Westmere - Not as fast as ‘sandyb’, 12 cores per node (vice 16) --partition=westmere At the command prompt, you can run sinfo to get some information about the available partitions, and rcchelp qos to learn more about the qos. Try to modify a submission script from one of the earlier examples to run on the bigmem partition. 5.1.6 Exercise 6 Interact With Your Submitted Jobs Submitted jobs status is viewable and alterable by several means. The primary command squeue is part of a versatile system of job monitoring. Example: squeue JOBID PARTITION 3518933 depablo 3519981 depablo 3519987 depablo 3519988 depablo 3539126 gpu 3538957 gpu 3535743 kicp 3535023 kicp 3525370 westmere 3525315 westmere 3525316 westmere NAME polyA6.0 R_AFM R_AFM R_AFM _interac test.6.3 Alleturb hf.b64.L phase_di phase_di phase_di USER ST ccchiu PD mcgovern PD mcgovern PD mcgovern PD jcarmas R jwhitmer R agertz PD mvlima R khaira R khaira R khaira R TIME NODES 0:00 0:00 0:00 0:00 45:52 58:52 0:00 5:11:46 4:50:02 4:50:03 4:50:03 NODELIST(REASON) 1 (QOSResourceLimit) 1 (Resources) 1 (Priority) 1 (Priority) 1 midway231 1 midway230 6 (QOSResourceLimit) 1 midway217 1 midway008 1 midway004 1 midway004 The above tells us: Name JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) Description Job ID #, unique reference number for each job Partition job will run on Name for the job, defaults to slurm-JobID User who submitted job State of the job Time used by the job in D-HH:MM:SS Number of Nodes consumed List of Nodes consumed, or reason the job has not started running squeue’s output is verbose, but also customizable Example: 5.1. Introduction to RCC Hands-on Session 51 RCC Documentation, Release 1.0 squeue --user CNet -i 5 The above will only show information for user CNet and will refresh every 5 seconds 6.1 Canceling Jobs Cancel a job: scancel <JobID> or cancel all of your jobs at the same time: scancel --user <CNetID> 6.2 More Job Information More information about a submitted job can be obtained through the scontrol command by specifying a JobID to query: scontrol show job <JobID> Example: scontrol show job 3560876 JobId=3560876 Name=sleep UserId=dylanphall(1832378456) GroupId=dylanphall(1832378456) Priority=17193 Account=rcc-staff QOS=normal JobState=CANCELLED Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=0 ExitCode=0:0 RunTime=00:00:10 TimeLimit=1-12:00:00 TimeMin=N/A SubmitTime=2013-01-09T11:39:40 EligibleTime=2013-01-09T11:39:40 StartTime=2013-01-09T11:39:40 EndTime=2013-01-09T11:39:50 PreemptTime=None SuspendTime=None SecsPreSuspend=0 Partition=sandyb AllocNode:Sid=midway-login2:24907 ReqNodeList=(null) ExcNodeList=(null) NodeList=midway113 BatchHost=midway113 NumNodes=1 NumCPUs=1 CPUs/Task=1 ReqS:C:T=*:*:* MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0 Features=(null) Gres=(null) Reservation=(null) Shared=OK Contiguous=0 Licenses=(null) Network=(null) Command=/bin/sleep WorkDir=/scratch/midway/dylanphall/repositories/pubsw/rccdocs.sphinx 5.1.7 Exercise 7 Explore the Module System The module system is a script based system to manage the user environment. Why isn’t everything installed and available by default? The need for • multiple versions of software (version number, addons, custom software) and • multiple build configurations (compiler choice, options, and MPI library) 52 Chapter 5. Tutorials RCC Documentation, Release 1.0 would lead to hoplessly polluted namespace and PATH problems. Additionally, most of the applications used on HPC machines are research codes in a constant state of development. Testing, stability, compatibility, and other usability concerns are often not a primary consideration of the authors. Try running the commands below and review the output to learn more about the module system. Basic module commands: Command module avail [name] module load [name] module unload [name] module list Description lists modules matching [name] (all if ‘name’ empty) loads the named module unloads the named module lists the modules currently loaded for the user 7.1 Example - Find and load a particular Mathematica version Obtain a list of the currently loaded modules: $ module list Currently Loaded Modulefiles: 1) slurm/2.4 3) subversion/1.6 2) vim/7.3 4) emacs/23.4 5) env/rcc 6) git/1.7 7) tree/1.6.0 Obtain a list of all available modules: $ module avail -------------------------------- /software/modulefiles --------------------------------Minuit2/5.28(default) intelmpi/4.0 Minuit2/5.28+intel-12.1 intelmpi/4.0+intel-12.1(default) R/2.15(default) jasper/1.900(default) ... ifrit/3.4(default) x264/stable(default) intel/11.1 yasm/1.2(default) intel/12.1(default) ------------------------- /usr/share/Modules/modulefiles ------------------------------dot module-cvs module-info modules null use.own ----------------------------------- /etc/modulefiles ----------------------------------env/rcc samba/3.6 slurm/2.3 slurm/2.4(default) --------------------------------------- Aliases ---------------------------------------- Obtain a list of available versions of a particular piece of software: $ module avail mathematica ---------------------------------------------------------------------------------------mathematica/8.0(default) mathematica/9.0 ---------------------------------------------------------------------------------------- Load the default mathematica version: $ module load mathematica $ mathematica --version 8.0 List the currently loaded modules: 5.1. Introduction to RCC Hands-on Session 53 RCC Documentation, Release 1.0 $ module list Currently Loaded Modulefiles: 1) slurm/2.4 3) subversion/1.6 2) vim/7.3 4) emacs/23.4 5) env/rcc 6) git/1.7 7) tree/1.6.0 8) mathematica/8.0 Unload the mathematica/8.0 module and load mathematica version 9.0 $ module unload mathematica/8.0 $ module load mathematica/9.0 $ mathematica --version 9.0 List the currently loaded modules: $ module list Currently Loaded Modulefiles: 1) slurm/2.4 3) subversion/1.6 2) vim/7.3 4) emacs/23.4 5) env/rcc 6) git/1.7 7) tree/1.6.0 8) mathematica/9.0 5.1.8 Exercise 8 Learn About Your Storage There are three different types of storage: 1. Home - for personal configurations, private data, limited space /home/[user name] 2. Scratch - fast, for daily use /scratch/midway/[user name] 3. Project - for common project files, project data, sharing /project/[PI name] To find the limits enter: quota Disk quotas for user dylanphall: Filesystem type used quota limit files grace ----------------- --------- --------- --------- ---------- -------home USR 2.90 G 10.00 G 12.00 G 39987 none midway-scratch USR 24.26 G 5.00 T 6.00 T 106292 none project-abcdefg FILESET 24.33 G 500.00 G 550.00 G 5488 none Descriptions of the fields: Filesystem This is the file system or file set where this quota is valid. type This is the type of quota. This can be USR for a user quota, GRP for a group quota, or FILESET for a file set quota. File set quotas can be considered a directory quota. USR and GRP quotas can exist within a FILESET quota to further limit a user or group quota inside a file set. used This is the amount of disk space used for the specific quota. quota This is the quota or soft quota. It is possible for usage to exceed the quota for the grace period or up to the hard limit. 54 Chapter 5. Tutorials RCC Documentation, Release 1.0 limit This is the limit or hard quota that is set. It is not possible for usage to exceed this limit. files This is the number of files currently counted in the quota. grace This is the grace period which is the amount of time remaining that the quota can be exceeded. The value none means that the quota is not exceeded. After the quota has been exceeded for longer than the grace period, it will no longer be possible to create new files. Default Storage: Partition Home Scratch Project Available 25 G 5T 500 G 8.1 Explore File Backups & Restoration Snapshots Automated snapshots of the Home and Project areas are available in case of accidental file deletion or other problems. Currently snapshots are available for these time periods: • 4 hourly snapshots • 7 daily snapshots • 4 weekly snapshots You can find snapshots in these directories: • /snapshots/home/ – Home snapshots • /snapshots/project – Project snapshots The subdirectories within these locations specify the time of the backup. For example, /snapshots/project/daily-2013-10-08.05h15/project/rcc contains an exact snapshot of the contents of /project/rcc as it appeared on October 10, 2013 at 5:15 am. Try recovering a file from a snapshot of your home directory. If you are a brand new RCC user, or using a guest account, you may not have any snapshots of your home as it has not existed for long enough. Backup Backups are performed on a nightly basis to a tape machine located in a different data center than the main storage system. These backups are meant to safeguard against disasters such as hardware failure or events that could cause the loss of the main data center. Users should make use of the snapshots described above to recover files. 5.1.9 Exercise 9 Run a community code Computing centers usually maintain the most popular community codes in a central repository. The physical and biological sciences in particular have a wealth of codes for simulating the natural world at a variety of scales. NAMD is an example of a code that can be run on hundreds of thousands of processors in order to study atomistic protein dynamics. Enter the command rcchelp namd and follow the instructions to download sample files and submission scripts. Run the parallel NAMD job. 5.1. Introduction to RCC Hands-on Session 55 RCC Documentation, Release 1.0 5.1.10 Exercise 10 Putting it all together Develop a simple application, compile it, and submit it to the scheduler to run. If you are not sure what to write and compile, try this simple C application: include <stdio.h> int main( void ) { printf("Hello, World!\n"); return 0; } Try compiling with both the GCC and Intel compilers. You will need to use the commands module load gcc and module load intel. 5.1.11 Exercise 11 Connect to the cluster with Samba Samba allows uses to connect to (or “mount”) their home and project directories on their local computer so that the file system on Midway appears as if it were directly connected to the local machine. This method of accessing your RCC home and project space is only available from within the UChicago campus network. From off-campus you will need to connect through the UChicago virtual private network. Your Samba account credentials are your CNetID and password: Username: ADLOCAL\<CNetID> Password: CNet password Hostname: midwaysmb.rcc.uchicago.edu Note: Make sure to prefix your username with ADLOCAL\ On a Windows computer, use the “Map Network Drive” functionality and the following UNC paths: Home: \\midwaysmb.rcc.uchicago.edu\homes Project: \\midwaysmb.rcc.uchicago.edu\project On a Mac OS X, use these URLs to connect: Home: smb://midwaysmb.rcc.uchicago.edu/homes Project: smb://midwaysmb.rcc.uchicago.edu/project To connect on a Mac OS X computer you can either click the above URLs or follow these steps: • Use the Connect to Server utility in Finder • Enter one of the URLs from above in the input box for Server Address. 56 Chapter 5. Tutorials RCC Documentation, Release 1.0 • When prompted for a username and password, select Registered User. • Enter ADLOCAL\user for the username and enter your CNet password. 5.1.12 Exercise 12 Head to the User’s Guide You are ready to work from the User’s Guide. Head to http://docs.rcc.uchicago.edu to get started. You may want to try: 1. Requesting a user or PI account 2. Connecting with NX, to get a graphical Linux desktop connect to the cluster 3. Using Sviz, for demanding graphical applications that run on the cluster 4. Transferring data to the cluster 5. Running jobs 5.1.13 Exercise 13 Transfer files with Globus Online • Go to https://globus.rcc.uchicago.edu and follow the instructions to sign in to your Globus Online account (http://docs.rcc.uchicago.edu/user-guide.html#user-guide-globus-online). • Once you have signed in, enter ucrcc#midway as the Endpoint then click the Go button. • Click the Get Globus Connect link on the transfer page and follow the installation instructions. The Endpoint Name you create in Step Two will refer to your local computer. 5.1. Introduction to RCC Hands-on Session 57 RCC Documentation, Release 1.0 • Open Globus Connect on your local computer • Return to the transfer page, enter your local computer’s endpoint name in the second Endpoint field, then click the Go button. • Find a file within your local endpoint window, select it, then click the opposing arrow to transfer it to your Midway home directory. Click refresh list to view the transferred file. 58 Chapter 5. Tutorials RCC Documentation, Release 1.0 5.2 Introduction to RCC for UChicago Courses As part of a course, you have been given access to the University of Chicago Research Computing Center (RCC) Midway compute cluster. Below are the basics you will need to know in order to connect to and use the cluster. 5.2.1 Where to go for help For technical questions (help logging in, etc) send a help request to help@rcc.uchicago.edu Technical documentation is available at http://docs.rcc.uchicago.edu RCC’s walk-in lab is open during business hours and is located in Regenstein Library room 216. Feel free to drop by to chat with one of our staff members if you get stuck. 5.2.2 Logging into Midway Access to RCC is provided via secure shell (SSH) login. All users must have a UChicago CNetID to log in to any RCC systems. Your RCC account credentials are your CNetID and password: Username Password Hostname CNetID CNetID password midway.rcc.uchicago.edu Note: RCC does not store your CNet password and we are unable to reset your password. If you require password assistance, please contact UChicago IT Services. Most UNIX-like operating systems (Mac OS X, Linux, etc) provide an SSH utility by default that can be accessed by typing the command ssh in a terminal. To login to Midway from a Linux/Mac computer, open a terminal and at the command line enter: ssh <username>@midway.rcc.uchicago.edu 5.2. Introduction to RCC for UChicago Courses 59 RCC Documentation, Release 1.0 Windows users will first need to download an SSH client, such as PuTTY, which will allow you to interact with the remote Unix server. Use the hostname midway.rcc.uchicago.edu and your CNetID username and password to access Midway through PuTTY. 5.2.3 Accessing Software on Midway When you first log into Midway, you will be entered into a very barebones user environment with minimal software available. Why isn’t everything installed and available by default? The need for • multiple versions of software (version number, add-ons, custom software) and • multiple build configurations (compiler choice, options, and MPI library) would lead to hopelessly polluted namespace and PATH problems. Additionally, most of the applications used on HPC machines are research codes in a constant state of development. Testing, stability, compatibility, and other usability concerns are often not a primary consideration of the authors. The module system is a script based system used to manage the user environment and to “activate” software packages. In order to access software that is installed on Midway, you must first load the corresponding software module. Basic module commands: Command module avail module avail [name] module load [name] module unload [name] module list Description lists all available software modules lists modules matching [name] loads the named module unloads the named module lists the modules currently loaded for the user Examples Obtain a list of the currently loaded modules: $ module list Currently Loaded Modulefiles: 1) slurm/2.4 3) subversion/1.6 2) vim/7.3 4) emacs/23.4 5) env/rcc 6) git/1.7 7) tree/1.6.0 Obtain a list of all available modules: $ module avail -------------------------------- /software/modulefiles --------------------------------Minuit2/5.28(default) intelmpi/4.0 Minuit2/5.28+intel-12.1 intelmpi/4.0+intel-12.1(default) R/2.15(default) jasper/1.900(default) ... ifrit/3.4(default) x264/stable(default) intel/11.1 yasm/1.2(default) intel/12.1(default) ------------------------- /usr/share/Modules/modulefiles ------------------------------dot module-cvs module-info modules null use.own ----------------------------------- /etc/modulefiles ----------------------------------env/rcc samba/3.6 slurm/2.3 slurm/2.4(default) --------------------------------------- Aliases ---------------------------------------- 60 Chapter 5. Tutorials RCC Documentation, Release 1.0 See also: Software Module List Obtain a list of available versions of a particular piece of software: $ module avail python ---------------------------- /software/modulefiles ----------------------------python/2.7(default) python/2.7-2013q4 python/2.7-2014q1 python/3.3 ------------------------------- /etc/modulefiles ------------------------------- Load the default python version: $ module load python $ python --version Python 2.7.3 List the currently loaded modules: $ module list Currently Loaded Modulefiles: 1) vim/7.4 4) env/rcc 2) subversion/1.8 5) mkl/10.3 3) emacs/24 6) hdf5/1.8 7) netcdf/4.2 8) texlive/2012 9) graphviz/2.28 10) python/2.7 11) slurm/current Unload the python/2.7 module and load python/2.7-2014q1 $ module unload python/2.7 $ module load python/2.7-2014q1 $ python --version Python 2.7.6 5.2.4 The Midway Cluster Environment Midway is a linux cluster with approximately 10,000 CPU cores and 1.5PB of storage. Midway is a shared resource used by the entire University community. Sharing computational resources creates unique challenges: • Jobs must be scheduled in a fair manner. • Resource consumption needs to be accounted. • Access needs to be controlled. Thus, a scheduler is used to manage job submissions to the cluster. RCC uses the Slurm resource manager to schedule jobs and provide interactive access to compute nodes. When you first log into Midway you will be connected to a login node (midway-login1 or midway-login2). Login nodes are not intended to be used for computationally intensive work. Instead, login nodes should be used for managing files, submitting jobs, etc. If you are going to be running a computationally intensive program, you must do this work on a compute node by either obtaining an interactive session or submitting a job through the scheduler. However, you are free to run very short, non-computationally intensive jobs on the login nodes as is often necessary when you are working on and debugging your code. If you are unsure if you job will be computationally intensive (large memory or CPU usage, long running time, etc), get a session on a compute node and work there. There are two ways to send your work to a Midway compute node: 5.2. Introduction to RCC for UChicago Courses 61 RCC Documentation, Release 1.0 1. sinteractive - Request access to a compute node and log into it 2. sbatch - Write a script which defines commands that need to be executed and let SLURM run them on your behalf Working interactively on a compute node To request an interactive session on a compute node use the sinteractive command: sinteractive When this command is executed, you will be connected to one of Midway’s compute nodes where you can then go about running your programs. The default disposition of the sinteractive command is to provide you access for 2 hours to a compute node with 1 CPU and 2GB of memory. The sinteractive command provides many more options for configuring your session. For example, if you want to get access to a compute node with 1 CPU and 4GB of memory for 5 hours, use the command: sinteractive --cpus-per-task=1 --mem-per-cpu=4096 --time=05:00:00 It may take up to 60 seconds for your interactive session to be initialized (assuming there is an available compute node that meets your specified requirements). Submitting a job to the scheduler An alternative to working interactively with a compute node is to submit the work you want carried out to the scheduler through an sbatch script. An example sbatch script is shown below: #!/bin/bash #SBATCH --job-name=example #SBATCH --output=example-%j.out #SBATCH --error=example-%j.err #SBATCH --ntasks=1 #SBATCH --cpus-per-task=1 #SBATCH --mem-per-cpu=4096 # load your modules here module load python/2.7-2014q1 # execute your tasks here python myScript.py SBATCH scripts contain two major elements. After the #!/bin/bash line, a series of #SBATCH parameters are defined. These are read by the scheduler, SLURM, and relay information about what specific hardware is required to execute the job, how long that hardware is required, and where the output and error (stdout and stderr streams) should be written to. If resources are available the job may start less than one second following submission. When the queue is busy and the resource request is substantial the job may be placed in line with other jobs awaiting execution. The %j wildcard included in the output and error file names will cause Slurm to append a unique number to the end of each file. This will prevent your output and error files from being over written if this script is run multiple times in the same directory. The second major element of an sbatch script is the user defined commands. When the resource request is granted the script is executed just as if it were run interactively (i.e. if you had typed in the commands one after the next at the command line). Sbatch scripts execute in the directory from which they were submitted. In the above example, we are assuming that this script is located in the same directory where myScript.py is located. 62 Chapter 5. Tutorials RCC Documentation, Release 1.0 See also: User Guide: Running Jobs Interact With Your Submitted Jobs The status of submitted jobs is is viewable and alterable by several means. The primary command squeue is part of a versatile system of job monitoring. Example: squeue JOBID PARTITION 3518933 sandyb 3519981 sandyb 3519987 sandyb 3519988 sandyb ... ... 3539126 gpu 3538957 gpu 3525370 westmere 3525315 westmere 3525316 westmere NAME USER polyA6.0 ccchiu R_AFM mcgovern R_AFM mcgovern R_AFM mcgovern ST PD PD PD PD TIME 0:00 0:00 0:00 0:00 NODES 1 1 1 1 _interac jcarmas test.6.3 jwhitmer phase_di khaira phase_di khaira phase_di khaira R R R R R 45:52 58:52 4:50:02 4:50:03 4:50:03 1 1 1 1 1 NODELIST(REASON) (QOSResourceLimit) (Resources) (Priority) (Priority) midway231 midway230 midway008 midway004 midway004 The above tells us: Name JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) Description Job ID #, unique reference number for each job Type of node job is running/will run on Name for the job, defaults to slurm-JobID User who submitted job State of the job Time used by the job in D-HH:MM:SS Number of Nodes consumed List of Nodes consumed, or reason the job has not started running As there are usually a very large number of jobs in the queue, the output of squeue must often be filtered to show you only specific jobs that are of interest to you. To view only the jobs that you have submitted use the command: squeue -u <yourCNetID> To cancel a job that you have submitted, first obtain the job’s JobID number by using the squeue command. Then issue the command: scancel <JobID> or cancel ALL of your jobs at the same time (be sure you really want to do this!) with the command: scancel -u <yourCNetID> 5.2.5 Accessing and Transferring Files RCC provides a number of methods for transferring data in/out of Midway. For relatively small amounts of data, we recommend the scp command. For non-trivial file transfers, we recommend using Globus Online for fast, secure and reliable transfers. When working on the UChicago network it is also possible to mount the Midway file systems using Samba. 5.2. Introduction to RCC for UChicago Courses 63 RCC Documentation, Release 1.0 Command Line - SCP Most UNIX-like operating systems (Mac OS X, Linux, etc) provide a scp command which can be accessed from the command line. To transfer files from your local computer to your home directory on Midway, open a terminal window and issue the command: Single files: $ scp file1 ... <CNetID@>midway.rcc.uchicago.edu: Directories: $ scp -r dir1 ... <CNetID@>midway.rcc.uchicago.edu: When prompted, enter your CNet password. Windows users will need to download an SCP client such as WinSCP that provides a GUI interface for transferring files via scp. Windows GUI - WinSCP WinSCP is a scp client software that can be used to move files to and from Midway and a Windows machine. WinSCP can be obtained from http://www.winscp.net. Use the hostname midway.rcc.uchicago.edu and your CNet credentials when connecting. If prompted to accept the server’s host key, select “yes.” The main WinSCP window allows you to move files from your local machine (left side) to Midway (right side). Mac GUI - SFTP Clients There are a number of graphical SFTP clients available for Mac. FileZilla for example is a freely available SFTP client (https://filezilla-project.org/). 64 Chapter 5. Tutorials RCC Documentation, Release 1.0 Use the hostname midway.rcc.uchicago.edu and your CNet credentials when connecting. Samba Samba allows uses to connect to (or “mount”) their home directory on their local computer so that the file system on Midway appears as if it were directly connected to the local machine. This method of accessing your RCC home and project space is only available from within the UChicago campus network. From off-campus you will need to connect through the UChicago virtual private network. Your Samba account credentials are your CNetID and password: Username: ADLOCAL\<CNetID> Password: CNet password Hostname: midwaysmb.rcc.uchicago.edu Note: Make sure to prefix your username with ADLOCAL\ On a Windows computer, use the “Map Network Drive” functionality and the following UNC paths: Home: \\midwaysmb.rcc.uchicago.edu\homes Project: \\midwaysmb.rcc.uchicago.edu\project On a Mac OS X, use these URLs to connect: Home: smb://midwaysmb.rcc.uchicago.edu/homes Project: smb://midwaysmb.rcc.uchicago.edu/project To connect on a Mac OS X computer: • Use the Connect to Server utility in Finder • Enter one of the URLs from above in the input box for Server Address. • When prompted for a username and password, select Registered User. • Enter ADLOCAL\YourCNetID for the username and enter your CNet password. 5.3 Introduction to RCC for CHEM 268 As part of CHEM 268 you have been given access to the University of Chicago Research Computing Center (RCC) Midway compute cluster. Below are the basics you will need to know in order to connect to and use the cluster. 5.3. Introduction to RCC for CHEM 268 65 RCC Documentation, Release 1.0 5.3.1 Where to go for help For technical questions (help logging in, etc) send a help request to help@rcc.uchicago.edu Technical documentation is available at http://docs.rcc.uchicago.edu RCC’s walk-in lab is open during business hours and is located in Regenstein Library room 216. Feel free to drop by to chat with one of our staff members if you get stuck. 5.3.2 Logging into Midway Access to RCC is provided via secure shell (SSH) login. All users must have a UChicago CNetID to log in to any RCC systems. Your RCC account credentials are your CNetID and password: Username Password Hostname CNetID CNetID password midway.rcc.uchicago.edu Note: RCC does not store your CNet password and we are unable to reset your password. If you require password assistance, please contact UChicago IT Services. Most UNIX-like operating systems (Mac OS X, Linux, etc) provide an SSH utility by default that can be accessed by typing the command ssh in a terminal. To login to Midway from a Linux/Mac computer, open a terminal and at the command line enter: ssh <username>@midway.rcc.uchicago.edu Windows users will first need to download an SSH client, such as PuTTY, which will allow you to interact with the remote Unix server. Use the hostname midway.rcc.uchicago.edu and your CNetID username and password to access Midway through PuTTY. 5.3.3 Accessing Software on Midway When you first log into Midway, you will be entered into a very barebones user environment with minimal software available. Why isn’t everything installed and available by default? The need for • multiple versions of software (version number, add-ons, custom software) and • multiple build configurations (compiler choice, options, and MPI library) would lead to hopelessly polluted namespace and PATH problems. Additionally, most of the applications used on HPC machines are research codes in a constant state of development. Testing, stability, compatibility, and other usability concerns are often not a primary consideration of the authors. The module system is a script based system used to manage the user environment and to “activate” software packages. In order to access software that is installed on Midway, you must first load the corresponding software module. Basic module commands: Command module avail module avail [name] module load [name] module unload [name] module list 66 Description lists all available software modules lists modules matching [name] loads the named module unloads the named module lists the modules currently loaded for the user Chapter 5. Tutorials RCC Documentation, Release 1.0 Examples Obtain a list of the currently loaded modules: $ module list Currently Loaded Modulefiles: 1) slurm/2.4 3) subversion/1.6 2) vim/7.3 4) emacs/23.4 5) env/rcc 6) git/1.7 7) tree/1.6.0 Obtain a list of all available modules: $ module avail -------------------------------- /software/modulefiles --------------------------------Minuit2/5.28(default) intelmpi/4.0 Minuit2/5.28+intel-12.1 intelmpi/4.0+intel-12.1(default) R/2.15(default) jasper/1.900(default) ... ifrit/3.4(default) x264/stable(default) intel/11.1 yasm/1.2(default) intel/12.1(default) ------------------------- /usr/share/Modules/modulefiles ------------------------------dot module-cvs module-info modules null use.own ----------------------------------- /etc/modulefiles ----------------------------------env/rcc samba/3.6 slurm/2.3 slurm/2.4(default) --------------------------------------- Aliases ---------------------------------------- Obtain a list of available versions of a particular piece of software: $ module avail python ---------------------------- /software/modulefiles ----------------------------python/2.7(default) python/2.7-2013q4 python/2.7-2014q1 python/3.3 ------------------------------- /etc/modulefiles ------------------------------- Load the default python version: $ module load python $ python --version Python 2.7.3 List the currently loaded modules: $ module list Currently Loaded Modulefiles: 1) vim/7.4 4) env/rcc 2) subversion/1.8 5) mkl/10.3 3) emacs/24 6) hdf5/1.8 7) netcdf/4.2 8) texlive/2012 9) graphviz/2.28 10) python/2.7 11) slurm/current Unload the python/2.7 module and load python/2.7-2014q1 $ module unload python/2.7 $ module load python/2.7-2014q1 $ python --version Python 2.7.6 5.3. Introduction to RCC for CHEM 268 67 RCC Documentation, Release 1.0 5.3.4 The Midway Cluster Environment Midway is a linux cluster with approximately 10,000 CPU cores and 1.5PB of storage. Midway is a shared resource used by the entire University community. Sharing computational resources creates unique challenges: • Jobs must be scheduled in a fair manner. • Resource consumption needs to be accounted. • Access needs to be controlled. Thus, a scheduler is used to manage job submissions to the cluster. RCC uses the Slurm resource manager to schedule jobs and provide interactive access to compute nodes. When you first log into Midway you will be connected to a login node (midway-login1 or midway-login2). Login nodes are not intended to be used for computationally intensive work. Instead, login nodes should be used for managing files, submitting jobs, etc. If you are going to be running a computationally intensive program, you must do this work on a compute node by either obtaining an interactive session or submitting a job through the scheduler. However, you are free to run very short, non-computationally intensive jobs on the login nodes as is often necessary when you are working on and debugging your code. If you are unsure if you job will be computationally intensive (large memory or CPU usage, long running time, etc), get a session on a compute node and work there. There are two ways to send your work to a Midway compute node: 1. sinteractive - Request access to a compute node and log into it 2. sbatch - Write a script which defines commands that need to be executed and let SLURM run them on your behalf 5.3.5 Working interactively on a compute node To request an interactive session on a compute node use the sinteractive command: sinteractive When this command is executed, you will be connected to one of Midway’s compute nodes where you can then go about running your programs. The default disposition of the sinteractive command is to provide you access for 2 hours to a compute node with 1 CPU and 2GB of memory. The sinteractive command provides many more options for configuring your session. For example, if you want to get access to a compute node with 1 CPU and 4GB of memory for 5 hours, use the command: sinteractive --cpus-per-task=1 --mem-per-cpu=4096 --time=05:00:00 It may take up to 60 seconds for your interactive session to be initialized (assuming there is an available compute node that meets your specified requirements). 5.3.6 Submitting a job to the scheduler An alternative to working interactively with a compute node is to submit the work you want carried out to the scheduler through an sbatch script. An example sbatch script is shown below: #!/bin/bash #SBATCH --job-name=example #SBATCH --output=example-%j.out #SBATCH --error=example-%j.err #SBATCH --ntasks=1 #SBATCH --cpus-per-task=1 #SBATCH --mem-per-cpu=4096 68 Chapter 5. Tutorials RCC Documentation, Release 1.0 # load your modules here module load python/2.7-2014q1 # execute your tasks here python myScript.py SBATCH scripts contain two major elements. After the #!/bin/bash line, a series of #SBATCH parameters are defined. These are read by the scheduler, SLURM, and relay information about what specific hardware is required to execute the job, how long that hardware is required, and where the output and error (stdout and stderr streams) should be written to. If resources are available the job may start less than one second following submission. When the queue is busy and the resource request is substantial the job may be placed in line with other jobs awaiting execution. The %j wildcard included in the output and error file names will cause Slurm to append a unique number to the end of each file. This will prevent your output and error files from being over written if this script is run multiple times in the same directory. The second major element of an sbatch script is the user defined commands. When the resource request is granted the script is executed just as if it were run interactively (i.e. if you had typed in the commands one after the next at the command line). Sbatch scripts execute in the directory from which they were submitted. In the above example, we are assuming that this script is located in the same directory where myScript.py is located. 5.3.7 Interact With Your Submitted Jobs Submitted jobs status is viewable and alterable by several means. The primary command squeue is part of a versatile system of job monitoring. Example: squeue JOBID PARTITION 3518933 sandyb 3519981 sandyb 3519987 sandyb 3519988 sandyb ... ... 3539126 gpu 3538957 gpu 3525370 westmere 3525315 westmere 3525316 westmere NAME USER polyA6.0 ccchiu R_AFM mcgovern R_AFM mcgovern R_AFM mcgovern ST PD PD PD PD TIME 0:00 0:00 0:00 0:00 NODES 1 1 1 1 _interac jcarmas test.6.3 jwhitmer phase_di khaira phase_di khaira phase_di khaira R R R R R 45:52 58:52 4:50:02 4:50:03 4:50:03 1 1 1 1 1 NODELIST(REASON) (QOSResourceLimit) (Resources) (Priority) (Priority) midway231 midway230 midway008 midway004 midway004 The above tells us: Name JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) Description Job ID #, unique reference number for each job Type of node job is running/will run on Name for the job, defaults to slurm-JobID User who submitted job State of the job Time used by the job in D-HH:MM:SS Number of Nodes consumed List of Nodes consumed, or reason the job has not started running 5.3. Introduction to RCC for CHEM 268 69 RCC Documentation, Release 1.0 As there are usually a very large number of jobs in the queue, the output of squeue must often be filtered to show you only specific jobs that are of interest to you. To view only the jobs that you have submitted use the command: squeue -u <yourCNetID> To cancel a job that you have submitted, first obtain the job’s JobID number by using the squeue command. Then issue the command: scancel <JobID> or cancel ALL of your jobs at the same time (be sure you really want to do this!) with the command: scancel -u <yourCNetID> 5.3.8 Accessing and Transferring Files RCC provides a number of methods for transferring data in/out of Midway. For relatively small amounts of data, we recommend the scp command. For non-trivial file transfers, we recommend using Globus Online for fast, secure and reliable transfers. When working on the UChicago network it is also possible to mount the Midway file systems using Samba. Command Line - SCP Most UNIX-like operating systems (Mac OS X, Linux, etc) provide a scp command which can be accessed from the command line. To transfer files from your local computer to your home directory on Midway, open a terminal window and issue the command: Single files: $ scp file1 ... <CNetID@>midway.rcc.uchicago.edu: Directories: $ scp -r dir1 ... <CNetID@>midway.rcc.uchicago.edu: When prompted, enter your CNet password. Windows users will need to download an SCP client such as WinSCP that provides a GUI interface for transferring files via scp. Windows GUI - WinSCP WinSCP is a scp client software that can be used to move files to and from Midway and a Windows machine. WinSCP can be obtained from http://www.winscp.net. Use the hostname midway.rcc.uchicago.edu and your CNet credentials when connecting. 70 Chapter 5. Tutorials RCC Documentation, Release 1.0 If prompted to accept the server’s host key, select “yes.” The main WinSCP window allows you to move files from your local machine (left side) to Midway (right side). Mac GUI - SFTP Clients There are a number of graphical SFTP clients available for Mac. FileZilla for example is a freely available SFTP client (https://filezilla-project.org/). Use the hostname midway.rcc.uchicago.edu and your CNet credentials when connecting. Samba Samba allows uses to connect to (or “mount”) their home directory on their local computer so that the file system on Midway appears as if it were directly connected to the local machine. This method of accessing your RCC home and project space is only available from within the UChicago campus network. From off-campus you will need to connect through the UChicago virtual private network. Your Samba account credentials are your CNetID and password: Username: ADLOCAL\<CNetID> Password: CNet password Hostname: midwaysmb.rcc.uchicago.edu Note: Make sure to prefix your username with ADLOCAL\ On a Windows computer, use the “Map Network Drive” functionality and the following UNC paths: Home: \\midwaysmb.rcc.uchicago.edu\homes Project: \\midwaysmb.rcc.uchicago.edu\project On a Mac OS X, use these URLs to connect: Home: smb://midwaysmb.rcc.uchicago.edu/homes Project: smb://midwaysmb.rcc.uchicago.edu/project To connect on a Mac OS X computer: • Use the Connect to Server utility in Finder 5.3. Introduction to RCC for CHEM 268 71 RCC Documentation, Release 1.0 • Enter one of the URLs from above in the input box for Server Address. • When prompted for a username and password, select Registered User. • Enter ADLOCAL\YourCNetID for the username and enter your CNet password. 5.3.9 Gaussian For the duration of this class, you will have access to a molecular modelling software package called Gaussian. You can load Gaussian with “module load gaussian”. While Gaussian is running, it creates scratch files for intermediate work. These scratch files can get extremely large, to the point that they go over the disk quota for individual users. Midway has scratch space intended precisely for this purpose, to hold intermediate data generated by a computation. Scratch space is high performance, not backed-up, and has a 5 TB quota. Scratch space is available at $HOME/scratch-midway. To use a scratch folder with Gaussian, first create a folder in scratch space. For example: mkdir -p ~/scratch-midway/gaussian-work Then, add the following line to your sbatch submission scripts: export GAUSS_SCRDIR=$HOME/scratch-midway/gaussian-work This sets and exports the environment variable GAUSS_SCRDIR, which Gaussian reads in order to decide where to put its scratch files. 5.4 Introduction to RCC for CSPP 51087 The following exercises will walk you through getting access and running simple jobs on Midway. Additional information about Midway and its environment can be found at this website. Contact us at help@rcc.uchicago.edu if you have any problems. You should contact the TAs or the professor for course-related information on CSPP 51087. 5.4.1 Exercise 1: Log in to Midway Access to RCC is provided via secure shell (SSH) login, a tool that allows you to connect securely from any computer (including most smartphones and tablets). All users must have a UChicago CNetID to log in to any RCC systems. Your RCC account credentials are your CNetID and password: 72 Chapter 5. Tutorials RCC Documentation, Release 1.0 Username Password Hostname CNetID CNet password midway.rcc.uchicago.edu Note: RCC does not store your CNet password and we are unable to reset your password. If you require password assistance, please contact UChicago IT Services. Most UNIX-like operating systems (Mac OS X, Linux, etc) provide an SSH utility by default that can be accessed by typing the command ssh in a terminal. To login to Midway from a Linux/Mac computer, open a terminal and at the command line enter: ssh <username>@midway.rcc.uchicago.edu Windows users will first need to download an SSH client, such as PuTTY, which will allow you to interact with the remote Unix server. Use the hostname midway.rcc.uchicago.edu and your CNetID username and password to access the RCC login node. 5.4.2 Exercise 2: Explore the Module System The module system is a script based system to manage the user environment Try running the commands below and review the output to learn more about the module system. Basic module commands: Command module avail [name] module load [name] module unload [name] module list Description lists modules matching [name] (all if ‘name’ empty) loads the named module unloads the named module lists the modules currently loaded for the user Example - Matlab: module list Currently Loaded Modulefiles: 1) slurm/2.4 3) subversion/1.6 2) vim/7.3 4) emacs/23.4 5) env/rcc 6) git/1.7 7) tree/1.6.0 module avail -------------------------------- /software/modulefiles --------------------------------Minuit2/5.28(default) intelmpi/4.0 Minuit2/5.28+intel-12.1 intelmpi/4.0+intel-12.1(default) R/2.15(default) jasper/1.900(default) ... ifrit/3.4(default) x264/stable(default) intel/11.1 yasm/1.2(default) intel/12.1(default) ------------------------- /usr/share/Modules/modulefiles ------------------------------dot module-cvs module-info modules null use.own ----------------------------------- /etc/modulefiles ----------------------------------env/rcc samba/3.6 slurm/2.3 slurm/2.4(default) --------------------------------------- Aliases ---------------------------------------- module avail matlab ---------------------------------------------------------------------------------------/software/modulefiles ---------------------------------------------------------------------------------------- 5.4. Introduction to RCC for CSPP 51087 73 RCC Documentation, Release 1.0 matlab/2011b matlab/2012a(default) matlab/2012b ---------------------------------------------------------------------------------------- module avail which matlab <not found> module load matlab which matlab /software/matlab-2012a-x86_64/bin/matlab module list Currently Loaded Modulefiles: 1) slurm/2.4 3) subversion/1.6 2) vim/7.3 4) emacs/23.4 5) env/rcc 6) git/1.7 7) tree/1.6.0 8) matlab/2012a module unload matlab which matlab <not found> module load matlab/2011b which matlab /software/matlab-2011b-x86_64/bin/matlab module list Currently Loaded Modulefiles: 1) slurm/2.4 3) subversion/1.6 2) vim/7.3 4) emacs/23.4 5) env/rcc 6) git/1.7 7) tree/1.6.0 8) matlab/2011b 5.4.3 Exercise 3: Set up an MPI Environment on Midway The RCC provides several compiler suites on Midway, as well as several MPI environments. For most users these should be completely interchangeable, however some codes find different performance or experience problems with certain combinations. Compiler Intel Portland GNU Module(s) intel/11.1 intel/12.1(default) intel/13.0 pgi/2012(default) No module necessary A complete MPI environment is composed of an MPI installation and a compiler. Each module has the naming convention [mpi]/[mpi version]+[compiler]-[compiler version]. Not all combinations of compiler and MPI environment are supported, but most are. Once you load a module, all the standard MPI commands mpicc, mpirun will function normally 74 Chapter 5. Tutorials RCC Documentation, Release 1.0 MPI Environment OpenMPI Intel MPI Mvapich2 URL Modules http://www.openmpi.org openmpi/1.6(default) openmpi/1.6+intel-12.1 openmpi/1.6+pgi-2012 http://software.intel.com/enintelmpi/4.0 intelmpi/4.0+intel-12.1(default) us/intel-mpi-library intelmpi/4.1 intelmpi/4.1+intel-12.1 intelmpi/4.1+intel-13.0 http://mvapich.cse.ohiomvapich2/1.8(default) mvapich2/1.8+intel-12.1 state.edu/overview/mvapich2/ mvapich2/1.8+pgi-2012 Note: A code compiled with one MPI module will generally not run properly with another. If you try several MPI modules, be very careful to recompile your code. Each compiler uses different flags and default options, use mpicc -show to see the compiler and default command line flags that MPI is passing to the compiler. 5.4.4 Exercise 4: Run a job on Midway The Slurm scheduler is used to schedule jobs and manage resources. Jobs are either interactive, in which the user logs directly into a compute node and performs tasks directly, or batch, where a job script is executed by the scheduler on behalf of the user. Interactive jobs are useful during development and debugging, but users will need to wait for nodes to become available. Interactive Use To request one processor to use interactively, use the sinteractive command with no further options: sinteractive The sinteractive command provides many options for reserving processors. For example, two cores, instead of the default of one, could be reserved for four hours in the following manner: sinteractive --ntasks-per-node=2 --time:4:00:00 The option –constraint=ib can be used to ensure that Infiniband connected nodes are reserved. Infiniband is a fast networking option that permits up to 40x the bandwidth of gigabit ethernet on Midway. Multi-node jobs that use MPI must request Infiniband. sinteractive --constraint=ib Batch jobs An example sbatch script: #!/bin/bash #SBATCH --job-name=test #SBATCH --output=test.out #SBATCH --error=test.err #SBATCH --nodes=2 #SBATCH --ntasks-per-node=16 #SBATCH --partition=sandyb #SBATCH --constraint=ib #SBATCH --account=CSPP51087 # load your modules here 5.4. Introduction to RCC for CSPP 51087 75 RCC Documentation, Release 1.0 module load intel # execute your tasks here mpirun hello_world When a scheduled job runs SLURM sets many environmental variables that may be helpful to query from within your job. You can explore the environment variables by adding env | grep SLURM to the sbatch script above. The output will be found in the file defined in your script header. Different types of hardware are usually organized by partition. Rules about job limits such as maximum wallclock time and maximum numbers of CPUs that may be requested are governed by a QOS (Quality of Service). You can target appropriate compute nodes for your job by specifying a partition and a qos in your batch script. At the command prompt, you can run sinfo to get some information about the available partitions, and rcchelp qos to learn more about the qos. #SBATCH #SBATCH #SBATCH #SBATCH #SBATCH --exclusive --partition --constraint=ib --qos --account Exclusive access to all nodes requested, no other jobs may run here Specifies the partition (ex: ’sandyb’, ’westmere’) For the sandyb partition, request nodes with infiniband Quality of Service (ex: ’normal’, ’debug’) Sets the account to be charged for use (the course name CSPP51087) GPU Midway has 4 GPU nodes, each with two GPUs. To request a job to run on these nodes use the slurm option: --gres=gpu: Specifies number of GPUs to use Please Contact RCC Staff prior to using GPU nodes We will gladly assist you in making sure your sbatch script is written to properly use the nodes. 5.4.5 Exercise 5: Use rcchelp to download, compile, and submit a parallel job rcchelp is a custom command-line program to provide online help and code examples. Help on software topics can be accessed with the rccsoftware shortcut. Run this command to see available topics: rccsoftware The output should look similar to this: ... c fftw gcc mpi namd ... Compile and run a C program [] Fastest Fourier Transform in the West [] Compile and run a C program [] Compile and run an MPI program [] Submission script and sample files for NAMD [] The left column contains topics that can be passed to the rccsoftware command. Enter: rccsoftware mpi into the command line and follow the instructions. Choose Yes when you are given the option to download files to your home directory. The final output should look like: 76 Chapter 5. Tutorials RCC Documentation, Release 1.0 The following files were copied locally to: /home/$HOME/rcchelp/software/mpi.rcc-docs hello.c mpi.sbatch README The information that is printed to the screen can be found and reviewed in README file. Follow the instructions to compile and run the parallel Hello World code. 5.4.6 Exercise 6: Interact With Your Submitted Jobs Submitted jobs status is viewable and alterable by several means. The primary command squeue is part of a versatile system of job monitoring. Example: squeue JOBID PARTITION 3518933 depablo 3519981 depablo 3519987 depablo 3519988 depablo 3539126 gpu 3538957 gpu 3535743 kicp 3535023 kicp 3525370 westmere 3525315 westmere 3525316 westmere NAME polyA6.0 R_AFM R_AFM R_AFM _interac test.6.3 Alleturb hf.b64.L phase_di phase_di phase_di USER ST ccchiu PD mcgovern PD mcgovern PD mcgovern PD jcarmas R jwhitmer R agertz PD mvlima R khaira R khaira R khaira R TIME NODES 0:00 0:00 0:00 0:00 45:52 58:52 0:00 5:11:46 4:50:02 4:50:03 4:50:03 NODELIST(REASON) 1 (QOSResourceLimit) 1 (Resources) 1 (Priority) 1 (Priority) 1 midway231 1 midway230 6 (QOSResourceLimit) 1 midway217 1 midway008 1 midway004 1 midway004 The above tells us: Name JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) Description Job ID #, unique reference number for each job Partition job will run on Name for the job, defaults to slurm-JobID User who submitted job State of the job Time used by the job in D-HH:MM:SS Number of Nodes consumed List of Nodes consumed, or reason the job has not started running squeue’s output is customizable Example: squeue --user CNet -i 5 The above will only show for user CNet and will refresh every 5 seconds 6.1 Canceling Jobs Cancel one job: scancel <JobID> or cancel all of your jobs at the same time: 5.4. Introduction to RCC for CSPP 51087 77 RCC Documentation, Release 1.0 scancel –user <User Name> 6.2 More Job Information scontrol show job <JobID> Example: scontrol show job JobId=3560876 Name=sleep UserId=dylanphall(1832378456) GroupId=dylanphall(1832378456) Priority=17193 Account=rcc-staff QOS=normal JobState=CANCELLED Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=0 ExitCode=0:0 RunTime=00:00:10 TimeLimit=1-12:00:00 TimeMin=N/A SubmitTime=2013-01-09T11:39:40 EligibleTime=2013-01-09T11:39:40 StartTime=2013-01-09T11:39:40 EndTime=2013-01-09T11:39:50 PreemptTime=None SuspendTime=None SecsPreSuspend=0 Partition=sandyb AllocNode:Sid=midway-login2:24907 ReqNodeList=(null) ExcNodeList=(null) NodeList=midway113 BatchHost=midway113 NumNodes=1 NumCPUs=1 CPUs/Task=1 ReqS:C:T=*:*:* MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0 Features=(null) Gres=(null) Reservation=(null) Shared=OK Contiguous=0 Licenses=(null) Network=(null) Command=/bin/sleep WorkDir=/scratch/midway/dylanphall/repositories/pubsw/rccdocs.sphinx sacct[-plus] Gives detailed reports on user(s) sacct-plus is a RCC wrapper around sacct giving easier manipulation options 5.4.7 Exercise 7: Learn About Your Storage Participants in CSPP51087 have access to two types of storage: 1. Home - for personal configurations, private data, limited space /home/[user name] 2. Scratch - fast, for daily use /scratch/midway/[user name] Scratch is also accessible via a symlink in your home directory $HOME/scratch-midway To find the limits enter: quota Disk quotas for user dylanphall: Filesystem type used quota limit files grace ----------------- --------- --------- --------- ---------- -------home USR 2.90 G 10.00 G 12.00 G 39987 none midway-scratch USR 24.26 G 5.00 T 6.00 T 106292 none Descriptions of the fields: Filesystem This is the file system or file set where this quota is valid. 78 Chapter 5. Tutorials RCC Documentation, Release 1.0 type This is the type of quota. This can be USR for a user quota, GRP for a group quota, or FILESET for a file set quota. File set quotas can be considered a directory quota. USR and GRP quotas can exist within a FILESET quota to further limit a user or group quota inside a file set. used This is the amount of disk space used for the specific quota. quota This is the quota or soft quota. It is possible for usage to exceed the quota for the grace period or up to the hard limit. limit This is the limit or hard quota that is set. It is not possible for usage to exceed this limit. files This is the number of files currently counted in the quota. There are currently no quotas enforced on the number of files. grace This is the grace period which is the amount of time remaining that the quota can be exceeded. It is currently set to start at 7 days. The value none means that the quota is not exceeded. After the quota has been exceeded for longer than the grace period, it will no longer be possible to create new files. 6.1 Explore File Backups & Restoration Snapshots Automated snapshots of users’ home directories are available in case of accidental file deletion or other problems. Currently snapshots are available for these time periods: • 4 hourly snapshots • 7 daily snapshots • 4 weekly snapshots You can find snapshots in these directories: • /snapshots/*/home/CNetID – Home snapshots The subdirectories refer to the frequency and time of the backup, e.g. daily-2013-10-04.06h15 or hourly-2013-1009.11h00. Try recovering a file from the snapshot directory. Backup Backups are performed on a nightly basis to a tape machine located in a different data center than the main storage system. These backups are meant to safeguard against disasters such as hardware failure or events that could cause the loss of the main data center. Users should make use of the snapshots described above to recover files. 5.5 Introduction to RCC This page has been updated. Please see: • http://docs.rcc.uchicago.edu/user-guide.html 5.5. Introduction to RCC 79 RCC Documentation, Release 1.0 • http://docs.rcc.uchicago.edu/tutorials/intro-to-rcc-workshop.html 5.6 Introduction to Crerar Computer Lab These materials are intended to be used during hands-on sessions of RCC workshops held in the Crerar Computer Lab (downstairs). Users have a number of options for connecting to RCC’s Midway computing cluster during the workshop. 5.6.1 Personal Laptops Users with Midway accounts If you already have an account on Midway, feel free to use it during the workshop. Simply log in using your normal ssh client. Users without Midway accounts If you do not have an account we can give you temporary access. Let the instructor know you need a temporary account and they will give you a Yubikey. The Yubikey allows you to access guest accounts as indicated in the images below. Identify your guest username, which is rccguest plus the last four digits of your Yubikey identification number. Note the button on the bottom of the Yubikey, which will enter your password when pressed. 80 Chapter 5. Tutorials RCC Documentation, Release 1.0 1. Insert the Yubikey into a USB slot and ssh to Midway 2. Username : rccguest#### 3. Password : touch the contact 4. Let the instructor know if you have trouble. 5.6.2 Lab Computers Access to RCC Midway is provided via secure shell (SSH) login or by NX, a graphical interface. Either can be used from the Crerar lab machines. NX should be used if users need access to graphical text editors or need to run graphical programs. Users comfortable with command line text editors such as VI, Emacs, or nano should use SSH rather than NX for better performance. Using putty SSH client 1. Locate the putty icon on the Windows desktop 2. Enter the hostname for Midway, and ensure the remaining options are set as in the image below. If you are using a Yubikey temporary account, enter rccguest plus the Yubikey identification number as your username, otherwise use your cnetid. 5.6. Introduction to Crerar Computer Lab 81 RCC Documentation, Release 1.0 3. At the password prompt, push the button of your Yubikey or enter your cnetid as appropriate. 82 Chapter 5. Tutorials RCC Documentation, Release 1.0 Using NX graphical client 1. Locate the NX client icon on the Windows desktop 2. If the connection window session list does not contain a “Midway” option, add it through the connection wizard: 5.6. Introduction to Crerar Computer Lab 83 RCC Documentation, Release 1.0 84 Chapter 5. Tutorials RCC Documentation, Release 1.0 3. Select the Midway session, and enter either your Yubikey temporary account username rccguest plus the Yubikey identification number, or your cnetid. Push the button of your Yubikey to obtain the temporary password, or enter your cnetid as appropriate. 5.7 Hierarchical Storage Management 5.7.1 Summary HSM Storage is a tape-only storage service that is provided on a per-terabyte basis through the Cluster Partnership Program. It is intended to be a convenient, low-cost storage option for data that needs to be kept for several years. HSM Storage leaves you in control of your data. This means that you, or anyone that you grant appropriate read and write privileges to can access, modify, and remove the data stored on tape at any time. 5.7. Hierarchical Storage Management 85 RCC Documentation, Release 1.0 Characteristics of HSM Storage: • Tape storage that you control • Available only through the Cluster Partnership Program • Scalable to hundreds of terabytes • Low-cost data storage • Unsuitable for data that needs to be frequently accessed • Unsuitable for data that needs to be processed, analyzed, or shared 5.7.2 Introduction Hierarchical Storage Management (HSM) is a storage technology that automatically moves data between high-speed and low-speed media. At RCC, HSM Storage is a storage service that compliments the high-performance Scratch Storage (/scratch) and high-capacity Project Storage (/project, /home) services. HSM Storage provides an inexpensive, relative to the cost of disk-based storage, option for storing data in the RCC ecosystem. Research groups are charged based on the cost of tape media rather than disk, substantially lowering the per-terabyte cost. A diagram of the architecture is shown above. Some important things to understand about the the service are: • when the initial backup of data copied to /hsm is completed, the data will always be stored in duplicate. The primary copy could be on the disk-cache or on tape. The secondary copy will always be on a tape • the tape machine is physically located in the University of Chicago 1155 data center • the two copies are guaranteed to not exist on the same tape, however all tapes are in the same tape machine and thus same physical location • data is transferred from Midway (6045 S. Kenwood) over a dedicated 400 megabit/second fiber channel connection. The major attraction of RCC’s HSM Storage service is seamless integration with existing RCC file systems (/home, /project/, scratch). Your research group is given access to a group directory in /hsm that is analogous to the group directory in /project. Moving data from disk-based file systems to the HSM storage uses the same Unix tools that are typically used to move data: cp, scp, Globus Online, and so on. cp /project/rcc/file1 /hsm/rcc/file1 Moving data to HSM Storage works much like moving data between project and scratch, as the data is first copied to a disk cache. The data will be moved off of the disk cache to tape by a scheduled process. The directory structure and files that you copied to the /hsm file system will still appear when you use ls or otherwise list the contents of your HSM directory. These files, however, will exist only as stub files on the HSM disk cache. A stub file appears to be on disk and immediately available, but in fact the data has been migrated to the tape system. Accessing the stub file automatically initiates a recall of file from tape. 86 Chapter 5. Tutorials RCC Documentation, Release 1.0 Use Case HSM Storage is ideal for storing large amounts of data that is not expected to be needed, but cannot be deleted. For example, HSM Storage would be a good place to move raw data from a series of simulations that have already been processed, but need to be kept. 5.7.3 Performance and Limitations Understanding the performance limitations of the HSM Storage service is critial to your ability to productively and succesfully use this service. • The disk-cache at /hsm that receives your data is a shared resource that is much smaller than the project or scratch file systems. Do not attempt to copy more than 10 terabytes of data to the system at one time. • Write speeds to the /hsm disk cache are roughly 100-200 MB/s. This means copying one terabyte of data from scratch or project space will take approximately 2 hours under good conditions (i.e., you are transferring a few relatively large files rather than thousands of smaller files). • Read speeds from tape vary dramatically depending on how much data is being moved, how many files are being recalled, how many tapes the data is stored on, and other factors. However, recalling data from tape is generally slower than writing to the disk by 1 to 2 orders of magnitude. This means that reading a full terabyte of data can take days. Note: For significant data recalls it is best to contact RCC. RCC administrators can apply methods not available to general users that reduce the amount of time required to recall data from tape. • Large numbers of small files incur significant processing overhead. If you are storing directories with more than a few hundred files it may be necessary to tar the directory. • Do not mv data from the HSM system. Due to the amount of time required to recall data from tape we recommend a two-step method of (1) using cp to copy the data to your home or project storage, then (2) removing the data from the HSM partition if required. • The snapshots feature is not available for /hsm data. • You remain in control of the data on tape. You can add, change, and delete this data at any time. • Change the unix permissions to 400 for any data that should not be modified once copied to /hsm. e.g., cp -r /project/rcc/dir1 /hsm/rcc chmod -R 400 /hsm/rcc/dir1 5.7.4 Data Integrity Two copies will be maintained of any data stored on the HSM storage service. The primary copy may exist on either a disk cache or on the RCC tape infrastructure. The secondary copy will always exist on tape. If the primary copy of the data is on tape, the secondary copy will always exist on a different physical tape. At this time both tapes exist in the same enclosure, however, so unlike project storage, there is no protection from site-disaster events, such as flooding or fire. Spectralogic, the manufacturer of RCC’s tape system, reports an uncorrected error rate of 1 in 10^17 bits (1 bit of uncorrectable error for every 11.1 petabytes) for tapes that have been stored in ideal conditions for 30 years. With the data stored on two tapes the uncorrectable error rate increases to 1 in 10^34 bits. 5.7.5 Cost and Access Contact RCC for more information. 5.7. Hierarchical Storage Management 87 RCC Documentation, Release 1.0 5.8 Software Modules Tutorial Modular software management allows a user to dynamically augment their environment with software that has been installed on the system. Each module file contains the required information to configure the environment for that particular software package. As a user, you are able to load module files in your environment thereby making the software packages they describe available for use. More information about using the module command can be obtained by typing module help at the command line. 5.8.1 Command Summary Some commonly used module command options: Command module list module avail module load [software_package] module swap [pgk1] [pkg2] module unload [software_package] module purge module help [software_package] Description Lists all currently loaded software packages Displays a list of all software packages that have been compiled for the system Enables use of software_package by setting the appropriate environmental variables and adding the software binaries to your path Unloads software pkg1 and loads pkg2 Removes software_package from your path and unsets any environmental variables set by that package Unloads all currently loaded software packages Displays a description of software_package 5.8.2 Example The following briefly shows how to load, swap, and unload a software package (in this case, the molecular dynamics visualization program VMD): 1. Find the name of the modulefile you wish to load by typing the command: module avail $ module avail R/2.14 bino/1.3 gnuplot/4.4 matlab/2012a pgi/2012 vmd/1.9 vmd/1.9.1(default) 2. Verify VMD is not currently loaded by typing the command: module list $ module list Currently Loaded Modulefiles: 1) R/2.14 2) gnuplot/4.4 88 Chapter 5. Tutorials RCC Documentation, Release 1.0 3. To load the default version of a software module, just specify the name of the software module, in this case, “vmd”: module load vmd 4. Verify the module has been loaded with the command: module list $ module list Currently Loaded Modulefiles: 1) R/2.14 2) gnuplot/4.4 3) vmd/1.9.1 VMD has now been loaded and can be invoked from the command line. If for some reason we need to switch to version 1.9 of VMD, the module swap command can be used as follows: 1. Swap the default version of VMD (v1.9.1) for a different version (v1.9 for example): module swap vmd/1.9.1 vmd/1.9 2. Verify the module has been swapped with: module list $ module list Currently Loaded Modulefiles: 1) R/2.14 2) gnuplot/4.4 3) vmd/1.9 Version 1.9 of VMD is now available on the command line instead of version 1.9.1. To remove VMD from your environment, use the module unload command: 1. Unload VMD software package with: module unload vmd 2. Verify the module has been removed with: module list $ module list Currently Loaded Modulefiles: 1) R/2.14 2) gnuplot/4.4 5.8.3 Further Assistance To request software packages, build variants, specific versions or any other software package related questions please contact help@rcc.uchicago.edu 5.8. Software Modules Tutorial 89 RCC Documentation, Release 1.0 5.9 Working With Data: Data management plans, transferring data, data intensive computing Big Data is a collection of data sets so large and complex that they are difficult to process using traditional (desktop environment) data processing techniques. The challenges of these data sets include the capture, curation, storage, search, sharing, transfer, analysis, and visualization of the elements they contain. Despite these problems, a clear trend to larger data sets is clear among UChicago research programs. In response to these trends the RCC has established a technological framework to enable researchers to work with today’s tera- and petascale research problems. 5.9.1 Data Management Plans Many funding agencies, including the NSF and NIH, require some form of data management plan to supplement funding proposals. The plan describes how the proposal will conform to the agency’s data integrity, availability, sharing, and dissemination policies. RCC can assist researchers with the development of data management plans by providing accurate descriptions of the data storage, versioning, backup, and sharing capabilities that are available to UChicago researchers. It is imperative that RCC is included in the development of data management commitments that leverage RCC resources. Please contact us (info@rcc.uchicago.edu) to learn more. The following is an example of text that can be used in your data management plans. It is critical that you contact RCC to review your data management plan if RCC resources are involved. Note: Data will be stored on the University of Chicago Research Computing Center (RCC) Project Storage Service. Project Storage sits on a petascale GPFS file system managed by professional system administrators. Snapshots, point-in-time views of the file system, are maintained at hourly, daily, and weekly intervals; allowing researchers to independently recover older versions of files at any time. The system is backed up nightly to a tape archive in a separate location. Project Storage is accessible both through the UChicago HPC Cluster and Globus Online (endpoint ucrcc#midway), which provides straightforward high-performance transfer and sharing capabilities to researchers. Files can be shared with any Globus Online user, at the UChicago or elsewhere, without need for an account on the UChicago HPC cluster or other resources. RCC Project Storage is connected to the UChicago campus backbone network at 10 gigabit/s, and to the UChicago HPC cluster, available to researchers for data analysis, at 40 gigabit/s. This professionally managed, reliable and highly-available infrastructure is suitable for capturing, generating, analyzing, storing, sharing, and collaborating on petabytes of research data. 5.9.2 Storage Systems RCC manages three storage complimentary storage systems that each fill a unique niche on the research computing landscape. Briefly, Home and Project storage is maintained on a 1.5 petabyte file system that is both backed-up and version controlled with GPFS snapshots. Scratch storage is a 80 terabyte high-performance shared resource. HSM is a petascale-capable tape system that is useful for storing data that is not expected to be referenced again, but must be kept for the lifetime of a project to meet certain requirements or best practices. Persistent Storage Persistent storage areas are appropriate for long term storage. They have both file system snapshots and tape backups for data protection. The two locations for persistent storage are the home and project directories. 90 Chapter 5. Tutorials RCC Documentation, Release 1.0 Home Every RCC user has a home directory located at /home/CNetID. This directory is accessible from all RCC compute systems and is generally used for storing frequently used items such as source code, binaries, and scripts. By default, a home directory is only accessible by its owner (mode 0700) and is suitable for storing files which do not need to be shared with others. The standard quota 10 GB. Project Project storage is generally used for storing files which are shared by members of a research group. It is accessible from all RCC compute systems. Every research group is granted a startup 500 GB quota, though scaling individual projects to hundreds of terabytes is straightforward on the RCCs 1.5 PB system. Additional storage is available through the Cluster Partnership Program. Contact info@rcc.uchicago.edu to learn more. Scratch Storage Scratch space is a high-performance shared resource intended to be used for active calculations and analysis run on the compute clusters. Users are limited to 5 terabytes of scratch. 5.9.3 Storage Performance Considerations RCC nodes are connected by two network fabrics - Infiniband (40 gb/s) and Gigabit Ethernet (1 gb/s). The fastest network available on a compute node is used for both interprocess communications and reading and writing to the shared storage systems. Accordingly, the time required to perform file system operations such as moving and copying data will vary according to the available network bandwidth. Performance is additionally influenced by the characteristics of the file system that is used. Taken together, these factors can result in orders-of-magnitude differences in time to perform seemingly very similar operations. For example, consider the the table below which indicates the time required to operate on 24 gigabytes on project storage and scratch storage, from GigE nodes and Infiniband nodes. 24 GB (100 x 240 MB files) 24 GB (100,800 245 KB files) Project IB 40 1805 Scratch IB 20 875 Project GigE 500 1150 Scratch GigE 500 1150 5.9.4 Purchasing Storage Project storage can be purchased through the Cluster Partnership Program in units as small as 1 terabyte, or exceeding 100 terabytes. Contact info@rcc.uchicago.edu to learn more. 5.9.5 Transferring Data to the RCC The RCC computing infrastructure is connected to the UChicago backbone network at XX Gb/s. This connection is rarely saturated; when you are transferring data to the RCC the time required to complete a data transfer is generally limited by the network bandwidth available to the machine you are transferring from. The table below indicates the amount of time required to transfer a given amount of data, in a best-case scenerio, at an indicated speed. 5.9. Working With Data: Data management plans, transferring data, data intensive computing 91 RCC Documentation, Release 1.0 10 Mb roughly corresponds to the speed of a modest home internet connection. The UChicago wireless network is capable of sustained 20-40 Mb transfers. Most of the data ports on campus are 100 Mb, although they can be upgraded with a request to IT Services (INSERT LINK) to 1 Gb in most buildings. 1 PB 100 TB 1 TB 100 GB 10 GB 1 GB 10 Mb/s 25 years 3 years 9 days 22 hours 1 hours 13 mins 100 Mb/s 2 years 92 days 22 hours 2 hours 13 mins 1 min 1 Gb/s 92 days 9 days 2 hours 13 min 1 min 8 sec 10 Gb/s 9 days 22 hours 13 mins 1 min 8 sec <1 sec 100 Gb/s 22 hours 2 hours 1 min 8 sec <1 sec <<1 sec Transfer times are variable, but it is reasonable to estimate based on this table that, for example, transferring 2 TB of data from an external hard drive that is plugged into a lab workstation via UChicago Ethernet (100 Mb) will take 40 to 50 hours. Data Transfer Recommendation The best way to transfer large amounts of data is with the Globus Online (https://www.globusonline.org/) data movement service. Use your CNet ID to sign in at globus.rcc.uchicago.edu. Globus offers a number of advantages over traditional Unix tools like SCP and rsync for large, regular, or otherwise complex file transfers. These include: • automatic retrys • email notifications when the transfer completes • command line tools to automate data transfers • increased speed. Of course common tools such as secure-copy and rsync are also available through any remotely accessible interactive node. 5.9.6 Data Intensive Computing The RCCs computing infrastructure enables researchers to perform data-intensive computing as well as flop-intensive computing. Elements that are particularly useful for dealing with large data are described below. Infiniband Network The nonblocking FDR10 40 gb/s Infiniband network provides up to 5 GB/s reads and writes to and from the shared storage systems. Large Shared-Memory Nodes A number of compute nodes are avaialble with very large shared-memory. These shared resources are available through the queue and are otherwise identical to other RCC compute nodes. • Two nodes are available with 256 GB of memory each • One node has 1 terabyte (1024 GB) of memory 92 Chapter 5. Tutorials RCC Documentation, Release 1.0 Map Reduce Nodes Ten compute nodes are available with very large (18 terabytes) local storage arrays attached to the nodes. These shared resources are available [describe access and availability]. 5.9.7 Data Visualization RCC maintains a data visualization laboratory in the Crerar Library Kathleen A. Zar Room. The lab is capable of high-definition 2D as well as sterescopic 3D visualizations. A high-power workstation in the lab direct-mounts the RCC storage systems to facilitate straightforward access to your research data. 5.10 Slurm Tutorial srun Run parallel jobs. sbatch Submit a batch script to Slurm sinteractive Start an interactive session scancel Cancel jobs or job steps sacct List all jobs in slurm squeue View information about jobs in the queue sview Graphically view slurm state 5.11 Introduction to Python Hands-On Exercises 5.11.1 Exercise 1 Start the iPython interpreter. Use it to calculate the first few Fibonacci numbers. In [1]: terms = [0,1] In [2]: for i in xrange(0,10): ...: f = sum(terms) ...: print f ...: terms[0] = terms[1] ...: terms[1] = f 1 2 3 5 8 13 21 34 55 89 5.10. Slurm Tutorial 93 RCC Documentation, Release 1.0 5.11.2 Exercise 2 Run a batch script Create a new Python source file. Convert the code above to a batch script that that can be run on the command line. 5.11.3 Exercise 3 Write a program that takes that will read the array from data1.txt and report the sum of all values, max value, and min value. 5.11.4 Exercise 4 Write a program that will estimate the base of the natural logarithm e according to Make n a command line argument. 5.11.5 Exercise 5 Calculate PI by summing the first few million terms in the Leibniz formula. 1 - 1/3 + 1/5 - 1/7 + 1/9 - ... = pi/4 5.11.6 Exercise 6 Elevation_Benchmarks.csv contains information on the elevation of many locations around Chicago. Write a script to read the file in, and identify the locations with the highest elevation. 5.11.7 Exercise 7 Develop a simple class to describe circles based on their radius. Write methods to calculate the diameter, circumference, and area. Save this class in a module (a stand-alone .py file). Import the module from another Python script. Create several circle objects, and print their properties. 5.11.8 Exercise 8 Run and modify the simple harmonic oscillator code. 5.12 Computational Tools for Chemists and Biochemists - Biopython Bio.PDB 5.12.1 Exercise 1 Download a PDB file to your local directory The RCSB Protein Data Bank hosts an FTP site that can be queried for single named structures or mirrored entirely on a local machine. The Research Computing Center hosts a local snapshot of the PDB at /project/databases/pdb. It contains every PDB file that was available at the time the snapshot was taken, dating back to the founding of the PDB. It is ~170 GB of gzipped data. Additionally, the getpdb command is provided as a convenience to you. It queries the online FTP site for one or more 4 letter PDB files, then downloads them locally. 94 Chapter 5. Tutorials RCC Documentation, Release 1.0 Download several files to work with: $ getpdb 1hlw 1vu2 2yoo 5.12.2 Exercise 2 Read a PDB file into memory as a structure object While logged into Midway, load the default Python module which has Biopython support added: $ module load python Then start the IPython interpreter so that you can interactively enter commands: $ ipython IPython is an advanced command interpreter; it has all the functionality of the normal python interpreter plus lots of time and effort saving features. Starting it should produce a screen like this Python 2.7.3 (default, Oct 30 2012, 22:25:59) Type "copyright", "credits" or "license" for more information. IPython 0.13.1 -- An enhanced Interactive Python. ? -> Introduction and overview of IPython’s features. %quickref -> Quick reference. help -> Python’s own help system. object? -> Details about ’object’, use ’object??’ for extra details. In [1]: Procede to enter commands In [1]: import Bio.PDB #Import the Bio.PDB module In [2]: parser = Bio.PDB.PDBParser() # Create a parser object In [3]: pdb1hlw = parser.get_structure(’1HLW’, ’1hlw.pdb’) #Create a structure object from the PDB fi The pdb1hlw object now contains the protein structure and associated meta data. 5.12.3 Exercise 3 Query the header information Header information is mapped to a Python dictionary. Try a few examples such as In [4]: pdb1hlw.header[’name’] Out[4]: ’ structure of the h122a mutant of the nucleoside diphosphate kinase’ In [5]: pdb1hlw.header[’resolution’] Out[5]: 1.9 In [6]: pdb1hlw.header.keys() Out[6]: [’structure_method’, ’head’, ’journal’, ’journal_reference’, ’compound’, ’keywords’, ’name’, ’author’, 5.12. Computational Tools for Chemists and Biochemists - Biopython Bio.PDB 95 RCC Documentation, Release 1.0 ’deposition_date’, ’release_date’, ’source’, ’resolution’, ’structure_reference’] Parsing PDB files is not an exact science; the files have been developed for decades and contain a variety of inconsistent formatting constructs and errors. Any large scale parsing should be approached carefully and apply appropriate error checking to the output. 5.12.4 Exercise 4 Write a program to identify PDB entries based on their resolution EXERCISE: Develop a program that can scan the entire PDB and identify high resolution structures solved by x-ray crystallographic methods. While scanning the entire PDB would take a few hours, the n8 directory provides about 60 structures for you to perform a proof of principle analysis with, and should take no more than 1 minute to process. id-highres-pdb.py is a functional sample script. It loads each PDB file, parses the header information, looks for methods with the string ‘x-ray’, and keeps structures with a resolution of 2.0 angstroms or less. A report is printed when the script completes. 5.12.5 Exercise 5 Iterate over residues and atoms Turn these code blocks into functional, running scripts, in order to get a feel for how you can access information about models, chains, residues, and atoms. Residues: import Bio.PDB parser = Bio.PDB.PDBParser() pdb1hlw = parser.get_structure(’1HLW’, ’1hlw.pdb’) for model in pdb1hlw: for chain in model: for residue in chain: print residue.get_resname(),residue.id[1] Atoms: import Bio.PDB parser = Bio.PDB.PDBParser() pdb1hlw = parser.get_structure(‘1HLW’, ‘1hlw.pdb’) for model in pdb1hlw: for chain in model: for residue in chain: for atom in residue: print atom.get_id(),atom.get_coord() 96 Chapter 5. Tutorials RCC Documentation, Release 1.0 5.13 Computational Tools for Chemists and Biochemists - Open Babel 5.13.1 Exercise 1 Interactively determine the 3D structure and properties of Aspirin The chemical makeup of the pain reliever Aspirin (2-acetoxybenzoic acid) is represented by the 1-dimensional smile string below. CC(=O)Oc1ccccc1C(=O)O While logged into Midway, load the python and openbabel modules to get access to all of the tools that will be used for the Open Babel exercises: $ module load python $ module load openbabel Then start the IPython interpreter so that you can interactively enter commands: $ ipython Pybel provides many convenience functions and data types that make using the Open Babel Library straightforward from within your Python scripts. Our first steps are to import the pybel module and use the pybel readstring function to create a Molecule object. In [1]: import pybel In [2]: aspirin = pybel.readstring(’smi’,"CC(=O)Oc1ccccc1C(=O)O") Molecule objects have inherent properties that are determined and immediately available to you upon creation. Some of the most interesting include atoms, charge, spin, molwt, and energy. You can see and access these attributes In [3]: aspirin.spin Out[3]: 1 In [4]: aspirin.charge Out[4]: 0 In [5]: aspirin.molwt Out[5]: 120.19158 In [6]: aspirin.energy Out[6]: 0.0 In [7]: aspirin.atoms Out[7]: [<pybel.Atom at <pybel.Atom at <pybel.Atom at <pybel.Atom at <pybel.Atom at <pybel.Atom at <pybel.Atom at <pybel.Atom at <pybel.Atom at 0x1d00590>, 0x1d00650>, 0x1d006d0>, 0x1d00750>, 0x1d007d0>, 0x1d00850>, 0x1d008d0>, 0x1d00950>, 0x1d009d0>] The spin, charge, and molwt output is practical and likely inline with your expectations. The energy value is meaningless because we have not yet evaluated the energy or even determined the structure of the compound. The atoms output output is not very helpful in this form, but it demonstrates a key feature of the Molecule object, which is that 5.13. Computational Tools for Chemists and Biochemists - Open Babel 97 RCC Documentation, Release 1.0 a list of Atom objects that each have their own unique attributes. Atom properties include atomicmass, atomicnum, partialcharge, formalcharge, type, coords, and vector, among others. Atoms can be accessed with a for loop In [8]: for atom in aspirin: ...: print atom.type, atom.coords ...: C3 (0.0, 0.0, 0.0) C3 (0.0, 0.0, 0.0) Car (0.0, 0.0, 0.0) Car (0.0, 0.0, 0.0) Car (0.0, 0.0, 0.0) Car (0.0, 0.0, 0.0) Car (0.0, 0.0, 0.0) Car (0.0, 0.0, 0.0) C3 (0.0, 0.0, 0.0) The atoms types are sensible, but the XYZ Cartesian coordinates are all 0.0. This is because our smile string contained no structural information other than bonding patterns. To get 3D coordinates and many other properties we must determine the 3D structure of the compound. The Molecule object has a number of methods in addition to the attributes that are very useful for performing basic operations on the compound. These include addh(), calcdesc, draw, make3D(), and write(). If a molecule does not have 3D coordinates, they can be generated using the make3D() method. By default, this includes 50 steps of a geometry optimization using the MMFF94 forcefield. To further optimize the structure, you can use the localopt() method, which by default carries out 500 steps of an MMFF94 optimization. In [9]: aspirin.addh() In [10]: for atom in aspirin: ....: print atom.type, atom.coords ....: C3 (0.0, 0.0, 0.0) C3 (0.0, 0.0, 0.0) Car (0.0, 0.0, 0.0) Car (0.0, 0.0, 0.0) Car (0.0, 0.0, 0.0) Car (0.0, 0.0, 0.0) Car (0.0, 0.0, 0.0) Car (0.0, 0.0, 0.0) C3 (0.0, 0.0, 0.0) H (0.0, 0.0, 0.0) H (0.0, 0.0, 0.0) H (0.0, 0.0, 0.0) H (0.0, 0.0, 0.0) H (0.0, 0.0, 0.0) H (0.0, 0.0, 0.0) H (0.0, 0.0, 0.0) H (0.0, 0.0, 0.0) H (0.0, 0.0, 0.0) H (0.0, 0.0, 0.0) H (0.0, 0.0, 0.0) H (0.0, 0.0, 0.0) In [11]: aspirin.make3D() In [12]: for atom in aspirin: ....: print atom.type, atom.coords ....: 98 Chapter 5. Tutorials RCC Documentation, Release 1.0 C3 (0.9190395977471512, -0.006438115844769086, -0.0876922855614613) C3 (0.36519626914487613, -0.07705730841537199, 1.3292477402540028) Car (0.8242319910091845, 1.1122634998282788, 2.141594133850694) Car (0.1769370712149536, 2.341379478456992, 1.9040754341930466) Car (0.6202744602124473, 3.5166545024294438, 2.507356140815162) Car (1.6554019166514982, 3.4677683660352656, 3.4283560156126622) Car (2.2619831811766002, 2.2474801769364077, 3.7298106371711515) Car (1.8983013261095736, 1.0692240107082573, 3.0492498629937064) C3 (2.7249269684253337, -0.16046043646236302, 3.312929179521239) H (0.5844597940475328, 0.9031432446692841, -0.59943970186114) H (0.5788372943708132, -0.864890321069018, -0.6735902838883707) H (2.0145315411810905, -0.001970269774536133, -0.08667098738318454) H (-0.7311518501412941, -0.09165658472094038, 1.2998200527112589) H (0.6460142191803344, -1.025084493078873, 1.7956958366434066) H (-0.6613457518690862, 2.3890987942485484, 1.2132639340640525) H (0.15618871569080567, 4.465249762538174, 2.250539997018506) H (2.002316603791748, 4.380256046774928, 3.9068223409669494) H (3.04787085360694, 2.2228983099171398, 4.480778351023797) H (3.780668118097511, 0.06201326352731574, 3.1208961489462723) H (2.6094480087471075, -0.4846162976377214, 4.351406645057128) H (2.4717518219027492, -1.0026214632013046, 2.66715390876939) In [13]: aspirin.localopt() You can now write your molecule to disk in a form that is commonly read by many computational chemistry codes. In [14]: aspirin.draw(show=False, filename=’aspirin.png’) In [15]: aspirin.write(’sdf’, ’aspirin.sdf’, overwrite=True) Look at the SDF file with a text editor and view the image to see the result of your work. 5.13.2 Exercise 2 Prepare a small library of compounds for docking/QM/MM studies Small molecules (mol. wt. < 500), have proven to be extremely important to researchers to explore function at the molecular, cellular, and in vivo level. Such molecules have also been proven to be valuable for treating diseases, and most medicines marketed today are from this class. Through the NIH Molecular Library Program such compounds are distributed electronically through Pubchem. One SDF file containing more than 500 compounds is included in the materials for this workshop. Only the 2D coordinates are available; the compounds need to be geometry optimized for further study. • EXERCISE: Write a program that reads the SDF file, optimizes the 3D geometry of each compound, and writes the minimized structures out to a new file. Minimizing 500 compounds will take 5-10 minutes of CPU time, so be sure to periodically print output that notifies you of progress. You may find the following construct helpful for this exercise. In [16]: molecules = pybel.readfile(’sdf’, sdffile) In [17]: for molecule in molecules: ....: %time molecule.make3D() ....: CPU times: user 0.03 s, sys: 0.00 s, total: 0.03 s Wall time: 0.05 s batchmin.py is a working example of the tasks presented in this exercise. 5.13. Computational Tools for Chemists and Biochemists - Open Babel 99 RCC Documentation, Release 1.0 5.13.3 Exercise 3 Find similar compounds in a molecular library The Tanimoto similarity coefficient is a measure of how similar two small compounds are. The coefficient ranges between 0 and 1, where 1 means the compounds are identical, and 0 means that they have no structural similarity whatsoever. The algorithm is implemented in OpenBabel and very straightforward to use given two Molecule objects. You must first calculate the molecular fingerprint with the calcfp() method, then compare the fingerprints of the two compounds with the pipe operator: ” | ”. An interactive example is below. In [22]: import pybel In [23]: benzene = pybel.readstring(’smi’,’c1ccccc1’) In [24]: toluene = pybel.readstring(’smi’, ’Cc1ccccc1’) In [25]: aspirin = pybel.readstring(’smi’, "CC(=0)0c1ccccc1C(=0)0") In [26]: benzene.calcfp() | benzene.calcfp() Out[26]: 1.0 In [27]: benzene.calcfp() | toluene.calcfp() Out[27]: 0.5 In [28]: benzene.calcfp() | aspirin.calcfp() Out[28]: 0.3157894736842105 • EXERCISE: Write a script to search the SDF file you optimized in the previous exercise for the compound that is most similar to aspirin. 5.13.4 Exercise 4 Nothing here yet 100 Chapter 5. Tutorials CHAPTER SIX SOFTWARE DOCUMENTATION Software documentation and examples 6.1 Applications Application support 6.1.1 CESM CESM can be built and run on Midway. For more information about CESM and how to acquire the source code, see http://www.cesm.ucar.edu/ To date, CESM versions 1.0.4 and 1.1.1 have been run on Midway. The port validation proceedure has been completed for CESM version 1.0.4 in conjunction with the Intel compliler suite version 12.1. Port validation results were found to be within tollerances (documetation coming soon). RCC has also downloaded the entire CESM inputdata repository and maintains a local copy. This can be found at /project/databases/cesm/inputdata. We reccomend using the intel compiler suite with CESM. The relevant modules that need to be loaded when working with CESM are therefore: $ module load intelmpi $ module laod netcdf/4.2+intel-12.1 To configure CESM on Midway, download the CESM source and insert the following configuration files for the specific version of CESM you are using into your local copy of CESM. CESM 1.0.4 These files should be placed in /path/to/cesm1_0_4/scripts/ccsm_utils/Machines/ (overwriting the default files) cesm1_0_4/Macros.midway cesm1_0_4/mkbatch.midway cesm1_0_4/env_machopts.midway cesm1_0_4/config_machines.xml – see note about how to edit this file You will need to edit your copy of config_machines.xml to point to appropriate locations. On lines 7 and 13, replace the substring “/path/to/cesm1_0_4” with the path to your local installation of CESM 1.0.4 101 RCC Documentation, Release 1.0 Once you have edited and inserted these files, you can run an example model with CESM 1.0.4 by running the following script (take note to edit the first line to point to your local installation of CESM 1.0.4): cesm1_0_4/example.sh CESM 1.1.1 These files should be placed in /path/to/cesm1_1_1/scripts/ccsm_utils/Machines/ (overwriting the default files) cesm1_1_1/config_compilers.xml cesm1_1_1/mkbatch.midway cesm1_1_1/env_mach_specific.midway cesm1_1_1/config_machines.xml – see note about how to edit this file You will need to edit your copy of config_machines.xml to point to appropriate locations. On lines 59, 60, and 63, replace the substring “/path/to/cesm1_1_1” with the path to your local installation of CESM 1.1.1 Once you have edited and inserted these files, you can run an example model with CESM 1.1.1 by running the following script (take note to edit the first line to point to your local installation of CESM 1.1.1): cesm1_1_1/example.sh 6.1.2 CP2K cp2k.sbatch demonstrates how to run CP2K. cp2k-H2O.tgz is a tarball with the H2O CP2K input deck used for this example. #!/bin/sh #SBATCH #SBATCH #SBATCH #SBATCH --output=cp2k-%j.out --constraint=ib --exclusive --nodes=2 module load cp2k mpirun cp2k.popt H2O-32.inp You can submit to the queue with this command: sbatch cp2k.sbatch 6.1.3 HOOMD For full performance HOOMD should be run on GPUs. It can be run on a CPU only but performance will not be comparable. hoomd.sbatch demonstrates how to run the polymer_bmark.hoomd and lj_liquid_bmark.hoomd on a GPU device. #!/bin/sh #SBATCH --time=0:10:00 #SBATCH --job-name=hoomd 102 Chapter 6. Software Documentation RCC Documentation, Release 1.0 #SBATCH --output=hoomd-%j.out #SBATCH --partition=gpu #SBATCH --gres=gpu:1 module load hoomd hoomd polymer_bmark.hoomd hoomd lj_liquid_bmark.hoomd The script can be submitted with this command: sbatch hoomd.sbatch 6.1.4 LAMMPS lammps.sbatch demonstrates how to run the in.lj benchmark with LAMMPS. #!/bin/sh #SBATCH #SBATCH #SBATCH #SBATCH #SBATCH --job-name=lammps --output=lammps-%j.out --constraint=ib --exclusive --nodes=4 module load lammps mpirun lmp_intelmpi < in.lj You can submit to the queue with this command: sbatch lammps.sbatch 6.1.5 NAMD namd.sbatch is a submission script that can be used to submit the apoa1.namd calculation job to the queue. #!/bin/sh #SBATCH #SBATCH #SBATCH #SBATCH #SBATCH --job-name=namd --output=namd-%j.out --constraint=ib --exclusive --nodes=4 module load namd/2.9 mpirun namd2 apoa1.namd The script can be submitted with this command: sbatch namd.sbatch 6.1. Applications 103 RCC Documentation, Release 1.0 6.2 Debugging and Optimization 6.2.1 Allinea DDT Allinea’s DDT (Distributed Debugging Tool) is a powerful, commercial gui-based debugger used in many HPC environments for debugging large MPI and OpenMP parallel programs. The RCC has purchased licenses for 8 processes for use on Midway. This means a total of 8 processes or MPI ranks can be analyzed at one time, either by one user or multiple users. Threads do not count toward the license limit, so a job using up to 128 cores on Midway can be debugged, provided only one MPI rank is assigned to each 16-core node. To analyze an MPI job with more than 8 ranks, the user must attach DDT to a subset of the MPI ranks. See the Allinea DDT User Guide for more information. Usage DDT can be used to debug a wide variety of programs in a number of different ways, and can be a little complicated to configure. To start ddt, use the command: $ module load ddt $ ddt This will bring up the DDT wizard to guide you through the various configuration options: • Run: launch a new program through DDT. This can launch a serial or multithreaded program on the current node (please use an interactive session), or submit a serial, threaded, or MPI job to Slurm. In the latter case DDT will wait for the job to begin execution and automatically attach itself to the job processes. • Attach: Attach the DDT debugger to one or more existing processes. If you are running on a head node, you will need the output of squeue to tell DDT which nodes your processes are running on. • Core: load the process state from a core file generated by a now terminated process. Useful for analyzing a program 104 Chapter 6. Software Documentation RCC Documentation, Release 1.0 The online help is very good, and should be the first place you look when things aren’t working. 6.2. Debugging and Optimization 105 RCC Documentation, Release 1.0 RCC has configured DDT to work with all supported MPI environments on Midway, and to submit debugging jobs through the batch queues. You may need to alter the default configuration to suit your particular needs. We recommend leaving the MPI implementation option “generic”, as shown below, or “Auto Detect” if you plan to use only one MPI module. 106 Chapter 6. Software Documentation RCC Documentation, Release 1.0 The job submission panel controls how and whether jobs are submitted to the SLURM batch system. If you run DDT from an interactive session, deselect the “submit job through queue” option. Beware, if this is unchecked and you are running DDT on one of the head nodes, your job will run there, not on a compute node. 6.2. Debugging and Optimization 107 RCC Documentation, Release 1.0 DDT includes a powerful memory checking feature, however it can cause your code to run very slowly due to overhead involved in checking memory accesses and allocations. This memory debugger is not enabled by default, and when enabled can be configured with a variety of checks that affect the resulting overhead. Select “Memory Debugging” details from the job submission window to bring up the full set of options. Be sure to select the language “C/Fortran, threads” when debugging a multi-threaded program or “no threads” when debugging serially. 108 Chapter 6. Software Documentation RCC Documentation, Release 1.0 The following sections describe the steps necessary to configure DDT in its various modes. Interactive When running DDT from a compute node interactive session, be sure to deselect the checkbox on the Options panel, “Job Submission” section. DDT will then execute the program to debug on the current machine. The job run window 6.2. Debugging and Optimization 109 RCC Documentation, Release 1.0 will look like the following. Configure the job options including arguments and working directory, OpenMP threading, and Memory Debugging. DDT batch submission When running MPI jobs on more than one node it is necessary to have DDT submit your job to the batch scheduler (this is also possible for OpenMP or serial codes, however in that case an interactive session will be easier). The run window is very similar to the interactive case, with the ability to alter Queue Submission Parameters. 110 Chapter 6. Software Documentation RCC Documentation, Release 1.0 When you select Submit, DDT will bring up the Queue Submission Parameters, which allows you to configure SLURM options based on the following submission template script. If you need to further configure your job script, you can create your own template and use the option in the Job Submission options panel to point DDT there. Instructions on how to customize these templates can be found in the Allinea DDT User Guide, or in the sample script at /software/allinea/tools/templates/sample.qtf. #!/bin/bash 6.2. Debugging and Optimization 111 RCC Documentation, Release 1.0 # # # # # # ACCOUNT_TAG: {type=text,label="Account"} PARTITION_TAG: {type=text,label="Partition",default="sandyb"} QOS_TAG: {type=text,label="QOS",default="debug"} CONSTRAINT_TAG: {type=text,label="Node Constraints (optional)",default="ib"} WALL_CLOCK_LIMIT_TAG: {type=text,label="Wall Clock Limit",default="15:00",mask="09:09"} MODULES_TAG: {type=text,label="Modules to load (optional)"} #SBATCH --partition=PARTITION_TAG #SBATCH --account=ACCOUNT_TAG #SBATCH --qos=QOS_TAG #SBATCH --nodes=NUM_NODES_TAG #SBATCH --constraint=CONSTRAINT_TAG #SBATCH #SBATCH #SBATCH #SBATCH --ntasks-per-node=PROCS_PER_NODE_TAG --cpus-per-task=NUM_THREADS_TAG --time=WALL_CLOCK_LIMIT_TAG --output=PROGRAM_TAG-ddt-%j.out #SBATCH --no-requeue #SBATCH --exclusive module load MODULES_TAG AUTO_LAUNCH_TAG The default template has mandatory fields for walltime, partition, account, and qos. Constraint and modules fields allow you to request nodes with a gpu or load necessary modules (although DDT will export your current environment, so this should not be necessary in general). Note: The maximum wallclock time for the debug qos is 15 minutes. DDT will resubmit your job to the queue as necessary, or you can select the normal qos for a longer walltime. You may request a Slurm reservation from RCC staff to ensure that nodes are available. 112 Chapter 6. Software Documentation RCC Documentation, Release 1.0 DDT will continually refresh the output of squeue and wait until your job has started. Be sure that you have selected options that will allow your job to eventually start. 6.2. Debugging and Optimization 113 RCC Documentation, Release 1.0 Once the job has started and DDT has attached to all running processes you will be taken to the normal debug window for the number of MPI ranks and threads you chose. Some MPI implementations will include threads used by the MPI implementation, which can be safely ignored. Attach to running process In order to attach to a running process you will need to know the node(s) and PID(s) of the processes you wish to examine. The slurm command squeue can be used for the former, and the system call getpid or ps can be used for the latter. DDT can be run directly on the node your The following c code will insert a breakpoint for all processes in an MPI program, allowing you to attach to the correct process and continue from the specified point in the code: void mpi_breakpoint( int proc ) { int rank, i = 0; char host[256]; MPI_Comm_rank( MPI_COMM_WORLD, &rank ); if ( rank == proc ) { gethostname(host,256); printf("%u entering breakpoint from host %s, %s:%u\n", getpid(), host, __FILE__, __LINE__+1 ); while ( i == 0 ) { sleep(1); } } MPI_Barrier( MPI_COMM_WORLD ); } Generally the attached process will be inside the sleep system call. Set a breakpoint at the specified line or MPI_Barrier, then set the value of the variable i to a non-zero value to allow the process to proceed. Once the code has returned from the mpi_breakpoint function (after the barrier), you can debug that process normally. Other processes will proceed as normally, only waiting on blocking communication with the attached process(es). Debugging The following images show the DDT debug window for several different program types. 114 Chapter 6. Software Documentation RCC Documentation, Release 1.0 Serial 6.2. Debugging and Optimization 115 RCC Documentation, Release 1.0 OpenMP 116 Chapter 6. Software Documentation RCC Documentation, Release 1.0 MPI 6.2.2 HPC Toolkit HPC Toolkit is an open source suite of profiling and performance analysis. It requires recompiling the code to be instrumented, but otherwise the code can remain unchanged. Compile Compile source files as normally using your normal compiler. Prepend the command hpclink to the linking stage and statically link: $ module load hpctoolkit $ hpclink gcc -static -g foo.c Or in the case of an MPI code: $ module load hpctoolkit/5.3+openmpi-1.6 $ hpclink mpicc -static -g <source>.c -o <executable> Currently HPC Toolkit modules are created only for GNU compiler based MPI environments. Gather Sampling Data The command hpcrun is used to run the instrumented program and collect sampling data. The option -l will list the available events that may be pofiled, including those defined by PAPI. The user can control which events are profiled and at what frequency using the option -e: 6.2. Debugging and Optimization 117 RCC Documentation, Release 1.0 $ hpcrun -e PAPI_L2_DCM@510011 <executable> or: $ mpirun hpcrun -e PAPI_L2_DCM@510011 <executable> Multiple events can be profiled in a single invokation of hpcrun, however not all events are compatible. It may also be necessary to run hpcrun multiple times to gather sufficient events to capture relatively rare events. Statially Instrument Code HPC Toolkit performs a static analysis of the programs original source in order to properly interpret the sampling data. Use the command hpcstruct: $ hpcstruct <executable> Correlate Source and Sampling Data The hpcprof command collects all available samples and correlates them with the static analysis produced by hpcstruct: $ hpcprof -I path-to-source -S <executable>.hpcstruct hpctoolkit-<executable>-measurements Analysis Once the database of measurements has been created, a separate module is available with a program to graphically visualize and explore the data: $ module load hpcviewer $ hpcviewer hpctoolkit-<executable>-database 6.2.3 PAPI PAPI is a multi-platform library for portably accessing hardware counters for event profiling of software including flop counts, cache efficiency, and branch prediction rates. See the PAPI website for more information. Usage The user must add PAPI function calls to their code and link to the PAPI library in order to access the hardware counters. Often PAPI calls can be added to previously instrumented code through timing calls, otherwise the user will need to identify the subset of the code to be profiled. An example code that uses PAPI to identify poor cache performance is located below. Available Counters The command papi_avail will determine which PAPI counters are accessible on the current system. Some counters are supported natively, and others can be derived from other counters that are natively supported. PAPI also supports 118 Chapter 6. Software Documentation RCC Documentation, Release 1.0 multiplexing, where a larger number of events can be profiled simultaneously using a sampling technique. See the PAPI documentation for more details. Note: The number of active counters is much less than the number of counters available on the system. Sandy Bridge nodes have 11 registers that can be used for hardware counters, but PAPI requires a few of those registers for internal functions. In practice, ~4 independent PAPI events can be instrumented at one time, and valid combinations of events must be found using trial-and-error. $ module load papi/5.1 $ papi_avail -a Available events and hardware information. -------------------------------------------------------------------------------PAPI Version : 5.1.0.2 Vendor string and code : GenuineIntel (1) Model string and code : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz (45) CPU Revision : 7.000000 CPUID Info : Family: 6 Model: 45 Stepping: 7 CPU Max Megahertz : 2599 CPU Min Megahertz : 2599 Hdw Threads per core : 2 Cores per Socket : 8 NUMA Nodes : 2 CPUs per Node : 16 Total CPUs : 32 Running in a VM : no Number Hardware Counters : 11 Max Multiplex Counters : 64 -------------------------------------------------------------------------------Name PAPI_L1_DCM PAPI_L1_ICM PAPI_L2_DCM PAPI_L2_ICM PAPI_L1_TCM PAPI_L2_TCM PAPI_L3_TCM PAPI_TLB_DM PAPI_TLB_IM PAPI_L1_LDM PAPI_L1_STM PAPI_L2_STM PAPI_STL_ICY PAPI_BR_UCN PAPI_BR_CN PAPI_BR_TKN PAPI_BR_NTK PAPI_BR_MSP PAPI_BR_PRC PAPI_TOT_INS PAPI_FP_INS PAPI_LD_INS PAPI_SR_INS PAPI_BR_INS PAPI_TOT_CYC PAPI_L2_DCH PAPI_L2_DCA PAPI_L3_DCA Code Deriv Description (Note) 0x80000000 No Level 1 data cache misses 0x80000001 No Level 1 instruction cache misses 0x80000002 Yes Level 2 data cache misses 0x80000003 No Level 2 instruction cache misses 0x80000006 Yes Level 1 cache misses 0x80000007 No Level 2 cache misses 0x80000008 No Level 3 cache misses 0x80000014 Yes Data translation lookaside buffer misses 0x80000015 No Instruction translation lookaside buffer misses 0x80000017 No Level 1 load misses 0x80000018 No Level 1 store misses 0x8000001a No Level 2 store misses 0x80000025 No Cycles with no instruction issue 0x8000002a Yes Unconditional branch instructions 0x8000002b No Conditional branch instructions 0x8000002c Yes Conditional branch instructions taken 0x8000002d No Conditional branch instructions not taken 0x8000002e No Conditional branch instructions mispredicted 0x8000002f Yes Conditional branch instructions correctly predicted 0x80000032 No Instructions completed 0x80000034 Yes Floating point instructions 0x80000035 No Load instructions 0x80000036 No Store instructions 0x80000037 No Branch instructions 0x8000003b No Total cycles 0x8000003f Yes Level 2 data cache hits 0x80000041 No Level 2 data cache accesses 0x80000042 Yes Level 3 data cache accesses 6.2. Debugging and Optimization 119 RCC Documentation, Release 1.0 PAPI_L2_DCR 0x80000044 No Level 2 data cache reads PAPI_L3_DCR 0x80000045 No Level 3 data cache reads PAPI_L2_DCW 0x80000047 No Level 2 data cache writes PAPI_L3_DCW 0x80000048 No Level 3 data cache writes PAPI_L2_ICH 0x8000004a No Level 2 instruction cache hits PAPI_L2_ICA 0x8000004d No Level 2 instruction cache accesses PAPI_L3_ICA 0x8000004e No Level 3 instruction cache accesses PAPI_L2_ICR 0x80000050 No Level 2 instruction cache reads PAPI_L3_ICR 0x80000051 No Level 3 instruction cache reads PAPI_L2_TCA 0x80000059 Yes Level 2 total cache accesses PAPI_L3_TCA 0x8000005a No Level 3 total cache accesses PAPI_L2_TCR 0x8000005c Yes Level 2 total cache reads PAPI_L3_TCR 0x8000005d Yes Level 3 total cache reads PAPI_L2_TCW 0x8000005f No Level 2 total cache writes PAPI_L3_TCW 0x80000060 No Level 3 total cache writes PAPI_FDV_INS 0x80000063 No Floating point divide instructions PAPI_FP_OPS 0x80000066 Yes Floating point operations PAPI_SP_OPS 0x80000067 Yes Floating point operations; optimized to count scaled single precision v PAPI_DP_OPS 0x80000068 Yes Floating point operations; optimized to count scaled double precision v PAPI_VEC_SP 0x80000069 Yes Single precision vector/SIMD instructions PAPI_VEC_DP 0x8000006a Yes Double precision vector/SIMD instructions PAPI_REF_CYC 0x8000006b No Reference clock cycles ------------------------------------------------------------------------Of 50 available events, 17 are derived. avail.c PASSED Example Download matrixmult_papi.c for an example using PAPI to measure the L2 cache miss rate for a poorly-written matrix multiplication program: $ module load papi/5.1 $ gcc -O2 matrixmult_papi.c -lpapi $ ./a.out 322761027 L2 cache misses (0.744% misses) in 5740137120 cycles The precise output will vary with the system load. Reversing the order of the inner two loops should produce a significant improvement in cache efficiency and a corresponding speedup. 6.2.4 Valgrind Valgrind is an open source set of debugging and profiling tools. It is most commonly used to locate memory errors, including leaks, but also can be used to debug threaded codes and profile cache efficiency. See the Valgrind Online Documentation for more information. Usage The following snippet shows how to load the valgrind module and use it to perform analysis on a c code. module load valgrind gcc -g source.c valgrind --tool=[memcheck,cachgrind,helgrind] ./a.out If no tool is specified, valgrind will default to the memory checker. 120 Chapter 6. Software Documentation RCC Documentation, Release 1.0 Memcheck Memcheck is a tool to detect a wide range of memory errors including buffer over-runs, memory leaks and doublefreeing of heap blocks, and uninitialized variables. Download memleak.c: for a simple example of using the cachegrind module to identify a memory leak: $ module load valgrind $ gcc -g memleak.c $ valgrind --tool=memcheck ./a.out ==3153== Memcheck, a memory error detector ==3153== Copyright (C) 2002-2012, and GNU GPL’d, by Julian Seward et al. ==3153== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info ==3153== Command: ./a.out ==3153== ==3153== ==3153== HEAP SUMMARY: ==3153== in use at exit: 800 bytes in 10 blocks ==3153== total heap usage: 10 allocs, 0 frees, 800 bytes allocated ==3153== ==3153== LEAK SUMMARY: ==3153== definitely lost: 800 bytes in 10 blocks ==3153== indirectly lost: 0 bytes in 0 blocks ==3153== possibly lost: 0 bytes in 0 blocks ==3153== still reachable: 0 bytes in 0 blocks ==3153== suppressed: 0 bytes in 0 blocks ==3153== Rerun with --leak-check=full to see details of leaked memory ==3153== ==3153== For counts of detected and suppressed errors, rerun with: -v ==3153== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 6 from 6) Running memcheck without any options identifies that 800 bytes were not freed at the time the program terminated, and those were allocated in 10 distinct blocks. To get a better idea of where those blocks were allocated, use the option --leak-check=full: $ valgrind --tool=memcheck --leak-check=full ./a.out ==3154== Memcheck, a memory error detector ==3154== Copyright (C) 2002-2012, and GNU GPL’d, by Julian Seward et al. ==3154== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info ==3154== Command: ./a.out ==3154== ==3154== ==3154== HEAP SUMMARY: ==3154== in use at exit: 800 bytes in 10 blocks ==3154== total heap usage: 10 allocs, 0 frees, 800 bytes allocated ==3154== ==3154== 800 bytes in 10 blocks are definitely lost in loss record 1 of 1 ==3154== at 0x4C278FE: malloc (vg_replace_malloc.c:270) ==3154== by 0x400575: main (memleak.c:i24) ==3154== ==3154== LEAK SUMMARY: ==3154== definitely lost: 800 bytes in 10 blocks ==3154== indirectly lost: 0 bytes in 0 blocks ==3154== possibly lost: 0 bytes in 0 blocks ==3154== still reachable: 0 bytes in 0 blocks ==3154== suppressed: 0 bytes in 0 blocks ==3154== ==3154== For counts of detected and suppressed errors, rerun with: -v ==3154== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 6 from 6) 6.2. Debugging and Optimization 121 RCC Documentation, Release 1.0 Now memcheck has identified that the 10 code blocks were allocated at memleak.c line 10, and the user can modify the code to free those allocations at the appropriate place. Cachegrind Cachegrind is a Valgrind tool that simulates (rather than measures) how a code interact with the multi-level caches found in modern computer architectures. It is very useful for identifying cache misses as a performance problem, as well as identifying parts of the code responsible. Cachegrind does have several limitations, and can dramatically increase the time it takes to execute a code. See the cachgrind manual for full details. Download matrixmult.c for a simple example using the cachegrind module to estimate cache efficiency: $ module load valgrind $ gcc -g matrixmult.c $ valgrind --tool=cachegrind ./a.out ==2548== Cachegrind, a cache and branch-prediction profiler ==2548== Copyright (C) 2002-2012, and GNU GPL’d, by Nicholas Nethercote et al. ==2548== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info ==2548== Command: ./a.out ==2548== --2548-- warning: L3 cache found, using its data for the LL simulation. ==2548== ==2548== I refs: 3,252,178,387 ==2548== I1 misses: 745 ==2548== LLi misses: 738 ==2548== I1 miss rate: 0.00% ==2548== LLi miss rate: 0.00% ==2548== ==2548== D refs: 1,082,643,679 (720,139,382 rd + 362,504,297 wr) ==2548== D1 misses: 406,465,246 (405,103,433 rd + 1,361,813 wr) ==2548== LLd misses: 313,706 ( 1,950 rd + 311,756 wr) ==2548== D1 miss rate: 37.5% ( 56.2% + 0.3% ) ==2548== LLd miss rate: 0.0% ( 0.0% + 0.0% ) ==2548== ==2548== LL refs: 406,465,991 (405,104,178 rd + 1,361,813 wr) ==2548== LL misses: 314,444 ( 2,688 rd + 311,756 wr) ==2548== LL miss rate: 0.0% ( 0.0% + 0.0% ) The above output shows that the example code has a greater than 50% read cache miss rate, which will significantly degrade performance. Since the code was compiled with the -g compiler flag, the cg_annotate tool can be used to parse cachgrind output and produce a line-by-line annotated report: $ cg_annotate --auto=yes cachegrind.out.2548 -------------------------------------------------------------------------------I1 cache: 32768 B, 64 B, 8-way associative D1 cache: 32768 B, 64 B, 8-way associative LL cache: 20971520 B, 64 B, 20-way associative Command: ./a.out Data file: cachegrind.out.2548 Events recorded: Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw Events shown: Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw Event sort order: Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw Thresholds: 0.1 100 100 100 100 100 100 100 100 Include dirs: User annotated: Auto-annotation: on -------------------------------------------------------------------------------- 122 Chapter 6. Software Documentation RCC Documentation, Release 1.0 Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw -------------------------------------------------------------------------------3,252,178,387 745 738 720,139,382 405,103,433 1,950 362,504,297 1,361,813 311,756 -------------------------------------------------------------------------------Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw -------------------------------------------------------------------------------3,251,974,540 6 6 720,090,005 405,100,952 0 362,490,010 1,361,251 311,250 PROGRAM TOTALS file:function /home/drudd/debug --------------------------------------------------------------------------------- Auto-annotated source: /home/drudd/debug/matrixmult.c -------------------------------------------------------------------------------Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw -- line 12 ---------------------------------------. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 909 0 0 0 0 0 3 0 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1 1 0 0 0 1 0 0 3 0 0 0 0 0 1 0 0 2 0 0 0 0 0 1 0 0 . . . . . . . . . 7 1 1 0 0 0 0 0 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1,801 0 0 0 0 0 0 0 0 6,000,600 1 1 0 0 0 0 0 0 2,400,000 0 0 0 0 0 1,200,000 150,000 150,000 2,400,000 0 0 0 0 0 1,200,000 1,199,999 150,000 . . . . . . . . . . . . . . . . . . . . . . . . . . . 180,001 0 0 0 0 0 0 0 0 180,000 0 0 0 0 0 90,000 11,251 11,250 . . . . . . . . . . . . . . . . . . 600 0 0 0 0 0 0 0 0 630,600 2 2 90,000 11,251 0 0 0 0 1,800,180,000 0 0 0 0 0 0 0 0 1,440,000,000 0 0 720,000,000 405,089,701 0 360,000,000 0 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 0 0 0 0 0 2 1 0 6.2. Debugging and Optimization **************** #include <stdlib. #include <stdio.h #include <math.h> #define N 300 #define M 4000 int main( int arg int i, j, k; double *A, *B double tmp; A = (double * B = (double * C = (double * if ( A == NUL fprintf(s exit(1); } /* initialize for ( i = 0; for ( j = A[M*i B[N*j } } for ( i = 0; C[i] = 0. } for ( i = 0; for ( j = for ( C } } } free(A); 123 RCC Documentation, Release 1.0 2 3 . . 6 0 0 . . 1 0 0 . . 1 0 1 . . 4 0 0 . . 0 0 0 . . 0 1 1 . . 0 0 0 . . 0 0 0 . . 0 free(B); free(C); return 0; } -------------------------------------------------------------------------------Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw -------------------------------------------------------------------------------100 1 1 100 100 0 100 100 100 percentage of events annotated Note that in this example the loop order causes very poor cache performance for the innermost line of the nested loop. Exchanging the k and j indexed loops will give significantly better performance. Still better performance can be obtained through blocking, or, as this is a standard linear algebra opperation, using LAPACK or the Intel Math Kernel Library, which has tuned routines for performing such calculations. Helgrind Helgrind is a thread error checking tool. Unfortunately it has poor interaction with gcc’s OpenMP implementation, and can lead to a large number of distracting messages. Still, it can be useful in identifying races or unprotected critical sections within shared memory parallel code. 6.3 Environments Environments 6.3.1 R R is available for statistical computing. There are R modules built with both the GCC and Intel compilers. We recommend using the Intel builds since those have had the best performance during benchmarks. Some R packages may not compile correctly with the Intel compilers, so use the GCC version in that case. All R modules have been built with OpenMP enabled and use the Intel MKL to improve performance. The currently available R modules: $ module avail R ---------------- /software/modulefiles ------------------------------R/2.15(default) R/2.15+intel-12.1 R/3.0 R/3.0+intel-12.1 To install and use additional R packages to your home directory, it is necessary to set to set the environment variable R_LIBS_USER. For example: export R_LIBS_USER=$HOME/R_libs The directory specified should exist before trying to install R packages. The RStudio IDE is also available as the rstudio module. This provides a graphical interface for developing and running R. To use R in this mode, you should login to midway via NX. Serial Examples Here is a simple “hello world” example to submit an R job to Slurm. This is appropriate for an R job that expects to use a single CPU. 124 Chapter 6. Software Documentation RCC Documentation, Release 1.0 sbatch script Rhello.sbatch #!/bin/sh #SBATCH --tasks=1 # load the appropriate R module module load R/3.0+intel-12.1 # Use Rscript to run hello.R # alternatively, this could be used: # R --no-save < hello.R Rscript hello.R R script Rhello.R: print ( "Hello World" ) Output: [1] "Hello World" Parallel Examples For parallel use there are several options depending on whether there should be parallel tasks on a single node only or multiple nodes and the level of flexibility required. There are other R packages available for parallel programming than what is covered here, but we’ll cover some frequently used packages. Multicore On a single node, it is possible to use doParallel and foreach. sbatch script doParallel.sbatch: #!/bin/bash #SBATCH --nodes=1 # --ntasks-per-node will be used in doParallel.R to specify the number of # cores to use on the machine. Using 16 will allow us to use all cores # on a sandyb node #SBATCH --ntasks-per-node=16 module load R/3.0+intel-12.1 Rscript doParallel.R R script doParallel.R: library(doParallel) # use the environment variable SLURM_NTASKS_PER_NODE to set the number of cores registerDoParallel(cores=(Sys.getenv("SLURM_NTASKS_PER_NODE"))) # Bootstrapping iteration example x <- iris[which(iris[,5] != "setosa"), c(1,5)] iterations <- 10000# Number of iterations to run 6.3. Environments 125 RCC Documentation, Release 1.0 # Parallel version of code # Note the ’%dopar%’ instruction parallel_time <- system.time({ r <- foreach(icount(iterations), .combine=cbind) %dopar% { ind <- sample(100, 100, replace=TRUE) result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit)) coefficients(result1) } })[3] # Shows the number of Parallel Workers to be used getDoParWorkers() # Executes the functions parallel_time Output: Loading required package: foreach Loading required package: iterators Loading required package: parallel [1] "16" elapsed 5.157 Multi-node Parallel For multiple nodes, you can use the parallel package, which provides a select number of functions to simplify using multi-node clusters. sbatch script parallel-test.sbatch: #!/bin/bash #SBATCH #SBATCH #SBATCH #SBATCH #SBATCH --job-name=parallel-test --nodes=4 --time=10 --exclusive --constraint=ib module load R/3.0+intel-12.1 # the openmpi module is not loaded by default with R module load openmpi/1.6+intel-12.1 # Always use -n 1 for the snow package. It uses Rmpi internally to spawn # additional processes dynamically mpirun -np 1 Rscript parallel-test.R R script parallel-test.R: ## # Source: http://www.umbc.edu/hpcf/resources-tara/how-to-run-R.html # # This file previously used SNOW, but that functionality has been # replaced with the parallel package # # Notes: 126 Chapter 6. Software Documentation RCC Documentation, Release 1.0 # # # # ## - Library loading order matters - system.time([function]) is an easy way to test optimizations - parApply is parallel version of ’apply’ #Must be loaded in this order library(Rmpi) library(parallel) # Initialize parallel using MPI communication. The first line will get the # number of MPI processes the scheduler assigned to us. Everything else is # standard parallal np <- mpi.universe.size() cluster <- makeCluster(np, type="MPI") # Print the hostname for each cluster member sayhello <- function() { info <- Sys.info()[c("nodename", "machine")] paste("Hello from", info[1], "with CPU type", info[2]) } names <- clusterCall(cluster, sayhello) print(unlist(names)) # Compute row sums in parallel using all processes, then a grand sum at the end # on the master process parallelSum <- function(m, n) { A <- matrix(rnorm(m*n), nrow = m, ncol = n) # Parallelize the summation row.sums <- parApply(cluster, A, 1, sum) print(sum(row.sums)) } # Run the operation over different size matricies system.time(parallelSum(5000, 5000)) # Always stop your cluster and exit MPI to ensure resources are properly freed stopCluster(cluster) mpi.exit() Output (trimmed for readability): 64 slaves are spawned successfully. 0 failed. [1] "Hello from midway197 with CPU type x86_64" [2] "Hello from midway197 with CPU type x86_64" [3] "Hello from midway197 with CPU type x86_64" ... [63] "Hello from midway200 with CPU type x86_64" [64] "Hello from midway197 with CPU type x86_64" [1] -9363.914 user system elapsed 3.988 0.443 5.553 [1] 1 [1] "Detaching Rmpi. Rmpi cannot be used unless relaunching R." 6.3. Environments 127 RCC Documentation, Release 1.0 Rmpi For multiple nodes, you can also use Rmpi. This is what snow uses internally. It is less convenient than snow, but also more flexible. sbatch script Rmpi.sbatch: #!/bin/sh #SBATCH #SBATCH #SBATCH #SBATCH --nodes=4 --time=1 --constraint=ib --exclusive module load R/3.0+intel-12.1 # the openmpi module is not loaded by default with R module load openmpi/1.6+intel-12.1 # Always use -n 1 for the Rmpi package. It spawns additional processes dynamically mpirun -n 1 Rscript Rmpi.R R script Rmpi.R: # Load the R MPI package if it is not already loaded. if (!is.loaded("mpi_initialize")) { library("Rmpi") } # Spawn as many slaves as possible mpi.spawn.Rslaves() # In case R exits unexpectedly, have it automatically clean up # resources taken up by Rmpi (slaves, memory, etc...) .Last <- function(){ if (is.loaded("mpi_initialize")){ if (mpi.comm.size(1) > 0){ print("Please use mpi.close.Rslaves() to close slaves.") mpi.close.Rslaves() } print("Please use mpi.quit() to quit R") .Call("mpi_finalize") } } # Tell all slaves to return a message identifying themselves mpi.remote.exec(paste("I am",mpi.comm.rank(),"of",mpi.comm.size())) # Tell all slaves to close down, and exit the program mpi.close.Rslaves() mpi.quit() Output (trimmed for readability): master slave1 slave2 slave3 128 64 slaves (rank 0 , (rank 1 , (rank 2 , (rank 3 , are spawned successfully. 0 failed. comm 1) of size 65 is running on: midway449 comm 1) of size 65 is running on: midway449 comm 1) of size 65 is running on: midway449 comm 1) of size 65 is running on: midway449 Chapter 6. Software Documentation RCC Documentation, Release 1.0 ... ... ... slave63 (rank 63, comm 1) of size 65 is running on: midway452 slave64 (rank 64, comm 1) of size 65 is running on: midway449 $slave1 [1] "I am 1 of 65" $slave2 [1] "I am 2 of 65" ... $slave63 [1] "I am 63 of 65" $slave64 [1] "I am 64 of 65" 6.3.2 GPU Computing RCC’s Midway compute cluster contains a number of GPU-equipped compute nodes. The GPUs in these nodes are available for use in general purpose GPU computing applications. Each GPU compute node contains dual 8-core intel Sandy Bridge processors, and two NVIDIA GPU devices. Three of these nodes contain dual M2090 (Fermi generation) devices, and two of the nodes contain dual K20 (Kepler generation) devices. Running GPU code on Midway When submitting jobs to the GPU nodes, you must use include the following SBATCH options: #SBATCH --partition=gpu #SBATCH --gres=gpu:<N> The flag --gres=gpu:N is used to request N GPU devices on each of the nodes on which your job will run. Valid numbers for N is either 1 or 2. If you are requesting both GPUs in a node, we also suggest including the #SBATCH --exclusive flag in your submission script to prevent other jobs from being placed on that node. Note: RCC requests that when writing custom CUDA code, do not explicitly select a GPU device (i.e. do not make a call to cudaSetDevice(int n)). SLURM will allocate a particular GPU in the node on which your job lands and the CUDA runtime will automatically detect which device it has been allocated. If you must specify a particular device ID in your code, check the environment variable CUDA_VISIBILE_DEVICES at run time to determine which device(s) has been allocated to your job. Additionally, you can specify which type of GPU device your job runs on by including a --constraint on your job. To ensure your job runs on a M2090 or K20 device, include one of the following lines in your submission script: #SBATCH --constraint=m2090 or #SBATCH --constraint=k20m An example GPU-enabled job script for a CUDA program is given below gpu.sbatch: #!/bin/bash # This script will request one GPU device and 1 CPU core 6.3. Environments 129 RCC Documentation, Release 1.0 #SBATCH #SBATCH #SBATCH #SBATCH #SBATCH #SBATCH #SBATCH #SBATCH --job-name=gpuSbatch --output=gpuSbatch.out --error=gpuSbatch.err --time=01:00:00 --nodes=1 --ntasks=1 --partition=gpu --gres=gpu:1 # if your executable was built with CUDA, be sure to load the CUDA module: module load cuda # if your exectuable was built with PGI (OpenACC), be sure to load the PGI module: module load pgi/2013 # # your GPU-based executable here # Compiling CUDA GPU code on Midway To view available CUDA versions on Midway, use the command: module avail cuda A very basic CUDA example code is provided below cudamemset.cu: #include <stdio.h> #include <cuda.h> int main(){ int n = 16; // host and device memory pointers int *h_a; int *d_a; // allocate host memory h_a = (int*)malloc(n * sizeof(int)); // allocate device memory cudaMalloc((void**)&d_a, n * sizeof(int)); // set device memory to all zero’s cudaMemset(d_a, 0, n * sizeof(int)); // copy device memory back to host cudaMemcpy(h_a, d_a, n * sizeof(int), cudaMemcpyDeviceToHost); // print host memory for (int i = 0; i < n; i++){ printf("%d ", h_a[i]); } printf("\n"); // free buffers 130 Chapter 6. Software Documentation RCC Documentation, Release 1.0 free(h_a); cudaFree(d_a); return 0; } CUDA code must be compiled with Nvidia’s nvcc compiler which is part of the cuda software module. To build a CUDA executable, first load the desired CUDA module and compile with: nvcc source_code.cu Compiling OpenACC GPU code on Midway OpenACC is supported on Midway through the PGI 2013 compiler suite. To load the OpenACC compiler, use the command: module load pgi/2013 A very basic OpenACC example code is provided below stencil.c: #include <stdio.h> #include <stdlib.h> int main(){ int i,j,it; // set the size of our test arrays int numel = 2000; // allocate and initialize test arrays float A[numel][numel]; float Anew[numel][numel]; for (i = 0; i < numel; i++){ for ( j = 0; j < numel; j++){ A[i][j] = drand48(); } } // apply stencil 1000 times #pragma acc data copy(A), create(Anew) for (it = 0; it < 1000; it++){ #pragma acc parallel loop for (i = 1; i < numel-1; i++){ for (j = 1; j < numel-1; j++){ Anew[i][j] = 0.25f * (A[i][j-1] + A[i][j+1] + A[i-1][j] + A[i+1][j]); } } #pragma acc parallel loop for (i = 1; i < numel-1; i++){ for (j = 1; j < numel-1; j++){ 6.3. Environments 131 RCC Documentation, Release 1.0 A[i][j] = Anew[i][j]; } } } // do something with A[][] return 0; } OpenACC code targeted at an Nvidia GPU must be compiled with the PGI compiler using at least the following options: pgcc source_code.c -ta=nvidia -acc 6.3.3 Hybrid MPI/OpenMP See also: MPI MPI and OpenMP can be used at the same time to create a Hybrid MPI/OpenMP program. Let’s look at an example Hybrid MPI/OpenMP hello world program and explain the steps needed to compile and submit it to the queue. An example MPI hello world program: hello-hybrid.c #include <stdio.h> #include "mpi.h" #include <omp.h> int main(int argc, char *argv[]) { int numprocs, rank, namelen; char processor_name[MPI_MAX_PROCESSOR_NAME]; int iam = 0, np = 1; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &numprocs); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Get_processor_name(processor_name, &namelen); #pragma omp parallel default(shared) private(iam, np) { np = omp_get_num_threads(); iam = omp_get_thread_num(); printf("Hello from thread %d out of %d from process %d out of %d on %s\n", iam, np, rank, numprocs, processor_name); } MPI_Finalize(); } Place hello-hybrid.c in your $HOME directory. Compile and execute this program interactively by entering the following commands into the terminal. 132 Chapter 6. Software Documentation RCC Documentation, Release 1.0 module load openmpi mpicc -fopenmp hello-hybrid.c -o hello-hybrid In this case we are using the default version of the openmpi module which defaults to the GCC compiler. It should be possible to use any available MPI/compiler for this example. An additional option -fopenmp must be given to compile a program with OpenMP pragmas (-openmp for the Intel compiler and -mp for the PGI compiler). A example sbatch script that runs hello-hybrid.sbatch is a submission script that can be used to submit the job to the queue. #!/bin/bash # set the job name to hello-hybrid #SBATCH --job-name=hello-hybrid # send output to hello-hybrid.out #SBATCH --output=hello-hybrid.out # this job requests 2 nodes #SBATCH --nodes=2 # this job requests exclusive access to the nodes it is given # this mean it will be the only job running on the node #SBATCH --exclusive # only request 1 MPI task per node #SBATCH --ntasks-per-node=1 # and request 16 cpus per task for OpenMP threads #SBATCH --cpus-per-task=16 # --constraint=ib must be give to guarantee a job is allocated # nodes with Infiniband #SBATCH --constraint=ib # load the openmpi module module load openmpi # set OMP_NUM_THREADS to the number of --cpus-per-task we asked for export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK # Run the process with mpirun. Notice -n is not required. mpirun will # automatically figure out how many processes to run from the slurm options mpirun ./hello-hybrid The options are similar to MPI, but with notable additions: • --ntasks-per-node=1 is given to only spawn 1 MPI rank per node • --cpus-per-task=16 is given to allocate 16 cpus for each task • export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK will set the number of tasks specified in --cpus-per-task Generally the product of --ntasks-per-node and --cpus-per-task should equal the number of cores on each node, in this case 16. You can submit this job with the command below: 6.3. Environments 133 RCC Documentation, Release 1.0 sbatch hello-hybrid.sbatch Here is example output of this program: Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from node node node node node node node node node node node node node node node node node node node node node node node node node node node node node node node node midway123, midway123, midway123, midway123, midway123, midway123, midway123, midway123, midway123, midway123, midway123, midway123, midway123, midway123, midway123, midway124, midway124, midway124, midway124, midway124, midway124, midway124, midway124, midway124, midway124, midway124, midway124, midway124, midway124, midway124, midway124, midway123, core core core core core core core core core core core core core core core core core core core core core core core core core core core core core core core core 0-15; 0-15; 0-15; 0-15; 0-15; 0-15; 0-15; 0-15; 0-15; 0-15; 0-15; 0-15; 0-15; 0-15; 0-15; 0-15; 0-15; 0-15; 0-15; 0-15; 0-15; 0-15; 0-15; 0-15; 0-15; 0-15; 0-15; 0-15; 0-15; 0-15; 0-15; 0-15; AKA AKA AKA AKA AKA AKA AKA AKA AKA AKA AKA AKA AKA AKA AKA AKA AKA AKA AKA AKA AKA AKA AKA AKA AKA AKA AKA AKA AKA AKA AKA AKA rank rank rank rank rank rank rank rank rank rank rank rank rank rank rank rank rank rank rank rank rank rank rank rank rank rank rank rank rank rank rank rank 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread 0 14 15 3 13 11 6 1 2 5 7 8 12 4 9 1 3 13 0 8 2 15 6 4 12 10 11 5 9 14 7 10 6.3.4 Matlab Parallel To run MATLAB effectively using parallel computing techniques requires a few basic concepts which can be optimized and expanded upon. The MATLAB Parallel Computing Toolbox User’s Guide is the official documentation and should be referred to for further details, examples and explanations. Here, we provide some Midway-specific considerations that RCC users should be aware of. Note: At this time, RCC does not support the Matlab Distributed Compute Server (MDCS). As such, parallel Matlab jobs are limited to 12 workers on a single node with the “local” pool through use of the Parallel Compute Toolbox (PCT). Basic PCT Operation The most basic level of parallelization in Matlab is achieved through use of a parfor loop in place of a for loop. The iterations of a parfor loop are distributed to the workers in the active matlabpool and computed concurrently. For this reason, care must be taken to ensure that each iteration of the parfor loop is independent of every other. The overall procedure for leveraging parfor in your Matlab script is as follows: 134 Chapter 6. Software Documentation RCC Documentation, Release 1.0 1. Create a local matlabpool 2. Call parfor in place of for in your Matlab scripts and functions A simple Matlab script that uses parfor can be downloaded here: matlab_parfor.m Submitting a PCT Matlab Job to SLURM Compute intensive jobs that will consume non-trivial amounts of CPU and/or memory resources should not be run on Midway’s login nodes. Instead, the job should be submitted to the scheduler and run on a compute node. A sample submission script for the above example Matlab sample is provided here: matlab_parfor.sbatch Running Multiple PCT Matlab Jobs Specific care must be taken when running multiple PCT jobs on Midway. When you submit multiple jobs that are all using PCT for parallelization, the multiple matlabpools that get created have the ability to interfere with one another which can lead to errors and early termination of your scripts. The Matlab PCT requires a temporary “Job Storage Location” where is stores information about the Matlab pool that is in use. This is simply a directory on the filesystem that Matlab writes various files to in order to coordinate the parallelization of the matlabpool. By default, this information is stored in /home/YourUsername/.matlab/ (the default “Job Storage Location”). When submitting multiple jobs to SLURM that will all use the PCT, all of the jobs will attempt to use this default location for storing job information thereby creating a race condition where one job modifies the files that were put in place by another. Clearly, this situation must be avoided. The solution is to have each of your jobs that will use the PCT set a unique location for storing job information. To do this, a temporary directory must be created before launching matlab in your submission script and then the matlabpool must be created to explicitly use this unique temporary directory. An example sbatch script to do this is shown below: #!/bin/bash #SBATCH #SBATCH #SBATCH #SBATCH #SBATCH #SBATCH --job-name=whatever --output=matlab_parfor.out --error=matlab_parfor.err --partition=sandyb --time=00:10:00 --exclusive module load matlab/2013b # Create a local work directory mkdir -p /tmp/YourUsername/$SLURM_JOB_ID # Kick off matlab matlab -nodisplay < multi_parfor.m # Cleanup local work directory rm -rf /tmp/YourUsername/$SLURM_JOB_ID And the corresponding Matlab script: % create a local cluster object pc = parcluster(’local’) % explicitly set the JobStorageLocation to the temp directory that was created in your sbatch script pc.JobStorageLocation = strcat(’/tmp/YourUsername/’, getenv(’SLURM_JOB_ID’)) 6.3. Environments 135 RCC Documentation, Release 1.0 % start the matlabpool with 12 workers matlabpool(pc, 12) % run a parallel for loop parfor i = 1:100 ones(10,10) end 6.3.5 MATLAB RCC provides the Matlab programming environment on all Midway compute resources. Most Matlab toolboxes are also available. When running compute- or memory-intensive Matlab jobs on Midway, it is important to run on compute nodes, and not on the login nodes. Note: Compute- and memory-intensive jobs running on the login nodes are subject to termination without warning by RCC system administrators as this impacts the performance of the login nodes and ability for other users to work. Getting Started To gain access to Matlab, a Matlab module must be loaded with the command: module load matlab A full list of the available Matlab versions can be obtained by issuing the command: module avail matlab Using Matlab’s Textual Interface On Midway, Matlab can be launched at the terminal with the command: matlab This will launch Matlab’s textual interface. We recommend running Matlab on a compute node as opposed to a login node. To obtain a shell on a compute node, use the sinteractive command. Using Matlab’s GUI Interface To use Matlab’s GUI interface on Midway, we reccomend connecting to Midway via NX. Information about how to use NX can be found here: http://docs.rcc.uchicago.edu/user-guide.html#nx Note that once connected via NX, you will be accessing a Midway login node. In order to run Matlab with its GUI interface on a compute node, obtain a terminal in the NX desktop and issue the sinteractive command. This will deliver you to a compute node. From there, you can launch Matlab with the command: module load matlab matlab and have access to the GUI interface. 136 Chapter 6. Software Documentation RCC Documentation, Release 1.0 Running Matlab Jobs with SLURM To submit Matlab jobs to Midway’s resource scheduler, SLURM, the Matlab commands to be executed must be containined in a single .m script. matlab_simple.m is a basic Matlab script that computes and prints a 10x10 magic square matlab_simple.sbatch is a submission script that submits Matlab program to the default queue To run this example, download both files to a directory on Midway. Then, enter the following command to submit the program matlab_simple.m to the scheduler: sbatch matlab_simple.sbatch Output from this example can be found in the file named matlab.out which will be created in the same directory. 6.3.6 Intel MIC RCC’s Midway compute cluster contains two nodes with Intel MIC co-processor cards. Each MIC compute node contains dual 8-core intel Sandy Bridge processors, and two Intel MIC devices. The MIC nodes are contained in the “mic” partition. To compile software that interacts with the MIC cards, you must use the Intel compilers available through these software modules on Midway: intel/14.0 intelmpi/4.1+intel-14.0 Here is a basic quickstart using interactive mode that will compile and execute the samples provided with the intel compiler suite: sinteractive -p mic --exclusive module load intelmpi/4.1+intel-14.0 cp -r /software/intel/composer_xe_2013_sp1/Samples/en_US/C++/mic_samples ~ cd ~/mic_samples/intro_sampleC make mic ./intro_sampleC.out It is also possible to submit jobs via sbatch to the mic partition. To do so, the option “–partition=mic” must be included in your submission script. The MIC cards also mount the filesystem locally via NFS and it is possible to SSH into the cards from an interactive session on the node. The hostnames on the MIC host node are mic0 and mic1. 6.3.7 MPI See also: Hybrid MPI/OpenMP • Overview • MPI Implementation Notes – IntelMPI – MVAPICH2 – OpenMPI • Example – Advanced Usage 6.3. Environments 137 RCC Documentation, Release 1.0 Overview MPI is a commonly-used message-passing library for writing parallel high-performance programs. MPI allows for explicit programmer control of data movement and interprocess communication. The specification for MPI is available here. RCC supports these MPI implementations: • IntelMPI • MVAPICH2 • OpenMPI Each MPI implementation usually has a module available for use with GCC, the Intel Compiler Suite, and PGI. For example, at the time of this writing these MPI modules were available: openmpi/1.6(default) openmpi/1.6+intel-12.1 openmpi/1.6+pgi-2012 mvapich2/1.8(default) mvapich2/1.8+intel-12.1 mvapich2/1.8+pgi-2012 mvapich2/1.8-gpudirect mvapich2/1.8-gpudirect+intel-12.1 intelmpi/4.0 intelmpi/4.0+intel-12.1(default) MPI Implementation Notes The different MPI implementations have different options and features. Any notable differences are noted here. IntelMPI IntelMPI uses an environment variable to affect the network communication fabric it uses: I_MPI_FABRICS During job launch the Slurm TaskProlog detects the network hardware and sets this variable approately. This will typically be set to shm:ofa, which makes IntelMPI use shared memory communication followed by ibverbs. If a job is run on a node without Infiniband this will be set to shm which uses shared memory only and limits IntelMPI to a single node job. This is usually what is wanted on nodes without a high speed interconnect. This variable can be overridden if desired in the submission script. MVAPICH2 MVAPICH2 is compiled with the OFA-IB-CH3 interface. There is no support for running programs compiled with MVAPICH2 on loosely coupled nodes. GPUDirect builds of MVAPICH2 with CUDA enabled are available for use on the GPU nodes. These builds are otherwise identical to the standard MVAPICH2 build. OpenMPI Nothing at this time. 138 Chapter 6. Software Documentation RCC Documentation, Release 1.0 Example Let’s look at an example MPI hello world program and explain the steps needed to compile and submit it to the queue. An example MPI hello world program: hello-mpi.c #include <stdio.h> #include <stdlib.h> #include <mpi.h> int main(int argc, char *argv[], char *envp[]) { int numprocs, rank, namelen; char processor_name[MPI_MAX_PROCESSOR_NAME]; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &numprocs); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Get_processor_name(processor_name, &namelen); printf("Process %d on %s out of %d\n", rank, processor_name, numprocs); MPI_Finalize(); } Place hello-mpi.c in your home directory. Compile and execute this program interactively by entering the following commands into the terminal: module load openmpi mpicc hello-mpi.c -o hello-mpi In this case we are using the default version of the openmpi module which defaults to the GCC compiler. It should be possible to use any available MPI/compiler for this example. hello-mpi.sbatch is a submission script that can be used to submit a job to the queue to run this program. #!/bin/bash # set the job name to hello-mpi #SBATCH --job-name=hello-mpi # send output to hello-mpi.out #SBATCH --output=hello-mpi.out # this job requests 2 nodes #SBATCH --nodes=2 # this job requests exclusive access to the nodes it is given # this mean it will be the only job running on the node #SBATCH --exclusive # run this job on the Sandy Bridge nodes. #SBATCH --partition=sandyb # load the openmpi module module load openmpi # Run the process with mpirun. Notice -n is not required. mpirun will # automatically figure out how many processes to run from the slurm options mpirun ./hello-mpi The inline comments describe what each line does. In addition, note that: 6.3. Environments 139 RCC Documentation, Release 1.0 • --partition=sandyb ensures that the program runs on the tightly-coupled Sandy Bridge nodes • --exclusive is given to guarantee this job will be the only job on the node • mpirun does not need to be given -n. All supported MPI environments automatically determine the proper layout based on the slurm options You can submit this job with this command: sbatch hello-mpi.sbatch Here is example output of this program: Process Process Process Process Process Process Process Process Process Process Process Process Process Process Process Process Process Process Process Process Process Process Process Process Process Process Process Process Process Process Process Process 4 on midway123 out of 32 0 on midway123 out of 32 1 on midway123 out of 32 2 on midway123 out of 32 5 on midway123 out of 32 15 on midway123 out of 32 12 on midway123 out of 32 7 on midway123 out of 32 9 on midway123 out of 32 14 on midway123 out of 32 8 on midway123 out of 32 24 on midway124 out of 32 10 on midway123 out of 32 11 on midway123 out of 32 3 on midway123 out of 32 6 on midway123 out of 32 13 on midway123 out of 32 17 on midway124 out of 32 20 on midway124 out of 32 19 on midway124 out of 32 25 on midway124 out of 32 27 on midway124 out of 32 26 on midway124 out of 32 29 on midway124 out of 32 28 on midway124 out of 32 31 on midway124 out of 32 30 on midway124 out of 32 18 on midway124 out of 32 22 on midway124 out of 32 21 on midway124 out of 32 23 on midway124 out of 32 16 on midway124 out of 32 It is possible to affect the number of tasks run per node with the --ntasks-per-node option. Submitting the job like this: sbatch --ntasks-per-node=1 hello-mpi.sbatch Results in output like this: Process 0 on midway123 out of 2 Process 1 on midway124 out of 2 140 Chapter 6. Software Documentation RCC Documentation, Release 1.0 Advanced Usage Both OpenMPI and IntelMPI have the possibility to launch MPI programs directly with the Slurm command srun. It is not necessary to use this mode for most jobs, but it may allow job launch options that would not otherwise be possible. For example, on a login node it is possible to launch the above hello-mpi command using OpenMPI directly on a compute node with this command: srun --partition=sandyb -n16 --exclusive hello-mpi For IntelMPI, it is necessary to set an environment variable for this to work: export I_MPI_PMI_LIBRARY=/software/slurm-current-$DISTARCH/lib/libpmi.so srun --partition=sandyb -n16 --exclusive hello-mpi 6.3.8 mpi4py See also: MPI and Python • Overview • Example Overview Using a Python module called mpi4py, one can write MPI codes using Python. MPI is a commonly-used messagepassing library for writing parallel high-performance programs. MPI allows for explicit programmer control of data movement and interprocess communication. The specification for MPI is available here. mpi4py allows you to use the power of MPI programming with the ease-of-use of Python. mpi4py functions have a syntax and semantics that will be familiar to MPI programmers; one difference is that there is no need to call MPI_Init() or MPI_Finalize(), as those functions are called automatically when you import the module and before the Python process ends, respectively. The following mpi4py modules are available on Midway: mpi4py/1.3+intelmpi-4.0(default) mpi4py/1.3-2014q1+intelmpi-4.0 They will automatically load the corresponding Python module and MPI library. Example Let’s look at an example mpi4py program hello-mpi.py: import sys from mpi4py import MPI nprocs = MPI.COMM_WORLD.Get_size() rank = MPI.COMM_WORLD.Get_rank() name = MPI.Get_processor_name() print "Process %d on %s out of %d" % (rank, name, nprocs) 6.3. Environments 141 RCC Documentation, Release 1.0 Each MPI process launched by this program will display some information about itself, then exit. Place hello-mpi.py in your home directory. hello-mpi.sbatch is a submission script that can be used to submit a job to the queue to run this program. #!/bin/bash #SBATCH #SBATCH #SBATCH #SBATCH --job-name=hello-mpi --output=hello-mpi.out --ntasks=32 --partition=sandyb module load mpi4py/1.3+intelmpi-4.0 mpirun python hello-mpi.py This submission script loads the mpi4py module, then runs the Python program with “mpirun.” Note that • --partition=sandyb ensures that the job runs on the Sandy Bridge nodes. • mpirun does not need to be given the -n option. All supported MPI environments automatically determine the proper layout based on the Slurm options. You can submit this job with this command: sbatch hello-mpi.sbatch Here is example output of this program: Process Process Process Process Process Process Process Process Process Process Process Process Process Process Process Process Process Process Process Process Process Process Process Process Process Process Process Process Process Process Process Process 142 9 on midway461 out of 32 10 on midway461 out of 32 17 on midway462 out of 32 13 on midway461 out of 32 2 on midway461 out of 32 15 on midway461 out of 32 14 on midway461 out of 32 7 on midway461 out of 32 12 on midway461 out of 32 8 on midway461 out of 32 4 on midway461 out of 32 0 on midway461 out of 32 6 on midway461 out of 32 16 on midway462 out of 32 24 on midway462 out of 32 25 on midway462 out of 32 19 on midway462 out of 32 21 on midway462 out of 32 23 on midway462 out of 32 18 on midway462 out of 32 3 on midway461 out of 32 5 on midway461 out of 32 1 on midway461 out of 32 11 on midway461 out of 32 28 on midway462 out of 32 22 on midway462 out of 32 29 on midway462 out of 32 30 on midway462 out of 32 31 on midway462 out of 32 26 on midway462 out of 32 20 on midway462 out of 32 27 on midway462 out of 32 Chapter 6. Software Documentation RCC Documentation, Release 1.0 6.3.9 Python hello.py is a Python program. Execute by entering the following commands: cd $HOME/rcchelp/software/python.rcc-docs module load python python hello.py python.sbatch is a submission script that will run the job on through the queue. You can submit this job with the command sbatch python.sbatch 6.3.10 Stata Stata is a powerful statistical software package that is widely used in scientific computing. RCC users are licensed to use Stata on all RCC resources. Stata can be used interactively or as a submitted script. Please note that if you would like to run it interactively, you must still run it on a compute node, in order to keep the login nodes free for other users. Stata can be run in parallel on up to 16 nodes. Note: Stata examples in this document are adapted from a Princeton tutorial. You may find it useful if you are new to Stata or want a refresher. Getting Started First, obtain a graphical login to Midway. You can use X-forwarding for this (ssh -X), but you may have a better experience if you connect with NX. Obtain an interactive session on a compute node. This is necessary so that your computation doesn’t interrupt other users on the login node. Now, load Stata: sinteractive module load stata xstata This will open up a Stata window. The middle pane has a text box to enter commands at the bottom, and a box for command results on top. On the left there’s a box called “Review” that shows your command history. The right-hand box contains information about variables in the currently-loaded data set. One way Stata can be used is as a fancy desktop calculator. Type the following code into the command box: display 2+2 Stata can do much more if data is loaded into it. The following code loads census data that ships with Stata, prints a description of the data, then creates a graph of life expectancy over GNP: sysuse lifeexp describe graph twoway scatter lexp gnppc Running Stata from the command line This is very similar to running graphically; the command-line interface is equivalent to the “Results” pane in the graphical interface. Again, please use a compute node if you are running computationally-intensive calculations: 6.3. Environments 143 RCC Documentation, Release 1.0 sinteractive module load stata stata Running Stata Jobs with SLURM You can also submit Stata jobs to SLURM, the scheduler. A Stata script is called a “do-file,” which contains a list of Stata commands that the interpreter will execute. You can write a do-file in any text editor, or in the Stata GUI’s do-file editor: click “Do-File Editor”” in the “Window” menu. If your do-file is named “example.do,” you can run it with either of the following commands: stata < example.do stata -b do example.do Here is a very simple do-file, which computes a regression on the sample data set from above: version 13 // current version of Stata, this is optional but recommended. sysuse lifeexp gen loggnppc = log(gnppc) regress lexp loggnppc Here is a submission script that submits the Stata program to the default queue on Midway: #!/bin/bash #SBATCH #SBATCH #SBATCH #SBATCH #SBATCH --job-name=stataEx --output=stata_example.out --error=stata_example.err --nodes=1 --tasks-per-node=1 module load stata stata -b stata_example.do stata_example.do is our example do-file, and stata_example.sbatch is the submission script. To run this example, download both files to a directory on Midway. Enter the following command to submit the program to the scheduler: sbatch stata_example.sbatch Output from this example can be found in the file named stata_example.log, which will be created automatically in your current directory. Running Parallel Stata Jobs The parallel version of Stata, Stata/MP, can speed up computations and make effective use of RCC’s resources. When running Stata/MP, you are limited to 16 cores and 5000 variables. Run an interactive Stata/MP session: sinteractive module load stata stata-mp # or, for the graphical interface: xstata-mp 144 Chapter 6. Software Documentation RCC Documentation, Release 1.0 Here is a sample do-file that would benefit from parallelization. It runs bootstrap estimation on another data set that ships with Stata. version 13 sysuse auto expand 10000 bootstrap: logistic foreign price-gear_ratio Here is a submission script that will run the above do-file with Stata/MP: #!/bin/bash #SBATCH #SBATCH #SBATCH #SBATCH #SBATCH --job-name=stataMP --output=stata_parallel.out --error=stata_parallel.err --nodes=1 --tasks-per-node=16 module load stata stata-mp -b stata_parallel.do Download stata_parallel.do and stata_parallel.sbatch to Midway, then run the program with: sbatch stata_parallel.sbatch 6.4 Libraries Library Support 6.4.1 FFTW FFTW, “the Fastest Fourier Transform in the West,” is a popular open-source library for computing discrete Fourier transforms. It supports both real and complex transformations, in both single and double precision. RCC has configured and installed a large variety of fftw modules including different versions, compiler choices, and parallelization strategy. Please contact RCC if a different version or configuration is necessary for your work. FFTW2 FFTW 2.1.5 is an older, now unsupported version, but still commonly used by codes as the API has changed in later versions. The versions compiled by the RCC support shared memory parallelism through OpenMP and distributed parallelism through MPI. See FFTW2 Documentation for complete documentation of this version. The non-MPI modules include both single and double precision versions, as well as openMP support. These have been built for all three RCC supported compilers (gcc, intel, and pgi). The library defaults to double-precision, however single-precision can be used by adding the prefix s to all filenames and library calls (see the following for more details). The MPI modules are compiled for each MPI library and compiler (leading to a large number of available combinations). These should all be interchangeable, simply ensure you match the correct module to the MPI library and compiler you are using (the module system will complain otherwise). 6.4. Libraries 145 RCC Documentation, Release 1.0 Each module adds the fftw2 library to your includes path, but for codes that prefer to self-configure, the environment variable FFTW2_DIR points to the currently loaded version. The RCC help system includes a sample code to perform 1-dimensional complex transformations in serial or in parallel. To compile and run each sample code, use the following commands: fftw2_serial.c: module load fftw2/2.1.5 gcc fftw2_serial.c -lm -lfftw ./a.out 512 fftw2_openmp.c: module load fftw2/2.1.5 gcc -fopenmp fftw2_openmp.c -lm -lfftw -lfftw_threads OMP_NUM_THREADS=8 ./a.out 512 8 fftw2_mpi.c: module load openmpi/1.6 module load fftw2/2.1.5+openmpi-1.6 mpicc fftw2_mpi.c -lfftw_mpi -lfftw mpirun -np 4 ./a.out 512 FFTW3 The API for FFTW has significantly changed in the 3.X branch of FFTW. MPI support has only recently been reincluded as a stable feature (3.3.X), and it is this version the RCC supports. As FFTW2 is no longer supported, we recommend users upgrade to the newest version if their code allows. Documentation for this version can be viewed at: FFTW3 Documentation Single precision support is included by post-fixing ‘f’ to commands and filenames, see this document Sample codes for serial, shared memory (openMP), and distributed (MPI) have been included in the RCC help system. Use the following to compile and run each sample: fftw3_serial.c: module load fftw3/3.3 gcc fftw3_serial.c -lm -lfftw ./a.out 512 fftw3_openmp.c: module load fftw3/3.3 gcc -fopenmp fftw3_openmp.c -lfftw3_omp -lfftw3 -lm OMP_NUM_THREADS=8 ./a.out 512 fftw3_threads.c: (this is a pthread enabled version) module load fftw3/3.3 gcc fftw3_threads.c -lpthread -lfftw3_threads -lfftw3 -lm ./a.out 512 8 fftw3_mpi.c: (note, only 2d or higher dimensional transforms supported in 3.3) module load fftw3/3.3+openmpi-1.6 mpicc fftw3_mpi.c -lfftw3_mpi -lfftw3 mpirun -np 4 ./a.out 512 146 Chapter 6. Software Documentation RCC Documentation, Release 1.0 6.4.2 NetCDF To run NetCDF codes they must be compiled first, usually from a C or Fortran file The module system has two versions of NetCDF, 3.6.3 and 4.2 (4.2 may be updated) The reason for this is due to incompatibilities and between them - particularly with the PGI family of compilers. To help run fortran code simply there are several files that go along with this help file. Copy and run those files to test version compatibility. A sample file is also provided to verify proper functionality. Example: ./pgf90-netcdf-3.6.3 simple_xy_wr.f90 output_file ./output_file *** SUCCESS writing example file simple_xy.nc! 6.5 Scheduler Scheduler Support 6.5.1 Interactive Node Usage sinteractive The preferred command to use of interactive node usage is sinteractive, which submits a requests to the scheduler for dedicated resources that you can use interactively. When the resources become available (hopefully immediately), sinteractive will do the following: • Log into the node • Change into the directory you were working in • Set up X11 forwarding • Transfer your entire environment including any modules you have loaded to the new node. To get started quickly with the default parameters, simply enter sinteractive on the command line. Any options used for the command sbatch, should also work with sinteractive. These options include selecting queues, constraints, wallclock time and number of nodes. An example command to select two full nodes exclusively (16 cores each), on the the Infiniband network, for four hours, is below: sinteractive --nodes=2 --exclusive --constraint=ib --time=4:0:0 Using Bigmem Nodes Interactively If you will require more than 32GB of memory in your interactive session, a bigmem node can be used. The Midway cluster contains 3 bigmem nodes. Two of which have 256GB of memory, and a third that has 1TB of memory. To access these nodes with an sinteractive session, use the following command: sinteractive -p bigmem --mem-per-cpu=<MB> Where <MB> is the amount of memory (in megabytes) you think you will need. To request 1 CPU and 128GB of memory for example, use the command: 6.5. Scheduler 147 RCC Documentation, Release 1.0 sinteractive -p bigmem --mem-per-cpu=128000 If your job requires multiple CPUs, you will need to divide the total amount of memory you need by the number of CPUs you are requesting. For example, if you want to request 8 CPU cores and 128GB of memory (128GB / 8cores = 16GB/core), you would use: sinteractive -p bigmem --cpus-per-task=8 --mem-per-cpu=16000 srun It is also possible to use the command srun for interactive node use with this command: srun --pty bash This method will not set up X11 forwarding and launches the job differently than sinteractive. You can specify any options to srun that are necessary to run your job. 6.5.2 Parallel Batch Job Submission Batch jobs that consist of large numbers of serial jobs should most likely be combined in some way to optimize job submission and reduce the number of jobs submitted to a more manageable number. RCC has developed a method to handle these type of job submissions using GNU Parallel and srun. For this submission method, a single job is submitted with a chosen number of CPU cores allocated (using the Slurm option --ntasks=X) and the parallel command is used to run that number of tasks simultaneously until all tasks have been completed. An example submit file is parallel.sbatch: #!/bin/sh #SBATCH --time=01:00:00 #SBATCH --ntasks=32 #SBATCH --exclusive module load parallel # for large numbers of tasks the controlling node will have a large number # of processes, so it will be necessary to change the user process limit #ulimit -u 10000 # the --exclusive to srun make srun use distinct CPUs for each job step # -N1 -n1 allocates a single core to each task srun="srun --exclusive -N1 -n1" # --delay .2 prevents overloading the controlling node # -j is the number of tasks parallel runs so we set it to $SLURM_NTASKS # --joblog makes parallel create a log of tasks that it has already run # --resume makes parallel use the joblog to resume from where it has left off # the combination of --joblog and --resume allow jobs to be resubmitted if # necessary and continue from where they left off parallel="parallel --delay .2 -j $SLURM_NTASKS --joblog runtask.log --resume" # this runs the parallel command we want # in this case, we are running a script named runtask # parallel uses ::: to separate options. Here {1..128} is a shell expansion 148 Chapter 6. Software Documentation RCC Documentation, Release 1.0 # so parallel will run the runtask script for the numbers 1 through 128 # {1} is the first argument # as an example, the first job will be run like this: # srun --exclusive -N1 -n1 ./runtask arg1:1 > runtask.1 $parallel "$srun ./runtask arg1:{1} > runtask.{1}" ::: {1..128} In this submit file we want to run the runtask script 128 times. The --ntasks option is set to 32, so we are allocating 1/4th the number of CPU cores as tasks that we want to run. Parallel is very flexible in what can be used as the command line arguments. In this case we are using a simple shell expansion of the numbers 1 through 128, but arguments can be piped into the parallel command, arguments could be file names instead of numbers, replacements can be made, and much more. Reading the Parallel Manual will provide give full details on its capabilities. Here is the example runtask script. This is the script that parallel runs. #!/bin/sh # this script echoes some useful output so we can see what parallel # and srun are doing sleepsecs=$[ ( $RANDOM % 10 ) + 10 ]s # $1 is arg1:{1} from parallel. # $PARALLEL_SEQ is a special variable from parallel. It the actual sequence # number of the job regardless of the arguments given # We output the sleep time, hostname, and date for more info echo task $1 seq:$PARALLEL_SEQ sleep:$sleepsecs host:$(hostname) date:$(date) # sleep a random amount of time sleep $sleepsecs To submit this job, download both the parallel.sbatch and runtask scripts and put them in the same directory. The job can be submitted like this: sbatch parallel.sbatch When this job completes there should be 128 runtask.N output files. Output for the first output file will look similar to this: task arg1:1 seq:1 sleep:14s host:midway002 date:Thu Jan 10 09:17:36 CST 2013 A file named runtask.log will also be created that lists the complete jobs. If this job is resubmitted, nothing will be run until the runtask.log file is removed. It is also possible to use this technique to run tasks that are either multi-threaded or can otherwise use more than one CPU at a time. Here is an example submit file: #!/bin/sh #SBATCH #SBATCH #SBATCH #SBATCH #SBATCH --time=01:00:00 --nodes=2 --ntasks-per-node=1 --cpus-per-task=16 --exclusive module load parallel # For this mode add -c (--cpus-per-task) to the srun command srun="srun --exclusive -N1 -n1 -c$SLURM_CPUS_PER_TASK" 6.5. Scheduler 149 RCC Documentation, Release 1.0 # Instead of $SLURM_NTASKS we want to use $SLURM_NNODES to tell # parallel how many jobs to start parallel="parallel --delay .2 -j $SLURM_NNODES --joblog runtask.log --resume" # run the parallel command again. The runtask command should be able to use the # 16 cpus we requested with -c16 by itself $parallel "$srun ./runtask arg1:{1} > runtask.{1}" ::: {1..16} In this case, the runtask script itself will use 16 CPUs and parallel is used to give work to only the 2 nodes requested. 6.5.3 sbatch - Slurm Batch Jobs Submission Program Official Documentation: http://www.schedmd.com/slurmdocs/sbatch.html The sbatch command is the primary way in which users submit jobs to the Midway cluster through the Slurm resource scheduler system. An “sbatch script” contains all the commands and parameters neccessary to run a program on the cluster. This allows the user to keep that file around for job re-submission and fine tuning. Slurm parameters are specified with the #SBATCH option followed by a flag. A basic example sbatch script as as follows: #!/bin/bash #SBATCH --job-name=example_sbatch #SBATCH --output=example_sbatch.out #SBATCH --error=example_sbatch.err #SBATCH --time=00:05:00 #SBATCH --partition=sandyb #SBATCH --qos=normal #SBATCH --nodes=4 #SBATCH --ntasks-per-node=16 module load openmpi mpiexec ./myExecutable In this basic sbatch example the main sbatch options have been specified: Option #SBATCH –job-name #SBATCH –output #SBATCH –error #SBATCH –time #SBATCH –partition #SBATCH –qos #SBATCH –nodes #SBATCH –ntasks-per-node Description A name for this particular job The file where standard output will be written to The file where standard error will be written to The maximum amount of time this job will run for. If it is not complete in this amount of time, it will be canceled The group of compute nodes on which this job should run. Most jobs will use ‘sandyb’ The quality of service associated with this job. ‘normal’ is most frequently used This many compute nodes are being requested for this job This number of CPUs in each compute node are being requested Because in this example 4 compute nodes each with 16 CPUs were requested, we have effectively requested a total of 64 CPUs. The lines that follow load the openMPI module and launch an MPI-based executable. This script could be saved into a file named example.sbatch and then submitted to the cluster for execution with the command: sbatch example.sbatch 150 Chapter 6. Software Documentation RCC Documentation, Release 1.0 6.5.4 Slurm Array Job Submission Slurm job arrays offer a mechanism for submitting and managing collections of similar jobs quickly and easily. In general, array jobs are useful for applying the same processing routine to a collection of multiple input data files. Array jobs offer a very simple way to submit a large number of independent processing jobs. By submitting a single array job sbatch script, a specified number of “array-tasks” will be created based on this “master” sbatch script. An example array job script is given below array.sbatch: #!/bin/bash #SBATCH #SBATCH #SBATCH #SBATCH #SBATCH #SBATCH #SBATCH --job-name=arrayJob --output=arrayJob_%A_%a.out --error=arrayJob_%A_%a.err --array=1-16 --time=01:00:00 --partition=sandyb --ntasks=1 ###################### # Begin work section # ###################### # Print this sub-job’s task ID echo "My SLURM_ARRAY_TASK_ID: " $SLURM_ARRAY_TASK_ID # # # # # Do some work based on the SLURM_ARRAY_TASK_ID For example: ./my_process $SLURM_ARRAY_TASK_ID where my_process is you executable In the above example, The --array=1-16 option will cause 16 array-tasks (numbered 1, 2, ..., 16) to be spawned when this master job script is submitted. The “array-tasks” are simply copies of this master script that are automatically submitted to the scheduler on your behalf, however in each array-tasks an environment variable called SLURM_ARRAY_TASK_ID will be set to a unique value (in this example, the value will be in the range 1, 2, ..., 16). In your script, you can use this value to select, for example, a specific data file that each array-tasks will be responsible for processing. Array job indices can be specified in a number of ways. For example: A job array with index values between 0 and 31: #SBATCH --array=0-31 A job array with index values of 1, 2, 5, 19, 27: #SBATCH --array=1,2,5,19,27 A job array with index values between 1 and 7 with a step size of 2 (i.e. 1, 3, 5, 7): #SBATCH --array=1-7:2 The %A_%a construct in the output and error file names is used to generate unique output and error files based on the master job ID (%A) and the array-tasks ID (%a). In this fashion, each array-tasks will be able to write to its own output and error file. The remaining #SBATCH options are used to configure each array-tasks. All of the standard #SBATCH options are available here. In this example, we are requesting that each array-task be allocated 1 CPU core (--ntasks=1) in the sandyb partition (--partition=sandyb), and be allowed to run for up to 1 hour (--time=01:00:00). To 6.5. Scheduler 151 RCC Documentation, Release 1.0 be clear, the overall collection of 16 array-tasks will be allowed to take more than 1 hour to complete, but we have specified that each individual array-task will run for no more than 1 hour. The total number of array-tasks that are allowed to run in parallel will be governed by the QOS of the partition to which you are submitting. In most cases, this will limit users to a maximum of 64 concurrently running array-tasks. To achieve a higher throughput of array-tasks, see Parallel Batch Job Submission More information about Slurm array jobs can be found in the Slurm Array Job Documentation. 6.5.5 Cron-like Batch Job Submission RCC does not currently support cron on any nodes. The reason for this is because cron jobs are tied to a specific machine. However, the system configuration expects that no permanent user data should be stored on a login or compute node. This allows a compute node or login node to be replaced, reinstalled, updated, etc at any time with no impact to a user because there is no permanent data stored on a node. Supporting cron in this environment would likely lead to cron jobs being lost with no way to notify users. It is possible to use Slurm to schedule periodic jobs. It will not be possible to have jobs run on a strict schedule like what is possible with cron, but that is unlikely to be a problem. Note: Due to the possibility of abuse you must email help@rcc.uchicago.edu and request access to submit cron-like jobs. These jobs are subject to strict scheduling limits and will be monitored for abuse. An example submit file is cron.sbatch: #!/bin/bash # specify the time limit for the cron job #SBATCH --time=00:00:10 # use cron.log and append to it #SBATCH --output=cron.log #SBATCH --open-mode=append # the account, partition, and qos should not be changed #SBATCH --account=cron-account #SBATCH --partition=cron #SBATCH --qos=cron # Specify a valid cron string for the schedule # this is daily at 5:05 AM SCHEDULE=’5 5 * * *’ echo hello on $(hostname) at $(date) # resubmit this script with --begin set to the next scheduled cron time # next-cron-time is a script that parses a cron schedule string and returns # the next execution time sbatch --quiet --begin=$(next-cron-time "$SCHEDULE") cron.sbatch This script just echoes the hostname and date and importantly resubmits itself at the end of the script. We add the --begin with a specific cron schedule to determine the next time the job should start. These jobs use a separate partition, account, and qos, which should not be changed. The cron partition is made up of login nodes and only accepts cron jobs, so there should be little problems scheduling jobs at specific times. Cron jobs will persist until they are cancelled, have a problem resubmitting themselves, or job state for the entire cluster is lost. It would probably be wise to add mail notification to these scripts if necessary. It should be very rare for the job state of the cluster to be lost. 152 Chapter 6. Software Documentation CHAPTER SEVEN SOFTWARE MODULE LIST See also: Module Documentation This is an auto-generated list of software modules available on Midway. The version in bold is the default module version. Run module load <modulename> to load any of these modules in to your environment. Application abaqus afni amber amdappsdk amira ant antlr apbs armadillo artisynth autoconf automake bats beagle beast bino blastplus blender boost caffe ccfits cctools cdo cernlib Module Name abaqus/6.13 afni/latest amber/12 amber/14 amdappsdk/2.7 amira/5.4.5 amira/5.5.0 ant/1.8.4 antlr/2.7 apbs/1.3 apbs/1.4 armadillo/3.4 artisynth/2.8 artisynth/2.9 artisynth/3.1 autoconf/2.68 automake/1.13 bats/latest beagle/trunk beast/1.7 bino/1.4 blastplus/2.2 blender/2.68 boost/1.50 boost/1.51 boost/1.55+python-2.7-2014q1 caffe/git ccfits/2.4 cctools/3.6 cdo/1.5 cdo/1.6 cernlib/2006b Version 6.13 latest 12 14 2.7 5.4.5 5.5.0 1.8.4 2.7 1.3 1.4 3.4 2.8 2.9 3.1 2.68 1.13 latest trunk 1.7 1.4 2.2 2.68 1.50 1.51 1.55 git 2.4 3.6 1.5 1.6 2006b Compiler python-2.7-2014q1 Continued on next page 153 RCC Documentation, Release 1.0 Application cfitsio cgal CIBS clang clhep cluto cmake cnvgrib comsol condor coreutils cp2k cpmd csdp cuda cytoscape ddd ddt dicom3tools diderot disper doxygen emacs emu env espresso exabayes examl fermi-fssc ffmpeg fftw2 fftw3 154 Table 7.1 – continued from previous page Module Name Version cfitsio/3 3 cfitsio/3+intel-12.1 3 cgal/4.1 4.1 cgal/4.1+intel-12.1 4.1 CIBS/2013.0 2013.0 clang/trunk trunk clhep/2.2 2.2 cluto/2.1 2.1 cmake/2.8 2.8 cnvgrib/1.4 1.4 comsol/44 44 condor/7.8 7.8 coreutils/8.20 8.20 cp2k/2.4 2.4 cp2k/svn svn cpmd/3.15 3.15 csdp/dev+intel-12.1 dev cuda/4.2 4.2 cuda/5.0 5.0 cuda/5.5 5.5 cytoscape/2.8 2.8 ddd/3.3 3.3 ddt/4.0 4.0 dicom3tools/1 1 diderot/trunk trunk diderot/vis12 vis12 disper/0.3.0 0.3.0 doxygen/1.8 1.8 emacs/23.4 23.4 emacs/24 24 emu/cvs cvs env/rcc rcc espresso/5.1 5.1 espresso/5.1+intel-12.1 5.1 exabayes/1.3 1.3 examl/git git fermi-fssc/v9r31p1 v9r31p1 ffmpeg/0.11 0.11 ffmpeg/1.1 1.1 ffmpeg/2.1 2.1 fftw2/2.1.5 2.1.5 fftw2/2.1.5+intel-12.1 2.1.5 fftw2/2.1.5+intelmpi-4.0 2.1.5 fftw2/2.1.5+intelmpi-4.0+intel-12.1 2.1.5 fftw2/2.1.5+mvapich2-1.9 2.1.5 fftw2/2.1.5+openmpi-1.6 2.1.5 fftw2/2.1.5+openmpi-1.6+intel-12.1 2.1.5 fftw2/2.1.5+openmpi-1.6+pgi-2012 2.1.5 fftw2/2.1.5+pgi-2012 2.1.5 fftw3/3.3 3.3 fftw3/3.3+intel-12.1 3.3 Compiler intel-12.1 intel-12.1 intel-12.1 intel-12.1 intel-12.1 intelmpi-4.0 intelmpi-4.0+intel-12.1 mvapich2-1.9 openmpi-1.6 openmpi-1.6+intel-12.1 openmpi-1.6+pgi-2012 pgi-2012 intel-12.1 Continued on next page Chapter 7. Software Module List RCC Documentation, Release 1.0 Application fiji firefox freepascal freesurfer fsl gamess gaussian gcc gdal gdbm gedit geos gephi gflags ghc git glew globus glog g_mmpbsa gnuplot grace grads graph-tool graphviz grass grib_api gromacs Table 7.1 – continued from previous page Module Name Version fftw3/3.3+intel-13.1 3.3 fftw3/3.3+intelmpi-4.0 3.3 fftw3/3.3+intelmpi-4.0+intel-12.1 3.3 fftw3/3.3+intelmpi-4.1+intel-13.1 3.3 fftw3/3.3+mvapich2-1.9 3.3 fftw3/3.3+mvapich2-1.9+intel-12.1 3.3 fftw3/3.3+mvapich2-1.9+pgi-2012 3.3 fftw3/3.3+openmpi-1.6 3.3 fftw3/3.3+openmpi-1.6+intel-12.1 3.3 fftw3/3.3+openmpi-1.6+pgi-2012 3.3 fftw3/3.3+pgi-2012 3.3 fiji/1.47 1.47 firefox/esr esr freepascal/2.6 2.6 freesurfer/5.3 5.3 fsl/5.0 5.0 fsl/5.0.6 5.0.6 gamess/1May2012R1 1May2012R1 gaussian/09RevA.02 09RevA.02 gaussian/09RevB.01 09RevB.01 gcc/4.8 4.8 gdal/1.10 1.10 gdal/1.11 1.11 gdal/1.9 1.9 gdbm/1.8 1.8 gedit/2.28 2.28 geos/3.4 3.4 gephi/0.8 0.8 gflags/git git ghc/6.8 6.8 ghc/7.4 7.4 git/1.7 1.7 git/1.8 1.8 glew/1.7 1.7 glew/1.9 1.9 globus/5.2 5.2 glog/0.3 0.3 g_mmpbsa/1.0.0 1.0.0 gnuplot/4.4 4.4 gnuplot/4.6 4.6 grace/5.1 5.1 grads/1.8 1.8 grads/1.9 1.9 grads/2.0 2.0 graph-tool/2.2 2.2 graphviz/2.28 2.28 grass/6.4 6.4 grib_api/1.9 1.9 gromacs/4.5.5 4.5.5 gromacs/4.6-cuda+intel-12.1 4.6-cuda gromacs/4.6+intel-12.1 4.6 Compiler intel-13.1 intelmpi-4.0 intelmpi-4.0+intel-12.1 intelmpi-4.1+intel-13.1 mvapich2-1.9 mvapich2-1.9+intel-12.1 mvapich2-1.9+pgi-2012 openmpi-1.6 openmpi-1.6+intel-12.1 openmpi-1.6+pgi-2012 pgi-2012 intel-12.1 intel-12.1 Continued on next page 155 RCC Documentation, Release 1.0 Application gromacs-plumed gsl hadoop hadoop-rdma hdf5 healpix hoomd hpctoolkit hpcviewer idl ifrit intel intelmpi ipopt j3d jasper java jruby julia knitro lammps leveldb libassp libint libspatialite libtool mallet mathematica matlab Table 7.1 – continued from previous page Module Name Version gromacs-plumed/1.3+intelmpi-4.0+intel-12.1 1.3 gromacs-plumed/1.3+openmpi-1.6 1.3 gsl/1.15 1.15 hadoop/1.1.2 1.1.2 hadoop/2.4 2.4 hadoop-rdma/0.9 0.9 hdf5/1.8 1.8 healpix/2.20 2.20 healpix/3.11 3.11 hoomd/0.11.0 0.11.0 hoomd/0.11 0.11 hoomd/1.0 1.0 hpctoolkit/5.3 5.3 hpctoolkit/5.3+intel-12.1 5.3 hpctoolkit/5.3+intelmpi-4.0 5.3 hpctoolkit/5.3+mvapich2-1.8 5.3 hpctoolkit/5.3+openmpi-1.6 5.3 hpcviewer/5.3 5.3 idl/8.2 8.2 ifrit/3.4 3.4 intel/11.1 11.1 intel/12.1 12.1 intel/13.1 13.1 intel/14.0 14.0 intel/15.0 15.0 intelmpi/4.0 4.0 intelmpi/4.0+intel-12.1 4.0 intelmpi/4.1 4.1 intelmpi/4.1+intel-12.1 4.1 intelmpi/4.1+intel-13.1 4.1 intelmpi/4.1+intel-14.0 4.1 intelmpi/5.0+intel-15.0 5.0 ipopt/3.11 3.11 j3d/1.5 1.5 jasper/1.900 1.900 java/1.7 1.7 jruby/1.7 1.7 julia/0.3 0.3 julia/git git knitro/9.0.1-z 9.0.1-z lammps/trunk trunk leveldb/1 1 libassp/1 1 libint/1.1 1.1 libint/2.0 2.0 libspatialite/4.0 4.0 libtool/2.4 2.4 mallet/2.0 2.0 mathematica/8.0 8.0 mathematica/9.0 9.0 matlab/2011b 2011b Compiler intelmpi-4.0+intel-12.1 openmpi-1.6 intel-12.1 intelmpi-4.0 mvapich2-1.8 openmpi-1.6 intel-12.1 intel-12.1 intel-13.1 intel-14.0 intel-15.0 Continued on next page 156 Chapter 7. Software Module List RCC Documentation, Release 1.0 Application maven mercurial midway-hadoop migrate-n Minuit2 mkl mono mosh mpg123 mpi4py mpiblast mplayer mrbayes mumps mvapich2 namd ncl_ncarg nco ncview netcdf node octave openbabel Table 7.1 – continued from previous page Module Name Version matlab/2012a 2012a matlab/2012b 2012b matlab/2013b 2013b matlab/2014b 2014b maven/3.1 3.1 mercurial/2.5 2.5 mercurial/2.8 2.8 mercurial/3.1 3.1 midway-hadoop/2.0.0 2.0.0 midway-hadoop/2.0.0+R-3.0 2.0.0 migrate-n/3.6 3.6 Minuit2/5.34 5.34 Minuit2/5.34+intel-12.1 5.34 mkl/10.2 10.2 mkl/10.3 10.3 mkl/11.0 11.0 mkl/11.1 11.1 mkl/11.2 11.2 mono/2.10 2.10 mosh/1.2 1.2 mpg123/1.13 1.13 mpg123/1.14 1.14 mpi4py/1.3-2014q1+intelmpi-4.0 1.3-2014q1 mpi4py/1.3-2014q3+intelmpi-4.0 1.3-2014q3 mpi4py/1.3+intelmpi-4.0 1.3 mpiblast/trunk trunk mplayer/trunk trunk mrbayes/release release mumps/4.10 4.10 mvapich2/1.8+pgi-2012 1.8 mvapich2/1.9 1.9 mvapich2/1.9+intel-12.1 1.9 mvapich2/1.9+pgi-2012 1.9 mvapich2/2.0 2.0 mvapich2/2.0+intel-12.1 2.0 namd/2.10b1 2.10b1 namd/2.9-cuda 2.9-cuda namd/2.9 2.9 ncl_ncarg/6.1 6.1 nco/4.2 4.2 nco/4.3 4.3 nco/4.4 4.4 ncview/2.1.1 2.1.1 netcdf/3.6.3 3.6.3 netcdf/3.6.3+intel-12.1 3.6.3 netcdf/4.2 4.2 netcdf/4.2+intel-12.1 4.2 netcdf/4.3 4.3 node/0.10.29 0.10.29 octave/3.6 3.6 openbabel/2.3.1 2.3.1 Compiler R-3.0 intel-12.1 intelmpi-4.0 intelmpi-4.0 intelmpi-4.0 pgi-2012 intel-12.1 pgi-2012 intel-12.1 intel-12.1 intel-12.1 Continued on next page 157 RCC Documentation, Release 1.0 Application openblas opencv openfoam openmm openmpi papi parallel paraview pdtoolkit perl petsc pgi picrust pism postgresql povray praat prism proj proot protobuf protobuff pycelegans pymol pypy python qd qgis qiime qt Table 7.1 – continued from previous page Module Name Version openblas/0.2.6 0.2.6 opencv/2.4 2.4 openfoam/2.1 2.1 openfoam/pegged+openmpi-1.6 pegged openmm/6.0 6.0 openmm/6.1 6.1 openmpi/1.6 1.6 openmpi/1.6+intel-12.1 1.6 openmpi/1.6+pgi-2012 1.6 openmpi/1.8 1.8 openmpi/1.8+intel-12.1 1.8 papi/5.1 5.1 papi/5.3 5.3 parallel/latest latest paraview/3.14 3.14 pdtoolkit/3.18 3.18 perl/5.18 5.18 petsc/3.4+openmpi-1.6 3.4 pgi/2012 2012 pgi/2013 2013 picrust/1.0 1.0 pism/0.6 0.6 postgresql/9.2 9.2 postgresql/9.3 9.3 povray/3.6.1 3.6.1 praat/5.3 5.3 prism/4.0 4.0 proj/4.8 4.8 proot/current current protobuf/2.5 2.5 protobuff/2.4 2.4 protobuff/2.5 2.5 pycelegans/0.4 0.4 pycelegans/0.5 0.5 pycelegans/0.61 0.61 pymol/svn svn pypy/1.8 1.8 python/2.7-2013q4 2.7-2013q4 python/2.7-2014q1 2.7-2014q1 python/2.7-2014q2 2.7-2014q2 python/2.7-2014q3 2.7-2014q3 python/2.7 2.7 python/3.3 3.3 qd/2.3.14 2.3.14 qgis/1.8 1.8 qiime/1.6 1.6 qiime/1.7 1.7 qiime/1.8 1.8 qt/3.3 3.3 qt/4.7 4.7 qt/4.8 4.8 Compiler openmpi-1.6 intel-12.1 pgi-2012 intel-12.1 openmpi-1.6 Continued on next page 158 Chapter 7. Software Module List RCC Documentation, Release 1.0 Application qwt R rapsearch raxml ROOT rstudio ruby samba sdpa sdpa-dd sdpa-gmp sdpa-qd slurm smlnj snack sparsehash spatialindex spm stanford-nlp stata subversion SuiteSparse swift swift-conf tcllib teem texinfo texlive tklib tree treepl uclust Table 7.1 – continued from previous page Module Name Version qwt/6.0 6.0 R/2.15 2.15 R/2.15+intel-12.1 2.15 R/3.0 3.0 R/3.0+intel-12.1 3.0 rapsearch/2 2 raxml/trunk trunk ROOT/5.26 5.26 ROOT/5.34 5.34 ROOT/5.34+python-2.7-2014q1 5.34 rstudio/0.97 0.97 rstudio/0.98 0.98 ruby/2.1 2.1 samba/3.6 3.6 sdpa/7.3.8+intel-12.1 7.3.8 sdpa-dd/7.1.2+intel-12.1 7.1.2 sdpa-gmp/7.1.2+intel-12.1 7.1.2 sdpa-qd/7.1.2+intel-12.1 7.1.2 slurm/2.4 2.4 slurm/2.5 2.5 slurm/current current smlnj/110.74 110.74 smlnj/110.76 110.76 snack/2.2 2.2 sparsehash/2.0 2.0 spatialindex/1.8 1.8 spm/12 12 spm/8 8 stanford-nlp/3.3 3.3 stata/13-64 13-64 stata/13 13 subversion/1.6 1.6 subversion/1.7 1.7 subversion/1.8 1.8 subversion/keyring keyring SuiteSparse/4.0 4.0 SuiteSparse/4.0+intel-12.1 4.0 SuiteSparse/4.2 4.2 swift/0.94 0.94 swift/0.94.1 0.94.1 swift/0.95-RC1 0.95-RC1 swift/0.95-RC5 0.95-RC5 swift-conf/1.0 1.0 tcllib/1.15 1.15 teem/trunk trunk texinfo/4.13a 4.13a texlive/2012 2012 tklib/0.5 0.5 tree/1.6.0 1.6.0 treepl/git git uclust/1.2.22q 1.2.22q Compiler intel-12.1 intel-12.1 python-2.7-2014q1 intel-12.1 intel-12.1 intel-12.1 intel-12.1 intel-12.1 Continued on next page 159 RCC Documentation, Release 1.0 Application udunits unrar usearch valgrind valkyrie vim visit vmd vtk weka wgrib2 word2vec wxwidgets x264 yasm yaz yt zlib 160 Table 7.1 – continued from previous page Module Name Version udunits/2.1 2.1 unrar/5.0 5.0 usearch/6.1 6.1 valgrind/3.7 3.7 valgrind/3.8 3.8 valgrind/3.9 3.9 valkyrie/2.0 2.0 vim/7.3 7.3 vim/7.4 7.4 visit/2.6 2.6 visit/2.7 2.7 visit/2.8 2.8 visit/all all vmd/1.9 1.9 vmd/1.9.1 1.9.1 vtk/5.10 5.10 vtk/5.10+python-2.7-2014q1 5.10 vtk/5.8 5.8 weka/3.6 3.6 wgrib2/0.1 0.1 word2vec/trunk trunk wxwidgets/2.8 2.8 x264/stable stable yasm/1.2 1.2 yaz/5.4 5.4 yt/3.0 3.0 zlib/1.2 1.2 Compiler python-2.7-2014q1 Chapter 7. Software Module List CHAPTER EIGHT KICP The KICP has exclusive access to a number of compute nodes associated with the RCC Midway cluster. Most of the documentation available at http://docs.rcc.uchicago.edu and through the online help system rcchelp is applicable to KICP members and the KICP nodes, however there are some specific differences which are described below. Email sent to kicp@rcc.uchicago.edu will be assigned a trouble ticket and reviewed by Douglas Rudd and the RCC helpdesk. Please don’t hesitate to ask questions if you encounter any issues, or have any requests for software installation. The RCC helpdesk can be also reached by phone at 773-795-2667 during normal business hours. 8.1 Get an account Please complete the RCC User Account Request form to request an account and put KICP as the PI (note: this request will be reviewed for KICP membership). Please clearly state your connection to the KICP, particularly if you are not a local or senior member. If you are requesting access for someone not at the University of Chicago (i.e. someone who doesn’t have a cnetid), please contact Douglas Rudd directly. 8.2 Submit A Job As a shared resource, Midway uses a batch queueing system to allocate nodes to individuals and their jobs. Midway uses the Slurm batch queuing system, which is similar to the possibly more familiar PBS batch system. 8.2.1 Slurm This is a brief introduction to Slurm for KICP users. Additional documentation for Slurm on Midway can be found at Running Jobs. Useful Commands sbatch squeue -p kicp squeue -u $USER sinfo -p kicp scancel job_id scancel -u $USER Description Submit a job to the Slurm scheduling system. List the submitted and running jobs in the KICP partition. List the current users’ own submitted and running jobs. List the number of available and allocated KICP nodes. Cancel the job identified by the given job_id (e.g. 3636950). Cancel all jobs submitted by the current user There are many ways to submit a batch job, depending on what that job requires (number of processors, number of nodes, etc). Slurm will automatically start your job in the directory from which it was submitted. To submit a job, create a batch script, say my_job.sh and submit with the command sbatch my_job.sh. The following is a list of commonly used sbatch commands. A more complete list can be found in the sbatch man page. The following is a good example batch script: 161 RCC Documentation, Release 1.0 #!/bin/bash #SBATCH #SBATCH #SBATCH #SBATCH #SBATCH #SBATCH #SBATCH --job-name=my_job --output=my_job_%j.out --time=24:00:00 --partition=kicp --account=kicp --nodes=1 --exclusive echo $SLURM_JOB_ID starting execution ‘date‘ on ‘hostname‘ # load required modules (change to your requirements!) # example: module load openmpi/1.6 # uncomment below if your code uses openMP to properly set the number of threads # export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK # the commands to run your job # example: mpirun ./my_task # Note: slurm will automatically tell MPI how many tasks and tasks per node to use Each option to sbatch can be specified on the command line, or in the batch script on lines prefixed by #SBATCH. The following table is a description of commonly used sbatch options. sbatch options –jobname –output –nodes –ntasks –tasksper-node –cpusper-task – exclusive –partition –time Description A label for your job The file to store all output from your job (stdout and stderr by default) %j will be replaced by the job allocation number. The number of nodes to allocate to the job. The total number of tasks (processes). May be more or less than the number of cores on a node (16). The number of ranks to allocate per node (takes the place of -n in a typical mpirun call) A number of cpu’s allocated to each job task (mpi rank if using mpi) Useful when using a hybrid parallel code that uses both threading and MPI. The total number of cores requested is cpus-per-task*tasks-per-node*nodes Allocate exclusive access to node(s), preventing other users’ jobs from being scheduled, even if the number of allocated cores is less than the total per node. Only use if necessary, as it will prevent other users from running on unused cores. The partition (queue) to submit the job to. The wallclock time the job will need to run. A smaller number may allow your job to be scheduled sooner, but your job will be killed when the time reaches this limit. Partitions typically limit this number. 8.2.2 KICP Queues KICP has the access to the following queues (called a partition in Slurm): Partition kicp kicp-ht kicp-long Wallclock limit 48h 36h 100h Job limits 256 cores, 64 jobs per user 64 cores/job, 32 jobs per user 128 cores/queue, 64 cores/user kicp-long requires special authorization. 162 Chapter 8. KICP RCC Documentation, Release 1.0 If you are running jobs with significant I/O or communication between nodes (typically MPI jobs), then you should use the the tightly-coupled, infiniband nodes accessed through the kicp and kicp-long partitions. Purely serial or embarrassingly parallel jobs with a large calculation to I/O ratio (say MCMC likelihood sampling) should use the high-throughput nodes in the kicp-ht queue. The limits for kicp-ht were relaxed to encourage use. If users start to conflict, they may be restricted to prevent a single user from dominating those nodes. Midway also includes two large memory (256GB) and four GPU enabled nodes, as well as a significantly larger set of nodes that are shared with the rest of the University. Accessing these resources requires a separate allocation. Please contact Douglas Rudd for more details. 8.3 Storage You will have access to three different storage locations on Midway. Your home directory has a 10G quota, and should be used for small files and codes. KICP has a 50TB allocation in /project/kicp/, and each user is initially given a 1TB quota and their own subdirectory (/project/kicp/$USER). If you require more space, please let Douglas Rudd know and your quota may be increased on a case-by-case basis. Both home and project space are backed up hourly to disk, and daily to tape. Finally, there is a high-performance filesystem mounted on /scratch which should be used during runs and has a 5TB quota. A symlink to this directory is placed in your home at $HOME/midway-scratch. This directory is not backed up and should not be used for long-term storage. In future, files older than a to be determined age may be removed automatically, so please practice good data management. 8.3.1 Snapshots and Backups We all inadvertently delete or overwrite files from time to time. Snapshots are automated backups that are accessible through a separate path. Snapshots of a user’s home directory can be found in /snapshots/*/home/cnetid/ where the subdirectories refer to the frequency and time of the backup, e.g. daily-2012-10-04.06h15 or hourly-201210-09.11h00. 8.3. Storage 163 RCC Documentation, Release 1.0 164 Chapter 8. KICP INDEX Symbols –account ACCOUNT[,...] command line option, 26, 27 –exclusive command line option, 23 –gres=gpu:<N> command line option, 24 –mail-type=<type> command line option, 24 –mail-user=<user> command line option, 24 –mem-per-cpu=<MB> command line option, 23 –ntasks-per-node=<ntasks> command line option, 23 –partition PARTITION[,...] command line option, 26 –period PERIOD command line option, 26 –period PERIOD[,...] command line option, 26, 27 –qos=<qos> command line option, 24 –user USER[,...] command line option, 26 -A, –account=<account> command line option, 24 -C, –constraint=<list> command line option, 24 -J, –job-name=<jobname> command line option, 24 -N, –nodes=<nodes> command line option, 23 -all command line option, 26 -byuser -bypartition -byperiod -byjob command line option, 26 -c, –cpus-per-task=<ncpus> command line option, 23 -n, –ntasks=<ntasks> command line option, 23 -o, –output=<output pattern> command line option, 24 -p, –partition=<partition> command line option, 24 -t, –time=<time> command line option, 24 A Access RCC, 4 Applications Software Index, 101 B biopython bio.pdb, 94 C CESM, 101 command line option –account ACCOUNT[,...], 26, 27 –exclusive, 23 –gres=gpu:<N>, 24 –mail-type=<type>, 24 –mail-user=<user>, 24 –mem-per-cpu=<MB>, 23 –ntasks-per-node=<ntasks>, 23 –partition PARTITION[,...], 26 –period PERIOD, 26 –period PERIOD[,...], 26, 27 –qos=<qos>, 24 –user USER[,...], 26 -A, –account=<account>, 24 -C, –constraint=<list>, 24 -J, –job-name=<jobname>, 24 -N, –nodes=<nodes>, 23 -all, 26 -byuser -bypartition -byperiod -byjob, 26 -c, –cpus-per-task=<ncpus>, 23 -n, –ntasks=<ntasks>, 23 -o, –output=<output pattern>, 24 -p, –partition=<partition>, 24 -t, –time=<time>, 24 165 RCC Documentation, Release 1.0 CP2K, 102 D DDT, 104 Debugging Software Index, 103 E environment variable CPATH, 16 I_MPI_FABRICS, 138 LD_LIBRARY_PATH, 16 LIBRARY_PATH, 16 MANPATH, 16 PATH, 16 PKG_CONFIG_PATH, 16 SLURM_CPUS_PER_TASK, 24 SLURM_JOB_NODELIST, 23 SLURM_JOB_NUM_NODES, 23 SLURM_NTASKS, 24 SLURM_NTASKS_PER_NODE, 23 Environments Software Index, 124 F Libraries Software Index, 145 M Matlab, 136 Parallel, 134 MIC, 137 Modules Tutorial, 87 MPI, 137 mpi4py, 141 N NAMD, 103 NetCDF, 146 P Papi, 118 Parallel Hybrid MPI/OpenMP, 132 Matlab, 134 MPI, 137 Parallel Batch, 148 PDB, 94 Python, 142 FAQ R RCC, 35 FFTW, 145 GPU GPGPU, 129 R, 124 RCC Access, 4 FAQ, 35 Introduction, 1 User Guide, 7 H S HOOMD, 102 HPC Toolkit, 117 Hybrid MPI/OpenMP, 132 sbatch, 150 Scheduler Cron-like, 152 Interactive, 147 Parallel Batch, 148 sbatch, 150 sinteractive, 147 Slurm Array Job, 150 Software Index, 147 sinteractive, 147 Slurm Array Job, 150 Slurm Tutorial, 93 Software Index, 100 Index, Applications, 101 Index, Debugging, 103 Index, Environments, 124 Index, Libraries, 145 Index, Scheduler, 147 Stata, 143 G I Index Applications Software, 101 Debugging Software, 103 Environments Software, 124 Libraries Software, 145 Scheduler Software, 147 Software, 100 Tutorial, 46 Introduction RCC, 1 Introduction to RCC for CHEM 268, 65 Introduction to RCC for UChicago Courses, 59 L LAMMPS, 103 166 Index RCC Documentation, Release 1.0 T Tutorial Index, 46 Modules, 87 Slurm, 93 U User Guide RCC, 7 V Valgrind, 120 W Workshops Introduction to Crerar Computer Lab, 80 Introduction to RCC, 47, 72, 79 Python, 93 Index 167