How to make computers work for J. Lakshmi Advanced User Education Programme
Transcription
How to make computers work for J. Lakshmi Advanced User Education Programme
How to make computers work for you when you are enjoying life. J. Lakshmi 3/16/2009 Advanced User Education Programme SERC, IISc. 1 Agenda • Conceptual introduction to batch computing. • Different batch schedulers available in SERC. • Queue configuration and job-submission information for different schedulers. • Generic guidelines while using batch schedulers. • Questions! 3/16/2009 Advanced User Education Programme SERC,IISc. 2 Batch Computing – Introduction • What is Batch Computing? – A set (a "batch") of commands or jobs, typically as a file, submitted to a system which then executes them and returns the results, all without human intervention. This contrasts with an interactive system where the user's commands and the computer's responses are interleaved. • When to use? – Tested programs or codes that need to be run multiple times, with different data and have execution times greater than one hour. 3/16/2009 Advanced User Education Programme SERC,IISc. 3 Need for Batch Computing • On a single system or distributed set of machines, resources are limited. • On shared resources, submitting many jobs exhausts the limited set of system resources which is reflected in terms of variable job execution times. In case of parallel programs, it can also cause errors in execution. • Batch computing allows for controlled and balanced use of system resources leading to deterministic job execution times and throughput of the system. 3/16/2009 Advanced User Education Programme SERC,IISc. 4 Generic Architecture of batch schedulers. Job Scheduler Batch Scheduler Job Queues Batch Server Submission Client Job script Execution node 3/16/2009 Advanced User Education Programme SERC,IISc. Execution node Execution node 5 Batch Schedulers at SERC. • LoadLeveler from IBM. • PBSPro from Altair Inc. • LSF from Platform Computing Inc. 3/16/2009 Advanced User Education Programme SERC,IISc. 6 LoadLeveller@SERC • LoadLeveler (LL) is the batch scheduler from IBM. • LL manages both serial and parallel jobs over a cluster of servers which consists of a pool of machines or servers, often referred to as a LL cluster. • Jobs are allocated to machines in the cluster by a scheduler and the allocation of the jobs depends on the availability of resources within the cluster and various rules defined by the LL administrator. • A user submits a job using a job command file which contains details of the executable, it dependencies and LL directives. 3/16/2009 Advanced User Education Programme SERC,IISc. 7 LL@SERC • LL is installed on almost all IBM servers and parallel machines hosted in SERC, which are, – P690 or IBM-Regatta machines – P575 machines – P720 (256 node) IBM Linux cluster – IBM Blue-Gene/L – IBM –SP3 3/16/2009 Advanced User Education Programme SERC,IISc. 8 Sample LL job submission file. #!/bin/sh # @ error = job1.$(Host).$(Cluster).$(Process).err # @ output = job1.$(Host).$(Cluster).$(Process).out # @ input = inputfile # @ class = p5task4 # @ job_type = parallel # @ tasks_per_node =4 # @ notification = always # @ requirements = (Arch == "R6000") && (OpSys == "AIX53") # @ executable = /usr/bin/poe # @ arguments = executable # @ queue 3/16/2009 Advanced User Education Programme SERC,IISc. 9 Useful LL commands • llq - Queries information about jobs in the LoadLeveler queues • llcancel <jobid> - Cancels one or more jobs from the LoadLeveler queue. • llclass - Returns information about classes • llsubmit - Submits a job to LoadLeveler • llstatus - Returns status information about machines in the LoadLeveler 3/16/2009 Advanced User Education Programme SERC,IISc. 10 LL@P690&P575 • There are four logical P690 and two P575 machines that are controlled by a single LL manager. All machines host the AIX OS. • Three of the P690 (regatta1/2/3) accept parallel jobs and one (regatta4) is for interactive use. Both P575 machines accept parallel jobs. • The machine regatta4 is the submission host for this cluster. • Jobs on this cluster are restricted by job _time. • Queue information for this cluster is: Class Wall_clock_limit Max Processor p5task4 4:00:00 4 p5task8 16:00:00 8 p5gtask16 32:00:00 16 For Gaussian p5task16 32:00:00 16 3/16/2009 Advanced User Education Programme SERC,IISc. 11 LL@P720 Cluster • P720 is a linux cluster and accepts only parallel jobs. • Jobs are controlled using one LL manager for this cluster. • Queue information on this cluster is, Class Wall_clock_limit Max Processor TotTasks ptask32 02:00:00 32 32 ptask128 1+08:00:00 128 200 ptask64 2+16:00:00 64 (A total of 200 mpi-tasks are shared between ptask128 and ptask64) 3/16/2009 Advanced User Education Programme SERC,IISc. 12 LL@BlueGene/L • Each node on BlueGene consists of two processors and LL can allocate these in two different ways: – VN mode – both processors are allocated for computation. (beneficial for compute intensive jobs) – CO mode – one processor is allocated for computation and another for communication. (beneficial for compute and communication intensive jobs) • On BlueGene the LL queues are divided into two blocks, namely – Big Block – Default processor allocation is VN mode – Small Block – Default processor allocation is CO mode but supports VN mode too. 3/16/2009 Advanced User Education Programme SERC,IISc. 13 LLQueues on BlueGene/L Queue Wall_clock_limit No.of jobs No. of Nodes No. of MPI Tasks Allowed Modes == ==================================================================== pnode 32 4:00:00 2 32 32 or 64 Both CO and VN pnode32-24h 24:00:00 2 32 32 or 64 Both CO and VN pnode128 16:00:00 2 128 128 or 256 Both CO and VN pnode128-24h 24:00:00 1 128 128 or 256 Both CO and VN pnode512 48:00:00 1 512 512 or 1024 Both CO and VN pnode1024 120:00:00 4 512 1024 Only VN pnode2048 60:00:00 2 1024 2048 Only VN pnode4096 48:00:00 1 2048 4096 Only VN ======================================================================= Small Block includes: pnode32, pnode32-24h,pnode128,pnode128-24h and pnode512. Small block will have 2 midplanes. Supports both Co and VN mode Big Block includes: pnode1024, pnode2048 and pnode4096. Big block will have six midplanes. Supports only VN mode. 3/16/2009 Advanced User Education Programme SERC,IISc. 14 PBSPro@SERC • PBSPro is the commercial version of OpenPBS/torque, initially developed at NASA labs, now sold by Altair. • It is a flexible workload manager that can schedule different jobs for different users on a set of distributed heterogeneous systems. • Has capabilities to define system/user/software specific controls on jobs. • Currently we are running PBSPro version 8.0.0. 3/16/2009 Advanced User Education Programme SERC,IISc. 15 PBSPro@SERC • Available on all Linux based systems from SUN, HP and SGI. • Each PBSPro cluster typically manages a homogeneous set of machines. • Four clusters available at SERC, namely – altix – altix350-1 – altix350-2 – hplx 3/16/2009 Advanced User Education Programme SERC,IISc. 16 PBSPro@altix • Consists of a single 32 CPU, SMP machine with hostname altix. • Supports only 16 CPU parallel jobs. • Jobs restricted by per processor CPU-time, number of jobs in execution and number of jobs per user. • Automatic job routing based on job script parameters. • Queue parameters: Queue Memory CPUTime Walltime Node Run Que Lm State ---------------- ------ -------- -------- ---- ----- ----- ---- ----qp100 ----2 12 2 E R route_q ----0 0 -- E R ----- ----2 12 3/16/2009 Advanced User Education Programme SERC,IISc. 17 PBS queues on altix Queue qp100 queue_type = Execution Priority = 250 max_queuable = 50 total_jobs = 14 state_count = Transit:0 Queued:12 Held:0 Waiting:0 Running:2 Exiting:0 Begun:0 max_running = 2 from_route_only = True resources_max.ncpus = 16 resources_max.pcput = 100:00:00 resources_min.ncpus = 16 resources_assigned.ncpus = 32 resources_assigned.nodect = 2 max_user_run = 1 enabled = True started = True Queue route_q queue_type = Route total_jobs = 0 state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0 Begun:0 route_destinations = qp100 enabled = True started = True 3/16/2009 Advanced User Education Programme SERC,IISc. 18 Sample PBSPro job submission script file #!/bin/sh #PBS -l ncpus=8 #PBS -l pcput=60:00:00 ./job1 3/16/2009 Advanced User Education Programme SERC,IISc. 19 Useful PBS commands • Job submission - qsub <job_script> 10374.altix • Submitted job status – qstat <job_name> or qstat –a • Kill a running job – qdel <job_name> • Detailed job status – tracejob <job_name> (this command will work correctly only if executed on the node where PBS server is running.) • Current queue status – qstat –q • Complete details of your running job – qstat –f <job_name> 3/16/2009 Advanced User Education Programme SERC,IISc. 20 Common errors with PBS job scripts • minncpus < ncpus < maxncpus • pcput > maxpcput – Job will not get submitted • All PBS directives described in the user guide may not work for an installation. This depends on the configuration. If you want to use something specific please check with your system administrator. 3/16/2009 Advanced User Education Programme SERC,IISc. 21 PBSPro@altix350-1 • • • • • Consists of a single 16CPU, SMP machine with hostname altix350-1. Supports serial and 4/8 CPU parallel jobs. Jobs restricted by per processor CPU-time, total job CPU-time, number of jobs in execution and number of jobs per user. Automatic job routing based on job script parameters. Queue parameters: Queue Memory CPUTime Walltime Node Run Que Lm State ---------------- ------ -------- -------- ---- ----- ----- ---- ----route_q ----0 0 -- E R qp_4_32 -128:00:0 --2 32 2 ER qp_4_64 -256:00:0 --1 23 2 ER qp_8_32 -256:00:0 --0 0 1 ER qs_32 -32:00:0 --3 7 4 ER ----- ----6 62 • Queue specific details can be found by executing the command qmgr qmgr> list queue qp_4_32 3/16/2009 Advanced User Education Programme SERC,IISc. 22 PBSPro@altix350-2 • • • • • Consists of a single 16CPU, SMP machine with hostname altix350-2. Supports 8 CPU parallel jobs. Jobs restricted by per processor CPU-time, total job CPU-time, number of jobs in execution and number of jobs per user. Automatic job routing based on job script parameters. Queue parameters: Queue Memory CPUTime Walltime Node Run Que Lm State ---------------- ------------- -------- -------- ----- ---- ----route_q ----0 0 -- E R qp_8_64 -512:00:0 --0 1 2 ER qp_8_100 -800:00:0 --2 34 2 ER ----- ----2 35 • Queue specific details can be found by executing the command qmgr qmgr> list queue qp_8_64 3/16/2009 Advanced User Education Programme SERC,IISc. 23 PBSPro@hplx&sunlx • • • • • • Consists of 18 nodes with 10 hplx and 8 sunlx machines. All machines loaded with 64-bit linux OS. Server and Scheduler for this cluster is hplx1_2. Currently undergoing reconfiguration. For details contact Mr. Chandrappa <chandru@serc.iisc.ernet.in> Supports only serial jobs. Jobs restricted by per processor CPU-time, total job CPU-time, number of jobs in execution and number of jobs per user. Automatic job routing based on job script parameters. Queue parameters: Queue Memory CPUTime Walltime Node Run Que Lm State ---------------- ----------- ---------- ------- ---- ----- ----- ---- ---- ------qh64 -64:00:00 --2 0 24 D R qh16 -16:00:00 --0 0 24 D R route ----0 0 --- E R qh8 -08:00:00 --0 0 24 E R qh256 -256:00:0 --2 0 24 D R qh32 -32:00:00 --0 0 24 D R ----- ----4 0 Queue specific details can be found by executing the command qmgr qmgr> list queue qh64 3/16/2009 Advanced User Education Programme SERC,IISc. 24 LSF@SERC • Load Sharing Facility (or simply LSF) is a commercial computer software job scheduler sold by Platform Computing. It can be used to execute batch jobs on networked Unix and Windows systems on many different architectures. • LSF version 4.1 is currently installed on the Compaq ES40 machines (commonly known as alpha servers). • In LSF there is no concept of job script. You can create a shell script that contains details of your executable and its dependencies and submit this as a job to LSF. • You can also use the various job submission options to specify the executable dependencies. 3/16/2009 Advanced User Education Programme SERC,IISc. 25 LSF@alphas4 • The alpha server cluster consists of 5 ES40 servers, each with 4 CPUs. • The cluster allows only serial jobs and has alphas4 as the submission host. All other machines are execution nodes. • Queue configuration for the cluster: QNAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP 8hr 10 Open:Active 4 1 - 0 0 0 0 64hr 6 Open:Active 4 1 - 0 0 0 0 32hr 6 Open:Active 4 1 - 0 0 0 0 16hr 6 Open:Active 4 1 - 0 0 0 0 128hr 4 Open:Active 4 1 - 0 0 0 0 256hr 2 Open:Active 4 1 - 0 0 0 0 g98_q 1 Open:Active 1 1 - 0 0 0 0 unlimited 1 Open:Active 4 1 - 0 0 0 0 • Queue specific details can be found by executing the command bqueues –l <queue_name> 3/16/2009 Advanced User Education Programme SERC,IISc. 26 Commonly used LSF commands • bhosts - gives status of the current nodes in the cluster. • bsub <job_name> - command to submit you executable to LSF. • bqueues – gives details on configured queues. • bjobs - gives details on the status of you jobs submitted to LSF. • bkill <job_id> - kills a submitted job • xbsub – is the X based GUI for LSF job submission. 3/16/2009 Advanced User Education Programme SERC,IISc. 27 Using Batch Schedulers-General Guidelines • Choosing appropriate machine. – The physical resources on the machine should meet the resource requirement of your job. Take some time and understand the profiling of your job and resources it requires. • Choosing appropriate job-queues. – Most queues are configured to restrict the CPU time or the job time. Ensure that the queue time is greater than your required time. – Write your jobs such that it stores intermediate results that can be used to restart the job in case of failure or termination. • Managing job specific I/O files. – Use common file systems like hpcscratch or utemp while reading or writing files that are used by your batch job. – Give complete file paths and use the file I/O programming APIs to do I/O. – Avoid using the shell I/O redirection, particularly on the distributed clusters. Your job may fail since file staging is not implemented or configured in any of the batch schedulers at SERC. – Include temporary /input file cleanup and output file movement commands as part of your job submission scripts. 3/16/2009 Advanced User Education Programme SERC,IISc. 28 When in doubt go to http://www.serc.iisc.ernet.in/ComputingFacilities/software/ software.htm Thankyou! ANY QUESTIONS? 3/16/2009 Advanced User Education Programme SERC,IISc. 29