Operational GARUDA Architecture
Transcription
Operational GARUDA Architecture
Operational GARUDA Architecture Version 1.0 Jan 2010 Centre for Development of Advanced Computing Knowledge Park, Bangalore Operational Garuda Architecture OPERATIONAL GARUDA ARCHITECTURE 1. Introduction GARUDA is a SOA based cyber-infrastructure supporting collaboration of science researchers and experimenters on a nation wide grid of computational nodes and mass storage. It connects a wide variety of resources, services and users to provide stable, robust and efficient grid environment with guaranteed QoS for a wider variety of uses. The Department of Information Technology (DIT), Government of India has funded the Centre for Development of Advanced Computing (C-DAC) to deploy the nationwide computational grid ‘GARUDA’ spans across 17 cities and 45 institutions with an aim to bring Distributed/Grid networked infrastructure to Academic labs, research labs and industries in India. This document is focused on describing the top level of GARUDA's architecture that spans across different critical pillars of grid namely Security, Data management, Resources and services monitoring and Job Management. GARUDA resources are integrated through Service Oriented Architecture in that resources provide a "service" that is defined in terms of interface and operation. Functions such as single-signon, remote job submission, workflow support for complex applications, data movement tools etc. are conceived, (will be) developed from the end-user perspective and offered as a service. Distributed accounting and user account management software, verification and validation software, programming and monitoring tools, tools to enforce the adherence and capturing violations of SLAs, Operation and administrative support tools like remote installer, request tracker are also (will be) developed by GARUDA. 2. Interoperability and Standards It is important that GARUDA Grid's systems and service interfaces are designed and operated in such a way that they can interoperate as smoothly as possible with the other elements of our users' overall IT landscape. Although there is a huge diversity in external systems, which we have little or no control over, we believe that standards, both formal and informal, offer a good likelihood of making our user's work easier as they integrate GARUDA systems into their overall workflow. Consequently, we need to place a high premium on compliance with community standards wherever it is possible. This plays out most clearly in the design of GARUDA architecture and defining service interfaces. Grid community standards, in the area of Remote Login and Computation, Data Movement and Management, Science Workflow and Scientific Visualization Support and Application Development and Runtime Support makes it significantly easier for Garuda’s virtual organizations, and other individual users to add the use of GARUDA resources to their existing suite of systems. Sridharan R, CDAC KP, Bangalore Ver.1.0 2 Operational Garuda Architecture One area where we have identified an opportunity for significant improvements is in regard to user identity certification and credential management. Identity certification is achieved by setting up and managing the IGCA, accredited by APgridPMA for Grid Authentication. X.509 PKI based Security Credential management is done by deploying MyProxy, a popular open source based tool whereas managing the authorization data within multi-institutional collaborations is achieved by VOMS. Standardization in the area of Remote computation and Data related activities is achieved by adhering to the community standards proposed and provided by the Globus Toolkit 4.X version. All the components of GT4 like MDS, SOAP, gsiftp WSRF follows its separate standards. For detailed information on these refer document on GT4 standards. 3. GARUDA Grid system Architecture GARUDA's resources are key to achieving its objective of providing stable and robust grid environment for wider variety of uses. They are not sufficient to attain other objectives namely efficient and guaranteed QoS. Progress in these areas relies on two additional architectural elements: (1) a set of core system components that provide system-wide services and (2) a set of common interface definitions that resources or services may implement in order to provide users with familiar and consistent interfaces on which to build applications and infrastructure extensions. This architecture is conceived based on service oriented approach, means all the support extended / offered by Garuda to the user community through Garuda services(GS), having a well thought and unambiguous interfaces. Both internal users using for maintaining and managing the Garuda resources and external users, namely application developers, want to access the resource can do so through writing clients to these GS only. 3.1. GARUDA Grid Core System Components Core system components of Garuda include Network, Resources, Federated Information Service, security with Authentication and Authorization service, Job Management which comprises data movement, scheduling, reservation and accounting. It also consists of access mechanisms like access portal which primarily acts as a GUI interface for the core systems, Workflow tools and Problem Solving Environments (PSE). Entire system need to have a proper procedure for maintenance and packaging to ensure smooth operation of Garuda. The above mentioned components have been captured in a layered diagram as shown in Figure 1. In the following sections some of these components are explained in detail and a separate section is dedicated for the description of fault tolerance and failover process in Garuda. Sridharan R, CDAC KP, Bangalore Ver.1.0 3 Operational Garuda Architecture Grid-Enabled Applications Grid PSE Visualization Workflow tool Access Portal Job Scheduler Data Packaging Grid Programming & Development Environment -MpichG2 -Gridhra -compiler Service Federated Information WSRF +GT4 + [(login, Accounting) +other services] Virtualization support Grid Security and High-Performance Grid Networking NKN CDAC Resource centers Non-Research Organizations Research Organizations Educational institutions Computing Centers Computing Resources and Virtual Organizations Figure 1: Highlevel System Components 3.2. Network Network of Garuda grid is entirely depend National Knowledge Network (NKN), the facility built by Govt of India, The design philosophy of this n/w is to build a scalable network which can expand both in terms of accessibility and speed. This should act as a common Network Backbone like national highway, wherein different categories of users shall be supported. Figure 2: Logical NKN Architecture Sridharan R, CDAC KP, Bangalore Ver.1.0 4 Operational Garuda Architecture Main objective of this n/w is to build a national highway to enable differentiates to leverage the common infrastructure. The CORE in this N/W will have 2.5/10 Gpbs and EDGES have 100Mpbs / 1Gpbs. Figure 2, shows the logical architecture of the NKN backbone links, Routers and last mile connectivity to the NKN participating organizations. The last mile connectivity to the participating agencies varies from 1 Gigabits/sec, 100 Megabits/sec to 10 Megabits/sec. Some of the features of NKN include High capacity and Scalable backbone, highly reliable and available, supporting strict QoS and Security, having wide geographical coverage and having common standard platform. Figure 3: NKN-GARUDA connectivity for Participating Organization Figure 3 explains the Very last mile connectivity devices to the NKN router. The GARUDA Virtual mechanism, which enabled in the NKN routers organizations, allows the GARUDA participating address scheme, which was used in the earlier continued in the on going GAURDA project. 3.3. between the GARUDA network Routing and Forwarding (VRF) for the GARUDA participating institutes to use the same IP GARUDA project phases to be Resources Essentially Garuda grid is formed by pooling the compute and storage resources and special devices like telescope provided by CDAC and it partners. By computing resource we mean a set of computers combine to form a cluster. Every cluster will have a Head node with many Computing Nodes attached to it. Every centre, be it is of partners’ or cdac will have one Gateway which acts as entry point. There will be one or more access terminals through which user will Sridharan R, CDAC KP, Bangalore Ver.1.0 5 Operational Garuda Architecture access the Garuda resources. Optionally every participating institution may deploy firewall protection. Access Terminal G Gridfs Access Terminal Internet Access LAN Local User H C-DAC, Bangalor Partner without resource s Compute Nodes MPLS Access G User Access Terminal LAN Access Terminal LAN Partner with resources G H Storage Legend Compute Nodes H Head Node G Gateway Telescope Figure 4: GARUDA SOA based deployment Software List on each Head Node SF1: • Globus 4.0.7 & above and its Dependent Software • Ganglia (Gmetad) • MyProxy Clients • GridWay (Integrated with Globus) • • PBS or LoadLeveler Server & Sched (Integrated with Globus) Plus or Maui (Reservation s/f) Software List on Compute Node SF2: • PBS_mom or LoadLeveler starter • Ganglia (Gmond) Operating Systems supported in Garuda includes various flavors of Linux and Aix. Some of the Local Resource Managers (LRM) running in the head nodes is Torque and LSF on Linux cluster where as Loadleveler on Aix cluster. Every cluster will have certain amount of scratch space so that application can run using that Sridharan R, CDAC KP, Bangalore Ver.1.0 6 Operational Garuda Architecture scratch and some storage space for temporarily storing the input and out. There should be dedicated storages to handle voluminous data to support larger retention time. 3.4. Garuda information service Garuda Information System (GIS), helps in keeping track of all the information about the distributed Garuda resources, plays a major role in resource allocation, monitoring and management. GIS needs to gather, collate and publish both static and dynamic information. Some of static information is operating systems, processors, storage devices, network devices where as dynamic information includes CPU load, file systems, job queues etc. 3.4.1. GIS Vision: • • • • Create a coordinated way for Garuda resource to publish about the services they offer, Device a method for Garuda to aggregate and index the information from all the resource including those of the partners, and To publish the collated information to the user in a form that can easily be accessed by user software, user interfaces, and Garuda service providers themselves to discover Garuda’s capabilities and how to access them The publication of the collated information should follow a proper interface so that all the grid applications, monitoring and discovery tools, Garuda services and grid meta-scheduler(s) can fetch the same information for decision making. GIS uses the Globus MDS4 with GLUE schema 2.0 as a base component. MDS4 has two higher-level services, an Index Service, which collects and publishes aggregated information as WSRF resources properties, and a Trigger Service, which collects resource information and performs actions when certain conditions are triggered. There will be set of information providers that collect and format resource information, for use by an aggregator source or by a WSRF service when creating resource properties. 3.4.2. Architecture: GIS introduces aggregation layer called Regional Information Service (RIS). Multiple levels of RIS can be built hierarchically based on proximity of resources in a geographical region. One among the Garuda cluster Head nodes in that region acts also as a RIS. Prime activity of RIS is indexing the information from different head nodes in that region at the lowest level, where in other level RIS will index the information from the immediate lower level RIS. Sridharan R, CDAC KP, Bangalore Ver.1.0 7 Operational Garuda Architecture Head Node & RIS(…….) Head Node & RIS (II) Head Node Centralized Information Server Head Node & RIS(II) Head Node & RIS Head Node Head Node & RIS Head Node Head Node Figure 5: GIS Architecture Each head node will have an Index Service running, which will collect and index the information provided by Ganglia and Local resource managers (LRM). The information will be published to the respective RIS. Index Service running in the RIS will publish the information to a higher level until it reaches the centralized Information Index Server, which provides a Garuda grid wide view of aggregated resource information. Refer figure 5. 3.4.3. High level components WS/ HTTP Apache 2.0 Clients PostgreSQL Repositories of Garuda grid-wide information Cache Tomcat WebMDS WS MDS4 WS/ SOAP Clients Figure 6: High –Level Components of GIS 3.4.4. Deployment Information provided by Ganglia includes; basic host data (name, ID), memory size, OS name and version, file system data, processor load data, and other basic cluster data from and Information provided by LRM includes; queue information, number of CPUs available and free, job count information Sridharan R, CDAC KP, Bangalore Ver.1.0 8 Operational Garuda Architecture etc. Also, custom information providers can be added at head node level to publish the information about installed and available software, tools, libraries and their licenses. Sample deployment of Garuda Information System based on the new architecture is captured in figure 7. IGIB Delhi JNU Delhi IIT Karagpur IIT Gauawti IIT Delhi GARUDA IS backup C-DAC Pune GARUDA Information Server Applications & Tools C-DAC KP CDAC Chennai IISC Bangalore IMSC Chennai RRI Bangalore C-DAC Hyderabad Figure 7: Sample Deployment of GIS To realize this architecture the MDS4 must be configured on individual cluster Head Node for collecting and indexing information from, Ganglia, LRMS and custom information providers. Each head node will publish their information to a pre-selected head node, by accepting it as the regional information index server. These regional information index servers in turn publish the complete resource information in that region to the central grid level information index server. 3.4.5. Advantages • • • With this GIS architecture, the GARUDA meta-scheduler can fetch the resource information directly from respective RIS avoiding the centralized information server. This facilitates the high availability of the resource information to the meta-scheduler, such that if a particular region or information server in a region goes down then the meta scheduler can carry out the work with remaining regions. Upon the required match is not met within the RIS, scheduler can look in the Centralized Information Server. Applications and tools are provided with a centralized information server for gathering information about the GARUDA Grid. Sridharan R, CDAC KP, Bangalore Ver.1.0 9 Operational Garuda Architecture 3.5. Security Major security component of Garuda are VOMS for virtual organization management and My Proxy for certificate management, where as Certificate is awarded by IGCA. Clients accessing Garuda through any of the interfaces like portal, Command line, PSE or workflow need to have a PKI X.509 certificate. Most of the resources under garuda is protected by site level Firewall rules. Even though Indiviaudal resource site may follow customised rules, they must adhere to certain common rules specific to operations of Garuda. This includes rules pertainng to opening of specfic, set of ports related to middleware and monitoring. Other rules related to applications can be decided on a case to case basis. Information Service VO, Certificate Management Service Resources Info MyProxy VOMS Proxy Certificate with VO Proxy Certificate with VO CP/CPS Get VOMS Attribute IGCA Put ProxyCertificate with VO ssh + voms-myproxy-init Meta scheduler Certificate Management Server User Certificate Private key CA Service Proxy Certificate with VO Proxy Certificate with VO voms-myproxy-init log-in Client Environment Proxy Certificate with VO credentials Portal PSE Work Flow gsi-ssh client Request/Get Certificate Signed Job Description Figure : 8 Garuda secuirty Components GAruda security components and its interactions are captured in the figure 8. More details about the authentication and autherization service with explanation of policy taxonomy, PDP and PEP are discussed in detail in the comming sections. 3.5.1. Certificate Award management Sridharan R, CDAC KP, Bangalore Ver.1.0 10 Operational Garuda Architecture Garuda entirely depends on Indian Grid Certification Authority (IGCA) for the management of entrie process of identifying and verfying the user crediential and providing / revoking certificates. An accredited member of the APgridPMA (Asia Pacific Grid Policy Management Authority) for Grid Authentication, IGCA provides X.509 certificates to support the secure environment in grid computing. This PKI certificate is the basic component without which no operations are permitted with in Garuda grid. Every member and resources, want to be a part of Garuda grid need to adhere to the rules and regulations of IGCA. For detailed description of IGCA and it activities including certificate policy, certification practice statement and end-entity certificates refer http://ca.garudaindia.in 3.5.2. Authentication/Authorization Services A primary purpose of authentication (verifying identity) and authorization (granting permission to access resources based on identity) is to implement the policies that organizations have created to govern and manage the use of computing resources. Goal of Garuda grid is to create a scalable infrastructure that leverages local identity (AuthN) while managing access to shared resources (AuthZ) across Garuda grid. Authentication (AuthN) is the process in which user authentication credentials are evaluated and verified as being trusted, or from a trusted source. Examples of credentials include password, public-key certificate, photo ID, fingerprint, or a biometric. Garuda grid’s principles for AuthN are: • • Grid user authentication should leverage existing local identity management processes. AuthN to various applications should be transparent to a user, seamlessly integrating with the existing local infrastructure and userenvironment. True to the heterogeneous nature, partners participating in Garuda grid do not necessarily implement local authentication with the same mechanism. Kerberos, LDAP (Lightweight Directory Access Protocol), password databases, and even PKI all can be used as mechanisms to establish local identity. On Garuda, the Globus security component, which uses PKI, has been deployed. The Globus Gridmap file is used during grid-to local authentication translation and vice versa and informs the grid resource that the user’s grididentity certificate has been verified. Authorization(AuthZ) can also refer to the issuing of a token that proves a user has the right to access resources, or permission that is granted to access the object, or to the token itself (e.g.,a signed assertion). AuthZ is used on grids to enforce conditions of use on a grid resource as specified by the resource owner. Sridharan R, CDAC KP, Bangalore Ver.1.0 11 Operational Garuda Architecture In Garuda, the Gridmap file can be viewed as the mechanism for AuthZ. Authentication occurs when the users local identity is expressed as a certificate that is understandable within the grid PKI infrastructure. On the resource side, authentication is completed when this certificate is verified. From that point on, use of the certificate to obtain access to grid resources can be considered authorization. 3.5.3. Access policy Taxonomy Any physical users of the grid need to have AuthN-ID, Distinguished Name (DN) and user name. Once proper mapping is done with local user, grid user becomes a logical user having valid access ID for that cluster. Users are grouped logically to form a user group with the special attribute and Roles. Similarly Physical Resources will have Physical File, URL and Fully Qualified Name (FQN), which is further divided as Logical resource having Logical file name and unified resource Name, which is visible across the Garuda. “Physical” User, AuthN-ID, DN, Username “Logical” User, Access-ID “Logical” Resource, LFile, URN User Group, Attribute, “Role” Puser/Luser/UGroup/Role “Operation / Action | Op| Perm| Rgroup/LRsrc/PRsrc Resource Group, Classification Permission Permit | Deny | Not applicable Physical Resource, Filename, URL, FQN Meta-Data integrated with Access Policy Figure 9: Access policy Taxonomy Meta-data with access policy is maintained for any Physical Resources with its corresponding logical resource and Resource Group. Garuda user (physical and logical), being a member of atleast one user group having valid permission are allowed to operate on Garuda resource (physical and logical) which is grouped based on some common classification criteria. Refer figure 9. 3.5.4. Operation and Permissions Varieties of tools can be deployed to permit the users to do their requested operations based on their attributes and policy decision. Possibilities of different tools that can be deployed if needed in the future to strengthen the Garuda Security are captured with its control flow in figure 10. Sridharan R, CDAC KP, Bangalore Ver.1.0 12 Operational Garuda Architecture MyProxy VOMS GridMap AZA ATA Policies Attributes AuthZ Svc Attr Svc PERMIS XACML SAML SAZ PRIMA CAS AZA ATA Policies Attributes AuthZ Svc Attr Svc VOMS “Pull” AuthZ assertion Attr assertion AuthZ assertion Attr assertion PDP / PEP “PUSH” Server Client AZA: AuthZ Authority; ATA: Attribute Authority PDP: Policy Decision Point; PEP: Policy Enforcement Point Figure 10: Control flow for PDP and PEP Users’ PKI certificate issued by IGCA is stored with MyProxy server, and his VO credentials are stored with VOMS server. When client want to use the Garuda resources he will download the proxy certificate for the desired period embedded with relevant VO attributes from the MyProxy server. This information will be pushed to Policy decision / enforcement server. This server in turn will pull relevant information from Authorization and Attribute authority. These authorities will fetch the relevant policy and attribute for the given certificate form the respective database. Upon receiving the information the server will do the matching between requested and approved operation and give permissions accordingly. Presently the policy and attribute evaluation are done at single domain with centralized policy database / service. But Garuda ultimately need to split policy rules and attribute mappings and distribute across sites. 3.5.5. Access determination and policy assertions VOMS SAZ/PRIMA/GUMS Puser / Luser/ UGroup/Role |Op| Data Service after staging.. Perm| Rgroup/ LRsrc /PRsrc MyProxy AuthN – UsrName => DN Mapping Meta-Data Catalog Figure 11: Possible tools for Access and Policy Control Access determination and policy assertions in Garuda are handled by deploying open source tool kits namely MyProxy and VOMS. Figure 11 shows role of MyProxy for user mapping and VOMS for user group. Other possible Sridharan R, CDAC KP, Bangalore Ver.1.0 13 Operational Garuda Architecture public domain tools for supporting permitted operations are also indicated. We will discuss in detail about MyProxy and VOMS in the next two sections. 3.5.6. MyProxy Deployment in Garuda MyProxy, a open source tool for managing X.509 Public Key Infrastructure (PKI) security credentials (certificates and private keys), installed in Garuda, acts as an online credential repository so as to allow users to securely obtain credentials when and where it is needed inside Garuda . o o o o MyProxy Server, deployed on one of the identified Head Nodes acts as centralized credential server. One more Head node of Garuda will also gets configured with MyProxy server as Fail-Over or Backup Credential Server incase of failure to centralized server MyProxy clients, comes as part of the standard GT4 distribution will be available on all the Head Nodes. The MyProxy server will only act as an Online Credential Repository and will not function as a Certificate Authority 3.5.6.1. MyProxy usage Figure 12: Control flow in MyProxy tool o o o o o After obtaining the signed certificates from the IGCA, user can optionally upload his certificates into the MyProxy server, to make it available in the Credential repository. The duration & validity of the stored credential can be controlled by the user during the time of upload. At the time of usage, the user can download the stored Credentials from the MyProxy Server to any of the machines from which he would like to access the Grid services. This proxy certificate enables single sign-on and access to a node demanding authentication. Users can renew their expired Proxy credentials using the MyProxy clients. Portals, PSEs and other tools which require grid authentication, can directly integrate with the MyProxy APIs, to download the user credentials and access the grid services on their behalf. Sridharan R, CDAC KP, Bangalore Ver.1.0 14 Operational Garuda Architecture 3.5.7. VOMS deployment in Garuda Virtual organizations in Garuda are managed by deploying Virtual organization Management Services (VOMS), an open source tool developed and maintained by EGEE grid. It provides a database of user roles and capabilities. A set of tools for accessing and manipulating the database and using the database contents to generate Grid credentials for users when needed, is also available. Separate set of tools are provided for the administrators who are responsible for admitting and assigning role for the users. • • • • • • • • • • • Each VO will have a corresponding ‘VO Server’, Each VO servers belonging to specified application area will be setup on one of the identified Head Nodes belonging to the respective VO. Some other Head Node, belongs to the same VO gets installed with VO Server and acts as a Fail-Over. (In addition there will be a common VO server, deployed centrally will be used for the members not belong to any of the application VO.) VO Management Clients will be available on all the Head Nodes The VO Server can contain the fine grained Access Control List & Role definitions for the members of the VO Role definitions can be used to collectively define a set of authorized operations. Individual Grid services can be enabled to validate & authorize based on the VO credentials. Resources can be part of different Virtual Organizations (VO) in a grid. The same resource can be a part of one or more VO. Grid Users can become members of different VOs in the grid. 3.5.7.1. VOMS usage VOMS-Proxy-Init with desired role Client tool for role-selection VOMS-proxy-init Retrieves VO membership /role attributes VOMS Server VOMS Attribute Repository Figure 13: Control flow in VOMS tool o o o o o In addition to the standard X.509 based Grid certificates, the users will be required to obtain separate VO credentials, from the VO Server to get authorized and access the VO enabled services. The VO credentials identify the membership and ‘Role” of a Grid User to a specific VO. A Grid User can have multiple ‘Roles’ defined in a VO. At the time of request for the credentials, the users can specify the Role for which they need the credentials. The VO Credentials will be appended to the standard X.509 grid proxy certificates. VO services can then grant or deny access to the users making requests, depending upon the validity of these credentials. Sridharan R, CDAC KP, Bangalore Ver.1.0 15 Operational Garuda Architecture 3.5.8. Integrated AuthN & AuthZ Control flow of the Integrated authentication and authorization using MyPoxy and Voms is captured in figure 14. Garuda has a centralized MyProxy server and two VO server namely VO1 and VO2 installed in some of the head node. MyProxy Server VO-1 Server Head Node VO-2 Server Head Node Head Node LEGEND Head Node VO 1 Grid Credential Request Proxy Credentials VO 2 VO Credential Request VO Credentials with Roles Figure 14: Flow of Integrated AuthN and AuthZ in Garuda 4. Job Management in GARUDA A key area in Grid computing is job management, which typically includes planning of job’s dependencies, selection of the execution cluster(s) for the job, scheduling of the job at the cluster(s) and ensuring reliable submission and execution. Garuda provides mechanism to Reserve the resources for their assured availability which enables smooth completion of job. It also provides functions to capture Accounting information once job ends, to create the Usage Record (UR) and Resource Usage Services (RUS) as standardized by OGF (Open Grid Forum). 4.1. Garuda Job Management Interfaces Job Management in Garuda is handled by the User Portal, Command Line Interface Meta-scheduler, Local resource Manager and other various components that constitute the Garuda Middleware. The access points for GARUDA users for job submission are the GARUDA Portal and the Command Line Interface (CLI). The Garuda Access Portal is also accessible through Internet, so that users not belonging to the Garuda Network can submit jobs on any of the Garuda resources. Other job submission mechanisms like Problem Solving Environments (PSE) and Work Flow tools are accessible through the Portal. Sridharan R, CDAC KP, Bangalore Ver.1.0 16 Operational Garuda Architecture User Internet Client Environment Garuda Portal User User PSE Work flow User Garuda Middleware CLI CLI Garuda Network Garuda M/W Garuda M/W LRM LRM Compute Nodes Compute Nodes Figure 15: Garuda Job Management Interfaces Both PSE, an environment ready for specific problems that has been already enabled in the grid and Work Flow tools, where in programs, data and I/O relations with their execution sequence are defined and submitted as a job can be invoked through Garuda portal. Using the Resource Reservation interfaces available in the Head Nodes Garuda compute nodes can be reserved to run the jobs during the specified time slots. Jobs submitted on any of the resources in GARUDA can be tracked and queried for job status using portal. Users can also make use of the Command Line Interface (CLI) available on the Garuda Head Nodes, to login to the GARUDA Grid and submit their jobs on the resources. 4.2. Types of jobs supported in Garuda Users can submit Jobs which may include, but not restricted to the following types of jobs: Serial/Sequential Jobs Array Jobs Client/Server Jobs Data Transfer Jobs Sridharan R, CDAC KP, Bangalore o o o Parallel / Distributed Jobs Distributed jobs without Parallel programming paradigm MPI Jobs running on a Single cluster MPI Jobs distributed across different cluster Ver.1.0 17 Operational Garuda Architecture Jobs requiring specific software environment for execution can be made available on selected clusters, on a request on a case to case basis. Currently the Garuda Job submission mechanism does not support the submission of Interactive and Real time jobs which require user intervention during the program execution. 4.3. Components involved in Job Management The Job submission portal and the Command line tools accept the job request from the end users and pass it to the GARUDA Metascheduler (Gridway). This meta-scheduler takes care of workload management in GARUDA grid. Garuda Job Submission GridWay Accounting Globus Accounting Local Resource Manager Failover Cluster Accounting DB Accounting Replica DB Figure 16: Various components of Garuda Job Management The Reservation module is well integrated into the Garuda Access Portal. The module also offers command line tools which can help in creating advance resource reservations. The meta-scheduler can schedule the jobs on reserved resources based on a prior Resource Reservation Identifier. • • • • 4.4. Job execution is achieved using the Globus GRAM services The super scheduler submits the jobs to the GRAM components available on the Head nodes, after querying the information system to find a suitable resource candidate on which the Job can be potentially executed. Job execution is carried out in the computing cluster with the help of the Local Resource Managers namely PBS/Torque and Loadleveler. The Accounting module integrates with Gridway, RLM and Globus components to gather the usage information and store it in a centralized database. Data staging Data staging for job execution in Garuda is primarily dealt through Gridftp which uses ftp protocol beneath. Users can also use the RFT which intern uses the Gridftp. Storage from where to take input and place where to copy the output is to be specified by the user through meta-scheduler. Input and output of the applications will be moved into scratch space during the execution of the application. After execution is complete outputs will be moved to storage as Sridharan R, CDAC KP, Bangalore Ver.1.0 18 Operational Garuda Architecture specified by the user. These data needs to be moved to permanent locations as early as possible failing which Garuda will not guarantee the availability of data for the future reference. 4.5. Super scheduler According to this new architecture GridWay, the meta-scheduler gets installed in all the head nodes. GridWay is customized to support Garudas’ resource reservation module by introducing wrappers for some of the Gridway commands like…… Customized GridWay need to recognize the virtual organization concept and identify resource according to the user member ship to a particular VO before scheduling his job in the Grauda. GridWay has been configured to look for the resource match primarily in their corresponding Regional Information Server. Unsuccessful search will lead GridWay to querying the grid-wide centralized Information server. This feature is achieved by modifying the MAD module of Gridway. Connection failures to the centralized information server are overcome by searching into the pool of regional information servers. This deployment model ensures high availability of resources information and allied services for job submission in GARUDA. 4.6. Resource Reservation and Management (RRM) Advance reservation facility in grid ensures the availability of resources required by a user or an application in future times. Resource reservation in grid, which can possibly independent of a particular job, can be requested by user or administrator. It will be granted by the reservation management system based on the privileges of the user or application, and in accordance with the policies enforced on the requested / matched resources. The reservation causes the associated resources to be reserved for the specified user, administrator, or an application. 4.6.1. RRM Vision o o o RRM should allow the use of reserved resources for the specified duration without any influence from the subsequent reserved and nonreserved jobs. Avoid non-utilization / mis-utilization of reserved resource by employing suitable mechanisms. Some resource should be made available to run non-reserved jobs also. 4.6.2. Reserved job The Garuda middleware should handle Co-allocation job as the reserved ones. MPI jobs demanding no Co-allocation are to be treated as non-reserved one. The Co-allocation jobs are referred to as jobs which run multiple computing resources concurrently to obtain solutions. To ensure their operations, the reserved jobs are to be assigned a higher priority than the non-reserved ones. Sridharan R, CDAC KP, Bangalore Ver.1.0 19 Operational Garuda Architecture Once resource reservation is established for a reserved job, the job should be started at the specified time, date as specified during the reservation period.. Failing which, the reservation will be automatically cancel the reservation after certain period of time. This elapse time could be configurable. Jobs should strictly adhere to maximum runtime for which the resource is booked. 4.6.3. Non-reserved job Non-reserved jobs are run without usage time reservation according to the operation policy like First Come, Fair Share, etc. These non-reserved jobs may be put together with the reserved jobs. These jobs are scheduled to run using free resources and free time of the established job, however, the running non-reserved jobs are to be canceled when any reserved job is submitted or run. To minimize the impact of this feature it is recommended to prepare a computing resource which cannot be reserved to assure an environment for running the non-reserved job. If an attempt is made to run the reserved or non-reserved job exceeding the specified duration, the job is forcibly terminated 4.6.4. Architecture of Garuda RRM The Reservation architecture in Garuda can be achieved using a two level implementation approach namely, Grid level and cluster level components. The block diagram depicting the components that are involved RRM of Garuda grid is captured in figure17. Garuda level reservation component GridWay Meta-scheduler Globus 4.x Reservation DB Garuda Middleware Reservation Component Garuda LRM reservation component Reservation Replica DB Failover Reservation Manager and Scheduler Local Resource Manager Figure 17: components of Garuda Resource Reservation Cluster reservation component responsible for enforcing reservations at the individual cluster along with cluster reservation manager and scheduler, guarantees the mapping of grid reservation on to the cluster compute nodes. These components will reside on every cluster participating in the Garuda. Sridharan R, CDAC KP, Bangalore Ver.1.0 20 Operational Garuda Architecture 4.7. Accounting Server Most commercial organizations are not likely to go on the Grid" unless they are paid for the resources they provide. In order to enable economic compensation, it is necessary to keep track of the resources utilized by Grid users. This is where Grid accounting enters the picture. The information acquired through Grid accounting can serve several useful purposes. Some of them can • • • • • Form a basis for economic compensation. Once usage information is available, the resource provider could apply a transformation to convert resource usage into some monetary unit. Direct payments could also be envisioned, where resource usage is charged for immediately, e.g., by means of credit card transactions or Bank payment gateways. Be used to enforce Grid-wide resource allocations. E.g., resources might only grant access to users whose current resource allocation has not been used up. Allow tracking of user jobs. Users can obtain information about a submitted job, such as where it was submitted, the resources it consumed and the output it generated. Enable evaluation of resource usage. Resource providers need to be able to determine to what extent different resources have been utilized. Furthermore, administrators can obtain information about what job executed on a specific resource at a certain time. Such information can be useful when debugging programs or tracking malicious users. Be used by resources to dynamically assign priorities to user requests based on previous resource usage. 4.7.1. Architecture of Garuda Accounting Service GARUDA is deployed with Globus that supports accounting and auditing facilities, as its middleware. Accounting information related to jobs submitted to Globus can be captured into a centralized server using PostgresSQL database through this facility. Similarly information like cpu and memory utilization, execution time etc., related to the jobs submitted to LRM can also be configured to get into the central server by writing custom module. The most important activity is to identify the mapping between Globus and LRM job ids. Once identified by combining these two tables we shall get all the accounting information. Sridharan R, CDAC KP, Bangalore Ver.1.0 21 Operational Garuda Architecture Client 1 Client 2 Client 3 ... Client n Accounting Service PostgreSQL job_id start_time end_time cpu_time memory ………. Head Node GridWay Meta-scheduler Globus GRAM Head Node Globus GRAM LRM Server LRM Server Compute Nodes Compute Nodes Figure 18: components of Garuda Accounting Service Service interface need to be developed to give the requested information by the individual client after querying the database on user selected filters. These queries may be on the usage details based on “Per User Basis” and “Per Resource Basis”.This service model provides the much needed modularity so that changes in the individual client and its access method can be avoided even if the accounting database schema changes in the future. 4.8. Login and compiler services Garuda grid being built using Service oriented Architecture; some of the essential facilities of Garuda need to be exposed to the user through service components. Immediate requirement is that of Login and compiler services. Since Garuda can potentially host various kinds of large resource and applications belonging to various user groups there exist a need to give single sign-on so as to avoid requesting user to go through the Authz mechanism for each resources with in a session. More over this service should be built with the facility to accommodate various types of AuthN and AuthZ mechanism that garuda can host in the future. It needs to take minimum input parameters from the user and give single identifier which will be recognized across the Garuda for Auth purposes. As Garuda is made up of different resource pooled from partners having variety of operating systems and LRMs building application will have significance in an application development for Garuda. In order to hide the complexities derived out Sridharan R, CDAC KP, Bangalore Ver.1.0 22 Operational Garuda Architecture of garuda’s heterogeneous nature a compiler service need to be deployed to build applications quickly. Apart from having the normal feature of a grid service, Compiler services is expected provide features like provision to modify the makefile even though it can generated a makefile on its own after getting necessary requirements form the user, resource selection facility to helps user to select the best matching resource to build his/her application etc. Inbuilt intelligence to eliminate non supportive resources based on the availability and the user requirement of special libraries, software etc… 5. Fault tolerance mechanisms Considering the heterogeneity and geographical distribution of Garuda resource it is very difficult to make sure all the resource is always available. In such a scenario Failover / Fault-tolerance becomes important so that user will not face any difficulty in using the resources. Very important aspect is that user should not aware of change and should not feel the hitches during their utilization. Triggering a sequence of action upon some service and resource failure, sending an alert and managing the entire process also holds a vital role for a smooth functioning operational Garuda. Every individual tools or components of Garuda can have there own fault-tolerance mechanisms, however for the some of the component that forms essential part of the architecture is capture here. All the tools developed as part of this architecture which requires a centralized database will replicate the same in a geographically different location with proper sinking mechanism. However there is no mandatory that the entire tool should use physically the same database server as current architecture does not envisaged providing database itself as service. 5.1. Information Service In the case of Information Server there will be a backup Information Server, which will also follow the pull mechanism and pull the information data available with the Regional Information servers. Same FQDN will be assigned to both the machine. DNS server need to be tweaked to resolve to the backup server once the first machine failed. As resource information in a grid is of dynamic in nature there won’t be any stale information in either of the server. 5.2. Resource Reservation System Failover mechanism of Garuda Reservation System needs to dealt in two Parts. The Grid and Cluster Level Reservation Program Components failure can be redressed easily as these components are deployed in each and every cluster participating in Garuda. To overcome this failure, user can login to other cluster and continue his activity with out facing any problems. The second part is failure of Reservation Database System that needs to be solved by adhering to replicating methods. The correctness of this is highly depends on sinking mechanisms that is deployed. Sridharan R, CDAC KP, Bangalore Ver.1.0 23 Operational Garuda Architecture 5.3. Monitoring, Alert and Event triggering Operational Garuda is deployed with Centralized Nagios, a public domain s/w for Comprehensive monitoring of all Garuda infrastructure components - including applications, services, operating systems, network protocols, system metrics, and network infrastructure. Nagios need to be configured to bring awareness to Garuda administrators via email and SMS. Multi-user notification escalation capabilities of Nagios ensure alerts reach the attention of the right people. Nagios can remediate the problems if we configure Event handlers to automatically restart failed applications, services, servers, and devices as soon as problems are detected. Extendable Architecture of Nagios provides methods for easy integration of in-house developed and third-party applications. Using this facility Garuda can write its own plug-in to gather information relevant to support QoS. Nagios helps in meeting the accepted SLAs by providing historical records of outages, notifications, and alert response for later analysis. Sridharan R, CDAC KP, Bangalore Ver.1.0 24