GEMS Deployment Planning Guide
Transcription
GEMS Deployment Planning Guide
GEMS Deployment Planning Guide Product Version: 1.4 Doc Rev 3.0 Last Updated: 27-Mar-15 Good Enterprise Mobility ServerTM Table of Contents Purpose and Scope 1 Prerequisites 1 Pre-Deployment Considerations 1 Microsoft Windows Server Considerations 2 Database Server 2 Hardware 2 Use Profile Definitions (per server instance) for Push (Mail) Notification 3 Use Profile Definitions (per server instance) for Presence 3 Use Profile Definitions (per server instance) for Connect 4 Docs Service Scalability Guidelines 5 Good Proxy Connections 7 Scalability 7 High Availability 7 Disaster Recovery 8 Scaling Factors 8 RTO and RPO 8 Physical Deployment 9 Simplest Deployment 9 Typical Deployment 10 High Availability (HA) 12 GEMS-HA Design Principles 12 HA for Instant Messaging 12 Load Distribution 12 Referral 13 HA for Presence Load Distribution 13 13 HA for Push Notifications 13 HA for the Docs Service 14 Good Enterprise Mobility Server™ ii Basic Deployment 14 HA Deployment 14 HA Failover Process/Behavior Summary 15 Additional HA Considerations 15 Disaster Recovery (DR) DR Failback Process/Behavior 15 17 General Scenario 17 Connect-Specific DR Scenario 17 Phased Approach Recommendation Deployment with Good Dynamics 18 19 Network Separation 19 Server Instance Configuration in Good Control 19 Server-Side Services 20 Conclusion 20 Appendix A – Upgrading from Good Connect Classic 22 Upgrade Scenario 1: Parallel Server (Recommended) 22 Pertinent Considerations in this Scenario Upgrade Scenario 2: Repurpose Existing Server Pertinent Considerations in this Scenario Appendix B – Migrating Your Good Share Database to GEMS-Docs 23 24 24 25 Client App Support Considerations 25 Migrating with Continued Support for Good Share 26 Migrating to Good Work Only 27 Noteworthy Feature Differences (GEMS-Docs versus Good Share) 27 Appendix C – Hardware Used for Testing GEMS Good Enterprise Mobility Server™ 28 iii Purpose and Scope Purpose and Scope Good Enterprise Mobility Server™ (GEMS) is the designated consolidation of servers currently supported by Good. The purpose of this document is to identify the key planning factors that will influence the performance, reliability, and scalability of your deployed GEMS configuration, as well as to offer guidance on high available and disaster recovery options. The guidance presented herein is intended to help ensure the highest possible levels of reliability, sustainability, performance, predictability, and end-user availability. The target audience for this guide includes enterprise IT managers and administrators charged with evaluating technology and network infrastructure, as well as those responsible for making corresponding business decisions. This document does not discuss general GEMS and supporting network installation and software configuration tasks. Rather, it focuses on infrastructure configuration topics that require careful consideration when you are first planning your GEMS deployment. For both general and specific installation and configuration guidance and best practices, see the GEMS Installation and Configuration Guide. First, however, a discussion centered in the basics of physical deployment will be helpful. Prerequisites The planning information in this document is predicated on the following software releases: l Good Enterprise Mobility Server (GEMS) – v1.2 l Good Control (GC) – v1.7.38.19 l Good Proxy – v1.7.38.14 l Good Connect Client – v2.3 SR 9 l Good Work Client – v1.2 General knowledge of GEMS and the Good Dynamics platform, along with Windows Server environments employing Microsoft Lync, Exchange and Active Directory is likewise required to effectively plan your GEMS deployment. Pre-Deployment Considerations Before attempting to deploy GEMS, you may also need to plan for upgrades to the supporting environment. Is your existing change management process sufficient and are all the required tools handy? If not, you'll need to plan for these as well. In addition, your in-house support team may need to have aspects of its training upgraded. Other key factors in the deployment of GEMS include the Microsoft Windows Server version and the machine hosting GEMS, available RAM, number of CPUs, Microsoft Lync Server version, Microsoft Exchange version, Microsoft SQL Server edition, and the roles and responsibilities of the IT staff supporting these servers and other vital components of your production network. Good Enterprise Mobility Server™ 1 Pre-Deployment Considerations Microsoft Windows Server Considerations Because GEMS uses Microsoft's Unified Communications Managed API (UCMA) to integrate Microsoft Lync with the GEMS Connect and Presence services, the OS version required to run GEMS Connect-Presence is dependent upon on the version of Microsoft Lync deployed. Per guidance from Microsoft, use the following guidelines to determine the version of MS Windows Server supported by GEMS Connect-Presence: l l l For MS Lync 2010 Deployments use Windows Server in one of these 64-bit versions: o 2008 R2 o 2008 R2 SP1 For MS Lync 2013 Deployments use Windows Server in one of these 64-bit versions: o 2008 R2 SP1 o 2012 R2 To host the Push Notification Service (PNS) only, use Windows Server in one of these 64-bit versions: o 2012 R2 o 2008 R2 SP1 Database Server A relational database is required for the GEMS Connect and Push Notification services, but not the Presence service. This database can be part of your existing environment or newly installed. GEMS supports Microsoft SQL Server in the versions and editions listed below. In all cases, the database must be installed and prepared before starting GEMS installation. This means the necessary SQL scripts included in the GEMS installation zip file must be executed before beginning GEMS installation proper. The following versions of MS SQL Server are supported: l SQL Server 2008 (Express/Standard/Enterprise) l SQL Server 2008 R2 (Express/Standard/Enterprise) l SQL Server 2012 (Standard) l SQL Server 2012 SP1 (Enterprise) Microsoft has visual and command line tools to assist with database and schema creation; i.e., Microsoft Management Studio or sqlcmd. It must be noted that, although SQL Server Express is installed and set up with little effort, it has limited resources. For most enterprises, Microsoft SQL Server Standard or Enterprise editions are recommended. Hardware The recommended hardware specifications for each GEMS machine running any combination of the services offered is captured in the following table: Good Enterprise Mobility Server™ 2 Pre-Deployment Considerations Component Specification CPU 4 vCPU Memory 16 GB RAM Storage 50 GB HDD The specifications listed above are considered sufficient to handle the majority of use cases. Your specific enterprise environment, combined with your particular traffic and use requirements, is the key consideration in determining the actual hardware to implement. Hardware configurations used in testing by Good are listed in Appendix B. Use Profile Definitions (per server instance) for Push (Mail) Notification The Mail Push Notification service uses Exchange Web Services (EWS) to watch for messages sent and received. A user profile is characterized by the number of messages sent and received by a user in a typical eight hour day. Messages sent/received per mailbox per day Activated Devices supported per server Light 50-100 40,000 Medium 100-200 20,000 Heavy 200-400 5,000 Profile For details regarding the user profile used for scale testing, please follow the Microsoft Load Gen Profile to determine which profile suits your needs best. The results of testing conducted by Good1 reveal: Metric GEMS CPU Utilization GEMS IOPS SQL CPU Utilization SQL IOPS Medium Profile Heavy Profile 7% 29 % 5 iops 4 iops 25% 25 % 40 iops 45 iops Use Profile Definitions (per server instance) for Presence Since Presence is exposed as a Good Dynamics Server-Side Service, it can be used for many applications and the load will vary depending on the characteristics of the application invoking the Presence service. Refer to the following table to gauge the load you can place on a server hosting the Presence service. 1Good lab test results are reported for the 90th percentile. The 90th percentile is a measure of statistical distributiion. Whereas the median is the statistical value for which 50% of the actual results were higher and 50% were lower, the 90th percentile reports the value for which 90% of the data points are smaller and 10% are greater. 90th percentile performance metrics are obtained by sorting test result values in increasing order, then taking the first 90 % of entries out of this set. Good Enterprise Mobility Server™ 3 Pre-Deployment Considerations Active Devices (%) Activated Devices subscribed per server Medium 20% 40,000 Heavy 50% 20,000 Profile The Good Work client also uses the GEMS Presence service. Plannning for a larger profile is recommended when sizing for a Good Work deployment due to higher activity inherent in an email-centric application. The Heavy profile results reported here represent each active device subscribing to 100 contacts. Metric Heavy Contact Profile GEMS Presence Service CPU Utilization 9.8 % GEMS Memory 3.5 GB The Presence service does not use SQL, so there is negligible disk I/O activity. Hence, only CPU and Memory test results are reflected in the above use profile for Presence . Use Profile Definitions (per server instance) for Connect Here, a profile is characterized by the amount of activity generated by users against enterprise Lync deployments. Active Devices (%) Activated Devices supported per server Light 5% 15,000 Medium 10% 10,000 Heavy 15% 5,000 Profile The activity used for scale testing followed general guidelines published in Microsoft Lync 2010 Capacity Planning for Mobility guidance, wherein a user has 60-80 contacts and each user initiates ≈4 IM sessions, each lasting ≈6 minutes per session, with 1 message sent every 30 seconds during a session. Once again, for a more detailed explanation of user profile and activity testing, please see Microsoft Lync 2010 Capacity Planning for Mobility. Resource Consumption During GEMS-Connect Load Tests 4-Core, 16 GB GEMS-Connect Profile CPU Memory Light 55% 8.4 GB 0.000218 MBps/read 0.000398 MBps/write Heavy 70% 9.2 GB 0.016115245 MBps/read 0.000379 MBps/write Good Enterprise Mobility Server™ Disk IOPS 4 Pre-Deployment Considerations Note: For 10,000 activated devices (containers) and a medium or average 10% concurrency—the DB size will be no more than 1GB. IOPS is negligible. General Performance for Connect , Presence, and Push (Mail) configured on the same machine Due to the modular design of GEMS, you can configure and run all or any of the GEMS services on the same machine or on different machines. As with all distributed systems, performance will suffer without adroit load balancing. One exception should always be made for production environments—do not run SQL Server on the same machine with other GEMS components. For lighter loads, or a lesser number of users (under 10,000), Connect, Presence and Mail Push Notifications can be configured to run on the same physical machine with a low or medium load as defined in the profiles above. Refer to the general performance outline below to determine the best configuration of (a) all services on the same machine or (b) using dedicated servers for each service to optimize performance for your particular traffic and load requirement(s). Generally, the actual use profile for most enterprises per GEMS instance will most likely be somewhere between Light and Heavy. Light profile testing1 conducted by Good on the recommended hardware configuration running all three services reveal the following metrics. Metric GEMS CPU Utilization GEMS IOPS SQL CPU Utilization SQL IOPS Light Profile 60 % 17 iops 32 % 55 iops Docs Service Scalability Guidelines The scalability of the Docs service is strongly influenced by maximum peak concurrency and end-user mobile usage patterns. Accordingly, Good’s guidelines for scalability are based on three concurrency profiles: High, Medium, and Low. As a baseline for the “Medium” concurrency profile we assume maximum peak concurrency of 10%, which is based on Microsoft’s Capacity Planning for Windows SharePoint Services guide and uses a maximum peak concurrency assumption of 10%, inclusive of both mobile and web traffic. We then conservatively assume that a “High” concurrency system will have greater mobile usage concurrency than Microsoft’s guidelines, while the “Low” concurrency system will have lower mobile usage concurrency than Microsoft’s guidelines. In practice, we do expect that mobile usage will have generally lower maximum peak concurrency than the overall SharePoint system, since the latter includes both mobile and web traffic. 1Again, Good lab test results are reported for the 90th percentile. The 90th percentile is a measure of statistical distributiion. Whereas the median is the statistical value for which 50% of the actual results were higher and 50% were lower, the 90th percentile reports the value for which 90% of the data points are smaller and 10% are greater. Good Enterprise Mobility Server™ 5 Pre-Deployment Considerations Docs Server Based on this approach and assumptions, the Docs Server scalability guidelines are set forth in the table below. When planning their individual deployments, we recommend that customers measure their actual current SharePoint maximum peak concurrency and then use that as the baseline for determining which of these concurrency profiles best fits their environment. GEMS Docs Service Scalability Concurrency # of users per server Max concurrent users High (12%) 33,000 4,000 Medium (10%) 40,000 4,000 Low (8%) 50,000 4,000 Because GEMS can run multiple services on the same server, actual capacity is contingent on the collective services being deployed in a given environment. As more social capabilities are added and File Explorer is made to work across multiple Good Dynamics apps, customers who enable these features or use apps that leverage the file explorer service may see higher concurrency for the Docs service. Docs Database Good estimates that the initial database for large deployments will need 1 GB for data files and 1 MB for the Redo log, if auditing is turned on. Present estimates suggest that data files will grow by 2 GB per month for every 1000 users. Actual usage patterns will vary between organizations, so your actual usage should be monitored over the first few days of production and adjusted as necessary. Due to the modest hardware costs involved, Good advises allocating at least 10 GB of disk space for data files. Very favorable response times were experienced during testing with 7200 RPM disks. Additional recommendations include limiting auditing to all user operations except file browsing, and purging audit data periodically. Note: In running performance tests we use simulation clients. These simulation clients open a connection to the Docs server on the recommended hardware profile and execute the same operations that a mobile device would execute—Upload/Download/Browse Files/Browse Folders/Delete Files/Update Files. All these operations are done at a variable and random time gap from 5 to 15 seconds. The test data uses files of 1KB, 5KB, 50KB, 100KB, 500KB, 10 MB and 100 MB with the total size of the data set being 1.34 GB. The SharePoint tests are performed on a SharePoint farm with two SharePoint 2013 Servers talking to same remote SQL Server 2008 Server. The pseudo user profiles are added to Active Directory and divided into security groups with 100 users in each group. On the Docs server, the users/user groups are divided into 4 policies. The SQL Server used by the Docs server is on a remote machine. Good Enterprise Mobility Server™ 6 Pre-Deployment Considerations Good Proxy Connections From the perspective of the Good Proxy (GP) server, GEMS is an application server. Any traffic relayed from GEMS to the GP server will consume a concurrent connection session on the GP server. Consequently, it's important to understand how the individual services in the GEMS machine interact with the GP server. Connect – 1 active device requires 3 connections Presence – 1 active device requires 1 connection Push (Mail) Notification – 1 active device requires 1 connection for EWS Scalability GEMS scales linearly. For this reason, and given the specifications cited, you can create additional capacity by adding more GEMS machines. You will then need to scale-out the database and Good Proxy resources accordingly to account for the additional capacity. See Scaling Factors below for best practices on utilization measurement. High Availability Hardware failure, data corruption, and physical site destruction all pose threats to GEMS services availability. You improve availability by identifying the points at which these services can fail. Increasing availability means reducing the probability of failure. At the end of the day, availability is a function of whether a particular service is functioning properly. Think of availability as a continuum, ranging from 100 percent—a completely fault-tolerant system/service that never goes offline—to 0 percent (never available/never works). Well-planned HA systems and networks typically have redundant hardware and software that makes them available despite failures. Well-designed high availability systems avoid single points-of-failure. Any hardware or software component that can fail has a redundant component of the same type. When failures occur, the failover process moves processing performed by the failed component to the backup component. This process remasters system-wide resources, recovers partial or failed transactions, and restores the system to normal, preferably within a matter of microseconds. The more transparent failover is to users, the higher the availability of the system. At all events, you cannot manage what you cannot measure, so two planning elements are vital before anything else. The first is determining the hardware required to manage and deliver the IT services in question, the basis for which is outlined above. Adequately allowing for growth, measuring as accurately as possible the number of devices, traffic and load likely to be placed on GEMS and its services offers the best indication of the server hardware and supporting infrastructure likely to be required. Concentrating solely on GEMS with Connect and Presence and its supporting architecture, the first objective in setting the goals of a high availability/disaster recovery (HA/DR) investment strategy is to develop a cost justification model for the expense required to protect each component. If the expense exceeds the value Good Enterprise Mobility Server™ 7 Scaling Factors provided by the application and data furnished to the business, plus the cost to recover it, then optimizing the protection architecture to reduce this expense is an appropriate course of action. See High Availability (HA) below for a general discussion of HA options and alternatives. Disaster Recovery Your data is your most valuable asset for ensuring ongoing operations and business continuity. Disasters, unpredictable by nature, can strike anywhere at any time with little or no warning. Recovering both data and applications from a disaster can be stressful, expensive, and time consuming, particularly for those who have not taken the time to think ahead and prepare for such possibilities. However, when disaster strikes, those who have prepared and made recovery plans survive with comparatively minimal loss and/or disruption of productivity. Establishing a recovery site for failover if your primary site is struck by a disaster is crucial. Good recommends mirroring your entire primary site configuration at the DR site, complete with the provision for synchronous byte-level replication of your SQL databases. This is because if the system does fail, the replicated copy is up to date. To avoid a “User Resync” situation, the replica must also be highly protected. See Disaster Recover (DR) below for a discussion of Good's DR recommendations for GEMS. Scaling Factors The scale of your GEMS deployment is largely dependent on the size of your enterprise and its IT logistics— number of sites, distance between sites, number and distribution of mobile users, traffic levels, latency tolerance, high availability (HA) requirements, and disaster recovery (DR) requirements. With respect to HA/DR, two elements must be considered—applications and data. Most commonly, though not exclusively, HA refers to applications; i.e., GEMS Connect and Presence. With clustering, there is a failover server for each primary server (2xN). DR focuses on both applications and data availability. The primary driver of your DR solution is the recovery time objective (RTO). RTO is the maximum time and minimum service level within which a business process must be restored after a disaster to avert an unacceptable break in business continuity. Before contemplating the optimal number of servers to be deployed, however, it’s wise to first determine the right size of an individual server to meet your enterprise’s “normal use” profile. There are a number of methods for projecting a traffic and use profile. Actual, real-world measurement is recommended and made easy using built-in Windows Performance Monitoring tools. Notwithstanding the method applied, it is important to remember that GEMS performance is governed by two principal factors: CPU utilization and available memory, the former being somewhat more critical than the latter. RTO and RPO For GEMS deployment planning purposes, the first step in defining your HA/DR planning objective is to balance the value of GEMS and the services it provides against the cost required to protect it. This is done by setting a recovery objective. This recovery objective includes two principal measurements: Good Enterprise Mobility Server™ 8 Physical Deployment l Recovery Time Objective (RTO) – the duration of time and a service level within which the business process must be restored after a disaster (or disruption) to avoid unacceptable consequences associated with a break in business continuity. For instance, the RTO for a payroll function may be two days, whereas the RTO for mobile communications furnished by GEMS to close a sale could be a matter of minutes. l Recovery Point Objective (RPO) – the place in time (relative to the disaster) at which you plan to recover your data. Different business functions will and should have different recovery point objectives. RPO is expressed backward in time from the point of failure. Once defined, it specifies the minimum frequency with which backup copies must be made. Obviously, if resources were fully abundant and/or free, then everything could have the best possible protection. Plainly, this is never the case. The intent of HA/DR planning is to ensure that available resources are allocated in an optimum fashion. Physical Deployment A production deployment of GEMS requires a clustered configuration, plus consideration given to integration with the Good Dynamics server infrastructure and with your existing enterprise systems. Here, it's important to understand the definition of a "GEMS cluster" and an "instance" within that cluster. An "instance" is any individual deployment of GEMS, with any combination of services provided by its Java tier and its .NET tier. An instance of GEMS usually runs on one physical machine. However this is not mandatory. The same physical machine could be used to deploy multiple instances of GEMS with service endpoints that listen in different ports. A GEMS cluster is just a group of instances. Within a GEMS cluster, each instance is identical in that they all expose the same services and share a common database. Instances in a cluster can be considered "active / active" in that there is no concept of a "passive" instance used for failover. Even so, instances in a cluster never communicate with each other or synchronize data. All GEMS instances in a cluster are homogeneous in that they all expose exactly the same service(s). This means that when an application is configured in the GC with a list of server endpoints, any of these server endpoints can be expected to provide the same service used by the application. This strategy also promotes ease of horizontal scale/replication, as well as ease of hardware failure correction by swapping in pre-built spares. Simplest Deployment The simplest production deployment of GEMS in a corporate network ( depicted below) comprises: Good Enterprise Mobility Server™ 9 Physical Deployment As shown, such a deployment comprises: l One Microsoft Lync Server and an Microsoft Exchange server deployed in a corporate network and one database. l A single GEMS cluster made up of two physical instances (for fail over). This cluster provides all services— Presence, Instant Messaging, Push Notifications and Exchange Integration—for all device clients. l One Good Proxy server (GP) with affinity configured to both instances in the GEMS cluster, along with only one Good Control (GC) server. Typical Deployment Expanding on the simplest configuration, a typical deployment, adhering to generally accepted IT practices, offers high availability (HA) service access within data centers, rather than geographically distributed disaster recovery (DR) sites between data centers. Good Enterprise Mobility Server™ 10 Physical Deployment Here, there are two geographical regions (UK and US) to which GEMS clusters are deployed, furnishing device clients access to the services provided by GEMS. Two Microsoft Lync Pools are deployed—one in each geographical region. Device clients in each region are provided access to the Presence service and Connect (IM) service by a GEMS cluster configured to use the Microsoft Lync Pool infrastructure in that region. There is only one GEMS cluster in the UK region (Cluster #1), and it provides the Presence and Connect services. Two GEMS clusters (Clusters #2 and #3) are deployed in the US Region. Cluster #2 provides the Presence and Connect services for devices clients in the US Region, whereas Cluster #3 provides the Email (Push) Notification service for device clients in both regions. In this example, only two physical instances are required for HA. As seen above, there is a separate GP Cluster deployed in each region. GP servers in each cluster are configured to have affinity to the GEMS cluster(s) used by device clients in their region. Good Enterprise Mobility Server™ 11 High Availability (HA) Only one GC cluster is necessary. It is deployed in the US Region and used by the proxy servers in both GP clusters for both regions. High Availability (HA) Availability is measured in terms of outages, which are periods of time when the system is not available to users. Your HA solution must provide as close to an immediate recovery point as possible while ensuring that the time of recovery is faster than a non-HA solution. Unlike with disaster recovery, where the entire system suffers an outage, your high availability solution can be customized to individual GEMS resources and services. HA for GEMS means that the runtime and Service APIs for Push Notifications, Presence, and Connect are unaffected from the perspective of a device client whenever any instance of GEMS goes down or any of its services stop working. GEMS-HA Design Principles Services provided by GEMS instances should not differ in their approach to: i. Even distribution of work over instances ii. Detection of instance failure and iii. Reallocation of work for existing users. Hence, the following design principles are followed for all services: l Shared Storage – Achieves HA/DR by adopting a shared storage model and, where possible, services provided by GEMS instances are stateless so that device clients can select any GEMS instance regardless of where they may have been previously connected. l Client-Side Load Balancing – Clients know the list of server endpoints in a GEMS cluster (with affinity to their GP cluster) and service requests are evenly distributed to those server endpoints via client-side load balancing. l Heartbeat – Services on each instance are responsible for reporting their own health in the shared database. l Elected Health Watcher pattern – One instance in the cluster is chosen through an election algorithm to watch the health of all the others, and then centrally coordinate work load distribution in response to a failed instance. All instances can be watchers and the election algorithm provides fail over for watchers. l User tables in Shared Storage – To aid failover, the database can be used to determine which instance in a GEMS cluster is currently being used to handle work for which end users. HA for Instant Messaging Instant Messaging (IM) is provided by the GEMS-Connect service to the Good Connect client. Load Distribution Client devices are aware of a list of endpoints (server instances in the same GEMS cluster) which they can contact for the Connect (IM) service. Each user session is kept up to date in the database, including which server instance Good Enterprise Mobility Server™ 12 High Availability (HA) is currently handling the user session. If a server instance receives a request for a user it has not yet served, it first looks for any sessions the user may have in the database, and may respond with a 503 referral to a different instance that is already holding a live session for that user. The Connect client cooperates by obeying referral responses. Referral Server instances can be marked "offline" in the database due to a heartbeat failure or because another instance in the GEMS cluster has determined that it is offline. If the server of record for the user is offline, the newly contacted server can adopt the user session, dynamically establish a session with Lync on behalf of the user, and then transition the user to the new server. The offline status of the server has no effect on requests being routed to it, but it prevents referral to it by other servers of incoming requests. HA for Presence The Presence Service provided by GEMS consists of an HTTP service called by device clients and a "Lync Presence Provider" that integrates with a Lync Pool deployment (.NET). GEMS clusters used for the Presence and Connect Services will be specialized for this purpose, even though they are capable of supporting Push Notifications and Exchange integration. Put another way, the Presence service is deployed with Connect following the Connect deployment pattern with the Lync infrastructure. Load Distribution Device clients can use any instance in the GEMS cluster to establish multiple different Presence subscriptions; for example, matching a list of Contacts, Email participants, or a GAL Search. Moreover, multiple instances in the GEMS cluster can all reuse the same Lync Presence subscription. Presence subscriptions are not long lived and they are not suitable for storage in a database. Instead, they are stored in a persistent cache shared by all instances in the GEMS cluster, where they readily expire. The persistent cache is used to maintain a timestamp for each subscription which is used by the Presence service to determine what new presence information to provide to the client on request. HA for Push Notifications The High Availability objective for Push Notifications is that device clients should be able to register once for Push Notifications, and not be impacted by servers that manage those notifications going up and down. Even though device clients can be expected to eventually resubscribe, the GEMS HA design does not depend on them doing so. Incoming push registrations are directed at random to any instance of GEMS in a cluster. There is no affinity to server instances for device clients based on mailbox. If the push registration already exists in the shared database, then it is assumed that one instance in the cluster is already managing an EWS Listener subscription for that user. No new action needs to be taken, except to reset the watermark of the push registration in the database for aging purposes. Good Enterprise Mobility Server™ 13 High Availability (HA) The EWS Listener Service on each instance of GEMS is responsible for periodically updating its own health status in the shared database. If the EWS Listener Service for any one instance fails to refresh its own health status within an expected time window, then it is considered down. One instance in the cluster is elected as a "Watcher" for this condition, whereupon it is responsible for instructing another instance to take over (and recreate) EWS subscriptions for user mailboxes that were currently attributed to the dead instance. This is done by updating Push Registrations in the shared database to reflect the new instance upon which the EWS Listener Service should manage those user mailboxes. When the dead instance comes back it is just another instance that is ready to manage new push registrations. HA for the Docs Service The Docs server/Docs Configuration Console for GEMS 1.4 is designed to be deployed in a wide-distribution, high-availability environment, ensuring continuous uptime and automatic workload distribution across multiple servers. Docs HA is implemented by leveraging the clustering capabilities of GD app servers. GEMS-Docs HA can also provide horizontal scalability. Currently, a single Docs server can support up to 5000 users and 600 concurrent connections. Docs servers can be deployed either as single-server nodes or in a cluster. Individual servers deployed in a cluster can be assigned server affinity. Clusters and affinities are configured in Good Control. However, the GD runtime does not implement server selection for Docs servers. Instead, the Good Work app is given access to a structured representation of the Docs server cluster configuration, a structure that includes server affinity, addresses, and port numbers. Basic Deployment The simplest (POC) deployment configuration for the Docs server via the requisite GEMS-GD infrastructure comprises a single server connecting to a remote SQL Server instance and document repositories. HA Deployment In addition to its supporting GEMS-GD infrastructure, a high-availability deployment of the Docs service consists of: l 3 Docs servers l remote SQL Server instance l document repositories All three Docs server instances access the same remote SQL database, which should ideally be installed in a SQL cluster. Designate a "master" Docs server for the performance of administrative tasks. Because the values configured are recorded in the database, they are automatically mirrored to the other server instances. The list of Docs servers must then be entered into Good Control. Each Good Work client will then receive the list and initiate a connection to a Docs server based on the designated priority/affinity. Consult the latest edition of the GEMS Installation and Configuration Guide for comprehensive guidance. Good Enterprise Mobility Server™ 14 Disaster Recovery (DR) HA Failover Process/Behavior Summary GEMS can scale horizontally and offers N+1 redundancy. This offers the advantage of failover transparency in the event of a single component failure. The level of resilience is referred to as active / active (a k a "hot") as backup components actively participate with the system during normal operation. Failover is generally transparent to the user as the backup components are already active within the system. In adequately configuring your GEMS components for this redundancy, the following measures must be taken: 1. Configure additional GEMS machines in a cluster to use the same underlying SQL Server database. This is done through the GEMS Dashboard. 2. Configure the additional GEMS Hosts in Good Control. This configuration can happen in two locations within Good Control, depending on your deployment model. Once configured, each client receives a list of supported GEMS servers during app start up of Good Connect/Good Work. The client will then choose a server from the list at random and continue to utilize that server for the life of the user's session. A session constitutes an active login with the system and persists until a user either manually signs out or a 24hour period (configurable), which ever comes first. Should one server fail, the client will retry additional servers from the list until it can successfully login. Any existing active user session will be seamlessly transferred to the new server. Detailed HA configuration steps are available in the GEMS Installation and Configuration Guide. Additional HA Considerations After adding servers for HA, each client must update its policies in order for it to be aware of the new systems. Policy updates are automatically performed each time the client is launched or a new policy is detected. However, the update could be delayed if the Good Control (GC) server is overburdened with update requests. As of the current release, each GC server can process two policy updates per second. Thus, it is important to scale your GC servers to match your policy update requirements. If you are using server affinities, these settings will need to be adjusted to account for the new servers. Disaster Recovery (DR) DR is different from High Availability among instances of a GEMS cluster in that an entire cluster in one region has become unavailable and device clients need to be redirected to a GEMS cluster in a different region providing the same services. The DR model for a GEMS cluster in a data center is to have another identically configured GEMS cluster in different (failover) data center that shares the same storage through a replication strategy provided by the vendor of the database and file system. This is the same strategy prescribed by Good Dynamics for disaster recovery of a GC cluster. Good Enterprise Mobility Server™ 15 Disaster Recovery (DR) Diagrammed below is a typical DR pattern using Primary and Standby data centers. Although the GEMS cluster depicted focuses on Presence and Connect, the pattern should be identical for a GEMS cluster used for Push Notifications and Exchange integration. Note: Virtual IP (VIP) is commonly employed by IT for failover, although, in the case of GEMS services, this is not mandated by Good Dynamics. This is because GP Clusters already have a "primary"and "secondary" configuration with respect to an application managed in Good Control. In the context of GEMS-Connect being the target app server, however, it is strongly recommended that you VIP the primary Good Proxy server in your primary GP cluster since Connect does not cache the list of GPs from the getGPServer list. Hence, if the primary GP to which you feed this API call goes down, the whole primary cluster is not seen and is effectively down, also rendering Connect in a down state. Moreover, there is no failover to a secondary cluster. Indeed, the only benefit a secondary cluster for a Connect configuration provides is scalability, no failover. Only failover within the primary cluster is supported, provided the primary GP (GD_HOST) is still up. See the recommended Connect-Specific DR Scenario. A Load Balancer with Virtual IP is used to route device traffic to a GP cluster in the primary datacenter. This GP cluster has affinity to the GEMS instances in a GEMS cluster likewise located in the primary datacenter. Good Enterprise Mobility Server™ 16 Disaster Recovery (DR) The Load Balancer is responsible for periodic heath check of the GEMS cluster in the primary datacenter. If the health check fails, then the Load Balancer initiates fail over to the GEMS cluster in the standby datacenter. Device clients are then routed to a GP cluster with affinity to server instances in that GEMS cluster. The database in the standby datacenter is replicated from the production database in the primary datacenter. However, any state—such as Presence subscriptions and active Lync conversations—would be lost and must be recovered as clients submit subsequent requests. Good Control server instances in both the standby datacenter and the primary datacenter are in the same GC cluster because they all use replicas of the same shared storage. The only difference is that GC server instances in the standby datacenter have affinity with the GP cluster in the same datacenter. When the Health Check indicates that the primary datacenter is available once again, the Load Balancer will initiate failover back to the GEMS cluster in the primary datacenter. With respect to push notifications, when a DR failover happens, device clients must resubscribe using the Push Notification Service (PNS) provided by the GEMS cluster in the standby datacenter. There is no expectation that EWS Listener subscriptions for existing users will be automatically recreated. DR Failback Process/Behavior Assuming the DR site is properly configured , failover should be transparent to the end user. As noted earlier, the client is aware of multiple GC, GP and GEMS with which it can connect. In the event that the primary site goes offline, GEMS clients will try to connect to the services in the secondary site. General Scenario Before failing back, you must make sure that the secondary database is synchronized with the primary database. Update the DNS accordingly to remap infrastructure resources. From a client perspective, the user may need to quit and relaunch the app. In most cases, however, the process will be transparent to the end-user, and the app will reconnect to the primary resources once it comes back online. Connect-Specific DR Scenario The scenario recommended for Connect is an active/passive arrangement in which one site serves as a standby backup site should the primary site goes down. Good Enterprise Mobility Server™ 17 Disaster Recovery (DR) In this scenario the entire system is duplicated with the only shared component being the Good Control cluster. This means separate AD, SQL Server, Lync, and GEMS clusters. This of course assumes that you already have a separate Lync disaster recovery plan in place, which may mean manually moving impacted Lync users over to a backup Lync site. In the scenario pictured above: l Secondary sites should replicate the primary Good Dynamics database. l The sites should NOT replicate the primary "GoodConnect" database in GEMS. Consequently, In the event of the primary system going down, users will fail over to the backup site, routed through the backup Good proxy servers for login to the backup GEMS servers, thus establishing a new Lync session, presuming impacted Lync users have been successfully migrated over. All existing conversations will still reside on the device. The only data potentially lost will be "offline messages", i.e. messages that are temporarily stored on the server while a user's device is in background during an active conversation. Phased Approach Recommendation Clearly, the key to a successful GEMS disaster recovery event is proper planning. To this end, the following phase approach is recommended: Phase 1 – Ensure and verify that all services are working properly in the primary site before introducing DR. Phase 2 – Independent of GEMS, test and verify that the infrastructure is setup properly in the secondary site. This includes, but not limited to, AD, SQL and Lync. Good Enterprise Mobility Server™ 18 Deployment with Good Dynamics Phase 3 – Add additional GC, GP and GEMS machines in the secondary site as appropriate. Phase 4 – Update configuration to include new GC, GP and GEMS machines. Phase 5 – Test a Failover/Failback. Deployment with Good Dynamics A number of factors bear consideration in appropriately deploying GEMS services with an existing or newly established GD infrastructure. Network Separation Good Control instances in a GC cluster do not need to be reachable by GEMS instances in a GEMS cluster. This may be desirable to an IT administrator since GC instances could be installed with high privilege service accounts to perform Kerberos Constrained Delegation (KCD) and may hold sensitive security tokens. In such cases, GC clusters and GEMS clusters can be deployed in different network zones separated by a trust boundary. Server Instance Configuration in Good Control Device clients are able to access GEMS instances in a GEMS cluster because each individual network endpoint for each instance in the cluster has been configured in a "Server List". This is the list of endpoints provided to a device client identified by its application ID. For example, a device client activated with a deployment of Good Control as configured below would be presented with three network endpoints to use for access to Services in a GEMS cluster. Not shown here is the ability to associate user groups to each network endpoint. This permits assignment of users to a GEMS cluster accessed via the GP cluster in their region, as described earlier. Good Enterprise Mobility Server™ 19 Conclusion These network endpoints configured in the GC do not reflect any physical deployment topology for the actual server instances. IT departments rely on separate infrastructure for routing within the enterprise and across sites. In fact, an IT department may employ VPN, Router, Load Balancer or other infrastructure configuration behind each of these device-facing network endpoints. Note also that network endpoints configured in this way are implicitly whitelisted by the GC. Server-Side Services Service names for each service provided by GEMS are registered on the Good Dynamics Network along with a service definition. An "application" is then created in Good Control and has bound to it one or more Service Definitions. In the example below there is an "application" called "com.g3.good.presence" and it has been bound to one server-side service called, "G3 Presence Service". Note that the application concept here does not represent an app on a device. Rather, it is a construct that can be used to entitle user and group access to the service(s) that are bound to it. Now, when a user who is entitled to this Application ID uses any GD application in their device, the device client is informed of this server-side service, plus all the network endpoints for it (via the "Application" entitlement in the GC), as illustrated above in Server Instance Configuration in Good Control. Conclusion In the most optimistic scenario, practically speaking, a GEMS cluster exposing all GEMS services and has two physical instances for failover—a simple system to manage. However, in large enterprises, IT organizations typically choose to deploy GEMS in a manner consistent with their existing enterprise systems, matching how Microsoft Lync and Exchange are deployed. Good Enterprise Mobility Server™ 20 Conclusion The deployment architecture and HA design principles for GEMS are, in essence, identical to those of Good Dynamics. This consistency becomes increasingly necessary as GEMS seeks to provide the runtime environment for GD Server-Side Services, and ultimately to replace the Application Server runtime environment for Good Control. Good Enterprise Mobility Server™ 21 Appendix A – Upgrading from Good Connect Classic Appendix A – Upgrading from Good Connect Classic Good Enterprise Mobility Server (GEMS) with Connect and Presence (CP) services is built on a different platform than the classic Good Connect server. As a result, there is no direct upgrade path from the classic Good Connect server to GEMS with Connect and Presence. For existing classic Good Connect server environments, please review the guidance that follows when upgrading to GEMS with Connect and Presence. The guidance found here covers two of the most common upgrade scenarios. It is not intended to be a step-bystep upgrade procedure, but rather a general overview of the process as a whole. Knowledge of the classic Good Connect server is required. Where appropriate, cross-references to more detailed instructions are indicated. Upgrade Scenario 1: Parallel Server (Recommended) In this scenario a new server is provisioned for GEMS with Connect and Presence to run in parallel with the existing classic Good Connect Server. The benefit is that no service interruption is required on the existing Good Connect system while GEMS is deployed. The parallel server upgrade environment can be generally depicted as follows: Good Enterprise Mobility Server™ 22 Appendix A – Upgrading from Good Connect Classic Pertinent Considerations in this Scenario Good Dynamics We recommend that you upgrade Good Control to v1.7.38.19 and Good Proxy to v1.7.38.14 in preparation for the installation of GEMS. Service Account The service account used for the classic Good Connect server can also be used for GEMS. Database A new schema (Oracle) or database (MS SQL) will need to be created for use by the new GEMS installation. Microsoft Lync Configuration Your existing classic Good Connect Lync application pool can be reused. However, the new GEMS machine must be added as a Trusted Application computer. If you are planning to use the Presence service as well, an additional Application ID will need to be created. Please see the GEMS Installation and Configuration Guide for details. GEMS Host Machine SSL/TLS Certificate The new GEMS machine will need its own (unique) SSL/TLS certificate. Please see the GEMS Installation and Configuration Guide for additional detail regarding setting up the SSL/TLS certificate. Good Control Configuration The “Good Connect” application configuration in Good Control will need to be updated to include the new GEMSConnect service. Caution: To minimize interruption to production users, Good Connect server affinities should be set up prior to updating the Good Connect application configuration. It is recommended that you set up two polices: one with user affinity to the classic Good Connect server, and another with affinity to GEMS-Connect. When you schedule your users to be switched over to the new server, make sure you ask them sign out of their Connect client prior to the maintenance window. Verification/Testing Verify that clients can connect to the GEMS-Connect service. This can be done by assigning a user to a policy that contains the new GEMS-Connect service. Moving Users After testing is complete, all users can be moved to GEMS by updating the user’s policy set. Specifically, update the server affinity to point to the new GEMS machine. As mentioned above under Good Good Control Configuration, it is also recommended that when you schedule users to be switched over to the new server, you ask them sign out of their Connect client prior to the maintenance window. Good Enterprise Mobility Server™ 23 Appendix A – Upgrading from Good Connect Classic Classic Good Connect Server After all users have been moved to the new GEMS machine, the old classic Good Connect server can be decommissioned or repurposed. Upgrade Scenario 2: Repurpose Existing Server In this scenario the existing classic Good Connect server will be repurposed for GEMS. As pointed out previously, a direct upgrade on the same machine running classic Good Connect is not possible. The existing classic Good Connect server software must be uninstalled before the GEMS software is installed. The benefit of this approach is that a new server is not needed. This mean, however, that service on your production Good Connect server will be interrupted. The existing server upgrade environment can be generally depicted as follows: Pertinent Considerations in this Scenario Good Dynamics We recommend that you upgrade Good Control to v1.7.38.19 and Good Proxy to v1.7.38.14 in preparation for the installation of GEMS. Good Enterprise Mobility Server™ 24 Appendix B – Migrating Your Good Share Database to GEMS-Docs Service Account The service account used for the classic Good Connect server can be used for GEMS. Database You will need to run the DDL/DML database scripts for Oracle or MS SQL to reset the schema or database used by the GEMS product. Microsoft Lync Configuration The existing classic Good Connect Lync application pool and Trusted Application Computer can be reused. Again, if you are planning to use the Presence service, an additional Application ID will need to be created. See the GEMS Installation and Configuration Guide for details. GEMS Host Machine SSL Certifcate If the FQDN of the server did not change, the existing SSL certificate can be reused; however, if you are planning to use the Presence service, the certificate will need to be updated with a SAN to include the Presence service App ID. Consult the relevant section in the GEMS Installation and Configuration Guide for additional instructions. Good Control Configuration If the FQDN of the server did not change, then the “Good Connect” application configuration in Good Control can remain the same. Please ask users to sign out of Good Connect prior to the upgrade since their temporary session information on the server will be lost during the upgrade process. Verification/Testing Verify that both existing and newly provisioned clients can connect to the GEMS-Connect service. Appendix B – Migrating Your Good Share Database to GEMS-Docs A Good Share deployment can migrate/repurpose its database for the GEMS-Docs service to support existing user transition from the Good Share client to Good Work. First, however, GEMS and the Docs Configuration Console must be installed in accordance with the guidance offered in the GEMS Installation and Configuration Guide for Administrators. Client App Support Considerations The following limitations must be considered in determining whether or not a migration is advisable: l Good Share clients communicate with the Good Share server only; they are not supported by the GEMS-Docs service l Good Work Docs communicates with the GEMS-Docs service only; it is not supported by the Good Share server. Good Enterprise Mobility Server™ 25 Appendix B – Migrating Your Good Share Database to GEMS-Docs Given these inherent limitations, it is recommended that you continue to run your deployed Good Share servers in parallel with the GEMS-Docs service for a duration sufficient to conveniently transition your users from Good Share to Good Work. This is possible due to the common database schema shared between the GEMS 1.4 Docs service and Good Share server. Indeed, both can connect to the same database instance. Eventually, once all Good Share users have switched to Good Work, you can decommission your Good Share server deployment. Hence, there a two migration scenarios you will want to consider: (1) Migrating with continued Good Share support (2) Migrating to Good Work only (no Good Share client support) Each is covered in turn here. Migrating with Continued Support for Good Share As more fully discussed in the GEMS Installation and Configuration Guide, the GEMS-Docs service requires installation of the Docs Configuration Console. For purposes of migration/upgrade from Good Share server, the Docs Configuration Console must not be installed on the same machine on which Good Share server and/or the Good Share Web Console are running. You can, however, install both the GEMS-Docs service and the Docs Configuration Console on the same machine, provided it meets all system/hardware requirements. To migrate to GEMS-Docs while continuing to support Good Share clients: 1. Install the GEMS-Docs Service in accordance with the procedure enumerated in the GEMS Installation and Configuration Guide. Note: If you are using Windows Authentication for the database, Good Technology Common Services must run under a user who has access to the Good Share database. 2. Install the Docs Configuration Console on a different machine than the one running either Good Share server or the Good Share Web Console. When prompted by the installer, enter information for the database currently being used by Good Share. 3. Launch the GEMS Dashboard, click on Docs, then click on Database and select the database being used by Good Share. Upon completion of Step 3, both the GEMS-Docs service and Good Share server should now be functional and sharing the same data. This means that policies, users, and data sources previously configured for Good Share should all be available in GEMS-Docs. Logged audit data continues to be availabled and reports can be generated from either the Good Share Web Console or the Docs Configuration Console. 4. When all Good Share users have switched to Good Work and Good Share clients are no longer being used, you can safely uninstall Good Share server and the Good Share Web Console. Good Enterprise Mobility Server™ 26 Appendix B – Migrating Your Good Share Database to GEMS-Docs Migrating to Good Work Only If there is no need to support both Good Work and Good Share at the same time (i.e., concurrently), then the machine(s) used for Good Share can be repurposed in accordance with the following steps: 1. Install the GEMS-Docs Service in accordance with the procedure enumerated in the GEMS Installation and Configuration Guide. Again, if you are using Windows Authentication for the database, Good Technology Common Services must run under a user who has access to the Good Share database. 2. Uninstall Good Share server and the Good Share Web Console but do not remove the database. 3. Install the Docs Configuration Console. When prompted by the installer, enter information for the database previously used by Good Share. 4. Launch the GEMS Dashboard, click Docs, then click Database, and here also select the database previously used by Good Share. Upon completion of Step 4, all previously configured policies, users, data sources and settings are now available to the GEMS-Docs service and configurable in the Docs Configuration Console. Noteworthy Feature Differences (GEMS-Docs versus Good Share) The following feature changes will be noticed when comparing GEMS-Docs to Good Share server: l Open-in application list is now managed in the Good Control application policy for Good Work. Any Open-in lists created in Good Share must now be added in Good Control. l Keep in-sync feature is not supported l Permissions in data sources not supported: o Allow Native email o Print o Open in l KCD is not supported in GEMS-Docs 1.4 l Security settings no longer supported: o Allow playing of media files – iOS only (stored outside of the secure container during playback) o Enable device to remember user password o Display event information for calendar alerts o Force user to save Pending Uploads Good Enterprise Mobility Server™ 27 Appendix C – Hardware Used for Testing GEMS Appendix C – Hardware Used for Testing GEMS The following computer hardware was used for PSR validation. Component Processor Memory OS EWS Push (Mail) Notification AMD Opteron 6234 2.4 GHz – 4vCPU 16 GB Microsoft Windows Server 2008 R2 Enterprise 64 bit Connect AMD Opteron 6234 2.4 GHz – 4vCPU 16 GB Microsoft Windows Server 2008 R2 Enterprise 64 bit Presence AMD Opteron 6234 2.4 GHz – 4vCPU 16 GB Microsoft Windows Server 2008 R2 Enterprise 64 bit Connect , Presence, and EWS Push (Mail) configured on the same machine AMD Opteron 6378 2.39 GHz – 4 cores Virt 16 GB Microsoft Windows Server 2008 R2 Enterprise 64 bit SQL Server for GEMS AMD Opteron 6234 2.4 GHz – 4vCPU 8 GB Microsoft Windows Server 2008 R2 Enterprise 64 bit / MS SQL Server 2008 R2 Note: This hardware profile was used for all GEMS PSR testing. All service configurations were tested running SQL Server on a separate machine. Good Enterprise Mobility Server™ 28 Legal Notice This document, as well as all accompanying documents for this product, is published by Good Technology Corporation (“Good”). Good may have patents or pending patent applications, trademarks, copyrights, and other intellectual property rights covering the subject matter in these documents. The furnishing of this, or any other document, does not in any way imply any license to these or other intellectual properties, except as expressly provided in written license agreements with Good. This document is for the use of licensed or authorized users only. No part of this document may be used, sold, reproduced, stored in a database or retrieval system or transmitted in any form or by any means, electronic or physical, for any purpose, other than the purchaser’s authorized use without the express written permission of Good. Any unauthorized copying, distribution or disclosure of information is a violation of copyright laws. While every effort has been made to ensure technical accuracy, information in this document is subject to change without notice and does not represent a commitment on the part of Good. The software described in this document is furnished under a license agreement or nondisclosure agreement. The software may be used or copied only in accordance with the terms of those written agreements. The documentation provided is subject to change at Good’s sole discretion without notice. It is your responsibility to utilize the most current documentation available. Good assumes no duty to update you, and therefore Good recommends that you check frequently for new versions. This documentation is provided “as is” and Good assumes no liability for the accuracy or completeness of the content. The content of this document may contain information regarding Good’s future plans, including roadmaps and feature sets not yet available. It is stressed that this information is non-binding and Good creates no contractual obligation to deliver the features and functionality described herein, and expressly disclaims all theories of contract, detrimental reliance and/or promissory estoppel or similar theories. Legal Information © Copyright 2015. All rights reserved. All use is subject to license terms posted at www.good.com/legal. GOOD, GOOD TECHNOLOGY, the GOOD logo, GOOD FOR ENTERPRISE, GOOD FOR GOVERNMENT, GOOD FOR YOU, GOOD APPCENTRAL, GOOD DYNAMICS, SECURED BY GOOD, GOOD MOBILE MANAGER, GOOD CONNECT, GOOD SHARE, GOOD TRUST, GOOD VAULT, and GOOD DYNAMICS APPKINETICS are trademarks of Good Technology Corporation and its related entities. All third-party technology products are protected by issued and pending U.S. and foreign patents. Good Enterprise Mobility Server™ 29