Beginners notes
Transcription
Beginners notes
Shivkanth Rohith www.aryanrohith.cjb.net Beginners notes Foundation in ITIL Service Management This document is a “Bonus” for Six Sigma Management Kit owners The Office of Government Commerce in the United Kingdom owns the ITIL Content. However, the use of ITIL processes is permitted and encouraged. More information on ITIL can be found at www.itil.co.uk The Art of Service (owners of this document) have a licensing arrangement in place with Ovitz Taylor Gates, that permits the supply of this document with the Six Sigma Management Kit. The full ITIL toolkit can be purchased by visiting: www.itil-toolkit.com e-Learning courses: www.itsm-learning.com The Art of Service support the purchase of further material supplied by The Stationary Office (a commercial outlet for Her Majesty’s Stationary Office (HMSO). Such material can be purchased by visiting: www.itsmdirect.com. This document has been modified slightly. ©The Art of Service Pty Ltd 2001, 2002. Shivkanth Rohith 1 1 2 3 4 5 6 7 8 9 www.aryanrohith.cjb.net Table of Contents Table of Contents .....................................................................................................2 Start Here. ..............................................................................................................4 Foundation Certificate in IT Service Management .........................................................4 3.1 EXIN Exam requirements specifications .................................................................4 3.1.1 The importance of IT Service Management and the IT Infrastructure ...................4 3.1.2 The Service Management processes and the interfaces between them .................4 IT Service Management.............................................................................................5 4.1 Introduction into IT Service Management...............................................................5 4.1.1 ITIL Service Management ...............................................................................6 4.1.2 Business Alignment .......................................................................................6 4.1.3 Processes .....................................................................................................6 4.1.4 Function version processed based ...................................................................7 ITIL ...................................................................................................................... 10 5.1.1 History....................................................................................................... 10 5.1.2 Groups involved .......................................................................................... 11 Implementing ITIL Service Management ................................................................... 19 6.1 Introduction ..................................................................................................... 19 6.2 Cultural change ................................................................................................ 19 6.3 Some of the do’s and don’ts ............................................................................... 20 6.4 Further reading ................................................................................................ 20 The ITIL Service Management Processes ................................................................... 21 7.1 Service Delivery Set .......................................................................................... 21 7.1.1 Service Level Management ........................................................................... 21 7.1.2 Financial Management for IT Services ............................................................ 23 7.1.3 Availability Management .............................................................................. 25 7.1.4 Capacity Management.................................................................................. 28 7.1.5 IT Service Continuity Management ................................................................ 29 7.2 Service Support Set .......................................................................................... 35 7.2.1 Service Desk .............................................................................................. 35 7.2.2 Incident Management .................................................................................. 37 7.2.3 Problem Management .................................................................................. 38 7.2.4 Change Management ................................................................................... 43 7.2.5 Release Management ................................................................................... 48 7.2.6 Configuration Management ........................................................................... 49 Tools .................................................................................................................... 53 8.1.1 Type of tools............................................................................................... 53 8.1.2 The Cost of a Tool ....................................................................................... 53 Security Management ............................................................................................. 55 9.1 Introduction ..................................................................................................... 55 9.1.1 Basic concepts ............................................................................................ 55 9.2 Objectives........................................................................................................ 55 9.2.1 Benefits ..................................................................................................... 56 9.3 Process............................................................................................................ 56 9.3.1 Relationships with other processes ................................................................ 57 9.3.2 Security section of the Service Level Agreement ............................................. 60 9.3.3 The security section of the Operational Level Agreement .................................. 61 9.4 Activities.......................................................................................................... 64 9.4.1 Control - Information Security policy and organisation ..................................... 64 9.4.2 Plan........................................................................................................... 64 9.4.3 Implement ................................................................................................. 65 9.4.4 Evaluate..................................................................................................... 65 9.4.5 Maintenance ............................................................................................... 66 9.4.6 Reporting ................................................................................................... 66 Shivkanth Rohith www.aryanrohith.cjb.net 9.5 Process control ................................................................................................. 67 9.5.1 Critical success factors and performance indicators.......................................... 67 9.5.2 Functions and roles ..................................................................................... 67 9.6 Problems and costs ........................................................................................... 67 9.6.1 Problems.................................................................................................... 67 9.6.2 Costs ......................................................................................................... 68 © The Art of Service Pty Ltd 2002 ‘All of the information in this document is subject to copyright. No part of this document may in any form or by any means (whether electronic or mechanical or otherwise) be copied, reproduced, stored in a retrieval system, transmitted or provided to any other person without the prior written permission of The Art of Service Pty Ltd, who owns the copyright.’ 3 Shivkanth Rohith 2 www.aryanrohith.cjb.net Start Here. This document is designed to answer many of the questions about IT Service Management and the ITIL Framework. The document has evolved over many years and offers the reader the chance to quickly learn through reading and re-reading a lot of the theory behind ITIL (IT Infrastructure Library). It provides answers, but it will also raise some questions for the reader. It is a beginner’s document. It tells stories. 3 Foundation Certificate in IT Service Management Let’s begin with the end in mind. A lot of readers are interested in achieving certification in ITIL. This document combined with online learning will prepare you to sit for the “IT Service Management Foundations” exam. This chapter discusses the examination that is set by EXIN. The exam is set and marked by this independent body. You can book and take your exam at any Prometric Testing centre (www.prometric.com to locate a Test Centre in your area). 3.1 EXIN Exam requirements specifications 3.1.1 The importance of IT Service Management and the IT Infrastructure The candidate has understanding of the importance of IT Service Management and the IT Infrastructure Library (ITIL). The candidate is able to indicate the importance of a methodical and systematic approach to information technology service: • for users and customers of IT Service • for suppliers of IT Service. 3.1.2 The Service Management processes and the interfaces between them The candidate has understanding of the Service Management processes and the interfaces between them. The candidate is able to: • Mention the benefits of the Service Management processes for an organisation • Distinguish between ITIL processes and organisational units • Indicate which elements are needed for the ITIL processes. 4 Shivkanth Rohith 4 www.aryanrohith.cjb.net IT Service Management 4.1 Introduction into IT Service Management Most organisations now understand the benefits of having Information Technology (IT) throughout their corporate structure. Few realise the potential of truly aligning the IT department’s objectives with the business objectives. More and more organisations start to recognize IT as being crucial to the service delivery to their customers. When the IT services are crucial to the organisation, you need to be absolutely positive that the IT group adds value and delivers consistent services. With this in mind as the ultimate goal for the IT organisation, we should look at the organisation’s objectives. To achieve these overarching, organisational objectives, the organisation has business processes in place. These business processes can be anything: sales, admin support, financial processes, etc. Information systems and technology are fundamental requirements to providing capability for the organisation to achieve these business objectives by enabling the activities to be carried out in an effective an efficient manner. Historically, these processes delivered products and services to clients in an off-line environment (the ‘brick-and-mortar’ companies). The IT organisation provides support to the back-office and admin processes. IT performance is measured internally as the external clients are only indirectly influenced by the IT performance. Today, with online service delivery, the IT component of the service delivery can be much stronger. The way of delivering the service is IT based and therefore internal and external clients measure the performance of the IT group. Consistent service delivery is more important than the glimpse of brilliance every now and then. The internal clients (business processes) and external clients need availability of the IT services and to be able to expect a consistent performance. IT Service Management is a means to enable the IT group to provide reliable Information Systems to meet the requirements of the business processes, irrespective of the way 5 Shivkanth Rohith www.aryanrohith.cjb.net these services are delivered to the external customers. This in turn enables the organisation to meet its Business Objectives. Definition: IT Service Management provides effective and efficient process driven management of the quality of IT services 4.1.1 ITIL Service Management Any organisation that delivers IT services to their customers with a goal to support the business processes, needs some sort of structure to achieve that. Historically, that structure was based around functions and technical capabilities. Currently, with the everincreasing speed of changes, and the need for flexibility that is no longer an option. That is why IT organisations are looking for alternatives: TQM processes and continuous improvement projects Cobit as a control mechanism CMM for control and structure in software (and system) development ITIL for operational and tactical management of service delivery Which framework, model or tool you use is heavily reliant on the company: ‘horses for causes’ is the adagio you need to keep in mind. For many IT organisations, ITIL is a very good way of managing service delivery and to perform the IT activities in end-to-end processes. Further reading on other models and frameworks: Cobit: http://www.isaca.org/cobit.htm CMM: http://www.sei.cmu.edu/cmm/cmm.html EFQM: http://www.efqm.org/new_website/ SixSigma: http://www.ge.com/sixsigma/ Deming: http://www.deming.org The Business Balanced ScoreCard: British Standards Institution: http://www.bsi.org.uk The Balanced scorecard: http://www.balancedscorecard.org/basics/bsc1.html 4.1.2 Business Alignment By implementing IT Service Management in your IT organisation you support the IT objectives of delivering those services that are required by the business. You can’t do this without aligning your strategy with the business strategy. You can’t deliver effective IT services without knowing about the demands, needs and wishes of your customer. This is why IT Service Management supports the IT organisation in the business alignment of their IT activities and service delivery. 4.1.3 Processes IT service Management helps the IT organisation to manage the service delivery by organising the IT activities into end-to-end processes. These processes cross the functional areas within the IT group and improve the efficiency. A process is a series of activities carried out to convert an input into an output. We can associate the input and output of each of the processes with quality characteristics and standards to provide information about the results to be obtained by the process. 6 Shivkanth Rohith www.aryanrohith.cjb.net This produces chains of processes which show what goes into the organisation and what the result is, as well as monitoring points in the chains to monitor the quality of the products and services provided by the organisation. Processes can be measured for effectiveness (did the process achieve its goal?) and efficiency (did the process use the optimum amount of resources to achieve its goal). The measurement points are at the input, the activities or the output side of the process. The standards for the output of each process have to be defined such that the complete chain of processes meets the corporate objective, if each process complies with its process standard. If the result of a process meets the defined standard, then the process is effective. If the activities in the process are also carried out with the minimum required effort and cost, then the process is efficient. The aim of process management is to use planning and control to ensure that processes are effective and efficient. 4.1.4 Function version processed based Most businesses are hierarchically organised. They have departments, which are responsible for a group of employees. There are various ways of structuring departments, for example by customer, product, region or discipline. IT services generally depend on several departments, customers or disciplines. For example, if there is an IT service to provide users with access to an accounting program on a central computer, this will involve several disciplines. The computer centre has to make the program and database accessible, the data and telecommunications department has to make the computer centre accessible, and the PC support department has to provide users with an interface to access the application. Processes that span several departments can monitor the quality of a service by monitoring certain aspects of quality, such as availability, capacity, cost and stability. A service organisation will then try to match these quality aspects with the customer’s demands. The structure of such processes can ensure that good data is available about the provision of services, so that the planning and control of services can be improved. A process is a logically related series of activities for the benefit of a defined objective. 7 Shivkanth Rohith www.aryanrohith.cjb.net We can study each process separately to optimise its quality. The process manager is responsible for the process results (i.e. is the process effective). The logical combination of activities results in clear transfer points where the quality of processes can be monitored. In the restaurant example, we can separate responsibility for purchasing and cooking, so that the chefs do not have to purchase anything and possibly spend too much on fresh ingredients that do not add value. The management of the organisation can provide control on the basis of the quality of the process as demonstrated by data from the results of each process. In most cases, the relevant performance indicators and standards will already be agreed upon. The day-to-day control of the process can then be left to the process manager. The process owner will assess the results based on a report of performance indicators and whether they meet the agreed standard. Without clear indicators, it would be difficult for a process owner to determine whether the process is under control, and if planned improvements are being implemented. Functional oriented organisations often find it difficult to respond to rapidly changing markets, Functional structured organisation Characteristics are: • • • • • • • Fragmented Aimed at vertical and functional matters Many control activities Emphasizes high/low relationships Encourages relation dependence Exchange loyalty/security Information is ‘secret’ Disadvantages are: • • • Walls; no further than here Customer focused? “We (the IT department) know what’s good for you.” Steering people instead of steering activities 8 Shivkanth Rohith • • • www.aryanrohith.cjb.net Communication? (Because we have to; we are after all one company) Political decision making (“the political arenas”) Who is IT? Who is responsible for IT? Processed based organisation Characteristics: • Entire tasks • Aimed at horizontal processes towards client • Control, as this adds value • Emphasises interdependence and uniting leadership • Emphasises interdependence of independent persons • “Security” is sought in individually added value • Information is accessible Advantages are: • No boundaries, but interconnections • Customer focused: what is the added value? • Steering activities in stead of steering people • Communication because it is useful (fulfilling the needs of the customer) • Decision making is matching & customising • IT service provision is a process 9 Shivkanth Rohith 5 www.aryanrohith.cjb.net ITIL The IT Infrastructure Library is a set of books with good practice processes on how to manage IT service delivery. The library consists of the following books and CD-ROMs: • Service Delivery • Service Support • Security Management • The Business Perspective • Applications Management • ICT Infrastructure Management • Planning to implement Service Management The Service Support, Service Delivery and Security Management books are regarded to be the core of the framework. These books cover the processes you will need to delivery customer-focused IT services according to your customers’ needs, demands and wishes. It helps the IT group to be flexible and reliable enough to ensure a consistent IT Service Delivery. The other books in the library support the core processes. T h e B u s i n e s s Planning to Implement Service Management T h The Business Perspective Service Support ICT Infrastructure Mgt. Security Management Service Delivery Applications Management 5.1.1 History During the late 1980’s the CCTA (Central Computer and Telecommunication Agency) in the UK started to work on what is now known as the Information Technology Infrastructure Library (ITIL). Large companies and government agencies in Europe adopted the framework very quickly in the early 1990’s and the ITIL framework has since become known as an industry best practice. ITIL has become the standard in delivering IT Services for all types of organisations. Both government and non-government organisations benefit from the process driven approach, regardless of the size of the IT shop. ITIL is used globally; the majority of IT organisations in the following countries use it as their way of delivering IT services: • UK • The Netherlands • Germany • France 10 Shivkanth Rohith • • • www.aryanrohith.cjb.net USA South Africa Australia Further reading on ITIL: ITIL website: http://www.itil.co.uk OGC website: http://www.ogc.gov.uk Buy the ITIL books: www.itsmdirect.com Examination boards: EXIN: http://www.exin-exams.com ISEB: http://www.bcs.org.uk/iseb/ ITIL Portal: http://www.itil-itsm-world.com/ In 2000 the British Treasury set up the OGC – Office for Government Commerce – to deal with all commercial activities within the government. This also includes all activities formerly done by CCTA (Central Computer and Telecommunications Agency). Even though the CCTA no longer exists, we still mention it in this syllabus because they were the original developers of the ITIL framework. In 2000, Microsoft used ITIL as the basis of their Microsoft Operations Framework (MOF) to support the launch of their ‘Datacentre’ product. In 2001, ITIL version 2 was released with the Service Support Book and the Service Delivery book. The other books (and CD-ROMs) are currently being published. 5.1.2 Groups involved 11 Shivkanth Rohith www.aryanrohith.cjb.net ITIL is a Public Domain framework, meaning that even though the copyright rests with OGC, every organisation can use the books to implement the processes in their own organisation. This also supported the growth in the number of supporting services like training, tools and consultancy services. The important part is that the framework is independent of any of the vendors. EXIN and ISEB are the examination bodies that organise and control the entire certification scheme. They guarantee that the personal certification is fair and honest and independent from the organisations that delivered the course. EXIN is based in the Netherlands and ISEB is part of the British Computer Society. Both bodies give out accreditations for training organisations to guarantee a consistent level of quality in the course delivery. The personal certification is the only type of independent certification in regards to ITIL Service Management. There is no independent tool certification or organisational certification (yet). People and organisations that wish to discuss their experience with ITIL Service Management implementation can become a member of the IT Service Management Forum. The ITSMf should be independent, just like ISEB and EXIN, to stimulate the best practice component of ITIL and to support the sharing of ‘war stories’ and tips. There is an ITSMf chapter in every country that is actively involved with ITIL Service Management. EXTRA READING (elective) Case study: Service Management implementation: British Telecom The Emergence of BT. British Telecom (BT) is an international private sector company operating in the field of telecommunications. From 1912 telecommunications was as part of the Post Office, held in public ownership. It was originally nationalised to ensure the provision of an integrated telegraphic and telephonic service . British Telecom was split off from the Post Office in 1981 as a prelude to its own privatisation three years later. The aim was to make it easier for the management of the two organisations to focus on the business strategies of their respective operations. Since 1981 BT has undergone major changes first with privatisation in 1984 and then because of Project Sovereign in the early 1990’s. What follows concentrates on the build up to and changes associated with Project Sovereign from the late 1980’s. It is arguable however that this represents some continuation of the earlier corporate restructuring that surrounded privatisation. The climate for these changes continues to be shaped by several significant factors including: the development of new technology which has changed the nature of telecommunications work; the opening up of the market for telecommunications to competition and the requirement for BT to be able to exploit new international markets for information technology. BT no longer enjoys the monopoly it once had. At home, competition from Mercury, the cable industry, and an increasing number of niche telephone operators is taking its toll. For example, it is estimated that 40,000 customers v a month are being lost to the cable companies who offer cheaper calls, connections and rentals, as well as clearer lines and the advantages of new technology. Cable firms claim to have won 470,000 customers in 12 Shivkanth Rohith www.aryanrohith.cjb.net the three years since they were permitted to offer telephone services. Internationally BT's rivals, such as AT&T and France Telecom, are battling for the custom of the multinationals that want one supplier to service all their telecommunication needs. As well as new competitors such as Mercury and the cable companies who are attacking BT on price, the regulatory regime is also becoming harsher. OFTEL have recently stated that prices on BT's basic services must now be kept to 7.5% below the rate of inflation. Although many of the same pressures affect BT's rivals, BT argues that it suffers most because it maintains a network that runs the length and breadth of the UK. Symptoms Project Sovereign and change of culture in BT. BT have launched several initiatives to transform the company however the most significant was the Project Sovereign which involved both adopting total quality management values and encouraging employees to focus on customers needs. In March 1990, the Chairman of BT announced fundamental changes to the organisational structure of BT. These emerged from the findings of the Scoop Project undertaken by a team of BT Senior Managers into how the company should tackle the telecommunications business of the 1990’s. This suggested that BT’s existing structure would prevent the company from achieving its full potential. Based largely on geography rather than customers or markets, it lacked the flexibility and integration necessary to meet changing market conditions. The plan for change was called Project Sovereign because according to the Chairman it was: “The single most important thing that we are ever likely to do because it puts the customer at the top of our organisation". Over the following 12 months the old structure of BT based on geographical districts and product divisions gave way to 3 customer facing divisions: Business Communications Division to serve the needs of business customers; Personal Communications Division to meet the requirements of the individual customer; and a Special Business Division to manage a range of facilities such as Yellow Pages and Cellnet. These new ‘customer facing’ Divisions were to be supported by a division with responsibility for the co-ordination of BT’s portfolio of products and services; a Worldwide Networks Division to plan, build, operate and maintain the switching and transmission network in the UK and globally; a Development and Procurement Division to provide a broad spectrum of technical services including research, development and design, fabrication, testing, problem-solving, planning and consultancy. In addition, this division was given responsibility for developing and managing a group wide supplier strategy and procurement service. Finally functions such as strategy, finance, personnel running across the business were to be handled by Group HQ. 13 Shivkanth Rohith www.aryanrohith.cjb.net Figure 1 summarises the organisation structure introduced under Project Sovereign. Customers/Markets Business Communications. Personal Communications Special Businesses. Products and Services Management World Wide Networks. Development and Procurement. Group HQ (Strategy, Finance, Personnel, etc.) Figure 1. Organisational structure on 1 April 1991. In a briefing session on the changes, given by BT Directors to top BT managers, the factors, which shaped the new structure and its essential features, emerged. The new BT required radical change and a more flexible approach to organisation and management. The Group Finance Director explained that: “BT is seeking a fundamental change in its approach to the markets it serves - it is a massive change and not tinkering at the edges. Moreover, it is ultimately linked to a change in our cost base - a leaner organisation and flatter management structure. The most successful telecommunications company in the world has to be flexible around a low cost-base.” In a presentation entitled “Management and Culture” BT’s UK Managing Director argued that surveys in the company showed that many employees wanted radical change and the opportunity to contribute more to the company. Managerial leadership he suggested would need to release these pent-up energies. The key will be providing the managerial leadership that will release and make use of these pent-up energies. However, he noted a substantial reduction in the numbers of managers will be inevitable in order to “flatten” the company’s multi-layered pyramidal organisation and quicken response between management and workforce. Project Sovereign has reduced the total number of layers of management in BT from twelve to six. One of the first changes to take effect with Sovereign was the integration of the UK and international marketing and sales organisation under a unified management structure. This was intended as a very visible indicator of the customer orientation of Project Sovereign. This was stressed by the Sales and Marketing Director: “This change allows our people to direct their energies towards meeting customer needs ¼ and nowhere is this more true than with the marketing and sales community that we will be bringing together into one structure. This group of people will be at the forefront of the organisational change, demonstrating our determination and commitment to put the customer first.” In summary, Project Sovereign was to lead to: · A flexible market-driven structure for business and personal customers, with the necessary technical and commercial functions to meet their needs. · A single interface for all BT’s customers backed up by systems and specialised support. · An increased sales revenue through protection of BT’s largest business customers, capturing the substantial opportunities available in the smaller business market and from the information-intensive personal customer. 14 Shivkanth Rohith www.aryanrohith.cjb.net · An integrated approach for business customers, to meet their rapidly expanding international needs. · Consistency for both business and personal customers, with the target to deliver uniform excellence across the customer base. · Integrated product management, removing duplication, eliminating gaps and managing potential conflicts to the benefit of customers and the company. Cultural change and the impact on employment in BT. BT, in common with many other telecommunications companies, has been faced with having to make the transition from a vast state-owned, public-service, bureaucracy to a flexible and responsive, private sector, high-technology company. The challenge of transforming an organisation that was rumoured to have more vehicles than the Red Army is huge. Not only do work practices have to be changed, but the culture also has to be shifted so that the staff are focused on providing services demanded by customers. In practice, this “cultural shift” has involved in a huge reduction in staff numbers. Many employees who did not like the new ethos, or found it difficult to adapt, simply left in one or other of the company's generous redundancy programmes. As a consequence of the increased competitive pressures outlined above costs have had to be reduced and the company has to move faster and become more flexible. BT, like many other companies are finding that speed is now a crucial competitive weapon and that several layers of management can slow down the decision making process. Another source of pressure on management jobs stems directly from the job losses lower down the hierarchy. Technology has had an important role here, as the company no longer needs as many operators, and consequently, it no longer needs as many managers to supervise them. Similarly, as junior staff have left, the company has found itself with too many managers at higher levels. BT currently employs 32,000 managers to organise its workforce of 153,000. The ratio of managers to staff has fallen to one to five although BT plans to lower that to one manager to ten workers. When the company was privatised in 1984 it employed 244,000 people and reached a peak of 245,700 employees in 1990. In 1994 it employed 185,000 people. Between 1992 and 1994 almost 28,000 engineers and telephone operators, more than 5,000 middle managers and 876 senior managers accepted voluntary redundancy. The Voluntary redundancy schemes, Release 92 and 93, proved to be run away successes. For example, BT indicated that it wanted to shed 20,500 jobs under Release 92. John Steele, BT's personnel director, later commented in a newspaper article: “We were told we would never get rid of 20,000 jobs in a year ¼ We were told it's impossible, it's never been done before ¼ In fact, as many as 45,000 applied to go with 85 per cent of the formal complaints, registered after the redundancy programme, coming from those who were not selected to leave.” Release 92 in particular proved to be almost too successful. Incentive payments to leave before the end of July, worth 25 per cent of salary, caused 19,500 staff to leave on the same day almost causing the company's administration to collapse under the weight of work and causing severe disruption to customer services. More than 16,000 people were rejected for severance and courses to maintain morale were instituted in those areas, mainly the operator service and telephone customer contact jobs, that were particularly oversubscribed. 15 Shivkanth Rohith www.aryanrohith.cjb.net Redundancy payments range from 40 weeks' pay for 10 years of service to a maximum of 72 weeks' pay for 14 years. Staff who were with the company before 1971 get three years' pay. An engineer with long service could qualify for a payment of £60,000 while a manager on £35,000 could receive more than £100,000. Most schemes involved a combination of lump sum and early pension entitlement. The average package of benefits for those leaving cost BT is about £38,000, although a small number of senior managers are believed to have received more than £200,000. The scheme cost an estimated £1.15bn. Although BT has not revealed a final price of the scheme it did explain that even with more than 40 per cent more people being accepted for severance the total bill did not rise in proportion as ‘the unit cost was lower than BT expected’ . Publicly BT has always said that it tried to make the package of benefits attractive to all ages. However, in 1993, unions representing workers at BT claimed that the company had introduced a grading system that could be used to target older staff. The system was based upon an individual's annual performance assessment, their attendance records and whether or not disciplinary action has been taken against them. The proposal that age should form part of the points system was dropped by BT after complaints from the National Communications Union and the Society of Telecom Executives. At that time 15,300 people under 45 had taken severance, of the older workers, 4,000 were aged 45-50 and 10,000 were 50-60. The company's standard retirement age is 60. In January 1994 the chairman, Sir Iain Vallance was able to state : “It is now almost exactly 10 years since BT was privatised, the natural turnover of staff means that many staff have never worked in the public sector. We now have the younger more responsive workforce that this industry needs.” In 1992 and 1993 the majority of the cuts were voluntary and had fallen on lower and middle management, operators, clerical staff and engineers. However in March 1994 BT announced that 6,900 senior managers out of BT's 32,000 management-grade employees will be targeted for ‘significant compulsory reductions’ as not enough were volunteering to leave under its job reduction programme. It also announced that it was considering compulsory redundancies for its most senior managers as part of a drive to lose 35 of its 170 managers at director level. All managers at this level earn over £50,000 a year with the average wage being £76,000 a year. Although the logic of de-layering and downsizing may be clear, it seems that managers that are more senior are reluctant to go and, until recently, BT has been loath to sack them. The reluctance to accept voluntary redundancy may be because senior managers have more agreeable, and better-paid jobs or it may be because they are more worried about finding another job elsewhere - although figures from the International Labour Office xv show that only 3% of professionals and only 5% of managers in the UK are unemployed in comparison to 13% of plant and machine operators and 13% of craft workers. The Customer Service System (CSS). In the late 1970's, BT (then part of the GPO) had six mainframe sites and very little local computing. When BT split from the Post Office in the early 80's BT began to set up local computing centres. Each area had set up its own databases and soon similar information 16 Shivkanth Rohith www.aryanrohith.cjb.net was being held in two or even three different places. The cost of attempting to keep all of this information consistent and up to date was spiralling out of control. The decision was taken to centralise all of the information at a district level and telephone areas began to be amalgamated into districts. This process marked the beginnings of CSS. Essentially CSS was conceived as a “Customer Service System”, i.e. the aim was to provide a single source for all of the information relating to a single customer. An operator could access all relevant information for that customer by telephone number, address or name and could call up the customer's details on to a screen almost instantaneously. After privatisation BT looked at several systems from a number of vendors and began setting up CSS's in 1985. People from the districts were seconded to the headquarters and a National User Group (NUG) formed to aid the setting up of the various CSS's. At a national level Setting up CSS involved creating 29 databases for the 29 separate districts that existed at that time with each database containing information about 12 specific areas such as customer records, line records, billing information, etc. Each CSS needed to hold the details of over 1 million customers and service around 2000 terminals. The installation of CSS into the various districts was spread over several years. The CSS system was initially created as a system to support operations within individual districts and although CSS was almost universally acknowledged as a great help in providing customer support many felt that the management information aspect of the system appeared to have been added almost as an afterthought. While CSS was an efficient in certain respects, such as tracking installations or repair work, it did not really provide the information needed by managers or, if it did, it was in a form that was of little use to them. The announcement of Project Sovereign in 1990 and the move to three new ‘customer facing’ divisions: - Personal Customers (PC), Business Customers (BC) Special Businesses (SB) –provided a new impetus to the development of CSS although this was not without problems. Previously every CSS had been set up on the basis that all customers were treated in more or less the same way. There was no distinction made between, for example, business customers or personal customers. The amount of time and effort that had gone into setting up the existing CSS, and the technical problems of splitting the 29 existing databases into three – one for BC, one for PC and one for SB - meant that another way had to be found to make the existing Information Systems work within the new company structure. The solution was to keep the same databases but give more access to people who required it. Thus the decision to create three new divisions meant that in order for CSS to continue to function managers, even at relatively junior levels, had to have the ability to ‘have visibility of other work areas’. Previously there had been a bureaucratic system of passwords and access codes that limited access to the various areas of a CSS. However, once the principle of greater visibility was established improvements in technology opened other areas for organisational change. It now became possible to switch automatically information between districts so that a person based in one district could update or amend records 17 Shivkanth Rohith www.aryanrohith.cjb.net held in another district. Similarly it also became possible to monitor work loads and allocate resources across the functional and geographic divisions in BT. Although all of this was now technically possible, some organisational problems remained as, in the past BT had relied on local expertise and each region had done things in a slightly different way. CSS provided an infrastructure that was relatively tightly controlled in terms of what it allowed a manager to do. However, in order to bring about some of the proposed new changes it would, in some senses, need to be even more tightly controlled as every region would now have to operate in the same way. The need to ensure consistency between regions lead to some dissatisfaction with the speed with which the system could be changed or modified. In the past when the system needed to be changed or updated this could often be accommodated at a local level, now however, each change or update needed to be worked out and agreed across the whole of the national network. 18 Shivkanth Rohith 6 www.aryanrohith.cjb.net Implementing ITIL Service Management 6.1 Introduction ITIL Service Management is something that impacts the entire IT organisation. Implementing end-to-end processes can have a big impact on the way things are done and can initiate a lot of uncertainty and resistance with staff. For these reasons, it is important to implement ITIL Service Management with a step-by-step approach that takes things slowly but steady. Developing ITIL processes is a fairly easy job to do… making sure everybody understands the processes and uses them is more difficult and requires serious planning. It is to be advised to use a project management approach to ITIL Service Management implementation and stay focused on the end result. 6.2 Cultural change 10% of the implementation project will be about process design and the more instrumental things in organisational change; 90% will be about cultural change and personal motivation of staff to use the end-to-end processes as the better way to do business. People (YOU!) will feel vulnerable and out of control, the perfect breeding ground for resistance… know that it is coming and work with it. The most important thing in this stage of the ITIL implementation is to keep the focus on the reason why your organisation needs ITIL Service Management in the first place. 19 Shivkanth Rohith www.aryanrohith.cjb.net 6.3 Some of the do’s and don’ts DO: • • • • • • • • • • Perform a feasibility study first Use what is already good in the organisation Take it slowly Stay focused Appoint a strong project manager with end-to-end focus to drive this implementation program Be brave enough to ask for help when you need it Keep in mind that you are dealing with personal issues Keep communicating WHY your organisation needs this Measure your successes continuously Enjoy the milestones and share them with the IT group DON’T: • Try to mature all the processes at the same time • Start with a tool • Start without management commitment and/or budget • Force ITIL upon people • ‘ITILISE’ your organisation, keep thinking… • Rush, take your time to do it well • ‘Do ITIL’ without a reason • Blindly follow the herd • Pretend you are a Greenfield site 6.4 Further reading The OGC book: ‘Best Practice for Planning to Implement Service Management’. 20 Shivkanth Rohith 7 www.aryanrohith.cjb.net The ITIL Service Management Processes The following diagram represents the most well known components of ITIL. Service Delivery and Service Support. These processes are the discussed futher. 7.1 Service Delivery Set 7.1.1 Service Level Management This process provides the contact point (or hinge) between the IT organisation and the customer. Within the ITIL books, ‘the customer’ is defined as being the person who pais for the services. It should therefore be someone with decision-making authority, e.g. business manager. Service Level Management is the process that ensures that the IT organisation knows what services they can deliver and organises that the IT group and the customer agree on the levels of service that need to be delivered. It also ensures that the IT group can consistently deliver these services to the customer by ongoing monitoring the service achievements and report these to the customer 21 Shivkanth Rohith www.aryanrohith.cjb.net Extra reading To report or not to report A lot of the organisations that start implementing Service Level management fall into the trap of over-reporting. Everything is monitored, and all results are reported back to the client. Negotiate the reporting strategy with your customer during the SLA-negotiations. A report is only valuable if your clients use it for their own work. Another pitfall is the fact that some people only report when things are going wrong. The image you build with an agreement like that is a negative one. The client only hears from IT when there is a problem or when service levels aren’t met. ALWAYS report on the positive things as well! It’s OK to say NO… Often, when you start implementing Service Level Management in your organisation you’ll find that you can’t deliver a lot of the user’s requests. You can’t deliver because you don’t have the underpinning processes in place, you don’t have enough budget and a lot of other reasons. And that’s OK, as long as you discuss it with your clients. Service Level Management is all about managing the expectations of your clients. Internal and external agreements The beauty of implementing ITIL is that everybody in the organisation speaks the same language, and therefore you need to be very strict with your choice of words. A Service Level Agreement is an internal agreement with your clients and an agreement with an external party is an underpinning contract. Don’t talk about service level agreements with vendors and suppliers because that is confusing everybody. 22 Shivkanth Rohith www.aryanrohith.cjb.net 7.1.2 Financial Management for IT Services When Service Level Management agrees with the customer on Service Levels, it has to know how much money is involved in delivering this service. Especially when the cost for IT services is charged on to the customer. Financial Management creates awareness of the total cost of the service both within the IT group and with the customers, and provides opportunities to increase the efficiency of the IT organisation. This information comes from Financial Management for IT Services. It basically deals with 3 areas: - Budgets - IT Accounting - Charging The sub-process of charging is implemented, subject to the company policy of internal invoicing structures. Financial Management needs input from all other processes in regards to the activities that form part of the service delivery. It will also deliver input in the other processes, e.g. financial information for the cost-benefits analysis within Problem Management and Change Management. EXTRA READING Since when did IT and finance merge? Q. Why is business financial management important to the IT professional? Isn't that the CFO's responsibility? Competent financial management is critical to the success and very survival of a wide variety of organizations. In the technology community, it is common to select the chief financial officer or the chief information officer for advancement to the CEO position. For the CIO professional looking for a promotion or a greater understanding of the IT arena, an understanding of the basics of financial management has become invaluable. The goal of business financial management is to maximize the value of the firm. Successful financial management requires a balance of a number of factors, and there are no simple rules or solution algorithms that will ensure financial success under all circumstances. The overall goal toward which corporate financial and IT managers should strive, is the maximization of earnings per share, subject to considerations of business and financial risk, timing of earnings, and dividend policy. The basic concepts of the fundamental principles of accounting, analytical techniques for interpretation of financial data, basic budgeting concepts, financial planning and control, and the analysis of long-term investment opportunities are applicable to IT as well as finance. Financial and IT professionals who can profitably harness the principles and techniques of financial and information resources will be able to manage their organizations more effectively than their competitors. Exchanging wooden dollars? Many organisations decide not to do a physical charge out to their internal clients because it would only add up to the administrative procedures within the organisation. 23 Shivkanth Rohith www.aryanrohith.cjb.net Financial management in this type of organisations is used to gain insight into the cost involved in the delivery of IT services and to raise the awareness with the clients. Instead of invoicing the client, a monthly report is sent out to update the client on their ITexpenditure. Even though charging is not used, financial management has a lot of benefits to organisations like this. 24 Shivkanth Rohith www.aryanrohith.cjb.net 7.1.3 Availability Management Contrary to what many believe, Availability management really deals with ensuring that the SERVICE is available according to the agreed levels with the customer. It is really not set out to guarantee the maximum level of availability, nor is it about reporting on server availability as a stand-alone issue. This process ultimately links all IT components together and manages the links and weaknesses between the IT components and IT areas to ensure the availability of the service delivery to the customer. Availability works closely together with Capacity Management, as you can’t ensure the availability of the service when the capacity is insufficient to deliver the IT service. There is also a close link to Problem Management as the availability expert will be part of many problem-team. EXTRA READING Distributed Availability Introduction Distributed availability is the ability to ensure predictable, reliable user access to key applications and computing resources. Maintaining distributed availability in client/server environments is a complex and expensive proposition, yet it plays a critical role in maximizing your investment in client/server computing. Businesses that have embraced and deployed client/server computing face major challenges such as keeping mission-critical applications up and running. This document addresses why this is difficult, compares different approaches to solving this challenge and describes why Tivoli's automated solution to distributed availability is uniquely suited for enterprise client/server management. Distributed Availability Client/server computing puts mission-critical computing resources such as systems, databases and applications in the hands of those who need them most, end users. However, without the proper management tools, it is difficult to maximize the availability of these crucial resources. Providing distributed availability gives information technology (IT) staff an efficient, automated way to ensure the high availability of key computing resources. It allows IT staff to meet its service-level commitments. It ensures that company personnel who depend on uninterrupted access to mission-critical business applications are more productive. It allows organizations or individual business units to minimize lost revenue opportunities that can occur when mission-critical applications and computing resources are not up and running efficiently. Why It's Difficult Leading global corporations have embraced distributed client/server computing as their vehicle for implementing mission-critical applications. However, distributed environments and applications are much more complex and dynamic than their mainframe predecessors. It is easier to track the performance level of computing resources in a centralized data centre environment because there are only a few large systems--typically from one vendor and located in one place. In contrast, the distributed client/server environment has a large number of dispersed, heterogeneous systems that grow and change at a rapid rate. For example, IT staff are often unaware that individual workgroups have installed their own systems or reconfigured their environments. With many remote sites, different types of systems from multiple vendors and a constantly changing environment, the challenge is to provide application and computer 25 Shivkanth Rohith www.aryanrohith.cjb.net resource availability without requiring system administration experts at each remote site. Providing distributed availability includes the following activities: − Automatic configuration and deployment of policy-based resource monitors and actions − Automatic problem detection through distributed monitoring − Automatic preventive and corrective actions Traditional Solutions The major problem with most of today's monitoring tools is that they are not scalable-they simply do not work in a large, distributed client/server environment. While a mature set of mainframe performance tools exist, these tools only work well for monitoring a centralized computing shop. They are simply not designed to handle enterprise-wide monitoring. With the advent of LAN computing, a second type of monitoring tool has emerged. LANbased monitoring tools typically focus on presenting many, often hundreds, of alerts per system. These tools work within one LAN segment and assume that a server or workgroup administrator exists for each LAN. This scenario is suitable for small computing environments but does not work for an enterprise with hundreds of LANs at many remote locations. Typical problems with LAN-based monitoring tools include: − Many tools focus solely on alerting − Filtering capabilities are often absent, causing alert-information overload − IT staff must handle each alert manually − Tools must be configured individually on each monitored machine at the remote site − Many tools can not initiate automatic corrective actions − Most tools are not extensible, restricting the addition of monitors or corrective actions − Many tools are SNMP-based. These tools work well monitoring network devices but are dependent on a central collection point, are not secure and not scalable. Further, they generate polling and network traffic. − Most existing monitoring tools have severe limitations in distributed client/server computing environments, especially in the areas of scalability and efficient central configuration. The limitations of existing monitoring tools make it difficult to monitor a large, dynamic computing environment. The Right Approach for Client/Server Tools that work for mainframe or for individual LAN environments do not provide a scalable solution for large, distributed computing environments. To meet the unique requirements of client/server computing, the solution must: − Configure and deploy monitoring agents from a central site − Be easy to configure so that deployment does not become labour-intensive − Provide alarm filtering at the source − Avoid generating an excessive amount of network traffic − Provide a method of automatically correcting repetitive problems at the source − Alert staff only for serious exception conditions − Allow you to build policy for monitoring a group of similar systems or applications − Allow you to build policy for corrective actions on a group of similar systems or applications − Be easily extensible, allowing you to add customised or third-party monitors − Provide an easy method to launch third-party debug or performance tools on an exception condition 26 Shivkanth Rohith www.aryanrohith.cjb.net Conclusion Providing distributed availability in the client/server environment can be a complex and expensive proposition; yet it plays a pivotal role in the maximisation of an organisation's investment in client/server computing. Without the proper management tools, it is difficult to maximise the availability of these distributed resources. 27 Shivkanth Rohith www.aryanrohith.cjb.net 7.1.4 Capacity Management Capacity Management has the task to ensure that the right amount of capacity is at the right location, for the right customer, at the right time and at the right cost. It may sound a bit weird but this captures the essence of Capacity Management. Capacity Management optimises the amount of capacity needed to deliver a consistent level of service to the customers. Without over-delivering to one customer and underdelivering to another customer or wasting money on excess capacity that is unnecessary. Capacity Management has a very close relationship to Availability Management, Configuration Management and Service Level Management. EXTRA READING Why a Capacity Plan? With cheap hardware prices, capacity planning may seem unimportant; you can always upgrade later. A simple guess-timation of the capacity of the system should be sufficient, right? Why give this subject any more thought? There are two main concerns that make capacity planning critical. The first is the rate of technical change in the distributing computing sector. We now measure progress in "Internet years" -- equivalent to about 90 days of a calendar year. The second is with Internet/Intranet at the helm. Today’s systems are primarily being developed within a 3-tier architecture. This rapid change, coupled with the increase in complexity of 3-tier architecture, is causing system designers to pay closer attention to capacity. Five years ago, a C/S designer could roll out a new system with a rough estimate of capacity and performance. The system could then be tuned or more capacity added before all of the users had been converted to the new system. The process was reasonable because the systems were typically not mission-critical. Today, there’s no time for this approach. Once systems are in place they become an integral part of the overall C/S design. Downing the system for upgrades becomes increasingly expensive in both time and resources. In addition, the added complexity of the environment typically requires more care, due to the interdependency between various C/S application components. Capacity planning is driven purely by financial considerations. Proper capacity planning can significantly reduce the overall cost of ownership of a C/S system. Although formal capacity planning takes time, internal and external staff resources, software and hardware tools, the potential losses incurred without capacity planning are staggering. Lost productivity of end users in critical business functions, overpaying for network equipment or services and the costs of upgrading systems already in production more than justify the cost of capacity planning. 28 Shivkanth Rohith www.aryanrohith.cjb.net 7.1.5 IT Service Continuity Management Availability Management focuses on ensuring the availability of the service as part of the daily routine. IT Service Continuity Management prepares for the worst case scenario to offer an appropriate disaster recovery option when a disaster occurs. The ultimate choice of which option to choose, is made by the customer as part of the SLA agreements. All options are possible, but they all have a different price tag. In the current global situation, a structured approach to IT Service Continuity Management becomes more and more important. The business processes rely more on the IT Services and the IT components are more under ‘attack’ than ever before. What a disaster is is something the IT organisation needs to define with the customer, as this might be different for every organisation. EXTRA READING Availability Solutions: What Do You Need? (source: www.contingencyplanning.com) Run New Search by: Bill Merchantz Pages: 22-25; September, 2002 There are many options available to you when selecting a managed availability solution. The following review of products, services, and technologies will help you choose the best one for your needs. The previous articles in this series have provided the foundation for choosing a managed availability solution. As they explained, the search for availability solutions begins with a thorough understanding of the nature of the problem. How much does an hour of downtime cost? Are your needs application-centric or data-centric? What is the absolute and relative importance of recovery time objectives (RTO) and recovery point objectives (RPO) in your business? The answers to these and other questions determine the appropriate size of your managed availability investment and where you should direct it. Once the objectives and scope of your solution are defined, a wide range of services, foundation technologies, and products are available with which to build it. In fact, some of the foundation technologies, such as journaling, may already be available in your database management or operating systems but have been deemed, until now, unnecessary or inappropriate for implementation in your systems. When developing a managed availability solution, all options and approaches should be carefully considered in order to build the optimum solution. Services Assessment — The first step in managing availability is to assess the current environment and seek availability options. Using an outside vendor to provide this 29 Shivkanth Rohith www.aryanrohith.cjb.net service adds an objective viewpoint and a wealth of preexisting knowledge, skills, and experience. Planning — In addition to a high-level plan that describes the overall direction and desired end goal, a program also needs separate plans for each step along the way. Education, Training, and Documentation — Human error is a major contributor to unplanned downtime. Education, training, and documentation on the use and maintenance of each system and its components can help reduce this type of problem. Auditing — Business requirements constantly evolve in ways that directly impact internal systems. Add staff turnover into the equation and the result is a volatile environment that can create unexpected availability exposures. A partial- or full-system failure will certainly highlight these new vulnerabilities, but a less-costly approach is ongoing vigilance through regular audits of the managed availability plan and its implementation. Foundation Technologies Backup and Restore — The most basic level of data protection is to back up data so that it can be easily restored. Backups are generally performed on a regular schedule, such as nightly. Journaling — Database journaling allows a destroyed database to be recovered up to the last committed transaction. Journaling eliminates the problem of “orphan data” — data added or updated since the last backup preceding a destruction event. Of course, like backup data sets, journals must be stored separately (preferably remotely) from the production database to ensure that both are not destroyed in the same disaster. Commitment Control — The commitment-control function of most commercial databasemanagement software protects the integrity of transactions in general by rolling back to their previous state any specific transactions that were not completed at the point of an outage. The complete series of updates can then be reentered upon system recovery. Products Uninterruptible Power Supplies — Power outages are the most common cause of abrupt system failures. Therefore, uninterruptible power supplies (UPSs) go a long way toward reducing downtime frequency. When configured properly, they can also reduce the duration of downtime. When a primary power outage occurs, the UPS can maintain system operation long enough to save main storage, thus protecting data and simplifying system startup when the power returns. 30 Shivkanth Rohith www.aryanrohith.cjb.net Fault-Tolerant Hardware — Some hardware includes internal redundancy, along with the means to monitor each component’s operating status and seamlessly switch to a backup component when necessary. Inoperable components can typically be replaced without stopping the system. RAID (Redundant Arrays of Independent Disks) — RAID spreads enough information across multiple disks to allow the disk subsystem controller to recalculate any missing information in case of a disk failure. It does not, however, protect against the failure of other disk-related hardware, such as a controller, an I/O processor, or a bus. Disk Mirroring — When properly configured, disk mirroring can eliminate single points of failure. Often used in combination with RAID5, this approach requires that data be concurrently written to each unit in a set of identical disks, incurring minimal CPU overhead or increase in system complexity. However, mirroring all data requires at least twice as many disks. Alternate Communication Paths — Core business functions often depend on a variety of systems and sites. For example, a company may take orders at one facility and fulfill them at a remote warehouse, with order data transmitted between the two over communication lines. If the line is disrupted, orders will have to be sent to the warehouse manually or, worse, not transmitted at all. The only way to combat these exposures is to maintain alternate communication paths. Redundant Systems — A multiple-systems approach that continuously maintains realtime replica systems offers the highest possible level of availability. It duplicates applications, user data, and system data on two or more systems that may be geographically dispersed. In addition, it can quickly switch users from the primary system to a backup system when necessary. Thus, unlike backups and disk mirroring, this approach can eliminate all types of user-processing interruptions. Redundant systems can be coupled with third-party software assuming full responsibility for system resiliency. Or, they can be tightly coupled in a cluster, with the operating system assuming primary responsibility for monitoring and managing the cluster. Although a backup system can be located in the same building or on the same site as the primary system, it is highly recommended that the backup system be located in a geographically distant location so that an event or disaster affecting a wide area, such as a storm, flood, or earthquake, will not affect both systems. 31 Shivkanth Rohith www.aryanrohith.cjb.net Because such disasters are rare, most companies justify the investment in redundant systems based not on disaster recovery but on their ability to provide continuous operations. In a single system environment, database saves and reorganizations may shut down applications or, at best, severely impair their performance. Worse, hardware and software upgrades can shut down applications for hours or even days. These problems are avoided in a redundant-system environment. Users are simply switched to the backup system whenever the primary system requires maintenance. Furthermore, database saves and read-only access tasks, such as query and reporting jobs, can be done on the backup system, thus eliminating their burden on the production environment. When considering a replication solution as part of a full managed availability solution, include these questions in your evaluation worksheet: • Does the solution allow for seamless, real-time replication of data to a backup system? • Does the solution also replicate system objects? If not, it may not be possible to recreate the application and user environment on the backup system. If, for example, a user profile is changed on the primary system, that change should be reflected on the backup. • Is the replication technology application-independent? If not, costly and error-prone application modifications will be required. • Does the solution provide system monitoring to automatically detect failures? • If the solution provides monitoring, can it automatically initiate a failover to the backup system in case of an outage on the primary system, without the need for any user or operator intervention? • Does the solution offer a way to quickly and easily switch to a backup system, with minimal user interruption, when maintenance is required on the primary system? • Does the solution enable activities normally associated with planned downtime, such as software and hardware upgrades or database reconfigurations, to be conducted in the background while applications stay in production and users stay online? Blended Products/Services Third-Party Recovery Sites — The purchase and maintenance of a complete set of duplicate hardware and software and the off-site facility to house them is cost-prohibitive for many companies. Third-party recovery sites, which share costs among a number of customers, provide an affordable alternative. 32 Shivkanth Rohith www.aryanrohith.cjb.net Data Vaulting — Journaling does not fully protect orphan data, since a disaster that destroys a system would likely also destroy the journal files in the system. Data vaulting eliminates this vulnerability by capturing changes made to production databases and immediately transmitting them over a network to a recovery site. If no problems occur during the day, the vaulted data can be discarded when a new backup tape arrives at the recovery site. Should a disaster occur at the primary site, the vaulted data can be applied to the recovery site’s database after the most recent backup tape has been loaded. The result is a database that is current up to the point of failure. Solutions that Fit Because of the diversity of data topologies, hardware, software, and application platforms, few elements in the managed availability toolkit can be delivered as a shrinkwrapped offering. A purely off-the-shelf product would have to be so generic as to be far from optimal in any installation. Therefore, companies should look for a total solution that includes methodologies, software, processes, training, support, and services that allow them to effectively tailor and adapt the solution to their unique environments. Foundation Questions Clearly, business needs and technology architecture determine the optimal managed availability solution. However, planners should ask some general questions about each of the options they evaluate: • Is the solution a comprehensive, end-to-end offering built upon a proven methodology, incorporating the software, services, support, documentation, and overall expertise necessary to provide full managed availability? • Does the solution manage all operation-critical elements of application availability (e.g., data, objects, security, work management, and user connectivity) and server requirements (e.g., environment configuration and power)? • Is the solution proven and supported by the company’s platform manufacturer(s) and/or application developer(s)? • Does the solution support new and emerging technologies? • Is the solution capable of handling the full-transaction volumes and complexity of the company’s systems? Is it scalable and flexible enough to keep pace with growth? • How long has the solution been in the market? What is its market share? • Has the solution been rigorously proven in a number of real-life installations? Can reference sites be visited? 33 Shivkanth Rohith www.aryanrohith.cjb.net Addressing these questions can help planners ensure that they select the solutions that best fulfill their organizations’ requirements. The next article in this series will take the process one step further by providing some suggestions on evaluating solution providers. About the Author Bill Merchantz is the founder, president, and CEO of Lakeview Technology, a provider of managed availability software solutions. He has many years of hands-on experience as a customer/manager of IT solutions. 34 Shivkanth Rohith www.aryanrohith.cjb.net 7.2 Service Support Set 7.2.1 Service Desk The end-users need a single point of contact with the IT organisation as well. The business users / end-users need IT services to improve the efficiency of their own business processes. When they can’t use the IT services, they have trouble achieving their objectives. The Service Desk should be the single point of contact for all end-users. This is where ALL questions, issues and requests are logged and recorded. The type of service Desk you need depends on the requirements of your customer base. After all, they need to pay for the service! - You can choose one of the skill levels: Call Centre Unskilled Service Desk Skilled Service Desk Expert Service Desk - And combine it with an organisational structure: Centralised Service Desk Distributed Service Desk Virtual Service Desk Split Function Service Desk EXTRA READING Question: How do you calculate the cost of a call? Jana Johnson - 3/26/01 3:26:00 AM How do you calculate your cost of call or cost of a service request? What types of costs do you include in the total and how often are the costs measured? Answers: Pam Erskine - 3/28/01 7:25:00 AM Several years ago, I purchased a white paper from Help Desk Institute, which reviewed different calculations for cost per call. While the paper is somewhat outdated, it does give you an idea of some basics. Our cost per call includes: salaries, benefits, utilities, rent, insurance, phones, training, office supplies, etc. Basically we include any cost related to the support centre. We complete the calculation quarterly. Justin Farmer - 3/29/01 2:46:00 PM Evidently the META and Gartner Groups have done studies to show that password reset calls cost somewhere between 25-35 dollars per call that lasts an average of 11.5-14.5 minutes. Furthermore, depending on security policies, an employee has an average of 3-4 reset requests per year. So, the average cost in that area is fairly easily definable. Alec Norrie - 4/1/01 9:45:00 AM Suggest you take a look at the "Interactive Help Desk tool" recently posted in the HDI Superstore. The tool calculates the cost per call for given Service Levels very quickly and accurately. Ronald Franklin - 4/2/01 11:36:00 AM 35 Shivkanth Rohith www.aryanrohith.cjb.net In addition to the departmental costs that can apply to all departments, a few other factors must be taken into consideration. These include length of call, quality of assistance received, whether the issue was adequately resolved, and whether or not the user must call back for additional assistance. For example, if the user must call back and has to rehash the situation, costs escalate and are difficult to control. If, however, the support technician is continually and adequately trained, even if he must stay on the phone for some time to resolve the call, this will result in a lower number of calls, a higher number of satisfied users, and an overall lower help desk budget. Dominick Miliano - 4/4/01 3:29:00 PM We use a simple calculation that works well for us. We take the entire dollar value of my annual budget and divide it by the number of customer contacts we have. This includes all inbound e-mails, voice mails and phone calls. In addition, we add in the number of outbound phone calls we make to get a problem resolved. Using just the inbound calls, we average $13-$16/call. If we add in the outbound calls (a very large number - representing a significant amount of time and therefore money) we are under $9.00/customer contact. This calculation method may not be "according to the book," but we have used it for years and as I said earlier, it works for us. Dick Szymanski - 4/10/01 12:35:00 PM One thing most help desk organizations do NOT do in calculating cost per call is, include the support cost beyond the help desk (i.e. second and third level support whether captive or outsourced). This leads to misleading and even detrimental management information. In the extreme, the lowest cost per call would be seen in the help desk that dispatched out the most of their service incidents the quickest, thus saving greatly on their own labour and phone costs while driving total costs up as the inefficiencies from buck-passing creep in. Any cost per unit measurement that focuses only on a portion of the whole is worthless without the full picture. James Cundall - 4/11/01 4:59:00 PM One individual mentions the cost for handling a password reset and one suggestion for handling such calls would be to give your users the option to do the reset themselves via an IVR, which is what we do. It thus lowers the cost per such call to less than a dollar versus it being done thru the help desk, that cost is in the 7.50 range. Our IVR is our first level of support for the client with the Response Centre being the 2nd level and so on. Our cost per call is calculated based on the total monthly expense for running the Response Centre, which includes salaries, fringe, supplies, etc. This versus the total number of calls handled gives us a cost per call that we can manage to. Brandon Caudle - 4/11/01 5:17:00 PM Combining several statements already posted, the budget of ALL support levels that are involved in Incident Resolution (Level 1, 2 etc.) divided by the number of Incidents may deliver a cost per call. A further breakdown may be used in which Level 1 calls are calculated using their average resolution time, Level 2 calls are calculated using their resolution time, etc. This would lead to a more accurate calculation of cost per call, depending on level of resolution. However, outbound calls are not included in this case either. Gail Freeman - 6/5/01 9:48:00 AM In a previous note Justin Farmer sites some statistics from Gartner and Meta reports that I found very interesting. Source: www.thinkhdi.com 36 Shivkanth Rohith www.aryanrohith.cjb.net 7.2.2 Incident Management This process is ultimately to get the end-user and clients back to work as soon as possible. It is symptom-driven and the only concern is speed of response, and the continuation of the business process. Incident Management uses information out of the Problem Management process (work arounds and Known Errors) and the Configuration Management process (linking Incidents to Configuration Items) A large component of Incident Management is the administration and tracking of the incident itself. EXTRA READING Scenario......... Your company has been hit with a new virus that your anti-virus software is not detecting. Widespread incidents are being reported to your Help desk. Every attempt to clean it only spreads it further to your employees, customers and business partners resulting in confidential data being sent outside your company. What do you do? Tongue in Cheek – but a good Incident example: End User calls tech support: "My computer is making a racket." What were you doing when it started making the noise? fish asks. "It wasn't even turned on," user says. Are you sure you didn't just turn off the monitor? fish asks. Check the lights on the box. "Nope, no lights." And you're sure it's making a noise? "Sure, listen," user says, and holds the phone toward the machine -- which is distinctly making a noise. When the support tech arrives at the user's office, the PC is piled so high with papers that the top of the CPU can't be seen -- but it's definitely making a fair bit of noise. The tech digs through the papers to unearth the box. Finally he gets down to bare metal -- and there, on top of the PC's case, is the user's beeper. It's set to vibrate mode. Problem solved. But not forgotten. Six weeks later, another call: same user, same complaint. Do you know where your beeper is? Support asks. "Oh. Err, never mind ..." Source: Computerworld 37 Shivkanth Rohith www.aryanrohith.cjb.net 7.2.3 Problem Management - Where Incident Management is only concerned with speed, Problem Management wants to get rid of the Known Errors in the IT Infrastructure. Everything you do within this process is focused on: Finding what the Known Error is (Problem Control diagnosis) Identifying alternative solutions for the deletion of the Known Error (Error control) Raising RFC’s to request for the deletion to happen Check after a change is performed if the Known Error is really gone In order to do this, Problem Management needs to work closely together with Incident Management, Change Management, Configuration Management and Availability Management. EXTRA READING Getting to the Root Problem Event correlation and root-cause analysis tools promise a lot, but real-world results are mixed, say users. Automated network management software - sophisticated stuff that promises an unprecedented ability to monitor a corporate network - is on the horizon. 38 Shivkanth Rohith www.aryanrohith.cjb.net But skeptical IT managers say the tools still aren't smart enough. They want artificial intelligence that can diagnose a network problem and get it right at least seven out of 10 times. Such automation relies on event correlation and root-cause analysis tools. The concept behind the tools is simple: keep track of network devices and relationships, automatically fix minor problems and refer more complex snafus to the network manager. But skeptical IT managers are demanding proof of better automation, bedrock interoperability and broader usefulness before they will buy such tools. "We've looked at these tools," says Tom Revak, domain architect at pharmaceutical company GlaxoSmithKline PLC in Research Triangle Park, N.C. But "until the artificial intelligence exists that can automatically update a dynamically changing network, it's just one more pretty little map of what could be." Historically, users have been "skeptical that software can meaningfully achieve what human expertise has achieved," says Dennis Drogseth, an analyst at IT consultancy Enterprise Management Associates Inc. in Boulder, Colo. Drogseth researched and wrote the consultancy's report on root-cause analysis and event correlation that was released in December. Who’s Using It Of 135 companies surveyed by Enterprise Management Associates, 60% are doing event correlation. Those 81 companies use it to: Speed problem resolution 33% Improve service delivery27% Reduce staff, overhead 24% They’re spending: More than $200,00027% $100,000 to $200,00016% $10,000 to $100,00044% They consider acceptable a rate of: 75% to 90% accuracy43% More than 90% accuracy40% Exactly, says Kristina Victoreen, senior network engineer at the University of Pennsylvania in Philadelphia. "We tried building on the autodiscovery that Spectrum does, but we But how satisfied are spent more time fixing what it had discovered," Victoreen they? says. "The guys who do [the model building] found it was Very48% quicker to build the topological model of the network by Somewhat hand, which is very time-consuming." Spectrum is a management tool from Aprisma Management Technologies in source: Enterprise Durham, N.H. Management Associates The tools "need to sense when something out of the norm Inc., Boulder, Colo. occurs, such as a critical deadline that forces people to work around the clock, and normally noncritical failures become critical and require immediate response," says Revak. "If they can't do this automatically, the administrative overhead greatly outweighs the return on investment." The moment of truth for users seems to come when the software tools "can successfully automate problem diagnostics 70% of the time or better," Drogseth says in the report. At that point, "users believe they are justified in the investment. "That 70% mark is being met today by most of the better products," Drogseth says. The benefits can be substantial, he says: smoother-running networks, better servicelevel delivery, reduced staff requirements and lower overhead. These benefits, together with advancements in the software and a reduction in the costs of deployment, are driving an increase in the use of root-cause analysis and event-correlation tools, Drogseth says in the report. 39 Users have viewed www.aryanrohith.cjb.net automation for rootcause analysis and event correlation as "more work than it's worth, requiring too much labor and knowledge for rules to be appropriately "There's no way we could manage without it," says Chris defined for a specific Vecchiolla, IT project manager at Royal Caribbean Cruises environment," says Ltd. in Miami. Each of Royal Caribbean's 18 ocean liners has Drogseth. an IT staff of two people, but most systems management is handled remotely from Miami via satellite. Royal Caribbean uses Compaq Insight Manager and Glossary: Unicenter from Islandia, N.Y.-based Computer Associates Advanced correlative International Inc. to manage and monitor "about 170 items, intelligence: A such as SCSI card failure and out-of-threshold notices on problem-isolation servers," Vecchiolla says. method cloaked in secrecy by most rootEscalating alarms notify both onboard and Miami-based IT cause analysis tool staffers of problems. When the system detects a virus, it vendors. This is where automatically destroys it and notifies onboard IT staff of the action via a banner on a monitor, Vecchiolla says. But should language is most likely a server exceed a predetermined threshold, Miami staff could to become obscure or insubstantial. be paged to handle the problem, he says. Because Royal Caribbean's ships cruise around the globe Event correlation: through every time zone, remote management from Miami Examines the sometimes occurs while onboard staffers are off-duty. When relationship among the Miami staff works on a ship's systems, "Unicenter automatically picks it up and generates a banner that goes to events across an IT the onboard systems manager that tells them the date, time, infrastructure to narrow the search for workstation accessed, what was done," Vecchiolla says. "The the cause of a [onboard IT staffers] like that a lot." problem. Drogseth says that more than half of the enterprise-level Object data store: companies with which he spoke are beginning with Knowledge specific to automation's "lowest common denominator, alarm devices, applications deduplication." If a server goes down, each attempt by any user or device to and connections that provides a database of access it can generate a separate alarm, which doesn't codified detail for describe the root cause of the problem. Deduplication lets a understanding objects network manager see a single, server-down alarm instead. and their relationships. An A Kansas City, Mo.-based unit of Boston-based financial extensive object data services company State Street Corp. isn't doing root-cause store can contain analysis, but it does use Spectrum for alarm deduplication, object performance says David Lembke, State Street's network services data for use in manager. modeling routine interactions across The University of Pennsylvania has been using Spectrum for device types such as five years to reduce the number of alarms reported by a servers and routers. single event, Victoreen says. "It works reasonably well, assuming we've built the model correctly," she says. But "it Polling and turns out [that] a big map with red dots flashing to show instrumentation: alarms is not that useful for us." Provide ongoing event data about Drogseth says that in his interviews with 40 midsize to large infrastructure companies, most IT managers said they know they must availability, start automating, because networks have grown too large performance and and complex to manage without automation tools. topology. They can include common Though Revak is skeptical, "that's not to say we're not availability metrics, as interested," he says. well as CPU utilization or even remote monitoring. Shivkanth Rohith 40 Presentation and context: Encompass Shivkanth Rohith www.aryanrohith.cjb.net "We're rethinking trying to model all of our networks and maybe moving to trap aggregators or event correlation engines," Victoreen says. IT managers are looking beyond the network focus that most vendors have stressed and are seeing extended uses for the tools, such as to support performance, help desk functions, inventory and asset management, change management, and security. Not all tools support all such extensions, Drogseth says. Root cause analysis: Isolates the cause of failure or poor performance. Topology: The map of where things are. It can detail both the physical (Layer 2) and logical (Layer 3) network, and move on up the Open Systems Interconnection stack to include configuration information relevant to systems and applications. Vendors of most of the tools also claim some kind of predictive capabilities. A network that learns over time can not only help prevent problems, but it can also increase job satisfaction by releasing IT staffers from grunt work and calling on them only for more difficult questions. But where most such artificial intelligence efforts fall short is in detecting subtle changes, Revak says. "Through repeated small changes, the norm [can] shift very near the failure point, setting up a significant failure situation for the next small deviation from the newly established norm," he says. Predictive capabilities vary greatly, and not all are based on sophisticated artificial intelligence techniques, Drogseth says. At the bottom of the range is basic linear trending. An algorithm can determine how long it will take a server to reach capacity if usage increases by, say, 20% per month. At the other end are sophisticated tools like CA's neural network-based Neugents, Drogseth says. A Neugent can look at historical data about network resource usage and a company's business cycle, says a CA spokeswoman. By aggregating and correlating data on network infrastructure and business relationships, the Neugent might predict that a server would reach capacity in six weeks but drop back to 30% in the seventh week, she says. Royal Caribbean plans to implement CA's Neugent for Windows NT networks, which "will take us to another level of management," Vecchiolla says. Before the root-cause analysis industry achieves that new level of management, however, it must hurdle the stumbling block of standards, Drogseth says. Part of why the University of Pennsylvania isn't "getting as much back as we'd hoped for is that we have a lot of different software from different vendors, and they have a lot of different proprietary schemes and interfaces," Victoreen says. "The systems management industry must develop the standards and interoperability capabilities required for the tools in the event, problem, change, configuration, inventory, workload management, capacity [planning], performance [monitoring] and security areas to work together," Revak says. "Each of these disciplines contains some part of the overall equation." Root-cause analysis and event-correlation tools aren't layered onto a network so much as they are woven into its fabric, Drogseth says. De facto standards such as Java, HTML and XML help provide a cooperative interface between different vendors' products. But true interoperability demands a common thread, "a standard structure of data maintained in the object store" - a database of network devices, applications and relationships, Drogseth says. "At its most esoteric, standards refers to that platonically perfect state that never gets achieved," he says. "What we're seeing is some adoption of some standards by some vendors." 41 Shivkanth Rohith www.aryanrohith.cjb.net Users should look for vendor partnerships to ease root cause tool deployment and management, he says. That's easier now than several years ago, when one New York-based financial services firm began doing root-cause analysis. "We had to build a lot of things ourselves because they weren't available at the time," says the firm's IT vice president, Gary Butler. "We're using the [System Management ARTS Inc.] correlation engine, and we're feeding it with data from Tibco's smart agents" and San Francisco-based Micromuse Inc.'s NetCool presentation software, says Butler. "We can't always find the root cause 100% of the time, but we can at least find the more serious event, and that keeps us from wasting time with all the symptoms." Revak says, "As the industry matures, the best bet is for companies to focus on developing their event infrastructure technology - a prerequisite for any advanced management - their people and their processes. Technology is not the most important. [Vendors] dislike it when I say this, but most important are the people and the processes" and the relationship between them. Source: Computerworld 42 Shivkanth Rohith www.aryanrohith.cjb.net 7.2.4 Change Management As a spider in the web, Change Management needs to know everything about every change. This way, you are in full control of the changes to your IT infrastructure. Change Management is NOT about performing changes risk free- it is about performing changes with a minimal risk OR consciously taken risk to the levels of service. It is therefore important to involve the clients or client representatives in the change management process. All projects start through Change Management as all projects wish to change something to the IT Infrastructure: they either enhance the current infrastructure or add a new component to it. Change Management is more than just Change Control The Change Management process starts with the RFC being raised and is in control from the assessment and acceptation of the RFC until the acceptance of the change during the Post Implementation Review. All other processes issue RFC’s to Change Management for necessary upgrades to improve their effectiveness and efficiency. Change Management needs information from all other processes in order to perform the risk assessment, an important part of the EXTRA READING Five Intranet Management Problems You Can Solve With a Change Management System (Source: www.earthweb.com) Remarks made around the office: "I updated that document yesterday, but it's not appearing on the Intranet. What happened?" "The new version's wrong! Did we back up the old one?" "Chris and I were editing the same file at the same time, but I finished first. His changes overwrote mine! "Who made that change?" "Only the Webmaster is allowed to change any files, so your changes might be implemented by Christmas." If you hear statements like these on a regular basis, you may be suffering from change out of control. Changes in files are necessary and beneficial, but they also have to be controlled. People in a development environment know all about this problem, and have devised a sound way of managing the volume of changes to software files. The discipline of managing, controlling, and automating changes in your files is called software change management. The Five Problems You only need to go as far as your Webmaster to learn all about the volume of change in a Web or Intranet environment. Just like their developer cousins, Webmasters struggle to manage the magnitude of change to a Web site. Now, with new integrated authoring and browsing tools like those available from Netscape Communications, making changes to a Web site is as easy as saving a file and pressing a "one-button publish." This technology opens the Web up to a whole new breed of content providers making changes to a site. Before the technology, changes had to go through the Webmaster because of the difficulty in executing the changes. Now, these new tools have empowered everyone to 43 Shivkanth Rohith www.aryanrohith.cjb.net make their own changes. And consequently, new change problems have emerged. The following are some examples of common change problems affecting Intranet sites. 1. Your update didn't appear. Even though you've made a change to a document, the old version remains on the Web. Until the new version appears, the time you spent updating it is wasted. If you can't find the new version, you will have to redo the work. This problem typically happens because two copies of the same file exist and the file you updated is not the file copied to the Web site. Say, you make a private copy "just in case." A commendable precaution but prone to problems. This private copy will often bypass scripts and automated update tools. You may not notice that two long pathnames differ in a single component and you alter the wrong file. With two copies of a file, mistakes will happen. Or to put it another way, duplicate files diverge. You can solve this problem by controlling both how you change the original source document and how you make those changes available on your Intranet site. Some kind of approval or promotion mechanism is most useful. One of the first rules of change management states that you store each source file in only one place. 2. You can't restore an old version. You need to examine an old version, but you can't. The backup has failed, or the changes were made during the time period between backups, or no backup exists for that directory. You usually discover this problem when you realize someone has made a mistake in the new version or released information before its time. For example, suppose you plan to change your internal purchasing system and, with it, the related forms on the Web site. If the new forms are incompatible with the old forms, then new page on the Web won't work with the old system. If you haven't put the new system in place yet, chaos will follow. Prevention is the best solution to this problem. One clear method of prevention requires having a backup methodology and an archive of all versions of the files on the Web server. Then, when a mistake does occur, the older version can quickly replace the mistaken version. This ideal archiving system is simple, automatic (or nearly so) and easy-to-use. 3. Multiple authors, multiple overwrites. Two people opening the same file at the same time for editing creates an opportunity for file overwrites. In this situation, one person may save a file every minute while the other makes changes over a couple of hours before saving. Consequently, that file is a free-fire zone. As soon as the second person saves the file, all of the first person's changes disappear. Neither is careless, but a single save command has erased two hours of work. A change management system saves changes sequentially. It allows anyone to read a file at any time but controls the ability to save changes to that file. If you want to change the file, you must lock it first. This lock warns others that the file is being changed and prevents them from making changes. Then, when you finish changing the file, you unlock it making it available for someone else to lock, edit, and unlock. 4. You don't know who changed what or when you discover a mistake in a document; you have no way of knowing who changed the file. With most systems you have difficulty determining who changed a file and when. Even if a file system records that last changed a file and at what time, that's usually all the information the system provides. This kind of system rarely keeps a history of all the changes or of the reasons for those changes. Since a proper change management system keeps a record of all the changes, it's easy to add who changed the files and when to the record. Ideally, you will enter a reason for the change like, "Fred added a new section to the white paper on change management." On an Intranet site, every object should have its own history. A single page may contain many different objects (graphical images, audio files, applets, etc.) each with its own history. For example, if your corporate logo changes, regular visitors to your home page will recognize the change. However, the change will actually take place in the image file 44 Shivkanth Rohith www.aryanrohith.cjb.net not to the home page itself. A record of changes to the home page won't help track changes to the logo. 5. Your Webmaster does all the work Organizations solve the problem of managing change to a Web site by assigning the responsibility for change to their Webmaster. If an organization funnels all the work through a single person, the Webmaster, then it automatically assumes that he or she is accountable for the accuracy of the content. That funnel soon turns into a bottleneck, and the Webmaster who must do everything soon finds he or she has no time to do anything. A change management system approaches change in a more reasonable way. It evens out responsibility for change by enabling multiple, authorized people to make changes to Web content. Much of the work involved in putting files on a Web site is quite simple. It involves changing content, copying files, and confirming the links are correct. These kinds of changes users can make for themselves rather than requiring the Webmaster do it except for the possibility of mistakes. By protecting the files on your Web site against unwanted changes, a sound change management policy empowers users to make changes to documents in a structured, managed way. People can then work without the fear of losing data and the Webmaster can have a moment or two of peace of mind. Advanced features As already mentioned, some users find the basic features of version control are not enough for their needs. Typically, these people deal with the files rather than their content. They include team leaders, project managers and Web site managers. The problems faced by these more sophisticated users include: Finding differences: A user often needs to visualize the differences between any two versions of a file. Object re-use: Some components of the Web pages on a site are standard for the site (i.e., graphics and logos) and should be re-used as much as possible. Other parts are solutions to common problems and the program should re-use these as well. Can the software handle groups of files? Synchronizing groups of files: Files change at different rates; a team or project leader must ensure that particular sets of file revisions are kept together. Security: Large development projects require enhanced security. Some project maintenance actions should be restricted to qualified or approved users. Approving content: If the Intranet is used as a staging ground for documents that will later appear on the external Web site, there must be some way of distinguishing approved files and documents from those not approved, provisionally approved or under review. Team management: A Webmaster or a team leader cares about all the changes made to a group of files and all the files locked for editing at any one time. This person requires a "group-view" of the file system that differs from the narrowly focused view of a single user. A Policy for Change Begin with a sound policy when applying a change management system to Web development. Policy is the rationale behind tools and procedures, and without a sound one, the tools won't be nearly as helpful as they could be. 45 Shivkanth Rohith www.aryanrohith.cjb.net When drawing up a policy, consider both the organizational structure and the logical structure of the Web site or sites: − Are people organized into teams, departments or, perhaps, in a flat structure? How does that reflect the Web site? − Are files under a version control of any sort? − Is there both an Intranet and Internet site? If so, how do the content standards differ? Once you determine the site's structure and needs, consider the process by which documents get to the site - the approval cycle. Different organizations have different procedures. The common history of a published piece of dynamic content is: − New content is generated. This may mean modifying old content or creating new content. − Content is approved. The person who created the content may be the only approving body, or there may be a formal review process. − The Web page is tested. Again, the person who created it may handle this strictly, or there may be an official tester or testing body. − The new content is made public. The order may occur in a slightly different sequence, but these four steps happen. Some may even happen more than once. If the content is not approved, new content must be generated again. You must also ask specific questions when formulating a policy that identifies responsibility at different stages in the process of documents getting to the site. 1. Who approves documents or file content? In order to streamline the use of new content on a Web site, there needs to be an approval process in place. The approval process will depend upon the nature of the content. For example, an art director may have approval over the use of images or logos, especially if they apply across the entire organization, and a development team leader may have to approve a new Java application. 2. Who will test the changed pages to ensure they are correct? How will the test be done? This is an important part of the approval and validation process. Web pages on the Internet represent an organization and mistakes do not reflect well even if they're relatively minor ones such as spelling errors or incorrect links. 3. Who has access? In other words, who can replace or write to the files? Even on an internal site it may be useful to establish a process for approval and access. On the external site, perhaps only the Webmaster may change files, or perhaps the Webmaster and team leaders. In most organizations, the person who places the file on the external server essentially approves it for publication. Conclusion The World Wide Web is almost synonymous with change. Essentially, every time you save a document you create a new version of a Web document or applet -- you make change. Mastering the Web depends upon managing that change properly. The ability to manage change depends, in turn, upon an effective version control system. Version control reduces lost information, eliminates confusion and duplication of effort creating bottlenecks, and keeps productivity up. The growth of the Intranet has brought in a new breed of content providers: people not interested in learning the technical details of version control. Therefore, version control 46 Shivkanth Rohith www.aryanrohith.cjb.net must be made convenient, easy and invisible to them. The key lies in integrating version control tightly with the Web server. Of course basic version control isn't enough for Webmasters or professional developers who develop complicated applications for the Web or organize large Web sites. These people require advanced version control to deal with internal and external servers, firewalls, multiple source files and approval cycles. Advanced version control is built upon a sound policy and understanding of the site's needs. It requires additional features such as extensive audit trails, access control, and organizational tools such as projects and sandbox directories. With a sound policy and the right tools, both the new breed of content providers and the old suppliers can manage change effortlessly and with integrity. 47 Shivkanth Rohith www.aryanrohith.cjb.net 7.2.5 Release Management As soon as Change Management has finished with the development of the change, it hands it over to Release management to release the change into the appropriate environment. Release Management performs version control and controls the movement of software from the development environment to the test environment and into production. Release management also manages the Definitive Software Library (DSL), in which it stores all master copies of the software CIs. The Definitive Hardware Store (DHS) is a physical storage area with the authorised spare parts and other components of the nonsoftware CIs. All products that go into the DSL of DHS need to be checked for damage and viruses before it is stored. Release Management needs to make sure that the parties involved –INCLUDING THE SERVICE DESK!- are aware of the upcoming release and that the appropriate notifications have been sent out. EXTRA READING Example of a release notice to end-users: KDE 2.2 Release Plan The following is the outline for our KDE 2.2 release schedule. The 2.2 release, unlike the 2.1.1 release, will come with new functionality and improvements in many areas of KDE. We aim for the following improvements among others: Better printing framework IMAP support in kmail All dates given here are subject to revision, but we will try our best to stick to them if possible. Waldo Bastian is acting as release coordinator for the 2.2 releases KOffice 1.1 will be released independent from KDE 2.2. The KOffice 1.1 release schedule can be found here. KDE 2.2 final Monday July 16 The HEAD branch goes into deep-freeze. All commits should be posted for review _BEFORE_ commiting. Sunday July 29 Last day for translations, documentation, icons and critical bugfixes commits. Monday July 30 The HEAD branch of CVS is tagged KDE_2_2_RELEASE and the day is spent testing the release a final time. Tuesday July 31 The tarballs are made and released to the packagers. The rest of the week is spent testing the tarballs, packaging and writing the announcement and changelog. Friday August 3 The source and binary packages are uploaded to ftp.kde.org to give some time for the mirrors to get them. Monday August 6 Announce KDE 2.2. 48 Shivkanth Rohith www.aryanrohith.cjb.net 7.2.6 Configuration Management How can you manage your IT service Delivery in a consistent manner, when you don’t know what you have? An important and revealing question… Lot’s of IT organisations don’t know what they own, or what they use to deliver the IT services to their customers. Configuration Management is the process that keeps a repository of all the Configuration Items you need in order to deliver the IT Services. CIs include –and are not limited to- Hardware items, software items, SLAs, Disaster Recovery Plans. The most important thing of Configuration Management is the fact that it also manages the relationships between the CIs so you can do a quality risk assessment and impact analysis. Configuration Management makes sure that the information about the CIs, stored in the CMDB, is accurate and up to date. All other processes rely on the quality of this one! EXTRA READING Building a CM Database: Nine Years at Boeing Susan Grosjean Boeing Electronic Products A Boeing organization developed an Oracle-based database to track problems during the life cycle of the Boeing 777 airplane. Over nine years, it has evolved from mainframe to web implementation as technology has become available. This article reviews some basic approaches to developing configuration management databases and includes resulting lessons learned. The Need for a Database System Boeing Electronic Products (EP) designs and develops some of the avionics 1 for Boeing Commercial Airplanes (BCAG). About 1990, when the 777-airplane program was getting started 2, EP was instructed to develop an electronic database to track and record problems encountered during design and development of EP electronics. BCAG wanted a database it could access. After the 777 was fielded, all airplanes were to use this database. The existing problem reporting system was paper-based. One paper copy of each problem report (PR) was stamped "original." Multiple copies were made for each person assigned to work the problem. Each person would record his or her response by writing on the hard copy. Periodically, the Integrated Product Team (IPT) leader, responsible for a given piece of avionics, would call together all members of the team, known as the Engineering Review Board. These meetings could last for hours, trying to take all inputs and create one original PR. An easier way was needed to consolidate board members' input. Requirements Definition and Implementation Engineers, IPT leaders, and other potential customers did not seem to know what they wanted/needed in a problem reporting system. All the organizational requirements were surveyed (EP Configuration Management Plan, RTCA/DO-178B "Software Considerations in Airborne Systems and Equipment Certification," EP Software Quality Assurance procedures, etc.). Using the PR as a guide, the existing process was captured in a flowchart representation with a text description of each block. A requirements document was developed from this process document. The numbering scheme for the PRs was used in conjunction with automation of the 49 Shivkanth Rohith www.aryanrohith.cjb.net problem reporting process. A two-field scheme was used. The first field was an Avionics Product (LRU)3 designation such as 7WEU (777 Airplane Warning Electronics Unit). The IPT leaders and auditors needed a list of all PRs for a desired component. A search on this first field would provide that list. The second field was a four-digit number that was assigned sequentially. Once a PR receives this two-field dash number, it cannot be deleted, so there is no break in the numbering sequence. This simplified the IPT leader's control and made any audit process easier. Four text fields, as well as signature blocks, were associated with each PR. Each text field needed to accommodate extensive data (20,000 characters each). In the PR heading were smaller fields for pertinent data needed for tracking and reporting. The entry of valid data to these fields was enforced through the use of pull-down lists that provided all valid options for a given field. Because of the size and quantity of PRs, the speed and capacity of a mainframe was needed. EP chose to use Oracle on a VAX mainframe rather than an IBM due to the availability of VAX programmers in EP. EP was unable to find an existing user interface to meet its requirements. User access to the mainframe was via dumb monochrome monitors or by loading an access icon to the personal computers A major implementation complication was the need to have five variations of the PR (i.e. variations on the fields and associated process). The struggle to agree on a uniform PR form and process was futile; each user organization wanted its own little corner of the world. The result was five different PR databases. Maintaining requirements, processes, procedures, updates, and training for five different databases was a nightmare. Throughout the PR process (see Figure 1), e-mail messages were to be triggered to inform personnel and to assign tasks. There are now 43 possible messages for each PR. For instance, after an originator of a PR has completed the appropriate "title" and "description" fields, the system sends the PR as an e-mail message to those responsible for the product (component) in question. The length of these e-mail messages was a challenge, since they included the 20,000-character-length text fields. This will be solved by e-mailing only a brief message that includes a URL access to the web site that the assigned can access to see the complete PR and perform the applicable function. The system also automatically changes the status of the PR as it moves through the phases shown in Figure 1. Portions of a PR are approved in each phase. After approval, the status changes and fields are locked, preventing changes by unauthorized persons. Engineers, configuration management specialists, and IPT leaders have different levels of authorization. User Acceptance of the System After implementation of the requirements and extensive testing, the database was given to the users. A program directive and a users manual were written. Hands-on training was also conducted—not an intensive course, but intended to acquaint the user with the database and convey some of the rationale used for its creation. Function Keys The users' first reaction was, "This database is not user friendly." No one liked using the function keys to move around the database. Many keyboards did not support the function keys we were using, and corresponding control keys, ^B or F10 = Exit had to be developed. These keystrokes were displayed at the bottom of each screen. Unlike the paper PR, the electronic version forced the entry of required information. The IPT leaders did not like the fact that the database forced everyone to do business the same way, and that it had checks and balances in place to ensure this practice. But users liked the information the system provided: the position of a given PR in the life cycle, e-mail notification of PR status, who was assigned to work PR related tasks, and visibility into process bottlenecks. If a PR had languished on someone's desk for months, the IPT leader needed to know that. Sometimes the IPT leader was surprised to 50 Shivkanth Rohith www.aryanrohith.cjb.net find that the languishing occurred on his or her own desk. Canned reports included listings of all PRs by responsible IPT leader or by assignee, and totals of the quantity of PRs by open and closed status. Windows Three years ago, Oracle offered Developer/2000, allowing a user-friendlier front end. We needed to retain the VAX, as by now all four-text fields had increased to 65,000 characters each. With high-level management backing during the implementation of the front end, all Seattle-based groups adopted a single PR form. The groups also agreed with using basically the same process. The Parts Engineering Organization in Seattle levied additional requirements, as it wanted a way to track and record problem reports involving obsolete parts. Since single part obsolescence is often common to several LRUs, a process enhancement was created to generate multiple corresponding PRs. A parent PR (the first PR written) is created and duplicate copies are made for each impacted LRU. These child copies retain a link to the parent. When the child PRs are closed, the parent PR can be closed. Requirements expanded beyond Seattle to Irving, Texas—our manufacturing site. Irving has a problem-reporting system also implemented in Oracle. A requirement to transmit engineering problems between the Seattle and Irving databases was accomplished by developing an interface that transfers the PR from their database to Seattle's. The Seattle system now has approximately 1,000 users with about 40,000-recorded problems. The database tracks and creates reports for any type of problem encountered by EP (product, test, obsolete parts, production) as well as lessons learned, action items, corrective actions, and PRs against the database itself. Intranet Access via the Boeing web is partially implemented. This provides access from more locations but with slower response times than those experienced by users who access the system directly using the software application windows front end. The database is migrated off the VAX to a Unix operating system. Lessons Learned You might consider commercial off-the-shelf tools that may meet most of your requirements or investigate other organizations or divisions using a tool similar to your requirements. Implementing this system has been a lot of work spread sporadically over nine years. Remember, EP had in-house Oracle developers; many organizations do not have similar resources. Establish a team to consider new requirements. One result of nine years of experience is EP's use of an Engineering Review Board that helps determine if a change or enhancement will benefit all users. This database has a PR type, which enables the users to write a PR against the PR system. These PRs include problems encountered and enhancements. Designate a focal point to write and transmit requirements to the database programmer. This should be one or two people who are familiar with the database and who interface well with the programmer. Once approved by the Engineering Review Board, the focal point writes or rewrites the requirements to ensure the programmer easily understands them. The change/enhancement is added to the users manual and if the change/enhancement is substantial, training is scheduled. Consider the importance of your testing approach. For the first four years, a separate test database was maintained. Later, the test database was dropped because of the work required to maintain it. Testing is now performed on the production database. This has been a cost-effective compromise that has resulted in few bugs. Make sure there is room to grow. In this case, text fields have grown from 20,000 characters to 65,000. Other organizations may decide to use the database once it has 51 Shivkanth Rohith www.aryanrohith.cjb.net proven itself. Also, the more you give users, the more they seem to want. Sponsorship can make it easier. Once senior management saw the benefit of using a standard process, a single PR, and a single database, EP no longer had to maintain separate requirements, processes, procedures, and updates. Technology improvements can make it easier. By providing pull-down and pop-up lists within the PR fields, diverse users could live with the generic PR. Training is essential to the effectiveness of the database and the user. It saves users hours of time by not having to struggle with an unknown element, and gives them hands-on experience. Help lines and training manuals are important in helping users further understand what they are expected to do. Help lines appear throughout most of the screens to inform and ensure that the PR is completed accurately. The training manual for this database is set up for step-by-step instructions with visual aids. Summary Boeing's need for a CM database has been outlined. Defining the database requirements in specific terms for design and implementation presented some challenges. The implementation presented user interface challenges. The system has evolved to meet new requirements and to exploit new technology. About the Author Susan C. Grosjean is a configuration management specialist with the Electronic Products Group. She has 20 years of configuration management experience at Boeing. Prior to developing the database described in this article, she developed status accounting databases for the B1-B and B2 bombers. Boeing Commercial Airplane Group Notes 1. Electrical and electronic devices in aviation, including the software embedded in those devices. 2. For BCAG, the 777 airplanes represented an unprecedented software challenge in terms of the size and complexity of the airplane's systems. The 2.5 million lines of newly developed software were approximately six times more than any previous Boeing commercial airplane development program. Including commercial-off-theshelf and optional software, the total size is more than 4 million lines of code. This software was implemented in 79 different systems produced by various suppliers throughout the world. CrossTalk January 1996, "Software Development for the Boeing 777." 3. Line replaceable unit is the terminology for the circuit board, drawer, or cabinet that can be removed and replaced when hardware failure occurs. 4. Class I Changes are called Engineering Change Proposals (ECPs) and consist of changes that are so extensive that additional customer funds must be contracted to implement the change. Changes smaller that Class I are called Class II. 52 Shivkanth Rohith 8 www.aryanrohith.cjb.net Tools Every organisation uses tools to support the IT organisations in delivering the services and performing the processes. The type of tool you need, is fully dependent on what you want to get out of it. For small organisations, the tool can be an Excel Spreadsheet. This can be more than sufficient. For larger organisations, you will probably be looking at other commercial tools. Try to design your processes before you start looking for a tool. This gives you the opportunity to really do a thorough Functional and technical specification before talking to the vendors. A lot of research can be done using the Internet. Many (tool) vendors have a list of appropriate tools listed with the specifications. Some of these sites are: http://www.tools2manage-it.com (Vendor: Pink Elephant) http://www.independenttool.nl/ click on: ‘marktoverzicht’ (top left corner) to find a clickable list of tools. (sponsored by Computer Associates) 8.1.1 Type of tools There are many tools on the market, and they are all slightly different from each other. The following list gives an example of some of the tools that organisations use: Service Desk Tools / Support Tools Heat Infra Remedy Compuware Peregrine Service Centre System Management tools HP Openview Qualiparc CA Unicentre 8.1.2 The Cost of a Tool When you have decided on a tool, you really need to have a close look at the cost. Most of the expenses are in getting the tool to work for you… - The following are some examples of the implementation cost of any tool: Initial Purchase Price Additional Licenses Maintenance contract Warranty Training of staff in using the tool Implementation expenses (consultancy hours) Fine-tuning the tool to your needs (consultancy and engineering hours) 53 Shivkanth Rohith - www.aryanrohith.cjb.net Updating the internal processes to fit the tool (consultancy and internal staff hours) EXTRA READING Field notes: Choosing a tool that fits. Varian Semiconductor Equipment Associates Inc. has plants in more than 40 locations in the U.S., Europe and the Far East, but the Gloucester, Mass.-based manufacturer had no way to monitor connections between facilities, says Troy Preble, manager of networks and technology. "Limited visibility meant managing the network whenever users called," says Preble. The company investigated several options, including outsourcing and using management framework products from IBM's Tivoli and HP before deciding on the WebNM network management suite from Somix Technologies Inc. in Sanford, Maine. Installing the suite took three days, including two spent on training. Immediately, Varian discovered one T1 line constantly operating at full capacity and an overprovisioned connection to Japan, which was running at less than 60% of capacity. Varian added a second T1 and cut back on the international line, drastically cutting costs and improving service. In addition to monitoring WAN connections, Varian also uses WebNM to monitor server uptime, performance statistics, hard-drive and CPU utilization, as well as monitoring all switches and routers. Preble acknowledges that WebNM doesn't have the full range of features available in a package from one of the big network management framework vendors but says the package met all of Varian's needs at a lower cost. "For what we wanted to do, WebNM cost one-tenth what we would have spent on Tivoli or OpenView," he says. Source: Computerworld 54 Shivkanth Rohith www.aryanrohith.cjb.net 9 Security Management 9.1 Introduction Business processes can no longer operate without a supply of information. In fact, more and more business processes consist purely of one or more information systems. Information Security Management is an important activity, which aims to control the provision of information, and to prevent unauthorised use of information. For many years, Information Security Management was largely ignored. However, this is changing. Security is now considered as one of the main management challenges for the coming years. The interest in this discipline is increasing because of the growing use of the Internet and e-commerce in particular. More and more businesses are opening electronic gateways into their business. This introduces the risk of intrusion. What risks do we want to cover, and what measures should we take now and in the next budgeting round? Senior Management has to take decisions and these decisions can only be taken if a thorough risk analysis is undertaken. This analysis should provide input to Security Management to determine the security requirements. These requirements affect IT service providers and should be laid down in Service Level Agreements. Security Management aims to ensure that the security aspects of services are provided at the level agreed with the customer at all times. Security is now an essential quality aspect of management. Security Management integrates security in the IT organisation from the service provider’s point of view. The Code of Practice for Information Security Management (BS 7799) provides guidance for the development, introduction and evaluation of security measures. 9.1.1 Basic concepts Security Management comes under the umbrella of Information Security, which aims to ensure the safety of information. Safety refers to not being vulnerable to known risks, and avoiding unknown risks where possible. The tool to provide this is security. The aim is to protect the value of the information. This value depends on confidentiality, integrity and availability. • • • Confidentiality: protecting information against unauthorised access and use. Integrity: accuracy, completeness and timeliness of the information. Availability: the information should be accessible at any agreed time. This depends on the continuity provided by the information processing systems. Secondary aspects include privacy (confidentiality and integrity of information relating to individuals), anonymity, and verifiability (being able to verify that the information is used correctly and that the security measures are effective). 9.2 Objectives In recent decades, almost all businesses have become more dependent on information systems. The use of computer networks has also grown, not only within businesses but also between them, and between businesses and the world outside. The increasing complexity of IT infrastructure means that businesses are now more vulnerable to technical failures, human error, intentional human acts, hackers and crackers, computer viruses, etc. This growing complexity requires a unified management approach. Security Management has important ties with other processes. Other ITIL processes, under the supervision of Security Management, carry out some security activities. Security Management has two objectives: • To meet the security requirements of the SLAs and other external requirements further to contracts, legislation and externally imposed policies. 55 Shivkanth Rohith • www.aryanrohith.cjb.net To provide a basic level of security, independent of external requirements Security Management is essential to maintaining the uninterrupted operation of the IT organisation. It also helps to simplify Information Security Service Level Management, as it is much more difficult to manage a large number of different SLAs than a limited number. The process input is provided by the SLAs, which specify security requirements, possibly supplemented by policy documents and other external requirements. The process also receives information about relevant security issues in other processes, such as security incidents. The output includes information about the achieved implementation of the SLAs, including exception reports and routine security planning. At present, many organisations deal with Information Security at the strategic level in information policy and information plans, and at the operational level by purchasing tools and other security products. Insufficient attention is given to the active management of Information Security, the continuous analysis and translation of policies into technical options, and ensuring that the security measures continue to be effective when the requirements and environment change. The consequence of this missing link is that, at the tactical management level, significant investments are made in measures that are no longer relevant, at a time when new, more effective measures ought to be taken. Security Management aims to ensure that effective Information Security measures are taken at the strategic, tactical and operational levels. 9.2.1 Benefits Information Security is not a goal in itself; it aims to serve the interests of the business or organisation. Some information and information services will be more important to the organisation than others. Information Security must be appropriate to the importance of the information. Striking a balance between security measures and the value of the information, and threats in the processing environment develops tailor-made security. An effective information supply, with adequate Information Security is important to an organisation for two reasons: • Internal reasons: an organisation can only operate effectively if correct and complete information is available when required. The level of Information Security should be appropriate for this. • External reasons: the processes in an organisation create products and services, which are made available to the market or society, to meet defined objectives. An inadequate information supply will lead to substandard products and services, which cannot be used to meet the objectives and which will threaten the survival of the organisation. Adequate Information Security is an important condition for having an adequate information supply. The external significance of Information Security is therefore determined in part by the internal significance. Security can provide significant added value to an information system. Effective security contributes to the continuity of the organisation and helps to meet its objectives. 9.3 Process Organisations and their information systems change. Checklists such as the Code of Practice for Information Security Management are static and insufficiently address rapid changes in IT. For this reason, Security Management activities must be reviewed continuously to ensure their effectiveness. Security Management amounts to a neverending cycle of plan, do, check, and act. The activities undertaken by Security Management, or undertaken in other processes under the control of Security Management are discussed below. Figure 21 shows the Security Management cycle. The customer's requirements appear at the top right, as input to the process. The security section of the Service Level Agreement defines these requirements in terms of the security services and the level of security to be provided. 56 Shivkanth Rohith www.aryanrohith.cjb.net The service provider communicates these agreements to his organisation in the form of a Security Plan, defining the security standards or Operational Level Agreements. This plan is implemented, and the implementation is evaluated. The plan and its implementation are then updated. Service Level Management reports about these activities to the customer. Thus, the customer and the service provider together form a complete cyclical process. The customer can modify his requirements on the basis of the reports. And the service provider can adjust the plan or its implementation on the basis of these observations, or aim to change the agreements defined in the SLA. The control function appears in the middle of Figure 21. This diagram will now be used to discuss the Security Management activities. Figure 23: Security Management Cycle 9.3.1 Relationships with other processes Security Management has links with the other ITIL processes. This is because the other processes undertake security-related activities. These activities are carried out in the normal way, under the responsibility of the relevant process and process manager. However, Security Management gives instructions about the structure of the securityrelated activities to the other processes. Normally, these agreements are defined after consultation between the Security Manager and the other process managers. 57 Shivkanth Rohith www.aryanrohith.cjb.net Configuration Management In the context of Information Security, Configuration Management is primarily relevant because it can classify Configuration Items. This classification links the CI with specified security measures or procedures. The classification of a CI indicates its required confidentiality, integrity and availability. This classification is based on the security requirements of the SLA. The customer of the IT organisation determines the classification, as only the customer can decide how important the information or information systems are to the business processes. The customer bases the classification on an analysis of the extent to which the business processes depend on the information systems and the information. The IT organisation then associates the classification with the relevant CIs. The IT organisation must also implement this set of security measures for each classification level. These sets of measures can be described in procedures. Example: ‘Procedure for handling storage media with personal data’. The SLA can define the sets of security measures for each classification level. The classification system should always be tailored to the customer's organisation. However, to simplify management it is advisable to aim for one unified classification system, even when the IT organisation has more than one customer. In summary, classification is a key issue. The CMDB should indicate the classification of each CI. This classification links the CI with the relevant set of security measures or procedure. Incident Management Incident Management is an important process for reporting security incidents. Depending on the nature of the incident, security incidents may be covered by a different procedure than other Incidents. It is therefore essential that Incident Management recognise security incidents as such. Any Incident, which may interfere with achieving the SLA security requirements, is classified as a security incident. It is useful to include a description in the SLA of the type of Incidents to be considered as security incidents. An Incident that interferes with achieving the basic internal security level (baseline) is also always classified as a security incident. Incidents reports are generated not only by users, but also by the management process, possibly on the basis of alarms or audit data from the systems. It is clearly essential that Incident Management recognise all security incidents. This is to ensure that the appropriate procedures are initiated for dealing with security incidents. It is advisable to include the procedures for different types of security incidents in the SLA plans and to practice the procedure. It is also advisable to agree a procedure for communicating about security incidents. It is not unusual for panic to be created by rumours blown out of proportion. Similarly, it is not unusual for damage to result from a failure to communicate in time about security incidents. It is advisable to route all external communications related to security incidents through the Security Manager. Problem Management Problem Management is responsible for identifying and solving structural security failings. A Problem may also introduce a security risk. In that case, Problem Management must involve Security Management in resolving the Problem. Finally, the solution or workaround for a Problem or Known Error must always be checked to ensure that it does not introduce new security problems. This verification should be based on compliance with the SLA and internal security requirements. Change Management Change Management activities are often closely associated with security because Change Management and Security Management are interdependent. If an acceptable security level has been achieved and is managed by the Change Management process, then it can be ensured that this level of security will also be provided after Changes. There are a number of standard operations to ensure that this security level is maintained. Each RFCs is associated with a number of parameters, which govern the acceptance procedure. The 58 Shivkanth Rohith www.aryanrohith.cjb.net urgency and impact parameters can be supplemented by a security parameter. If an RFCs can have a significant impact on Information Security then more extensive acceptance tests and procedures will be required. The RFCs should also include a proposal for dealing with security issues. Again, this should be based on the SLA requirements and the basic level of internal security required by the IT organisation. Thus, the proposal will include a set of security measures, based on the Code of Practice. Preferably, the Security Manager (and possibly also the customer’s Security Officer) should be a member of the Change Advisory Board (CAB). Nevertheless, the Security Manager need not be consulted for all Changes. Security should normally be integrated with routine operations. The Change Manager should be able to decide if they or the CAB need input from the Security Manager. Similarly, the Security Manager need not necessarily be involved in the selection of measures for the CIs covered by the RFCs. This is because the framework for the relevant measures should already exist. Any questions should only relate to the way in which the measures are implemented. Any security measures associated with a Change should be implemented at the same time as the Change itself, and be included in the tests. Security tests differ from normal functional tests. Normal tests aim to investigate if defined functions are available. Security tests not only address the availability of security functions, but also the absence of other, undesirable functions as these could reduce the security of the system. In terms of security, Change Management is one of the most important processes. This is because Change Management introduces new security measures in the IT infrastructure, together with Changes to the IT infrastructure. Release Management All new versions of software, hardware, data communications equipment, etc. should be controlled and rolled out by Release Management. This process will ensure that: • • • • • • • The correct hardware and software are used. The hardware and software are tested before use. The introduction is correctly authorised using a Change. The software is legal. The software is free from viruses and that viruses are not introduced during its distribution. The version numbers are known, and recorded in the CMDB by Configuration Management. The rollout is managed effectively. This process also uses a regular acceptance procedure, which should include Information Security aspects. It is particularly important to consider security aspects during testing and acceptance. This means that the security requirements and measures defined in the SLA should be complied with at all times. Service Level Management Service Level Management ensures that agreements about the services to be provided to customers are defined and achieved. The Service Level Agreements should also address security measures. The objective is to optimise the level of service provided. Service Level Management includes a number of related security activities, in which Security Management plays an important role: 1. Identification of the security needs of the customer. Naturally, determining the security needs is the responsibility of the customer as these needs are based on their business interests. 2. Verifying the feasibility of the customer's security requirements. 3. Proposing, discussing and defining the security level of the IT services in the SLA. 59 Shivkanth Rohith www.aryanrohith.cjb.net 4. Identifying, developing and defining the internal security requirements for IT services (Operational Level Agreements). 5. Monitoring the security standards (OLAs). 6. Reporting on the IT services provided. Security Management provides input and support to Service Level Management for activities 1 - 3. Security Management carries out activities 4 and 5. Security Management and other processes provide input for activity 6. The Service Level Manager and the Security Manager decide in consultation that actually undertakes the activities. When defining an SLA it is normally assumed that there is a general basic level of security (baseline). Additional security requirements of the customer should be clearly defined in the SLA. Availability Management Availability Management addresses the technical availability of IT components in relation to the availability of the service. The quality of availability is assured by continuity, maintainability and resilience. Availability Management is the most important process related to availability. As many security measures benefit both availability and the security aspects confidentiality and integrity, effective coordination of the measures between Availability Management, IT Service Continuity Management, and Security Management is essential. Capacity Management Capacity Management is responsible for the best possible use of IT resources, as agreed with the customer. The performance requirements are based on the qualitative and quantitative standards defined by Service Level Management. Almost all Capacity Management activities affect availability and therefore also Security Management. IT Service Continuity Management IT Service Continuity Management ensures that the impact of any contingencies is limited to the level agreed with the customer. Contingencies need not necessarily turn into disasters. The major activities are defining, maintaining, implementing, and testing the contingency plan, and taking preventive action. Because of the security aspects, there are ties with Security Management. On the other hand, failure to fulfil the basic security requirements may be considered itself as a contingency. 9.3.2 Security section of the Service Level Agreement The Service Level Agreement (SLA) defines the agreements with the customer. The Service Level Management process is responsible for the SLA (see also Chapter 11). The SLA is the most important driver for all ITIL processes. The IT organisation indicates to what extent the requirements of the SLA are achieved, including security requirements. The security elements addressed in the SLA should correspond to the security needs of the customer. The customer should identify the significance of all business processes. These business processes depend on IT services, and therefore on the IT organisation. The customer determines the security requirements on the basis of a risk analysis. The security elements are discussed between the representative of the customer and the representative of the service provider. The service provider compares the customer's Service Level Requirements with their own Service Catalogue, which describes their standard security measures (the Security Baseline). The customer may have additional requirements. The customer and provider compare the Service Level Requirements and the Service Catalogue. The security section of the SLA can address issues such as the general Information Security policy, a list of authorised personnel, asset protection procedures, restrictions on copying data, etc. 60 Shivkanth Rohith www.aryanrohith.cjb.net 9.3.3 The security section of the Operational Level Agreement The Operational Level Agreement is another important document. It describes the services provided by the service provider. The provider must associate these agreements with responsibilities within the organisation. The Service Catalogue gives a general description of the services. The Operational Level Agreement translates these and general descriptions into all services and their components, and the way in which the agreements about the service levels are assured within the organisation. Example: the Service Catalogue refers to ‘managing authorisations per user and per individual’. The Operational Level Agreements details this for all relevant services provided by the IT organisation. In this way, the implementation of the measure is defined for the departments providing UNIX, VMS, NT, Oracle services, etc. Where possible, the customer's Service Level Requirements are interpreted in terms of the provider's Service Catalogue, and additional agreements are concluded where necessary. Such additional measurements exceed the standard security level. When drafting the SLA, measurable Key Performance Indicators (KPI) and criteria must also be agreed for Security Management. KPIs are measurable parameters (metrics), and performance criteria are set at achievable levels. In some cases it will be difficult to agree on measurable security parameters. This is easier for availability, which can generally be expressed numerically. However, this is much more difficult for integrity and confidentiality. For this reason, the security section of the SLA normally describes the required measures in abstract terms. The Code of Practice for Information Security Management is used as a basic set of security measures. The SLA also describes how performance is measured. The IT organisation (service provider) must regularly provide reports to the user organisation (customer). 61 Shivkanth Rohith www.aryanrohith.cjb.net EXTRA READING Central Command Releases Its Annual Computer Security Survey Results for 2002 Virus protection concerns continue to increase among P2P users; Cyber-warfare likely according to respondents MEDINA, Ohio, September 24, 2002 - Central Command, Inc., a leading provider of PC anti-virus software and computer security services announced today the findings of its annual security survey. The survey, reflecting up-to-date industry trends, was e-mailed to 943,026 PC users worldwide and explored individual's computer security settings and behaviours with known computer security vulnerabilities. With a 7% response rate, the survey provides valuable insight on the constant battle with computer viruses. Given the horrific events of September 11, 2001, multiple questions were given seeking user opinion on the world's war on terror in regards to online security. The survey surprisingly showed that nearly three-fourths of the respondents felt that some form of cyber-warfare is likely to occur in the near future. In a related question, 67% strongly feel that their respective country is not yet prepared to combat against such a major threat. The top response rate came out of North America. The results also displayed a significant increase in virus awareness. For the first time, responses showed that users are beginning to understand the importance of protecting themselves from computer viruses and other forms of malicious software (malware). When asked about the handling of email attachments from an unknown source, the results showed that 58% of the respondents would delete the attachment immediately and 41% expressed they practiced extreme caution when viewing any attachment regardless of the sender. "These numbers mark a great improvement from a year ago. We have relentlessly preached about the risks associated with email attachments and we're finally seeing that message sticking in users minds. Unfortunately, the lesson doesn't end with email-based worms as other types of applications are now being targeted," said Steven Sundermeier, product manager at Central Command, Inc. On a similar note, 29% claim to have all the latest security patches installed, up 15% from a year ago. As file-sharing applications like KaZaA, iMesh, Napster continue to increase in popularity so does the number of malicious programs (ie. trojans, worms) written to exploit these networks. Of the 65% of surveyed users who regularly use these types of applications, over 39% responded that they were unaware of any dangers associated with file sharing and the security holes they can open. An increase in P2P instant messaging software as the preferred form of online communication raised concern about the future of virus writing trends in these channels. The number one reported virus, according to the survey, was Worm/Klez.E,G. Nearly one-forth of home and office PC users responding claimed Klez infected their system over the last year. One in every ten responses claimed to have had a W32/Nimda infection. "This statistic closely mirrors the number of virus occurrences confirmed through our Emergency Virus Response Team. Based on the percentage of all tracked viruses, our top five tracked viruses for the summer of 2002 included Worm/Klez.E,G (52.1%), W32/Elkern.C (13.8%), W32/Yaha.E (9.6%), Worm/W32.Sircam (5.7%), and W32/Nimda (2.4%)." Central Command noted that over the past year infection reports, factoring out the top five infectors, have dropped 3.2% over the past year. 62 Shivkanth Rohith www.aryanrohith.cjb.net The full results of the survey are published on Central Command's website at: http://www.centralcommand.com/safesurvey2002.html Vexira Antivirus starts at $29.95, and a free 30-day trial version may be downloaded by clicking here or obtained by contacting Central Command at +1-330 723-2062. About Central Command: A leader in the anti-virus industry, Central Command, Inc., a privately held company, serves home PC users and industrial, financial, government, education and service firms with virus protection software, services, and information. The company services customers in over 91 countries and is headquartered in Medina, Ohio. Visit Central Command online at (www.centralcommand.com) or call 1-330-723-2062 for more information. Central Command, EVRT, Vexira, and Emergency Virus Response Team are trademarks of Central Command, Inc. All other trademarks, trade names, and products referenced herein are property of their respective owners. 63 Shivkanth Rohith www.aryanrohith.cjb.net 9.4 Activities 9.4.1 Control - Information Security policy and organisation The Control activity in the centre of Figure 21 is the first sub process of Security Management and relates to the organisation and management of the process. This includes the Information Security management framework. This framework describes the sub processes: the definition of security plans, their implementation, evaluation of the implementation, and incorporation of the evaluation in the annual security plans (action plans). The reports provided to the customer, via Service Level Management, are also addressed. This activity defines the sub processes, security functions, and roles and responsibilities. It also describes the organisational structure, reporting arrangements, and line of control (who instructs who, who does what, how is the implementation reported). The following measures from the Code of Practice are implemented by this activity. Policy: • Policy development and implementation, links with other policies. • Objectives, general principles and significance. • Description of the sub processes. • Allocating functions and responsibilities for sub processes. • Links with other ITIL processes and their management. • General responsibility of personnel. • Dealing with security incidents. Information Security organisation: • Management framework. • Management structure (organisational structure). • Allocation of responsibilities in greater detail. • Setting up an Information Security Steering Committee. • Information Security coordination. • Agreeing tools (e.g. for risk analysis and improving awareness). • Description of the IT facilities authorisation process, in consultation with the customer. • Specialist advice. • Cooperation between organisations, internal and external communications. • Independent EDP audit. • Security principles for access by third parties. • Information Security in contracts with third parties. 9.4.2 Plan The Planning sub process includes defining the security section of the SLA in consultation with Service Level Management, and the activities in the Underpinning Contracts related to security. The objectives in the SLA, which are defined in general terms, are detailed and specified in the form of an Operational Level Agreement. An OLA can be considered as the security plan for an organisational unit of the service provider, and as a specific security plan, for example for each IT platform, application and network. The Planning sub process not only receives input from the SLA but also from the service provider's policy principles (from the Control sub process). Examples of these principles include: ‘Every user should be uniquely identifiable’, and ‘A basic security level is provided to all customers, at all times.’ The Operational Level Agreements for Information Security (specific security plans) are drafted and implemented using the normal procedures. This means that, should activities be required in other processes, there will have to be coordination with these processes. Change Management using input provided by Security Management makes any required Changes to the IT infrastructure. The Change Manager is responsible for the Change 64 Shivkanth Rohith www.aryanrohith.cjb.net Management process. The Planning sub process is discussed with Service Level Management to define, update and comply with the security section of the SLA. The Service Level Manager is responsible for this coordination. The SLA should define the security requirements, where possible in measurable terms. The security section of the agreement should ensure that all the customer's security requirements and standards could be verifiably achieved. 9.4.3 Implement The Implementation sub process aims to implement all the measures specified in the plans. The following checklist can support this sub process. Classification and management of IT resources: • Providing input for maintaining the CIs in the CMDB • Classifying IT resources in accordance with agreed guidelines Personnel security: • Tasks and responsibilities in job descriptions. • Screening. • Confidentiality agreements for personnel. • Training. • Guidelines for personnel for dealing with security incidents and observed security weaknesses. • Disciplinary measures. • Increasing security awareness. Managing security: • Implementation of responsibilities, implementation of job separation. • Written operating instructions. • Internal regulations. • Security should cover the entire life cycle; there should be security guidelines for system development, testing, acceptance, operations, maintenance and phasing out. • Separating the development and test environments from the production environment. • Procedures for dealing with incidents (handled by Incident Management). • Implementation of recovery facilities. • Providing input for Change Management. • Implementation of virus protection measures. • Implementation of management measures for computers, applications, networks and network services. • Handling and security of data media. Access control: • Implementation of access and access control policy. • Maintenance of access privileges of users and applications to networks, network services, computers, and applications. • Maintenance of network security barriers (firewalls, dial-in services, bridges and routers). • Implementation of measures for the identification and authentication of computer systems, workstations and PCs on the network. 9.4.4 Evaluate An independent evaluation of the implementation of the planned measures is essential. This evaluation is needed to assess the performance and is also required by customers and third parties. The results of the Evaluation sub process can be used to update the 65 Shivkanth Rohith www.aryanrohith.cjb.net agreed measures in consultation with the customers, and also for their implementation. The results of the evaluation may suggest changes, in which case an RFCs is defined and submitted to the Change Management process. There are three forms of evaluation: • Self-assessments: primarily implemented by the line organisation of the processes. • Internal audits: undertaken by internal EDP auditors. • External audits: undertaken by external EDP auditors. Unlike self-assessments, the same personnel that act in the other sub processes do not undertake audits. This is to ensure that the responsibilities are separated. An Internal Audit department may undertake audits. Evaluations are also carried out in response to security incidents. The main activities are: • Verifying compliance with the security policy and implementation of security plans. • Performing security audits on IT systems. • Identifying and responding to inappropriate use of IT resources. • Undertaking the security aspects of other EDP audits. 9.4.5 Maintenance Security requires maintenance, as the risks change due to changes in the IT infrastructure, organisation and business processes. Security maintenance includes the maintenance of the security section of the SLA and maintenance of the detailed security plans (Operational Level Agreements). Maintenance is carried out on the basis of the results of the Evaluation sub process and an assessment of changes in the risks. These proposals can either be introduced into the Planning sub process, or included in the maintenance of the SLA as a whole. In either case, the proposals can result in including activities in the annual security plan. Any changes are subject to the normal Change Management process. 9.4.6 Reporting Reporting is not a sub process, but an output of the other sub processes. Reports are produced to provide information about the achieved security performance and to inform the customers about security issues. These reports are generally required under agreement with the customer. Reporting is important, both to the customer and to the service provider. Customers must be informed correctly about the efficiency of the efforts (e.g. with respect to implementing security measures), and the actual security measures. The customer is also informed about any security incidents. A list with some suggestions for reporting options is included below. Examples of scheduled reports and reportable events: The Planning sub process • Reports about the extent of compliance with the SLA and agreed Key Performance Indicators for security. • Reports about Underpinning Contracts and any problems associated with them. • Reports about Operational Level Agreements (internal security plans) and the provider's own security principles (e.g. in the baseline). • Reports about annual security plans and action plans. The Implementation sub process • Status reports about the implementation of Information Security. This includes progress reports about the implementation of the annual security plan, possibly a 66 Shivkanth Rohith • • • www.aryanrohith.cjb.net list of measures which have been implemented or are yet to be implemented, training, outcome of additional risk analyses, etc. A list of security incidents and responses to these incidents, optionally a comparison with the previous reporting period. Identification of incident trends. Status of the awareness programme. The Evaluation sub process • Reports about the sub process as such. • Results of audits, reviews, and internal assessments. • Warnings, identification of new threats. Specific reports To report on security incidents defined in the SLA, the service provider must have a direct channel of communication to a customer representative (possibly the Corporate Information Security Officer) through the Service Level Manager, Incident Manager or Security Manager. A procedure should also be defined for communication in special circumstances. Apart from the exception in the event of special circumstances, reports are communicated through Service Level Management. 9.5 Process control 9.5.1 Critical success factors and performance indicators The critical success factors are: • Full management commitment and involvement. • User involvement when developing the process. • Clear and separated responsibilities. The Security Management performance indicators correspond with the Service Level Management performance indicators, in so far as these relate to security issues covered by the SLA. 9.5.2 Functions and roles In small IT organisations, one person may manage several processes. While in large organisations, several persons will be working on one process, such as Security Management. In this case there is normally one person appointed as Security Manager. The Security Manager is responsible for the effective operation of the Security Management process. Their counterpart in the customer's organisation is the Information Security Officer, or Corporate Information Security Officer. 9.6 Problems and costs 9.6.1 Problems The following issues are essential to the successful implementation of Security Management: • Commitment: security measures are rarely accepted immediately, resistance is more common than acceptance. Users resent losing certain privileges due to security measures, even if these facilities are not essential to their work. This is because the privileges give them a certain status. A special effort will therefore have to be made to motivate users, and to ensure that management complies with the security measures. In the field of Security Management in particular, management must set an example (‘walk the talk’ and ‘lead by example’). If there are no security incidents, then management may be tempted to reduce the Security Management budget. 67 Shivkanth Rohith • • • • • • www.aryanrohith.cjb.net Attitude: information systems are not insecure due to technical weaknesses, but due to the failure to use the technology. This is generally related to attitude and human behaviour. This means that security procedures must be integrated with routine operations. Awareness: awareness, or rather communication, is a key concept. There sometimes appears to be a conflict of interest between communication and security – communication paves the road, while security creates obstacles. This means that implementing security measures requires the use of all communication methods to ensure that users adopt the required behaviour. Verification: it should be possible to check and verify security. This concerns both the measures introduced, and the reasons for taking these measures. It should be possible to verify that the correct decisions have been taken in certain circumstances. For example, it should also be possible to verify the authority of the decision-makers. Change Management: frequently the verification of continued compliance with the basic level of security wanes over time when assessing Changes. Ambition: when an organisation wants to do everything at once, mistakes are often made. When introducing Security Management, the implementation of technical measures is much less important than organisational measures. Changing an organisation requires a gradual approach and will take a long time. Lack of detection systems: new systems, such as the Internet, were not designed for security and intruder detection. This is because developing a secure system takes more time than developing a non-secure system, and conflicts with the business requirements of low development costs and a short time-to-market. 9.6.2 Costs Securing the IT infrastructure demands personnel, and therefore money, to take, maintain and verify measures. However, failing to secure the IT infrastructure also costs money (cost of lost production; cost of replacement; damage to data, software, or hardware; loss of reputation; fines or compensation relating to failure to fulfil contractual obligations). As always, a balance will have to be struck. 68