NFV Quality Management Framework Proposal
Transcription
NFV Quality Management Framework Proposal
NFV Quality Management Framework Proposal Eric Bauer May 11th 2015 1 COPYRIGHT © 2015 ALCATEL-LUCENT. ALL RIGHTS RESERVED. Summary • ETSI is driving standards for network function virtualization (NFV), including “Network Function Virtualization Service Quality Metrics” (published in December ’14) • TM Forum is driving standards for SLA management, including GB917 “SLA Management Handbook, Volume 2, Concepts and Principles” and TR178 “Enabling End-to-End Cloud SLA Management “ • QuEST Forum drives TL 9000 Measurements Handbook defining objective and quantitative measurements (e.g., outages) for quality management of telecom networks and equipment • NFV Strategic Initiative was chartered by QuEST Forum’s Executive Board, and that group is now working a draft “NFV Quality Management Framework” which enables objective and quantitative prediction, control and quality improvement of NFV-based services and applications - This presentation covers the 4/29/15 review draft of that Framework 2 COPYRIGHT © 2015 ALCATEL-LUCENT. ALL RIGHTS RESERVED. QuEST Forum Executive Board NFV Strategic Initiative Team 3 COPYRIGHT © 2015 ALCATEL-LUCENT. ALL RIGHTS RESERVED. The Network Function Virtualization Vision Today…. Tomorrow… a.k.a., cloud-based applications, or virtualized network functions (VNFs) VNF Service Provider Organization Figure from ETSI NFV Whitepaper http://portal.etsi.org/nfv/nfv_white_paper.pdf 4 COPYRIGHT © 2015 ALCATEL-LUCENT. ALL RIGHTS RESERVED. NFV Infrastructure Service Provider Organization Fundamental Changes due to NFV 1. Decoupling Software from Hardware 2. Shared compute, memory, storage and networking infrastructure 3. Automated Resource and Application Lifecycle Management 4. Automated Network Service Lifecycle Management 5. Dynamic Operation 6. Increasingly Complex Multivendor Environment Today…. PNF 1 PNF 2 5 COPYRIGHT © 2015 ALCATEL-LUCENT. ALL RIGHTS RESERVED. VNF 1 Tomorrow… VNF 2 SW SW SW SW OS OS OS OS HW HW Traditional Network Function Deployment NFV MANO NFV Infrastructure Virtual Network Function Deployment Objectives of the NFV Quality Measurement Framework NFV quality measurements should be… 1. Quantitative and Objective 2. Future proof 3. Recognize Different Resource Service Quality Expectations 4. Recognize Different Application and Service Architectures 5. Support Leading Service Quality Indicators 6. Enable Side-by-Side Physical and Virtual Quality Comparisons 7. Principle of Simplicity (Parsimony) 6 COPYRIGHT © 2015 ALCATEL-LUCENT. ALL RIGHTS RESERVED. Sample Motivational Scenario 1. One organization offers VoLTE on NFV infrastructure shared by other tenants (e.g., EPC, IP-TV, enterprise communications) 2. VoLTE provider buys best-in-breed VNFs from several suppliers 3. VoLTE provider and partners write the policies and descriptors that configure and chain VNFs… 5. NFV Infrastructure service provider’s systems from other suppliers automatically apply policies and descriptors to configure and operate VoLTE provider’s (and other tenants’) service…. 4. NFV Infrastructure service provider buys COTS servers, storage and switches from several equipment suppliers How does one rapidly localize, drive root cause analysis and decide on corrective actions in this shared, decoupled, flexible, dynamic and multi-vendor environment? 7 COPYRIGHT © 2015 ALCATEL-LUCENT. ALL RIGHTS RESERVED. Underlay figure from “Network Functions Virtualization (NFV); Architectural Framework,” GS NFV 002 V1.2.1 (2014-12) End-to-End Quality Management Framework (TR178 Style View) Primary functionality of VNF drives TL product category assignment Cloud Service User Customer Role Application Service CSP: Network Provider Provider Role 8 Provider Role Integrator Role Network Service Virtual Machine and Virtual Networking Services COPYRIGHT © 2015 ALCATEL-LUCENT. ALL RIGHTS RESERVED. Cloud Service Customer (CSC) Customer Role From Cloud Service Developers (a.k.a., VNF Suppliers) Customer Role VNF Service Provider Role VNF 1 Customer Role VNF Service Provider Role VNF N Customer Role Func.Com p. as-a-service Provider Role Customer Role MANO Service Provider Role Customer Role NFVI Service Provider Role Automated Lifecycle Management and Orchestration Services Cloud Service Provider (CSP): NFV Infrastructure, Management & Orchestration, Functional Component as-a-Service Functional Component (Entity as-a-Service NFV Quality Management Framework NFV Quality Measurement Cloud Framework enables Service objective and User quantitative quality measurements across Application Customer NFV reference points Role Service to support end-to-end quality and SLA management CSP: Network Provider Provider Role 9 COPYRIGHT © 2015 ALCATEL-LUCENT. ALL RIGHTS RESERVED. Network Service Cloud Service Customer (CSC) Provider Role Integrator Role Customer Role From Cloud Service Developers (a.k.a., VNF Suppliers) Customer Role Customer Role Customer Role Customer Role Customer Role VNF-1 SLA VNF-N SLA FC aaS SLA MANO SLA INFRA SLA Provider Role VNF 1 Provider Role VNF N Provider Role Provider Role Provider Role Cloud Service Provider (CSP): NFV Infrastructure, Management & Orchestration, Functional Component as-a-Service Outage Metrics in NFV Quality Measurement Framework (User) Service Impact Outages (SO) Cloud Service User Customer Role User Service SLA 10 COPYRIGHT © 2015 ALCATEL-LUCENT. ALL RIGHTS RESERVED. Provider Role Integrator Role CSP: Network Provider Provider Role Cloud Service Customer (CSC) Network Service Customer Role From Cloud Service Developers (a.k.a., VNF Suppliers) Customer Role Customer Role Customer Role Customer Role Customer Role VNF-1 SLA VNF-N SLA FC aaS SLA MANO SLA INFRA SLA Provider Role Provider Role Provider Role Provider Role Provider Role VNFService 1 Primarily Impact Outage (SO) VNF N Likely to be Service Cloud Service Impact Outage Provider (CSP): (SO) ([NFV_SQM] NFV Infrastructure, TcaaS outage) Management & Orchestration, Functional (Primarily) Network Component Element Impact as-a-Service Outages (SONE) Transaction Metrics in NFV Quality Measurement Framework Cloud Service Customer (CSC) Cloud Service User Customer Role User Service SLA Integrator Role CSP: Network Provider Provider Role Provider Role Network Service 11 COPYRIGHT © 2015 ALCATEL-LUCENT. ALL RIGHTS RESERVED. Customer Role From Cloud Service Developers Functional-Component(a.k.a., VNF Suppliers) Customer Role Customer Role Customer Role Customer Role Customer Role VNF-1 SLA VNF-N SLA FC aaS SLA MANO SLA INFRA SLA Provider Role Provider Role Provider Role Provider Role Provider Role (Entity) as-a-Service VNF 1 Reliability and Service Latency VNF NManagement Lifecycle Action Reliability and Latency Cloud Service Provider (CSP): Notification Accuracy NFV Fault Infrastructure, Management & (Reliability) and Orchestration, Timeliness (Latency) Functional Component Virtual Machine and as-a-Service Virtual Network Reliability and Latency Transaction Quality Model Input Event Lifecycle Management Request Processing Output Correct Response Non-instantaneous processing action Failure Event Latency Operation Latency is the elapsed time between the triggering event and the corresponding correct or incorrect response Incorrect Response Unacceptably Late Response Operation Reliability is the ratio of correct responses to the sum of correct, incorrect, unacceptably late and no response events No Response Incorrect Response DPM + Unacceptably Late DPM + Non-Response DPM = Operation Reliability DPM DPM = defective operations per million requests 12 COPYRIGHT © 2015 ALCATEL-LUCENT. ALL RIGHTS RESERVED. Measure Operational Quality (Reliability and Latency) Across Standard NFV Reference Points Examples: Network Service Lifecycle Management (7.1.2) Network Service Fault Management (7.1.5) VNF Lifecycle Management (7.2.4) Virtualized Resources Management (7.3.3) Virtualized Resources Fault Management (7.3.5) VNF Fault Management (7.2.8) Virtual Machine and Virtual Networking Reliability and Latency Section numbers in callouts (e.g., 7.1.2) are from “Network Functions Virtualization (NFV); Management and Orchestration,” 13 GS NFV-MAN 001 V1.1.1 (2014-12) document COPYRIGHT © 2015 ALCATEL-LUCENT. ALL RIGHTS RESERVED. Proposed Lifecycle Management Errors [TL_9000] Procedural Error An error that is the direct result of human intervention or error. (Proposed) Lifecycle Management Error Example of Elevated VNF Service Risk due to Lifecycle Management Error An error that is the direct result of policy, management, or orchestration. Contributing factors can include but are not limited to… Failing to continuously enforce anti-affinity placement rules for VNFCs a) deviations from accepted practices or documentation, can lead to both primary and protecting VNFC instances appearing in a single NFV infrastructure failure group b) inadequate training, not generally applicable c) unclear, incorrect, or out-of-date faulty or out of date: automation scripts; Proper execution of faulty or out of date scripts can produce faulty and documentation, service, VNF or resource descriptors; etc higher risk (e.g., simplex) VNF configurations d) inadequate or unclear displays, messages, inadequate, insufficient or stale FCAPS Inadequate, insufficient or stale performance information can produce or signals, input data faulty elastic capacity management decisions e) inadequate or unclear hardware labeling, not generally applicable f) miscommunication, not generally applicable Configuring non-standard third party software to monitor, manage, g) non-standard configurations, backup or control a VNF instance. Failing to diligently monitor alarms and correct unsuccessful VNF repair h) insufficient supervision or control actions can leave impacted VNF simplex exposed i) user characteristics such as mental faulty execution of policy by a management Faulty execution of automation scripts can produce faulty and higher attention, physical health, physical fatigue, or orchestration element risk (e.g., simplex) VNF configurations mental health, and substance abuse. tardy execution of lifecycle management Insufficient automated management and orchestration capacity or other action causes result in late execution of VNF repair, capacity change or other lifecycle management action, thereby prolonging service risk to target VNF. risky operational policies Failing to maintain sufficient spare application capacity online can yield poor user service quality when unforecast surges in offered workload occur during capacity change lead time intervals 14 Standardizing definition of lifecycle management errors enables richer conversations about roles, responsibilities and accountabilities…before outages occur COPYRIGHT © 2015 ALCATEL-LUCENT. ALL RIGHTS RESERVED. Outlook • 4/29/15 draft of NFV Quality Management Framework will be reviewed on 5/21/15 at QuEST Forum meeting; goal is to baseline this non-normative document 3Q15 • ETSI NFV work item in support of NFV Quality Management Framework will be considered later this month • QuEST Forum will continue to work with TM Forum, ETSI NFV, NIST and appropriate other SDOs to enable standardized, objective and quantitative metrics to facilitate rapid and accurate fault localization, root cause analysis, and end-to-end SLA management 15 COPYRIGHT © 2015 ALCATEL-LUCENT. ALL RIGHTS RESERVED. Questions? Eric.Bauer@alcatel-lucent.com 16 COPYRIGHT © 2015 ALCATEL-LUCENT. ALL RIGHTS RESERVED. 17 COPYRIGHT © 2015 ALCATEL-LUCENT. ALL RIGHTS RESERVED.