Slides
Transcription
Slides
Rocky K. C. Chang, Edmond Chan, Waiting Fok, and Weichao Li The Hong Kong Polytechnic University Hung hom, Kowloon, Hong Kong APRICOT 2010 1 Problem statement Measurement system Measurement methodology Interesting findings Conclusions 2 Source: http://www.jucc.edu.hk/jucc/harnet.html 3 Wide area network linking up eight tertiary institutions in HK Managed by Joint Universities Computer Centre (JUCC) − Coordinate IT service of mutual interest Provide high-speed speed optical backbone network and Internet connectivity − Bulk tendering and selection of Internet service provider – (PCCW Wharf) 4 Collect reliable performance data for operation and planning purposes. Justifications for service upgrade Evaluate the fairness of resource sharing among the eight institutions. Achieve some kind of “fairness”. Improve the quality of network services. Less optimal routes Fault locations 5 Problem statement Measurement system Measurement methodology Interesting findings Conclusions 6 Operating since 1 Jan 2009 Measurement side OneProbe: provide around around-the-clock path-quality monitoring Planetopus: a measurement management platform User side Web-based based report on measurement results Ad hoc performance diagnosis 7 User side HKU CUHK PolyU CityU BU HKUST OneProbe @HKIED Planetopus, database, etc OneProbe @LU OneProbe @HKUST OneProbe @BU OneProbe @CityU OneProbe @PolyU OneProbe @CUHK OneProbe @HKU Measurement side 40+ web servers selected by the JUCC LU HKIED 8 9 10 11 12 Problem statement Measurement system Measurement methodology Interesting findings Conclusions 13 Continuous monitoring Configurable sampling rate and pattern Low overhead User-chosen websites TCP data-path measurement Middlebox friendly Multi-metric measurement Reverse Loss Reverse Re-ordering Forward Loss OneProbe RTT Jitter Forward Re-ordering Round-trip Capacity RTT 14 Deploying measurement tasks Monitoring the resources usage Secure measurement data collection Measurement data management 15 16 Problem statement Measurement system Measurement methodology Interesting findings Conclusions 17 18 • Strong and diurnal correlation between RTT and reverse-path path packet loss 19 • No correlation between RTT and reverse reversepath loss 20 • Good effect of a forward forward-route change 21 The three fault events according to public information: 9 Aug 1:37am(HKT) and 12 hours later − 12 Aug 10:50am(HKT) − EAC APCN2 17 Aug 2:20pm(HKT) − FNAL/RNAL 22 Path 9 Aug 12 Aug 17 Aug Australasia - NLA Diurnal RTT burst – 1200ms, up to 12 Aug Loss burst – 50%, 8 hrs X X Japan - Nissan X Rv Loss – 30% 17 hrs RTT burst – 1800ms 7 hrs Taiwan - TANET RTT increase Fw loss increase RTT increase 60ms Diurnal Rv loss – 10~50%, 22 hrs Diurnal Rv loss burst 10~50%, 17+ hrs US - Citibank X X RTT burst – 1800ms, 7hrs Rv Loss – 30%, 13 hrs Finland - Nokia X X Connectivity Lost 12hrs Rv Loss – 50% 1.5 days Korea - KREONET X X RTT increase to 400ms 23 24 25 Affected by the 9 Aug fault: RTT peaks of 1200ms up to 12 Aug 50%+ burst of losses at 2pm-10pm 2pm on 9 Aug PCCW → Pacnet → TransactSDN(AU) → NLA 9 Aug 13:37(HKT) 26 Affected by the 12 & 17 Aug faults: Burst of Rv Loss(30%) from 12 Aug 10am to 13 Aug 3am RTT burst of 1800ms on 17 Aug 2-9pm 2 PCCW → Equinix → NTT(US/JP) → OCN(JP) 12 Aug 10:50(HKT) 17 Aug 14:20(HKT) 27 Affected by the 12 & 17 Aug faults: RTT increased for 60ms since 12 Aug 15:00 Diurnal Rv Loss (10~50%) in 22 hrs since 12 Aug 16:20 and 17+ hrs since 21:40 17 Aug HKIX → ChungHwaTel → TANET 12 Aug 10:50(HKT) 17 Aug 14:20(HKT) 28 Affected by the 17 Aug fault: RTT burst of 1800ms Reverse-path path loss up to 40% From 17 Aug 2pm to 18 Aug 3am PCCW → BNA → AT&T 17 Aug 14:20(HKT) 29 Affected by the 17 Aug fault: Connectivity lost (OneProbe, TCPTraceroute) From 17 Aug 2pm to 18 Aug 2am Rv Loss burst up to 50% until 20 Aug 4pm PCCW → BNA → GBLX(US) → Nokia(Finland) 17 Aug 14:20(HKT) Connection lost 30 Affected by the 17 Aug fault RTT increased from 40ms to 400ms since 17 Aug 14:20 RTT burst of 400ms around 12 Aug 22:00 to 22:30 HARNET → ASGC (TW) → KREONET 12 Aug 10:50(HKT) 17 Aug 14:20(HKT) 31 Deploying and managing a distributed measurement system is very challenging. A reliable, non-cooperative cooperative measurement method A measurement management platform But such a system, if deployed and managed correctly, is very useful. More information obtained from contrasting for performance and fault diagnosis Currently monitoring the impact of switching to a new provider 32 33