D3.2 Traffic Models
Transcription
D3.2 Traffic Models
Project Deliverable CELTIC TRAMMS CP4-025 TRAMMS – TRAFFIC MEASUREMENTS AND MODELS IN MULTISERVICE NETWORKS DELIVERABLE D3.2 - TRAFFIC MODELS Editor full name Perényi Marcell/ Kåre Gustafsson Editor affiliation BUTE/EABS Editor email address perenyim@tmit.bme.hu/kare.gustafsson@ericsson.com Contributors Tord Westholm (EABS), Andreas Aurelius (Acreo), Felipe Mata (UAM) Iñigo Sedano (ROB), Jens Andersson (LTH), Tamás Éltető (BUTE), Sándor Molnár (BUTE) Identifier: Deliverable D3.2 Class: Report Version: Version Date: 02/04/2009 Distribution: Public Responsible Partner: EABS D3.2 TRAFFIC MODELS Public 1 (74) Project Deliverable CELTIC TRAMMS CP4-025 TRAMMS PROJECT The Celtic TRAMMS project (http://projects.celtic-initiative.org/tramms/) measures and analyzes IP traffic in European access networks. TRAMMS aims to increase the insight in the nature of the data traffic in today’s and tomorrow’s IP networks. In order to cope with the demands from emerging applications, the architecture of the underlying networks must be laid out with deep knowledge of the applications that are used and the traffic that will be flowing through the networks. The idea behind the concept of a converged infrastructure is that a single network should support (in principle) all applications. It will have to carry traffic from different terminals and a great variety of applications. Traditionally, lack of knowledge regarding traffic patterns in multiservice IP networks has been compensated by massive over-provisioning of resources in order to decrease the likelihood of QoS violations. Understanding the user traffic patterns and how they aggregate on different levels will imply a competitive advantage when deploying broadband networks and applications since the investment costs will be lower. The main objective of TRAMMS is to model traffic in multi-service IP networks, and to use the models as input for capacity planning of tomorrow’s networks. The models will be built upon data acquired with advanced traffic measurements on the application level with deep packet/deep flow inspection in different parts of Europe, combined with bottleneck analysis and interdomain routing analysis. The traffic generated by end-users in fixed access network infrastructures by specifying traffic parameters to be measured and analysed in the different test sites, and to jointly evaluate the results and develop traffic models built upon them. Based on the traffic models, dimensioning rules for capacity planning of IP networks will be created. To achieve the goals of TRAMMS work will be performed in the following main areas: • Traffic measurements in fixed metro/access and wireless access networks. • Traffic analysis and models for fixed metro/access and wireless access networks ABSTRACT Traffic measurements made in two municipal networks in Sweden, in one Spanish residential network and in the RedIRIS network in Spain are presented and analyzed in this deliverable. The usages of different applications in the four networks are described as well as the locality, the sources and the geographical destination. TABLE OF CONTENT TRAMMS PROJECT............................................................................................................................... 2 ABSTRACT ............................................................................................................................................. 2 TABLE OF CONTENT ............................................................................................................................ 2 ABBREVIATIONS................................................................................................................................... 4 1 EXECUTIVE SUMMARY ................................................................................................................. 5 2 STATE OF THE ART ....................................................................................................................... 6 3 MEASUREMENTS, TOOLS AND NETWORKS ............................................................................. 8 4 NETWORK TRAFFIC LOCALITY ................................................................................................... 8 4.1 OUTGOING DIRECTION ................................................................................................................. 9 4.1.1 Packet count ...................................................................................................................... 9 4.1.2 Byte count ........................................................................................................................ 10 4.1.3 Flow count........................................................................................................................ 12 D3.2 Traffic Models Public 2 (74) Project Deliverable CELTIC TRAMMS CP4-025 4.2 INCOMING DIRECTION................................................................................................................. 13 4.2.1 Packet count .................................................................................................................... 13 4.2.2 Byte count ........................................................................................................................ 15 4.2.3 Flow count........................................................................................................................ 16 4.3 BOTH DIRECTIONS ..................................................................................................................... 19 4.3.1 Packet count .................................................................................................................... 19 4.3.2 Byte count ........................................................................................................................ 19 4.3.3 Flow count........................................................................................................................ 19 4.4 DISCUSSION OF THE RESULTS .................................................................................................... 24 5 MAIN TRAFFIC DESCRIPTOR ..................................................................................................... 25 5.1 DAILY AND WEEKLY PROFILES .................................................................................................... 25 5.1.1 Daily profiles Spanish network......................................................................................... 25 5.1.2 Comparison between the daily profiles of the different networks .................................... 26 5.2 UL/DL VOLUME AND PACKET ...................................................................................................... 28 5.2.1 Spanish network .............................................................................................................. 28 5.3 SUBSCRIBER CLUSTERS ............................................................................................................. 29 5.3.1 Separation using the total traffic volume ......................................................................... 29 5.3.2 Separation using cluster analysis .................................................................................... 31 5.3.3 Clustering of users by setting up traffic limits .................................................................. 34 5.3.4 Analysis of “minimal users”.............................................................................................. 39 5.3.5 Separation using cluster analysis (Swedish network) ................................................... 40 5.3.6 Separation using the traffic volume of popular applications ............................................ 43 5.4 SUBSCRIBER ACTIVITIES ............................................................................................................ 44 5.5 APPLICATION VOLUME, PACKET AND SESSION SHARE .................................................................. 45 5.5.1 Comparison of applications usage in different networks and technologies..................... 45 5.5.2 Applications with high user penetration xviii ...................................................................... 46 5.5.3 Traffic volume distribution xviii ........................................................................................... 48 6 APPLICATION CHARACTERISTICS ........................................................................................... 50 6.1 WEB VIDEO ON DEMAND ............................................................................................................ 50 6.1.1 YouTube content popularity analysis............................................................................... 52 6.2 VIDEO STREAMING ..................................................................................................................... 54 6.3 WEB TRAFFIC ANALYSIS ............................................................................................................. 56 6.3.1 Top domain analysis........................................................................................................ 58 6.4 P2P FILE SHARING..................................................................................................................... 59 6.5 P2P TELEPHONY AND VOIP........................................................................................................ 65 6.5.1 Skype traffic ..................................................................................................................... 65 6.5.2 MSN Messenger (Windows Live Messenger) traffic........................................................ 68 7 CONCLUSIONS/DISCUSSION ..................................................................................................... 71 7.1 7.2 7.3 8 DESCRIPTION OF AGGREGATE TRAFFIC ...................................................................................... 71 APPLICATION USAGE .................................................................................................................. 72 CLUSTERING OF USERS.............................................................................................................. 72 REFERENCES............................................................................................................................... 73 D3.2 Traffic Models Public 3 (74) Project Deliverable CELTIC TRAMMS CP4-025 ABBREVIATIONS AC Autonomous Community ADSL Asymmetric Digital Subscriber Line AL Aggregation Level CMTS Cable Modem Transmission System CoS: Class of Service DiffServ: Differentiated Services DL Downlink DPI Deep Packet Inspection DRDL Datastream Recognition Definition Language DRG Digital Residential Gateway DSL Digital Subscriber Line FBM: Fractional Brownian Motion FTP File Transfer Protocol FTTC Fiber To The Cabinet FTTH Fiber To The Home Gbps: Gigabits per second GGSN Gateway GPRS Support Node GPRS General Packet Radio Service IntServ: Integrated Services IPTV Internet Protocol TV MRTG: Multi Router Traffic Grapher P2P Peer to peer PoP Point of presence POTS: Plain Old Telephony Service QoS: Quality of Service RSVP: Resource ReSerVation Protocol SLA: Service Level Agreement VoIP Video over IP D3.2 Traffic Models Public 4 (74) Project Deliverable 1 CELTIC TRAMMS CP4-025 EXECUTIVE SUMMARY Internet usage is evolving, from the traditional WWW usage (i.e. downloading web pages), to tripleplay usage where households may have all their communication services (telephony, data, TV) through their broadband access connection. The challenge is to design IP access networks so that they can deliver services with strict QoS demands such as IPTV at the same time as having capacity for (from the operator's perspective) unwanted traffic, for example file sharing, demanded by the users. One important part in meeting this research challenge is to identify and monitor Internet usage. The traffic patterns and applications need to be investigated and reported on. The experience from earlier traffic measurements is that it is no longer sufficient to investigate aggregated traffic at the IP level. In order to capture user behavior and traffic patterns in IP access networks, the measurements need to be performed close to the users and to be able to identify specific applications. Traffic modeling is tightly coupled both to traffic measurements and to engineering and techno economics. There are in most cases either a theoretical or practical problem motivating and defining the modeling and depending on the problem, the model may take very different shapes. For example when studying queuing disciplines, detailed dynamic models are needed for reliable results, while for network planning and capacities dimensioning high level traffic models in combination with estimates of the traffic evolution are needed. However, independent of the type of model traffic measurements are a common denominator that provide input for the model parameters. Without measurements the parameters are very likely to be wrong or not detailed enough. It should be pointed out that we model user data traffic, i.e. traffic going to and from subscriber clients. This includes application signaling traffic that is normally considered as user data from a network traffic handling point of view. Thus, there are many traffic types not considered here such as link control traffic, mobility traffic for cellular networks, operation and maintenance traffic, etc. The main traffic descriptors aim at describing the traffic profile either of a subscriber/subscriber line or at some aggregate level of subscribers/subscriber lines, while the application level model describes the traffic characteristics of individual applications or application types/classes. In this report, traffic measurements from four different networks were collected and analyzed. Two of the networks are in Spain, one commercial and one university network, and two are in Sweden: • The first Swedish municipal network is an open fibre based network with approximately 2600 FTTH and 200 DSL customers. The FTTH customers represent many social and ethnic groups, while the DSL customers constitute a more homogeneous group of Swedish middle class living in single family houses. • The second Swedish municipal network is a FTTH network with 350 IPTV users. This was used only to study user IPTV behavior. • The commercial Spanish network contains both fixed and wireless access networks. The wireline part consists of a fibre network to the cabinet (FTTC) and the last mile consists of Cable Modem Termination System (CMTS) and ADSL. The wireless access is a combined GPRS and UMTS system. • The Spanish university network RedIRIS interconnects and allows Internet access to more than 300 institutions with 2.7 million users. The network is SDH-based with link speeds from 2.5 Gbps up to 10 Gbps. The current document (Deliverable 3.2) contains several sections describing new results and achievements since the previous deliverable (D3.1) was published. A number of sections were updated and extended significantly. Nevertheless, D3.2 also includes the summary of some of the results and findings present in D3.1 focusing on the major conclusions. Chapter 2 (State of the Art) gives a brief overview of the network capacity planning methods used in today’s networks to assure Quality of Service (QoS). D3.2 Traffic Models Public 5 (74) Project Deliverable CELTIC TRAMMS CP4-025 Chapter 3 describes the measurement tools and techniques used to capture traffic in different networks and measurement points. It also discusses the steps of the data processing, including anonymization. Chapter 4 contains an analysis of Network traffic Locality that has been performed with measurements in the RedIRIS network. Traffic sent to and received from six universities within the RedIRIS network has been analyzed. The mapping of IP addresses with the related countries made use of the public free database for IP addresses’ geographic localization of MaxMind i which has an accuracy of 99.5%. Chapter 5 (Main Traffic Descriptors) focuses on properties of network traffic and users on an aggregate level. Section 5.1 presents a comparison between daily profiles of the different networks along with an interesting analysis of the average traffic rate per active user in the Spanish CMTS network. Section 5.2 summarizes the findings (presented in D3.1) of the relationship between the uplink and the downlink traffic volumes per subscriber in the Spanish network. Section 5.3 extends the cluster analysis work (started in D3.1) by new clustering results considering users over a certain traffic limit. Furthermore, it contains a characterization of “minimal users” and a cluster analysis of the Swedish subscribers. Section 5.4 analyzes the number of active MAC addresses of the total traffic and some popular applications categories. Finally, Section 5.5 compares the share of important application groups in different technologies and networks. It also investigates applications with the highest user penetration and the share of the most popular application in the traffic volume. Chapter 6 (Application Characteristics) concentrates on individual applications (or application categories) instead of the aggregate traffic. Section 6.1 contains an analysis of the traffic of the popular web based video sharing websites (namely YouTube and Metacafe). It also presents an interesting content popularity study of YouTube videos as well as user activity and traffic intensity charts. Section 6.2 investigates the characteristics of video streaming traffic. Section 6.3 studies the distribution of web traffic among different websites and domains. The study reveals the total traffic of websites distributing traffic between several servers and sub-domains for load balancing purposes. The findings about the characteristics of P2P file sharing applications are summarized in Section 6.4. Section 6.5 focuses on VoIP (and instant messaging) applications, e.g. Skype and MSN messenger. The section unveils findings about weekly fluctuation of Skype traffic in the Spanish fixed and mobile networks, user rankings according to generated traffic, and detailed daily profile of hosts using Skype in the Swedish network. Finally, Chapter 7 collects the most important findings and conclusions of the document. 2 STATE OF THE ART Network capacity planning is the process of determining the amount of resources needed in every link of a network in order to guarantee certain Quality of Service (QoS) constraints. Usually, the resources are the bandwidth of the different links and the QoS constraints are defined in order to satisfy users’ performance requirements. Two different approaches have appeared to deal with this problem. In the first approach, protocols and architectures that guarantee the QoS constraints are used to reserve bandwidth for every new stream or to give priority to some streams over other ones. This alternative makes an efficient use of the resources at the expense of complexity in management and maintenance. In the second approach, the overprovisioning alternative, links are dimensioned with more bandwidth than is needed for the aggregate stream which wastes resources but is easy to handle. Two techniques have become popular for dimensioning networks without wasting resources, namely IntServ ii and DiffServ iii. IntServ (Integrated Services) reserves bandwidth by means of the RSVP protocol to ensure QoS. A new application can make a reservation of a required amount of bandwidth and if there is enough bandwidth along the path between the origin of the application and the end, then the application is guaranteed those resources in the network and the target QoS. DiffServ (Differentiated Services) uses a byte in the IP header to set a QoS level for an IP packet. Routers make use of this information to prioritize traffic with higher level of QoS over traffic with lower level of QoS. The main difference between IntServ and DiffServ is that DiffServ offers a relative level of QoS, because the QoS of a given Class of Service (CoS) depends on the amount of traffic of the other CoS, while IntServ gives a QoS that does not depend on the remaining traffic. Although the former point of view seems to meet the requirements of a network provider, where the resources are efficiently managed to achieve the desired level of QoS, thus reducing the investment in D3.2 Traffic Models Public 6 (74) Project Deliverable CELTIC TRAMMS CP4-025 the network, this is in reality not the case. The equipment used in the network must have more complexity in order to maintain these architectures, increasing equipment investment. This also results in additional cost in operation and management of the network. Network operators must be trained to configure and manage the different classes of service, and installers of the equipment have to be instructed how to properly configure the routers. These reasons make the overprovisioning approach more attractive than would be thought at first. The usual approach to bandwidth overprovisioning follows several stages. In the first stage, the performance parameters and their target values are determined. These targets are commonly agreed on in a Service Level Agreement (SLA) which is a contract where the client and the network operator formally define the level of service that the network operator is obliged to give to the user. In the second stage, measurements are done in order to analyze current capacity and if the QoS targets are met. In the third stage, a network model together with the actual measurements is necessary to predict or estimate the demand of bandwidth in the future. Later, a validation of the model is needed, in order to assess the correctness of the predictions already done. Finally, conclusions from the model and the data are extracted and the amount of resources is determined. Following this approach, what is commonly done by operators is measuring bandwidth over a link and then using the following rule of thumb: C=d ·M the average used Equation 1 where C is the link capacity that satisfies the requirements, M is the average of the traffic load and d is some overprovisioning constant, which is larger than 1. In this naive approach, the measurements can be of several kinds, but it is sufficient to have MRTG iv values with a granularity of 5 minutes. The constant d is commonly much greater than 1 because the operators want to take into account the fluctuations of the traffic about the mean value (burstiness). The model in this case is as simple as considering the load constant over time M, taking M for instance as the average load during the busiest hour of a period of time. It has been reported v that present networks are very lightly utilized, less than 40 % utilization even in highly loaded days, resulting in a capacity about 30 times the average traffic rate. More complex approaches to network dimensioning have appeared in the literature, for instance the one of Fraleigh et al. vi In this work, the QoS target is defined by means of delay between POPs (Points-of-Presence). A backbone network consists of a set of nodes (POPs) that are connected via high speed links. The QoS requirement is of the form P[d (i , j ) > Dt arg et ] < τ , Equation 2 i.e. the probability of having a delay between POP i and POP j greater than the target delay is less than a given threshold τ. Modeling the traffic load with a Gaussian process, specifically a two-scale Fractional Brownian Motion (FBM) process, they compute the delay distribution for a single queue, and using this result eventually compute the end-to-end queueing delay through a network. After assuming that the characteristics of a traffic demand remain the same throughout the network and that the delays at each queue are independent (these assumptions were first validated in Fraleigh’s thesis vii) they compute the end-to-end queuing delay as the convolution of the queuing delays of the single queues that connect both ends. The traffic measurements contained in this paper are packet level measurements from the Sprint IP network, containing the arrival time, packet size and the first 40 bytes of every transmitted packet. These measurements were used to derive and validate the above described model, and also to compute the values for the capacities of every link connecting POPs. To achieve this, they resolve a Capacity Assignment problem taken into account the traffic demand between POPs. The results of this work are that for links with high capacity (greater than 1 Gbps), utilization can reach 80%-90% and still meet the delay requirements, and that with an extra capacity between 5%-15% an end-to-end delay requirement in the Sprint IP network of 4 ms is satisfied. Another interesting work in capacity planning was published in 2006 by Hans van den Berg et al. viii. The authors define the QoS requirement using the following formula P[ A(T ) ≥ CT ] ≤ γ Equation 3 i.e. the probability that the amount of traffic offered in [0, T] A(T), for small T, is greater than the maximum amount of traffic that can be allocated in the link over that period of time (CT) is less than a given threshold γ. Here we can see that the QoS target directly relates with the capacity C of the link. D3.2 Traffic Models Public 7 (74) Project Deliverable CELTIC TRAMMS CP4-025 The model that they use for the traffic is that the amount of traffic offered in [0, T] is distributed as a Gaussian ( A(T ) : Norm( ρT ,ν (T )) ) Equation 4 with load ρ Mbps and varianceν (T ) Mbit². Using this model, the show that the capacity of a link follows the following formula: C=ρ + α ρ Equation 5 Here, ρ is estimated in intervals of length T, and α is a parameter that depends on the ratio between peak bandwidth and mean bandwidth, the time interval T, the target probability γ and the mean service time (supposing an M/G/∞ queue). Surprisingly, they demonstrate that the value of α does not depend on the arrival rate, so the fact that the number of users is increased in the network affects the capacity only by means of the increase of the average load ρ. For the calculation of ρ they use MRTG measurements with a granularity of 5 minutes. For the calculation of α, they estimate it using Equation 5 in order to satisfy it for a 99 % of the data, but they point out that it is possible to compute it theoretically if flow-level measurements are done. 3 MEASUREMENTS, TOOLS AND NETWORKS The analysis in this report is based on measurements performed in access networks in Sweden and Spain. The residential user measurements include the access technologies DSL and FTTH (Sweden municipal network), CMTS and Mobile (Spanish operator network). Measurements were also performed in the Spanish National Academic and Research Network (NARN), RedIRIS. Details on the networks and subscribers are found in the TRAMMS Deliverable D3.1 ix. The tools used in the measurements are x • PacketLogic • Cisco NetFlow • Wireshark • Traffic databases xi xii xiii The PacketLogic and Cisco NetFlow are described in D3.1ix Wireshark is a passive software solution that can be used for real-time monitoring or non-real-time analysis from captured files. In this report a larger focus has been put on packet level measurements than in the D3.1. Thus one new measurement technique is introduced, packet capture via firewall rules. In this technique, a firewall rule is created that matches certain criteria, e.g. url visited, application used, etc. Packets that match the criteria are dumped to pcap files xiv. These files are anonymized and post processed either with wireshark, python or similar programming languages, to extract statistics from the files. The traffic databases mentioned in the bullets above are a way of retrieving the traffic data from the Packetlogic tool, and store it in a database to make analysis easy and fast. Thus, the information in the databases is traffic data per household, per IP number and per application. 4 NETWORK TRAFFIC LOCALITY The analysis of Network traffic Locality has been performed with measurements in the RedIRIS network. Traffic sent to and received from six universities within the RedIRIS network has been analyzed. The mapping of IP addresses with the related countries made use of MaxMind xv,the public free database for IP addresses’ geographical localization which has an accuracy of 99.5%. D3.2 Traffic Models Public 8 (74) Project Deliverable CELTIC TRAMMS CP4-025 To clarify the description of the results, they have been split in three cases, one for the incoming direction of traffic, another for the outgoing direction and a last one for both directions. The incoming direction must be understood as the traffic that has as destination IP address belonging to one of the universities under study. On the other hand, the outgoing traffic is the traffic that is generated in the universities and so the source IP address belongs to one of the universities. The results of the both directions case are the aggregation of all the traffic analyzed in the former cases. In all the cases three different measurements have been performed, namely packet count, byte count and flow count. This is done to circumvent (and also illustrate when possible) the traffic behavior commonly referred to as “the elephants and mice phenomenon”, that is that a very small proportion of the flows carries the largest part of the information. The chapter is structured as follows. In the following three sections we present the results of the aforementioned cases and in the last section we discuss the obtained results. 4.1 Outgoing direction Here we present the results obtained from the Network Traffic Locality analysis in the outgoing direction for the three measurements described above. 4.1.1 Packet count In Figure 4-1 we show the percentage of packets per destination sent from the universities. The countries in the figure are the thirteen that receive the largest amount of packets (accounting for at least 1% of the total number of packets), the not classified packets (i.e. the packets whose IP addresses did not match in the database) and the percentage of packets that were sent to other countries not shown in the graph (more than 12% of the traffic). As can be seen, most of the traffic is sent to locations within Spain (nearly 45% of the total number of packets) and the second most visited country is the United States, accounting for nearly 20% of the packet count. After them there is a huge jump in percentage, with Germany being the third most visited country with less than 4% of the packets. It is also worth mention that the percentage of not classified packets was nearly negligible (less than 0.1%) Figure 4-1: Percentage of packets per destinion for the traffic from RedIRIS. As Spain and the United States jointly account for more than 65% of the analyzed packets, we show in Figure 4-2 the percentage of traffic that is sent to the Top 15 countries after removing Spain and the D3.2 Traffic Models Public 9 (74) Project Deliverable CELTIC TRAMMS CP4-025 United States (that is, the percentage of traffic sent to the 15 most visited countries without taking into account Spain and the United States) in order to remove the masking effect of them. Finally, we have placed this information in a color coded map. Spain and the United States have been removed from this map in order not to make the color scale meaningless. We have also excluded the countries which account for less than 10-3 % of the packets. The results are shown in Figure 4-3. 4.1.2 Byte count In Figure 4-4 we present results analogous to those shown in Figure 4-1. As in Figure 4-1, most of the bytes are sent to Spain and the United States (accounting again for more than 65% of the total number of bytes) but in this case the United States percentage is halved. There are also thirteen countries where at least 1% of the bytes are sent to, but these are not the same as those for the packet count. Figure 4-2: Percentage of packets of the 15 most visited countries excluding Spain & USA. D3.2 Traffic Models Public 10 (74) Project Deliverable CELTIC TRAMMS CP4-025 Figure 4-3: Map plot of Outgoing Packet Locality. We again remove Spain and the United States and show the results in Figure 4-5. We find approximately the same countries in the as for the packet count, and the percentage of bytes of the remaining countries is also similar. Figure 4-4: Percentage of bytes per destinion of the traffic from RedIRIS. Figure 4-5: Percentage of bytes for the 15 most visited countries excluding Spain & USA. D3.2 Traffic Models Public 11 (74) Project Deliverable CELTIC TRAMMS CP4-025 Finally, in Figure 4-6 we show the geographic plot of the countries that account for more than 10-3 % of the bytes sent from the universities under study. Figure 4-6: Map plot of Outgoing Byte Locality. Figure 4-7: Percentage of flows per destinion for the traffic from RedIRIS. 4.1.3 Flow count In Figure 4-7 we show the results of the analysis of the traffic locality by flows. Similarly to the previous measurements, Spain and the United States account for the majority of the flows, but in this case the percentage does not reach 65% as reached before. There is also another difference with D3.2 Traffic Models Public 12 (74) Project Deliverable CELTIC TRAMMS CP4-025 previous results. In this case there are fourteen countries that account for at least 1% of the total number of flows. Figure 4-8 corresponds to the Top 15 countries when we measure the number of flows. As in the former cases, the countries are not exactly the same, neither is their percentage of flows. Figure 4-8: Percentage of flows of the 15 most visited countries after removing Spain & USA. Finally, as was done in the Packet and Byte cases, we show in Figure 4-9 the countries that account for more than 10-3 % of the flow number on a world map. 4.2 Incoming direction We now proceed to the results obtained when analysing the incoming traffic, i.e. the flows where the destination was one of the universities under study. 4.2.1 Packet count Here results equivalent to those presented in Section 4.1.1 are shown here for traffic in the incoming direction. Figure 4-10 shows the countries that contribute with more than 1% of the total number of packets. Compared with Figure 4-1, there are fewer countries contributing with more than 1% of the packet count in the incoming direction than in the outgoing direction. However the set of countries that contribute with more than 1% of the total number of packets in the incoming direction are contained in the equivalent set for the outgoing direction. In Figure 4-11 we present the results of the 15 most contributing countries excluding Spain and the United States, because as in the previous cases they account for more than 60% of the total number of packets. The sets of countries in both directions are not the same, but this difference appears in the less contributing countries of the Top 15. Finally, we show this information in a world map in Figure 4-12. D3.2 Traffic Models Public 13 (74) Project Deliverable CELTIC TRAMMS CP4-025 Figure 4-9: Map plot of Outgoing Flow Locality. Figure 4-10: Percentage of packets per destination of the traffic to RedIRIS. D3.2 Traffic Models Public 14 (74) Project Deliverable CELTIC TRAMMS CP4-025 Figure 4-11: Percentage of packets of the 15 most contributing countries excluding Spain & USA. Figure 4-12: Map plot of Incoming Packet Locality. 4.2.2 Byte count Figure 4-13 shows the percentage of bytes that are sent from the most contributing countries (those that sent at least 1% of the bytes). First it is worth mentioning that in this case, the most contributing country is not Spain but the United States, although both have nearly 30% of the total bytes. Another D3.2 Traffic Models Public 15 (74) Project Deliverable CELTIC TRAMMS CP4-025 difference as compared with Figure 4-4 is the number of countries that contribute with more than 1% of the total, which in this case is fewer than in Figure 4-4. Figure 4-13: Percentage of bytes per destination of the traffic to RedIRIS. In Figure 4-14 we present an analogous figure to Figure 4-5. There are noticeable differences between them, because one can readily see that when talking about incoming direction, the most contributing countries contribute more than they did in Figure 4-5 (it can be seen that the less contributing countries of the incoming direction contribute with hardly 1% of the remaining traffic excluding Spain and the United States. Finally, as was done in the previous cases, we present a map of the localization of the sources of the traffic in Figure 4-15. 4.2.3 Flow count In Figure 4-16 we present the results of the countries that contribute with at least 1% of the total number of flows. Comparing it with Figure 4-7, it can be seen that the percentages accounted for by Spain and the United States are nearly the same, and so is the percentage of the remaining countries. The only slight difference is that Brazil is one of the most contributing countries in the incoming direction which it wasn’t in the outgoing direction. The rest of the countries are the same in both cases, with a small permutation of the orders. Figure 4-17 shows the 15 most contributing countries, again after the removal of Spain and the United States. It can be seen that the percentage of the countries are nearly the same as in Figure 4-8, and the only difference between both of them is that in the incoming direction Brazil has replaced Switzerland. Finally, Figure 4-18 shows a map with the geographic localization of the flows represented as a intensity color scale. D3.2 Traffic Models Public 16 (74) Project Deliverable CELTIC TRAMMS CP4-025 Figure 4-14: Percentage of bytes of the 15 most contributed countries excluding Spain & USA. Figure 4-15: Map plot of Incoming Byte Locality. D3.2 Traffic Models Public 17 (74) Project Deliverable CELTIC TRAMMS CP4-025 Figure 4-16: Percentage of flows per destination of the traffic to RedIRIS. Figure 4-17: Percentage of flows of the 15 most contributing countries excluding Spain & USA. D3.2 Traffic Models Public 18 (74) Project Deliverable CELTIC TRAMMS CP4-025 Figure 4-18: Map plot of Incoming Flow Locality. 4.3 Both directions In this section we present the results of the joint analysis of traffic in both the incoming and the outgoing directions. The discussion of the results is deferred to the following section. 4.3.1 Packet count We present here the results of analyzing the geographic locality measuring the packet count. Figure 4-19 shows the percentages of the countries that contribute more than 1% in both directions. Figure 4-20 shows the 15 most contributing countries after removing the bias introduced by Spain and the United States due to their huge difference in percentages. Finally, Figure 4-21 displays the data of the countries that account for more than 10-3 % of the total number of packets in a world map with color intensities. 4.3.2 Byte count For the case of the byte count in both directions, Figure 4-22 shows the countries that contribute with at least 1% of the bytes in both directions, Figure 4-23 presents the fifteen most contributing countries after removing Spain and the United States and finally Figure 4-24 presents this information in a world map. 4.3.3 Flow count The last study we present in this chapter is the analysis of the locality of the traffic for both of the incoming and outgoing directions when measuring the number of flows. Figure 4-25 presents the countries that account for more than a 1% of the total number of flows, Figure 4-26 shows the fifteen most contributing countries without taking into account Spain and the United States as they both account for more than half of the flows, and finally Figure 4-27 presents this information in a world map. D3.2 Traffic Models Public 19 (74) Project Deliverable CELTIC TRAMMS CP4-025 Figure 4-19: Percentage of packets per destiny in both directions. Figure 4-20: Percentage of packets of the 15 most contributing countries after removing Spain & USA. D3.2 Traffic Models Public 20 (74) Project Deliverable CELTIC TRAMMS CP4-025 Figure 4-21: Map plot of both directions packet Locality. Figure 4-22: Percentage of bytes per destination in both directions. D3.2 Traffic Models Public 21 (74) Project Deliverable CELTIC TRAMMS CP4-025 Figure 4-23: Percentage of bytes of the 15 most contributing countries excluding Spain & USA. Figure 4-24: Map plot of both directions byte Locality. D3.2 Traffic Models Public 22 (74) Project Deliverable CELTIC TRAMMS CP4-025 Figure 4-25: Percentage of flows per destination in both directions. Figure 4-26: Percentage of flows of the 15 most contributing countries excluding Spain & USA. D3.2 Traffic Models Public 23 (74) Project Deliverable CELTIC TRAMMS CP4-025 Figure 4-27: Map plot of both directions flow Locality. 4.4 Discussion of the results In this section an analysis of the results presented in the previous sections is presented. We will give a more in depth description of these results and provide some insights into what we measure. First of all it is worth remembering the results of D3.1 for daily and weekly profiles of the RedIRIS networks. In that analysis, it was shown that in the RedIRIS network the outgoing traffic is greater than the incoming traffic during night-hours, but during working-hours it was smaller. Also it is worth mentioning that the average incoming traffic was slightly greater than the average outgoing traffic. We refer the reader to the D3.1 for further reading. The majority of the packets, approximately 40%, are sent and received within Spain. This is reasonable because all the universities connected to the network are located in Spain, and it is logical that most of the Internet traffic is sent and received from Spanish sites. In the RedIRIS network, the P2P is negligible. Instead the largest traffic volumes come from email or web services. This traffic will be between Spanish people in the first case and will be visited by Spanish people in the second case. Human social interaction studies support these results xvi. In second place we found the United States which is responsible for about 20% of the total packet traffic volume. As the United States stands for most of the research developments and is the world leader in the information society, it is understandable when looking for first hand information to search within United States sites. Moreover, the majority of the most visited web pages are hosted in the United States, which supports these results. In third place we can put the group of most important countries in the European Community. To mention some of them: France, Italy, United Kingdom, Germany… From a research perspective, the most common kind of projects without taking into account national projects are European projects such as this one for example. This forces the users of the RedIRIS network to communicate and be up to date with news of these countries, explaining the high percentage of traffic to and from these countries accounting for nearly 20% of the total traffic. In fourth position, we encounter the Latin American countries, for instance Mexico, Chile, Argentina, etc. As the official language used there is Spanish, it is usual to get directed to a web page in one of these countries when looking for information in Spanish. Moreover, there are a lot of researchers in Spain that come from Latin America countries which also explains the percentage of traffic from this region which is nearly 15% of the total traffic volume. D3.2 Traffic Models Public 24 (74) Project Deliverable CELTIC TRAMMS CP4-025 Finally, we find that nearly all countries are present in the study. Although their specific percentage of the traffic is very small (less than 10-3 % of the traffic) they jointly account for nearly 5% of the traffic. This traffic can be defined as sporadic. In the section describing the incoming traffic, it was found that the United States was the largest contributor to the byte count and not Spain as expected (see Figure 4-13). This can be thought of as anomalous compared with the flow count and packet count for the same direction where Spain has a greater percentage of traffic than the United States. Actually it cannot be considered an anomaly, as it is an example of a well studied phenomenon, the aforementioned “the elephants and mice phenomenon”. This phenomenon xvii is very common in actual networks, where a small percentage of the users account for most of the traffic. In our case we find “elephants” in the United States traffic, where we can see that there are flows that contain a similar number of packets that have a very high payload. So although we have more flows and packets from Spain, the ones that come from the United States have a greater percentage of the total number of bytes. The impact of these “elephants” is not so large when both the incoming and the outgoing directions are taken into account. We can see that for all the studied metrics, Spain accounts for the greater percentage. 5 5.1 5.1.1 MAIN TRAFFIC DESCRIPTOR Daily and weekly profiles Daily profiles Spanish network In this section the daily profile of the average traffic per active user in the Spanish fixed network (CMTS) was investigated. The measurements used in this analysis were done from 2008-03-07 to 2008-03-30. The CMTS daily profile in the Spanish fixed network is shown in the Figure 5-1: 650 IN OUT TOTAL 600 550 Traffic rate (Mbps) 500 450 400 350 300 250 200 150 0 5 10 15 Hours of the day 20 25 Figure 5-1. CMTS daily profile (Spanish Network measurement from 2008-03-07 to 2008-03-30) The number of active users for each hour of the day was calculated. A subscriber was considered to be active if it generated any uplink or downlink traffic within the hour. This was done for all the days of the measurement period and then averaged. The results are plotted in Figure 5-2: D3.2 Traffic Models Public 25 (74) Project Deliverable CELTIC TRAMMS CP4-025 2600 2400 Number of active users 2200 2000 1800 1600 1400 1200 1000 800 0 5 10 15 Hours of the day 20 25 Figure 5-2. Number of active users per hour of the day (Spanish Network CMTS measurement from 2008-03-07 to 2008-03-30) Using the data from Figure 5-1 and Figure 5-2, the average traffic rate per active user for each hour of the day was calculated. The results are shown in the Figure 5-3. 5 Average traffic rate per active user (bps) 4.5 x 10 IN OUT TOTAL 4 3.5 3 2.5 2 1.5 1 0 5 10 15 Hours of the day 20 25 Figure 5-3. Daily average traffic rate per active user (Spanish Network CMTS measurement from 2008-03-07 to 2008-03-30) It can be noted that the average traffic rate per active user remains stable throughout the day. However during the night (from 1 a.m. to 7 a.m.) the average traffic rate per active user increases significantly (around 60%). That means that the percentage of heavy users in the total number of active users is much higher during that time. 5.1.2 Comparison between the daily profiles of the different networks In this analysis the following measurements were considered (only fixed networks): D3.2 Traffic Models Public 26 (74) Project Deliverable CELTIC TRAMMS CP4-025 − DSL in Swedish municipal network No.1, measurements from 2007-12-10 to 2007-01-30. − FTTH in Swedish municipal network No. 1, measurements from 2007-10-01 to 2007-11-06. − RedIRIS university, measurements from 2008-06-11 to 2008-07-22. − CMTS in Spanish Network, measurements from 2008-03-07 to 2008-03-30. The daily traffic pattern of each network was normalized dividing by its maximum value. Then the average between the networks was calculated and normalized, also dividing by its maximum value. This was done separately for the downlink, uplink and total traffic. The results are shown in the Figure 5-4, Figure 5-5 and Figure 5-6: 1 RedIRIS CMTS FTTH DSL Average 0.9 N ormalized dow nlink traffic 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 5 10 15 20 25 Hours of the day Figure 5-4. Normalized daily downlink traffic pattern for the different networks and normalized average daily downlink traffic pattern 1 RedIRIS CMTS FTTH DSL Average N o rm a lize d u p lin k tra ffic 0.9 0.8 0.7 0.6 0.5 0.4 0 5 10 15 20 25 Hours of the day Figure 5-5. Normalized daily uplink traffic pattern for the different networks and normalized average daily uplink traffic pattern D3.2 Traffic Models Public 27 (74) Project Deliverable 1 RedIRIS CMTS FTTH DSL Average 0.9 0.8 N orm alized total traffic CELTIC TRAMMS CP4-025 0.7 0.6 0.5 0.4 0.3 0.2 0 5 10 15 20 25 Hours of the day Figure 5-6. Normalized daily total traffic pattern for the different networks and normalized average daily total traffic pattern These figures show that the Spanish fixed network and the Swedish network, despite using different access technologies (CMTS, DSL, FTTH) have similar daily traffic patterns, both for the uplink and the downlink traffic. However, the RedIRIS academic network shows a very different daily traffic pattern for the downlink traffic. The amount of downlink traffic is less constant in the RedIRIS network than in the other networks (10% of the maximum traffic at 5 a.m.) and from 1 p.m. to 9 p.m. it decreases while it increases in the other networks. The main conclusion is that the shape of the daily traffic patterns depends on the subscriber type of the network (residential, enterprise, academic) and that there is a common daily traffic pattern for the networks that have mainly residential users. Taking into account the results shown in the Deliverable D3.1 it can be deduced that the amount of downlink and uplink traffic depends on the access technology (CMTS, DSL or FTTH). 5.2 UL/DL volume and packet This section describes the UL/DL traffic volume and packet share for each measurement set. 5.2.1 Spanish network In the Deliverable 3.1 a 6.000 subscriber sample was studied to obtain the relationship between the downlink and the total traffic volume per subscriber (Spanish Network measurement from 2008-03-07 to 2008-03-30). This analysis was done for the fixed access network and it was found that the common profile is to have much more downlink than uplink traffic: 1. The percentage of households where the downlink was similar to the uplink traffic volume was 25,95%. 2. The percentage of households where the downlink traffic volume was greater than the uplink traffic volume was 58,99%. Within this group, the households where the downlink traffic volume was much greater than the uplink traffic volume was 34,31% of the total. 3. The percentage of households where the uplink traffic volume was greater than the downlink traffic volume was 15,05%. Within this group the households where the uplink traffic volume was much greater than the downlink traffic volume was 2,92% of the total. However, as seen in the Deliverable D3.1 section 5.1.2 “Daily profiles Spanish network”, in the early hours of the day the amount of uplink traffic volume is greater than the downlink traffic volume, probably due to P2P application usage. Therefore there may exist a relationship between the type of subscriber (heavy user, applications used) and the downlink/uplink ratio of the subscriber that can be explored in further subscriber cluster analysis. D3.2 Traffic Models Public 28 (74) Project Deliverable 5.3 CELTIC TRAMMS CP4-025 Subscriber clusters This section contains the summary and conclusions of the cluster analysis results presented in Deliverable 3.1 along with new results from analysis assuming traffic limits and analysis of minimal users. The section analyses different Spanish network measurements in order to identify population groups that behave similarly from different points of view. Section 5.3.1 considers the total traffic volume per MAC address in uplink and downlink direction. The analysis in 5.3.2 considers the total downlink traffic volume per MAC address and the number of observed applications per MAC address using cluster analysis. Section 5.3.3 performs cluster analysis assuming traffic limits. Section 5.3.4 analyses “minimal users”. Finally, Section 5.3.6 is based on the traffic volume of several popular applications. Section 5.3.1, 5.3.2 and 5.3.6 summarize the conclusion of Deliverable 3.1, while Section 5.3.3 and 5.3.4 present new results. A few general conclusions can be drawn from the different kinds of analyses detailed below. First of all, there is a small group of subscribers who generate huge amount of – mainly P2P – traffic, while there are subscribers whose traffic demands are much more moderate and the ratio of web browsing is more significant in their traffic mix, though they also use P2P applications as well. Another general conclusion is that the traffic of the heavy users seems to be constant over the repeated measurements, while the moderate subscribers seem to gradually increase their activity. 5.3.1 Separation using the total traffic volume The exploratory data analysis for identifying user groups was performed in Spanish Network measurements detailed below. The exact definition of the main analysed statistics is: Average incoming/outgoing daily traffic volume per MAC address This value is calculated as the average of the incoming/outgoing daily traffic volumes generated by a MAC address during the measurement. Table 5-1 shows an example of the basic statistics for the aggregated traffic volume of the different MAC addresses. The population here consists of 5801 different MAC addresses. By comparing the results, we could observe a little increase in the maximum traffic. Note, that the mean traffic was more-or-less the same in all measurements. N average incoming daily traffic per MAC address average outgoing daily traffic per MAC address Minimum Maximum Mean Std. Deviation 5801 0 19.2 GB 502.1 MB 1.1 GB 5801 0 7.0 GB 437.0 MB 874.3 MB Table 5-1 Descriptive statistics of the average daily traffic of MAC addresses (Spanish Network measurement from 2008-02-19 to 2008-02-29) Table 5-2 shows the 10 percentile values for the distribution of the average incoming daily traffic per MAC address. In the low percentage region, the downlink (incoming) traffic is more significant than the uplink (outgoing). For example, 30% of the MAC addresses downloaded less than 36 MB traffic, while the bottom 30% uploaded less than 6 MB. However, particularly above 90% the up- and downlink traffic volume is quite balanced: 1.3 GB for downlink and 1.4 GB for uplink. One can also observe the different scales in the traffic volumes for both directions: the median (50%) is at 113 MB / 46 MB, which is quite small compared to the mean traffic volume of 502 MB / 437 MB. This indicates that the distribution of the daily average traffic volume has significant tail. D3.2 Traffic Models Public 29 (74) Project Deliverable CELTIC TRAMMS CP4-025 average incoming daily traffic per MAC address Percentiles average outgoing daily traffic per MAC address 10 1.7 MB 0.2 MB 20 14.7 MB 2.0 MB 30 35.6 MB 5.7 MB 40 64.7 MB 15.1 MB 50 113.2 MB 45.7 MB 60 195.8 MB 126.7 MB 70 351.4 MB 298.2 MB 80 648.0 MB 635.2 MB 90 1337.7 MB 1372.9 MB Table 5-2 10 percentile values of the average daily traffic volume per MAC addresses (Spanish Network measurement from 2008-02-19 to 2008-02-29) The comparison of further measurements showed that the 20-70 percentile regions, i.e. the low traffic segments, increased their traffic volume over the measurements. Unlike the low traffic segments, the traffic of the heavy users seems to have stayed on a constant level, around 1.2-1.3 GB. Figure 5-7 shows the complementary cumulative distribution function of the number of subscribers as a function of the downlink/uplink traffic volume in the same Spanish Network measurement. Since Table 5-1 and Table 5-2 indicate that the subscribers generate traffic on significantly different scales, the complementary cumulative distribution function is plotted on linear-logarithmic scale in order to visualize the distribution over several orders of magnitude. Figure 5-8 shows the uplink traffic volume – downlink traffic volume pairs for the Spanish Network measurement from 2008-02-19 to 2008-02-29. As was seen in Table 5-2 - 4, Figure 5-8 shows that the traffic is quite downlink dominated for MAC addresses with low traffic volumes, while it is quite symmetric for MAC addresses with high traffic volume. traffic volume in downlink direction traffic volume in uplink direction 1.0 Probability 0.8 0.6 0.4 0.2 0.0 1.0E-5GB 1.0E-4GB 0.0GB 0.0GB 0.1GB 1.0GB 10.0GB 100.0GB Average traffic volume per day per mac address Figure 5-7: Complementary cumulative distribution function of users based on generated traffic (Lin-log scale, Spanish Network measurement from 2008-02-19 to 2008-02-29) D3.2 Traffic Models Public 30 (74) Project Deliverable CELTIC TRAMMS CP4-025 average uplink traffic volume per mac address 10.0GB 1.0GB 0.1GB 0.0GB 0.0GB 1.0E-4GB 1.0E-5GB 1.0E-6GB 1.0E-6GB 1.0E-5GB 1.0E-4GB 0.0GB 0.0GB 0.1GB 1.0GB 10.0GB 100.0GB average traffic downlink volume per mac address Figure 5-8 The relation of the uplink and downlink traffic per MAC address (Log-log scale, Spanish Network measurement from 2008-02-19 to 2008-02-29) 5.3.2 Separation using cluster analysis The aim of this subsection was to divide the total population into groups (clusters) using cluster analysis. For this reason, the following statistics are also computed for each MAC address besides the average traffic volume statistics: Number of active applications per MAC address This is the number of applications identified by PacketLogic per MAC address that have nonzero traffic volume. Table 5-3 shows an example result of the cluster analysis (Two Step Cluster in SPSS) that was run on the population of MAC addresses. The average downlink daily traffic volume per MAC address and the number of active applications per MAC address statistics were used as continuous variables. The cluster analysis identified three distinct clusters in both measurements. Cluster 1 N 1768 % of Total 30.5% 2 3645 62.8% 3 Total 388 6.7% 5801 100.0% Table 5-3 Cluster distribution for Spanish Network measurement from 2008-02-19 to 2008-02-29 D3.2 Traffic Models Public 31 (74) Project Deliverable CELTIC TRAMMS CP4-025 Average downlink daily traffic volume per MAC address Cluster Number of active applications per MAC address 1 67.9 MB 14.4 2 372.3 MB 36.3 3 3700.0 MB 42.8 502.1 MB 30.1 Combined Table 5-4 Cluster centroids for Spanish Network measurement from 2008-02-19 to 2008-02-29 Table 5-4 shows the coordinates of the cluster centroids for the two Spanish measurements. The centroids of Cluster 3 in the measurements indicate that these clusters contain MAC addresses that have rather high average downlink traffic volume; these can be identified as “heavy users” or “high profile” subscribers. Table 5-3 shows that this cluster contains 6.7% of the population in the first Spanish measurement. The similar ratio is 6.6% for the second Spanish measurement, respectively. Another cluster contains subscribers who generate considerably limited traffic volume and use a limited number of applications at the same time. The ratio of this cluster is around 30% in the first and second measurement. We suggest referring to these subscribers as “low profile” subscribers. The third cluster centroid shows that the subscribers in this cluster generate considerably smaller downlink traffic volume. However, the members of Cluster 2 use a large set of applications, almost 40, while the members of Cluster 1 use much less. That is, there is a group whose members generate relatively low traffic volume, but use many applications. We suggest referring to these subscribers as “medium profile” subscribers. After comparing the cluster centroids for the clusters of the low profile subscribers in the two measurements, we found that the cluster centroids are quite similar. The average traffic of the low profile subscribers is 68 MB in the first measurement and again around 60 MB in the second one. The number of observed applications seems to be 10-20 for this set. That is, the population of the low profile subscribers seems to behave in a similar manner over the measurement periods. When examining the cluster centroids for the medium profile subscribers, it was seen that the traffic of the medium profile subscribers increased in the second measurement as compared to the first one. Also based on the analysis of the cluster centroids, it can be seen that the centroids show similar traffic volumes for the measurements. We note that the number of the observed applications in medium and high profile subscribers seems to stay around 40 in all the measurements. In Figure 5-9, the structure of the different clusters can be seen. The figure shows the average traffic per MAC address versus the number of applications used by the individual subscribers. There is a small population of heavy users and two larger for the medium and low profile users. The three populations are distinctly divided. D3.2 Traffic Models Public 32 (74) Project Deliverable CELTIC TRAMMS CP4-025 TwoStep Cluster Number Average incoming daily traffic per mac address 100000,00 MB 1 2 10000,00 MB 3 1000,00 MB 100,00 MB 10,00 MB 1,00 MB 0,10 MB 0,01 MB 0 20 40 60 80 Number of active applications per mac address Figure 5-9. The average traffic and application number of the different clusters There is a straightforward question about the list of applications that generate most traffic in the different groups. Table 5-5 shows this list together with the average traffic volumes for the corresponding applications in the first Spanish measurement. It can be seen that the HTTP traffic is quite significant for the low and medium profile subscribers, while different P2P traffic types appear in first place for the medium and high profile subscribers. Nevertheless, we note that marginal P2P traffic can also be observed in the group of low profile subscribers. Similar conclusions can be drawn from the same list in other measurements. D3.2 Traffic Models Public 33 (74) Project Deliverable CELTIC TRAMMS CP4-025 Cluster 1 Encapsulated HTTP eDonkey HTTP media stream Unknown Cluster 2 Cluster 3 22.1 MB 15.5 MB 11.1 MB 6.4 MB 4.1 MB eDonkey HTTP BitTorrent transfer eDonkey encrypted HTTP media stream 139.8 MB 60.8 MB 46.9 MB 28.4 MB 24.8 MB 7.4 MB 3.5 MB 3.2 MB eDonkey BitTorrent transfer Unknown eDonkey encrypted HTTP BitTorrent encrypted transfer Encapsulated HTTP media stream Ares Thunder UDP BitTorrent transfer POP3 3.3 MB 0.8 MB 17.0 MB 9.9 MB Untracked eDonkey encrypted SSL v3 BitTorrent encrypted transfer Soulseek PPLive TFTP transfer RTMP 0.6 MB 0.4 MB 0.4 MB Unknown Ares BitTorrent encrypted transfer Encapsulated Ares tcp 0.3 MB 0.2 MB 0.2 MB 0.2 MB 0.2 MB Pando BitTorrent KRPC Kademlia Untracked POP3 2.4 MB 2.2 MB 2.0 MB 1.7 MB 1.5 MB Pando Untracked BitTorrent KRPC QQ live PPStream 1692.2 MB 625.4 MB 286.3 MB 269.0 MB 240.5 MB 141.5 MB 71.3 MB 53.5 MB 47.1 MB 46.0 MB 38.3 MB 29.3 MB 19.3 MB 17.0 MB 15.9 MB Table 5-5 Most voluminous traffic types of the clusters for Spanish Network measurement from 2008-02-19 to 2008-02-29 5.3.3 Clustering of users by setting up traffic limits This subsection is an extension of the previous one. We were interested in how the clustering results change if we set a lower limit on the generated traffic per user, thus excluding those users whose traffic was low in the measurement interval. Table 5-6 shows the descriptive statistics of the total traffic considering different lower bounds on the total traffic. Descriptive statistics for the total traffic Lower limit Number on traffic of per day users Minimum Maximum Mean Std. Deviation Number of applications 0 MB 5801 0,0 GB 19,2 GB 0,5 GB 1,1 GB 312 10 MB 5137 10,07 MB 19,40 GB 1,05 GB 1,58 GB 164 30 MB 4847 30,00 MB 19,72 GB 1,18 GB 1,67 GB 131 50 MB 4577 50,10 MB 19,69 GB 1,28 GB 1,73 GB 114 Table 5-6. Descriptive statistics of the total traffic considering different lower bounds on the total traffic It can be concluded from the table that by setting a limit on the daily traffic, the number of users meeting the criterion drops sharply. There are 664 users generating less traffic than 10 MB per day, which is a surprisingly high number. (The traffic statistics of this user group is analyzed in details in Section 39.) There are 954 users under 30 MB and 1224 under 50 MB. As a trivial consequence, the average traffic per user also rose. The standard deviation of the generated traffic per user increases significantly as well. D3.2 Traffic Models Public 34 (74) Project Deliverable CELTIC TRAMMS CP4-025 Similar to the number of users, a considerable drop can be seen in the number of applications. Only 164 applications (out of 312 which can be found in the measurement) generate more traffic than 10 MB, 131 generate more than 30 MB, and only 114 applications exceed the 50 MB limit of daily average traffic. Without setting a traffic limit the number of applications in the “minimal”, “medium” and “heavy user” clusters were 14, 36, and 42 respectively. As expected, after setting a traffic limit, these numbers fell to 2, 6, and 10. Cluster centroids 0 MB 10 MB traffic_mean_sum Cluster Mean APPL traffic_mean_sum APPL Std. Dev Mean Std. Dev. Mean Std. dev Mean Std. dev 1 67,9 MB 202,9 MB 14,42 7,68 207,0 MB 270,8 MB 2,53 1,08 2 372,3 MB 429,1 MB 36,32 7,32 941,0 MB 656,2 MB 6,69 1,55 10,20 3831,4 MB 2377,7 MB 10,57 2,80 3 3700,0 MB 2223,8 MB 42,79 Cluster centroids 30 MB traffic_mean_sum Mean 50 MB APPL traffic_mean_sum Std. Dev Mean Std. Dev Mean Std. Dev APPL Mean Std. Dev 374,14 MB 0,39 GB 2,42 1,09 400,90 MB 398,00 MB 1,92 0,81 1761,36 MB 0,97 GB 6,36 1,58 1688,50 MB 933,43 MB 5,02 1,21 6448,76 MB 2,49 GB 8,78 2,97 5607,78 MB 2627,18 MB 7,85 2,29 Table 5-7. Cluster centroids in the cases of different lower bounds on traffic per day Cluster distribution 0 MB 10 MB 30 MB 50 MB N % N % N % N % 1 1768 30,477 2228 43,372 2922 60,285 2577 56,303 2 3645 62,833 2154 41,931 1661 34,269 1628 35,569 3 388 6,688 755 14,697 264 5,447 372 8,128 Table 5-8. Distribution of users between clusters D3.2 Traffic Models Public 35 (74) Project Deliverable CELTIC TRAMMS CP4-025 Cluster Size TwoStep Cluster Number 1 2 3 0 MB lower limit on traffic per day 10 MB lower limit on traffic per day Cluster Size Cluster Size TwoStep Cluster Number TwoStep Cluster Number 1 1 2 2 3 3 30 MB lower limit on traffic per day 50 MB lower limit on traffic per day Figure 5-10: Pie diagram of the distribution of users between clusters Table 5-7 shows the cluster centroids in the cases of different lower bounds on traffic per day, while Table 5-8 contains the distribution of users between clusters. The clustering process classifies more users as “minimal (light) users” at the expense of the “medium users”, because the real minimal users were practically filtered out. By shifting the cluster centroids the average traffic of “minimal users” increased. The percentage of heavy users did not change. 0 MB TOTAL eDonkey Cluster 1 204,4 MB Encapsulated Cluster 2 22,1 MB eDonkey Cluster 3 139,8 MB eDonkey 1692,2 MB BitTorrent transfer 72,3 MB HTTP 15,5 MB HTTP 60,8 MB BitTorrent transfer 625,4 MB HTTP 59,0 MB eDonkey 11,1 MB BitTorrent transfer 46,9 MB Unknown 286,3 MB eDonkey encrypted 36,0 MB HTTP media stream 6,4 MB eDonkey encrypted 28,4 MB eDonkey encrypted 269,0 MB Unknown 31,1 MB Unknown 4,1 MB HTTP media stream 24,8 MB HTTP 240,5 MB HTTP media stream 21,1 MB BitTorrent transfer 3,3 MB Unknown 17,0 MB BitTorrent encrypted transfer 141,5 MB BitTorrent encrypted transfer 14,2 MB POP3 0,8 MB Ares Encapsulated 13,7 MB 9,9 MB Encapsulated 71,3 MB Untracked 0,6 MB BitTorrent encrypted transfer 7,4 MB HTTP media stream 53,5 MB Ares 9,4 MB eDonkey encrypted 0,4 MB Encapsulated 3,5 MB Ares 47,1 MB Pando 4,1 MB SSL v3 0,4 MB Ares tcp 3,2 MB Thunder UDP 46,0 MB D3.2 Traffic Models Public 36 (74) Project Deliverable CELTIC TRAMMS CP4-025 Thunder UDP 3,8 MB BitTorrent encrypted transfer 0,3 MB Pando 2,4 MB Pando 38,3 MB Untracked 3,2 MB Soulseek 0,2 MB BitTorrent KRPC 2,2 MB Untracked 29,3 MB BitTorrent KRPC 2,7 MB PPLive 0,2 MB Kademlia 2,0 MB BitTorrent KRPC 19,3 MB Ares tcp 2,6 MB TFTP transfer 0,2 MB Untracked 1,7 MB QQ live 17,0 MB Kademlia 1,5 MB RTMP 0,2 MB POP3 1,5 MB PPStream 15,9 MB 10 MB TOTAL Cluster 1 eDonkey 373,9 MB BitTorrent transfer 164,5 MB HTTP Cluster 2 52,9 MB Cluster 3 eDonkey 368,8 MB eDonkey 1363,0 MB eDonkey 43,7 MB BitTorrent transfer 104,4 MB BitTorrent transfer 807,5 MB HTTP media stream 37,1 MB HTTP 93,0 MB eDonkey encrypted 309,0 MB 78,2 MB Encapsulated 28,1 MB eDonkey encrypted 74,1 MB Unknown 241,2 MB 59,9 MB Unknown 6,6 MB HTTP media stream 64,4 MB HTTP 212,6 MB HTTP media stream 55,7 MB BitTorrent transfer 4,8 MB Unknown 51,5 MB BitTorrent encrypted transfer 163,7 MB BitTorrent encrypted transfer 30,5 MB eDonkey encrypted 3,9 MB Ares 24,7 MB Ares 111,8 MB Ares 27,7 MB POP3 2,8 MB BitTorrent encrypted transfer 15,3 MB Untracked 95,1 MB Encapsulated 23,6 MB Adobe Update Manager 2,7 MB Untracked 15,1 MB HTTP media stream 85,8 MB Untracked 21,3 MB 49,1 MB HTTP 93,2 MB eDonkey encrypted Unknown Untracked 2,2 MB Encapsulated 10,0 MB Encapsulated Pando 8,3 MB RTSP media stream 2,2 MB Ares tcp 8,2 MB Thunder UDP 42,4 MB Thunder UDP 7,8 MB RTMP 2,2 MB PPLive 6,9 MB Pando 38,1 MB Ares tcp 6,6 MB Ares 2,0 MB Adobe Update Manager 6,8 MB BitTorrent KRPC 27,5 MB BitTorrent KRPC 6,0 MB PPLive 1,5 MB RTSP media stream 6,2 MB Ares tcp 19,9 MB PPLive 6,0 MB BitTorrent tracker 1,4 MB RTMP 6,2 MB FTP transfer 19,6 MB 30 MB TOTAL Cluster 1 Cluster 2 eDonkey 413,9 MB eDonkey eDonkey 649,01 MB eDonkey 2,30 GB BitTorrent transfer 178,8 MB HTTP 85,3 MB BitTorrent transfer 269,51 MB BitTorrent transfer 1,34 GB HTTP 122,5 MB HTTP media stream 51,9 MB eDonkey encrypted 157,11 MB eDonkey encrypted 0,51 GB Encapsulated 24,3 MB HTTP 148,86 MB Unknown 0,46 GB eDonkey encrypted 91,8 MB 110,3 MB Cluster 3 Unknown 69,6 MB BitTorrent transfer 22,5 MB Unknown 106,65 MB HTTP 0,37 GB HTTP media stream 67,4 MB eDonkey encrypted 16,6 MB HTTP media stream 88,16 MB BitTorrent encrypted transfer 0,29 GB BitTorrent encrypted transfer 34,4 MB Unknown 13,2 MB Ares 58,59 MB Untracked 0,21 GB Ares 31,3 MB Ares BitTorrent encrypted transfer 51,40 MB Encapsulated 0,13 GB Encapsulated 25,6 MB Untracked 3,8 MB Untracked 30,75 MB Ares 0,13 GB Untracked 24,0 MB Adobe Update Manager 3,3 MB Ares tcp 14,83 MB Thunder UDP 0,12 GB 7,2 MB Thunder UDP 9,2 MB RTSP media stream 3,1 MB Pando 14,27 MB HTTP media stream 0,11 GB Pando 9,1 MB RTMP 2,8 MB PPLive 11,78 MB Pando 0,06 GB Ares tcp 7,4 MB FTP transfer 2,4 MB BitTorrent KRPC 11,60 MB BitTorrent KRPC 0,04 GB BitTorrent KRPC 6,6 MB POP3 2,3 MB Encapsulated 10,80 MB NNTP 0,04 GB PPLive 6,4 MB PPLive 2,0 MB FTP transfer 9,64 MB QQ live 0,03 GB 50 MB TOTAL eDonkey Cluster 1 449,1 MB eDonkey Cluster 2 127,7 MB eDonkey Cluster 3 628,8 MB eDonkey 1888,7 MB 1166,7 MB BitTorrent transfer 192,1 MB HTTP 99,6 MB BitTorrent transfer 240,1 MB BitTorrent transfer HTTP 144,0 MB HTTP media stream 56,5 MB HTTP 168,7 MB eDonkey encrypted 461,8 MB eDonkey encrypted 102,3 MB Encapsulated 28,3 MB eDonkey encrypted 156,8 MB Unknown 405,3 MB 105,2 MB HTTP 343,9 MB BitTorrent encrypted transfer 259,6 MB Unknown 76,0 MB BitTorrent transfer 21,1 MB Unknown HTTP media stream 75,5 MB eDonkey encrypted 16,0 MB HTTP media stream D3.2 Traffic Models Public 92,3 MB 37 (74) Project Deliverable CELTIC TRAMMS CP4-025 BitTorrent encrypted transfer 38,0 MB Unknown Ares 55,9 MB Untracked 177,9 MB Ares 34,2 MB Ares 7,6 MB BitTorrent encrypted transfer 46,7 MB HTTP media stream 134,6 MB Encapsulated 27,1 MB RTSP media stream 2,9 MB Untracked 25,8 MB Ares 123,9 MB Untracked 25,2 MB Untracked 2,7 MB Ares tcp 14,7 MB Thunder UDP 94,7 MB Thunder UDP 10,2 MB Pando 9,8 MB 10,0 MB RTMP 2,5 MB Pando 13,1 MB Encapsulated 82,1 MB FTP transfer 2,4 MB Encapsulated 12,7 MB Pando 58,2 MB 10,5 MB Ares tcp 7,9 MB POP3 1,8 MB PPLive BitTorrent KRPC 38,5 MB BitTorrent KRPC 6,6 MB Zattoo TCP 1,8 MB BitTorrent KRPC 9,2 MB NNTP 27,4 MB PPLive 6,4 MB Ares tcp 1,7 MB FTP transfer 8,3 MB FTP transfer 26,1 MB Table 5-9. Top applications per cluster assuming different lower bounds on traffic per day TwoStep Cluster Number 30 1 2 25 3 APPL 20 15 10 5 0 0,00GB 5,00GB 10,00GB 15,00GB 20,00GB traffic_mean_sum Figure 5-11 The average traffic and the number of used applications per user on a 2D plot of those generating at least 10 MB average traffic per day TwoStep Cluster Number 1 25 2 3 APPL 20 15 10 5 0 0GB 5GB 10GB 15GB 20GB traffic_mean_sum Figure 5-12 The average traffic and the number of used applications per user on a 2D plot of those generating at least 30 MB average traffic per day D3.2 Traffic Models Public 38 (74) Project Deliverable CELTIC TRAMMS CP4-025 Figure 5-11 and Figure 5-12 show the clustering results on a 2D plane assuming a traffic limit of 10 and 30 MB, respectively. The dimensions are average generated traffic and the average number of applications. Each point represents one user. The clusters are indicated by different colors. Table 5-9 shows the top applications per clusters assuming different lower bounds on average traffic per day. Despite the traffic limits, we observed the same contributing applications, but with higher average traffic. Regarding the light users, the higher the traffic limit, the more HTTP traffic shifts backward in rank for the top applications, because users generating the least amount of traffic are filtered out. Nonetheless, HTTP traffic still remains important. Interestingly, P2P applications are getting more dominant as the traffic limits are increasing. At the same time new applications also showed up among the top 15 applications (e.g., RSTP media stream, BitTorrent tracker). Interestingly, the “Adobe Update Manager” application also earned a position in the top 15 by generating 3 MB of traffic per day on average. Considering the heavy users, the list of top applications remained unchanged irrespectively of the traffic limits. 5.3.4 Analysis of “minimal users” Those users who generated less than 10 MB daily traffic on average are regarded as “minimal users”. This section gives insight into the statistics of the “minimal user” group. Table 5-10 contains 10% percentile values of the average daily traffic volume per MAC addresses in the case of “minimal users”, while Figure 5-13 shows complementary cumulative distribution function of the traffic generated by light users. Both suggest that the traffic of minimal users is distributed unevenly (just like the total traffic). Statistics traffic_mean_sum N Valid 5801 Missing Percentiles 0 10 0,88 MB 20 2,12 MB 30 3,47 MB 40 4,64 MB 50 5,92 MB 60 7,34 MB 70 9,01 MB 80 11,23 MB 90 15,22 MB Table 5-10 10% percentile values of the average daily traffic volume per MAC addresses in the case of “minimal users” D3.2 Traffic Models Public 39 (74) Project Deliverable CELTIC TRAMMS CP4-025 Figure 5-13: Complementary cumulative distribution function of the traffic generated by light users N Minimum Maximum Mean Std. Deviation HTTP 5801 0,00 MB 9,94 MB 2,11 MB 2,16 MB HTTP media stream 5801 0,00 MB 9,68 MB 0,65 MB 1,02 MB SSL v3 5801 0,00 MB 8,16 MB 0,54 MB 0,83 MB Unknown 5801 0,00 MB 9,64 MB 0,53 MB 1,09 MB Undetermined 5801 0,00 MB 9,49 MB 0,42 MB 1,09 MB POP3 5801 0,00 MB 7,71 MB 0,33 MB 0,79 MB eDonkey encrypted 5801 0,00 MB 9,71 MB 0,33 MB 0,90 MB Kademlia 5801 0,00 MB 9,94 MB 0,21 MB 0,82 MB eDonkey 5801 0,00 MB 9,71 MB 0,16 MB 0,66 MB ICMP 5801 0,00 MB 9,15 MB 0,15 MB 0,54 MB BitTorrent KRPC 5801 0,00 MB 9,98 MB 0,12 MB 0,62 MB Untracked 5801 0,00 MB 9,68 MB 0,12 MB 0,37 MB Encapsulated 5801 0,00 MB 9,15 MB 0,11 MB 0,47 MB RTMP 5801 0,00 MB 4,84 MB 0,11 MB 0,34 MB PPLive 5801 0,00 MB 4,74 MB 0,10 MB 0,31 MB Table 5-11. Descriptive statistics of applications used by “minimal users” Table 5-11 shows statistics of applications used by “minimal users”. The table suggests that most minimal users browse the web (HTTP and HTTP secure) and read emails. These applications are followed by file sharing and P2P applications, which means even light users use them. 5.3.5 Separation using cluster analysis (Swedish network) xviii Cluster measurement data present household usage based on the number of unique applications used together with the amount of data transferred. The bandwidth axis is in logarithmic scale to resolve users with low bandwidth usage and remove the domination of extreme bandwidth usage in D3.2 Traffic Models Public 40 (74) Project Deliverable CELTIC TRAMMS CP4-025 the graph. Below, inbound data refers to when a household downloads data and outbound to when data leaves the household. We separate these to detect differences and similarities of the shape of the clusters. One can then identify common user habits and extreme cases. Figure 5-14 The graph shows a number of households, plotted based on the number of applications used and inbound bandwidth consumed. Technology is FTTH. Measurement from the Swedish network No. 1 between 2007-09-01 00:00 and 2007-10-01 00:00. Total number of households in measurement is 2081 Figure 5-14 and Figure 5-15 show a measurement from the FTTH part of the Swedish municipal network No.1 measured during 30 days, between 2007-09-01 00:00 and 2007-10-01 00:00. Traffic is separated based on direction where Figure 5-14 describes inbound traffic and Figure 5-15 represents outbound traffic. The upper 10% of the households are colored red based on their high bandwidth consumption. Similarly, the lower 10% of the households are colored blue based on low bandwidth consumption. The total number of households measured was 2081. The upper boundary was calculated to 2 GB of inbound data per day and per household and the lower boundary was approximately 4 MB of total data per day and household. For outbound data, the boundaries were calculated to 9 GB and 4 MB. We see a distinct difference in inbound volume and outbound volume. To a large extent, traffic dominating this is P2P file sharing from computers left on around the clock. Another characteristic for both inbound and outbound traffic in the FTTH measurement which is also present in Figure 5-16, which shows the DSL measurement, is the fact that we have a slope from low application and low bandwidth users to high application and high bandwidth users. We also see that the application and protocol usage is quite high, the majority uses more than 20 applications or protocols. One reason for such a high number is the fact that the PacketLogic separates protocols for one specific application. For example, Skype has seven different sub-protocols while HTTP can be divided into HTTP, HTTP media streaming and with the use of SSL v2 and SSL v3, common web browsing is therefore listed as four different applications. D3.2 Traffic Models Public 41 (74) Project Deliverable CELTIC TRAMMS CP4-025 Figure 5-15 The graph shows a number of households, plotted based on the number of applications used and outbound bandwidth consumed. Technology is FTTH. Measurement from the Swedish network No. 1 between 2007-09-01 00:00 and 2007-10-01 00:00. Total number of households in measurement is 2081. Looking further into Figure 5-15, there are a lot of households quite high on the data axis. To put that scale in perspective, every household consuming more than 106 KB use at least the equivalent of data to download or upload one full length movie per day during one month. As we see in Figure 5-16 where the technology is DSL instead of FTTH, fewer people reach that extreme level. This might depend on many different things, including the type and age of the people in the households and the bandwidth of the Internet connection. In the DSL case, we have not divided the traffic into inbound and outbound graphs because we could not find any distinct differences when doing so. Hence, Figure 5-16 represents the total data measured. The DSL measurements were performed between 2008-0202 00:00 and 2008-02-20 00:00 and the total number of households in the measurement was 104. The same color indexing was used in Figure 5-16 as in Figure 5-15 but the boundaries for the DSL case of total data were calculated to 2 GB and 5 MB for the upper and lower boundary, respectively. Figure 5-16 The graph shows a number of households, plotted based on the number of applications used and total bandwidth consumed. Technology is DSL. Measurement from the Swedish network No. 1 between 2008-02-02 00:00 and 2008-02-20 00:00. Total number of households in measurement is 104. D3.2 Traffic Models Public 42 (74) Project Deliverable 5.3.6 CELTIC TRAMMS CP4-025 Separation using the traffic volume of popular applications Figure 5-17, Figure 5-18, Figure 5-19 and Figure 5-20 show histograms of the number of users as a function of traffic volume for some popular applications. Probability Both Figure 5-17 and Figure 5-18 show that the complementary cumulative distribution function of the FTP control and data traffic volume have linear parts on log-log scale indicating that the traffic volume distributions have polynomial relations over two or three orders of magnitude. 0.01 0.001 0.00 0.001MB 0.01MB 0.10MB 1.00MB 10.00MB FTP control average daily traffic volume Figure 5-17: Complementary cumulative distribution function of MAC addresses based on generated FTP control traffic (50% of the MAC addresses do not generate FTP control traffic) Probability Probability 0.10 0.01 0.01 0.001 0.00 0.00 0.01MB 0.10MB 1.00MB 10.00MB 100.00MB 1000.00MB FTP data average daily traffic volume 10.00MB 100.00MB 1000.00MB FTP data average daily traffic volume Figure 5-18: Body and tail of the complementary cumulative distribution function of MAC addresses based on generated FTP data traffic (43% of the MAC addresses do not generate FTP data traffic) Figure 5-19 shows the complementary cumulative distribution function of the traffic volume of MAC addresses that generate HTTP traffic. It can be seen that approximately 80% of the MAC addresses generate less than 10MB HTTP traffic on average during a day. Nevertheless, there are a few MAC addresses that generate more than 1GB HTTP traffic on average. This does not seem to be consistent with the rest of the population and it might be the result of some data transfer applications over HTTP connections and not real web browsing. D3.2 Traffic Models Public 43 (74) Project Deliverable CELTIC TRAMMS CP4-025 1.00 Probability 0.80 0.60 0.40 0.20 0.00 0.01MB 0.10MB 1.00MB 10.00MB 100.00MB 1000.00 MB 10000.00 MB HTTP average daily traffic volume Figure 5-19: Complementary cumulative distribution function of MAC addresses based on generated HTTP traffic (5% of the MAC addresses do not generate HTTP traffic) Figure 5-20 shows the complementary cumulative distribution function of the traffic volume of MAC addresses that generate P2P traffic. One can see here that approximately 80% of the MAC addresses generate less than 100 MB of data on average during a day. The remaining 20% contains the “heavy users” whose average daily traffic volume is in the order of GB. 1,0000 Probability 0,8000 0,6000 0,4000 0,2000 0,0000 1,00 MB 10,00 MB 100,00 MB 1000,00 MB 10000,00 MB 100000,00 MB Average daily traffic volume of P2P file transfer Figure 5-20: Complementary cumulative distribution function of MAC addresses based on generated P2P traffic (BitTorrent, eDonkey, BitTorrent encrypted, eDonkey encrypted, PPLive) The distributions of the population of MAC addresses were shown in this subsection based on the traffic volume of three popular applications. There is a group containing MAC addresses with an “average” behavior and there is another group containing MAC addresses with “heavy user” behavior. It is a matter of further analysis whether the groups of “average HTTP” and “average P2P” MAC addresses and the groups of “heavy user HTTP” and “heavy user P2P” MAC addresses correlate to each-other. It is also a matter of further analysis whether these groups correlate to the groups defined by the cluster analysis in Section 5.3.1. 5.4 Subscriber activities In Deliverable 3.1 we analyzed the number of active MAC addresses contributing to the total traffic and some popular applications categories, namely FTP, HTTP and P2P traffic. Here we are summarizing the main conclusions. The number of active MAC addresses was available on a daily basis in an 11 day long measurement. A MAC address is assumed to be active if at least one byte of traffic is generated by that MAC address during the measurement interval. D3.2 Traffic Models Public 44 (74) Project Deliverable CELTIC TRAMMS CP4-025 The user activity was quite steady during every day of the measurement, though some minor drops in user activity was observed during the weekend. This observation applies to the total traffic and to the individual application categories as well. Regarding FTP traffic, the daily number of active MAC addresses ranged between 300 and 500, which is 5-9% of the total population. On the other hand, 2900 MAC addresses generated (at least once in the interval) FTP traffic during the 11 days, which means 50% penetration. The difference of the two values suggests that many of the MAC addresses generated FTP traffic for a few days only during the measurement and did not return later. Considering HTTP traffic, the number of daily users was around 3500-4500 (60-77%), while the total penetration (meaning those users who have generated HTTP traffic at least once) was 5500 (95% of the population). The numbers indicate that, compared to the case of FTP, more MAC addresses returned and generated HTTP traffic several times during the measurement. About the File Sharing Traffic we observed that practically there is no drop in activity during the weekend days. This may happen because the P2P applications can work autonomously, without human control. The number of daily users varies between 4100 and 4200 (70-72%), which is to be compared to 5300 (92% of the population). This concludes that the ratio of returning subscribers is even higher than for the HTTP traffic. The user activity statistics show again the significant difference between HTTP (mostly web traffic) and File Sharing traffic: File Sharing traffic is steady in time, though the penetration is lower than web surfing. Web traffic, on the other hand, varies more in time and has a higher total penetration. In the case of File Sharing traffic, the difference is the smallest between the daily and the total penetration, suggesting that file sharing applications are “always on”. 5.5 5.5.1 Application volume, Packet and Session share Comparison of applications usage in different networks and technologies In order to avoid differences in the amount of unrecognized traffic between the Swedish network no.1 and the Spanish network only the traffic from the following application groups has been considered for this comparison: web browsing, P2P file sharing and multimedia streaming. In all the networks and technologies most of the traffic belongs to one of these groups. In Table 5-12, Table 5-13 and Table 5-14 the share of the application groups in the downlink, uplink and total traffic for different technologies and networks is depicted: Application group Web Browsing P2P File Sharing Multimedia streaming Downlink traffic Swedish network No.1 FTTH DSL 7,06% 20,62% 88,27% 65,98% 4,67% 13,40% Spanish network CMTS GGSN 12,10% 60,49% 80,85% 26,56% 7,05% 12,95% Table 5-12. Share of the application groups in the downlink traffic for different technologies and networks (only web browsing, P2P file sharing and multimedia streaming traffic are considered, measurements in the period from October 2007 to March 2008). D3.2 Traffic Models Public 45 (74) Project Deliverable Application group Web Browsing P2P File Sharing Multimedia streaming CELTIC TRAMMS CP4-025 Uplink traffic Swedish network No.1 FTTH DSL 0,41% 2,65% 99,39% 96,74% 0,20% 0,61% Spanish network CMTS GGSN 1,86% 20,70% 96,97% 76,51% 1,17% 2,79% Table 5-13. Share of the application groups in the uplink traffic for different technologies and networks (only web browsing, P2P file sharing and multimedia streaming traffic are considered, measurements in the period from October 2007 to March 2008). Application group Web Browsing P2P File Sharing Multimedia streaming Total traffic Swedish network No.1 FTTH DSL 2,38% 13,28% 96,07% 78,65% 1,55% 8,07% Spanish network CMTS GGSN 7,42% 50,02% 88,22% 39,71% 4,36% 10,27% Table 5-14. Share of the application groups in the total traffic for different technologies and networks (only web browsing, P2P file sharing and multimedia streaming traffic are considered, measurements in the period from October 2007 to March 2008). As far as the uplink traffic is concerned, in the fixed networks (FTTH, DSL and CMTS) P2P file sharing is responsible for more than 97% of the traffic regardless of the technology. In the case of downlink traffic, in the fixed networks (FTTH, DSL and CMTS) the P2P file sharing generates an important amount of traffic depending on the technology (from 66% to 88%). Approximately 60% of the rest of the traffic corresponds to web browsing and 40% to multimedia streaming. Regarding the mobile network (GGSN) the amount of web browsing traffic is five times higher than multimedia streaming. Compared to the fixed networks, the P2P file sharing traffic in the mobile network is lower in uplink (77% of the traffic) and much lower in downlink (27% of the traffic). In the downlink direction the mobile network traffic is mainly web browsing (61% of the traffic). 5.5.2 Applications with high user penetration xviii User penetration is a very important factor when looking at network traffic. Applications with high user penetration often load themselves automatically when the computer is started and runs in the background when the user is not actively using the computer. An excerpt from a database measurement in the Swedish municipal network No. 1 between 2007-0901 00:00 and 2007-10-01 00:00 is displayed in Table 5-15. During this time, the total number of households seen sending or receiving traffic was 2081. The Internet access technology used was FTTH. Not surprisingly, the HTTP protocol sits at the top of the list. HTTP as used by the World Wide Web is widely adopted by Internet users. The reason that SSL also has a place near the top is probably due to the fact that most e-business and web based e-mail sites use it for security. Some protocols, like SSL are actually separated by the PacketLogic into SSL v2 and SSL v3. In Table 5-15 and Table 5-16, they are merged into one. BitTorrent, being the most popular P2P file sharing protocol in Sweden, can also be found in the top. The e-mail protocol POP3 is found rather far down on the list. Reasons for this include that web based e-mail increases in popularity and that a newer protocol called IMAP often is used instead of POP3. The HTTP protocol is further up on the list than the DNS protocol in the FTTH measurement. Users of HTTP and the World Wide Web mainly use domain names instead of IP addresses. Translation from a domain name to an IP address implicates the DNS protocol. The explanation for the rather low usage D3.2 Traffic Models Public 46 (74) Project Deliverable CELTIC TRAMMS CP4-025 of DNS compared with that of HTTP is because in the FTTH network, users have the possibility to choose their ISP. Different ISPs use different setups for their DNS servers. Since some ISP's DNS servers are located on the user side of the PacketLogic measurement point not all DNS traffic pass the measurement equipment, and thus do not contribute to the measurement. In Table 5-16, which shows the top 13 used applications or protocols for the DSL customers in the Swedish municipal network No. 1, HTTP and SSL are as widely used as in Table 5-15. In comparison with Table 5-15, the signature of Windows update was added to the signature database by Procera Networks and we find it at 95% which will give a good hint about the usage of Microsoft Windows in this rather small population. The total number of hosts active during the measurement interval was 104. The measurements for the results in Table 5-16 were done between 2008-02-02 00:00 and 200802-20 00:00. Number of active households Percent Application or protocol 2068 99.3% HTTP 1975 94.9% SSL 1911 91.8% ICMP 1850 88.9% HTTP media stream 1795 86.3% BitTorrent 1794 86.2% NTP 1769 85.0% DNS 1768 85.0% SOAP over HTTP 1630 78.3% Ares 1593 76.6% eDonkey 1571 75.5% MSN messenger 1287 61.8% RTP 1273 61.2% Napster 1239 60.0% RTSP media stream 805 38.7% Skype 752 36.1% POP3 Table 5-15 User penetration results from the traffic database. The measurements are from the Swedish network No. 1 between 2007-09-01 00:00 and 2007-10-01 00:00. The total number of households were 2081 during the time and the Internet access technology was FTTH. D3.2 Traffic Models Public 47 (74) Project Deliverable CELTIC TRAMMS CP4-025 Number of active households Percent Application or protocol 104 100.0% HTTP 104 100.0% DNS 102 98.1% SSL 99 95.1% Windows update 94 90.4% HTTP media stream 85 81.7% NTP 83 79.8% ICMP 73 70.2% BitTorrent 69 66.3% RTP 63 60.6% MSN messenger 58 55.8% RTSP media stream 54 51.9% POP3 51 49.0% Microsoft Online Crash Analysis Table 5-16 User penetration results from the traffic database. The measurements are from the Swedish network No. 1 between 2008-02-02 00:00 and 2008-02-20 00:00. The total population was 104 and the Internet access technology was DSL. 5.5.3 Traffic volume distribution xviii Analyzing the traffic based on different applications and protocols is very important and will make traffic analysis easier. Both capacity planning for new access networks and QoS configuration are examples that can benefit from volume distribution analysis. There are several ways to display volume measurements, e.g. pie charts, tables, bar charts. We have chosen to use bar charts since it is possible to extract a lot of information from a bar chart, mostly regarding inbound and outbound traffic volumes. Figure 5-21 and Figure 5-22 show results from measurements in the Swedish municipal network No. 1. In Figure 5-21, the names BT trans and BT enc refers to BitTorrent transfer and BitTorrent encrypted transfer, respectively. HTTP ms stands for HTTP media stream. In Figure 5-22, DC trans means Direct Connect transfer which is a P2P file sharing application. Looking at Figure 5-21 and Figure 5-22, the clearly most bandwidth dominant protocol is unsurprisingly BitTorrent. Many research articles in the area of traffic measurements report P2P file sharing as the most bandwidth intense category. In Sweden, BitTorrent is the near-standard way of file sharing compared with other countries. For example, TRAMMS partners in Spain report that eDonkey and Direct Connect are the most popular file sharing applications, shown in TRAMMS D3.1. From Figure 5-21, compared with other traffic measurements, it can be seen that HTTP media stream is beginning to become rather bandwidth intense. Probably YouTube and similar video sites are the explanation as they are attracting more viewers. D3.2 Traffic Models Public 48 (74) Project Deliverable CELTIC TRAMMS CP4-025 Figure 5-21 The graph shows traffic divided into application and protocol categories based on volume. Technology is DSL. Measurement from the Swedish network No. 1 between 2008-02-02 00:00 and 2008-02-20 00:00 Figure 5-22 The graph shows traffic divided into application and protocol categories based on volume. Technology is FTTH. Measurement from the Swedish network No. 1 between 2007-0901 00:00 and 2007-10-01 00:00. D3.2 Traffic Models Public 49 (74) Project Deliverable 6 6.1 CELTIC TRAMMS CP4-025 APPLICATION CHARACTERISTICS Web Video on Demand In this subsection we investigated the traffic of two popular web based video sharing systems, YouTube and Metacafe xix. Compared to D3.1, our analysis was extended with an interesting content popularity analysis presented in Section 6.1.1. YouTube that is described in a number of publications xx,xxi,xxii is a video sharing website where users can upload, view and share video clips. YouTube was created in February 2005 by three former PayPal employees. The service uses Adobe Flash technology to display a wide variety of usergenerated video content, including movie clips, TV clips and music videos, as well as amateur content such as video-blogging and short original videos. In October 2006, Google Inc. acquired the company for US$1.65 billion in Google stock. Unregistered users can watch most videos on the site, while registered users are permitted to upload an unlimited number of videos. Some videos are available only to users of age 18 or older. Few statistics are publicly available regarding the number of videos on YouTube. However, in July 2006, the company revealed that more than 100 million videos were being watched every day, and 2.5 billion videos were watched in June 2006. 50,000 videos were being added per day in May 2006, and this increased to 65,000 by July. In January 2008 alone, nearly 79 million users watched over 3 billion videos on YouTube. YouTube's video playback technology is based on Macromedia's Flash Player 9 and uses the Sorenson Spark H.263 video codec. YouTube files contain an MP3 audio stream. By default, it is encoded in mono at a bit rate of 64kbps sampled at 22 kHz, giving an audio bandwidth of around 10 kHz. The default bit rate delivers acceptable but not hi-fi audio quality. Figure 6-1: Comparison of YouTube and Metacafe web-based video sharing systems (Spanish Network measurement (GGSN+CMTS) from 2008-02-19 to 2008-02-29) D3.2 Traffic Models Public 50 (74) Project Deliverable CELTIC TRAMMS CP4-025 Figure 6-2: Comparison of YouTube and Metacafe web-based video sharing systems (Spanish Network measurement (GGSN+CMTS) from 2008-03-07 to 2008-03-30) Figure 6-3: Comparison of YouTube and Metacafe web-based video sharing systems (Spanish Network measurement (GGSN+CMTS) from 2008-02-19 to 2008-02-29) Figure 6-4: Comparison of YouTube and Metacafe web-based video sharing systems (Spanish Network measurement (GGSN+CMTS) from 2008-03-07 to 2008-03-30) D3.2 Traffic Models Public 51 (74) Project Deliverable CELTIC TRAMMS CP4-025 Figure 6-1 shows the total traffic generated by YouTube and Metacafe websites for both downlink and uplink direction. YouTube generates several orders of magnitude more traffic than others, thus it is more popular among Spanish users. Figure 6-2 shows the result of a second, 24 day long measurement. Although this measurement was longer, the ratios of traffic are very similar. Interestingly the daily fluctuation of the video sharing traffic changes quite unpredictably according to Figure 6-4. It shows no clear weekly periodicity. In fact YouTube is more popular than not only Metacafe, but any other website, generating almost 600 GB of downlink traffic in a period of 11-days which is about 14.8% of the total downlink web traffic and 2.0% of the total downlink traffic. The traffic seems to be somewhat smaller on weekdays and larger on the weekend (23rd and 24th February appears as a peak in the curve). The calculated downlink per uplink traffic ratio is 43.14 and 47.31 for YouTube and Metacafe, respectively. 6.1.1 YouTube content popularity analysis The aim of this analysis is to investigate when users are viewing YouTube videos and what the most popular contents are. This way we can draw conclusions about user activity, user behavior, traffic intensity and content popularity. Naturally, we are not focusing on the content of the video itself, but want to determine certain properties (e.g., intensity, popularity distribution) of the collection of videos. To achieve this, we set up a special firewall rule in PacketLogic, which filtered out all HTTP GET requests containing the following query string: “http://www.youtube.com/watch?v=”. The equation sign is followed by the 11 character long YouTube content ID, which is the subject of the investigation. PacketLogic logged and dumped all IP packets containing the given pattern. By processing the trace, it is possible to the extract content IDs and times when the videos were viewed. The latter is determined by the packet arrival time. The output of the packet dump procession is a text file containing only the content IDs and times; this text file is later loaded into a database system for further analysis. Figure 6-5 shows the number of viewed videos per hour throughout the 16 day long measurement, which is an estimation of the user activity (and the traffic intensity as well). The user activity seems more intense on weekdays and lower on the weekend (confirmed by Figure 6-6 as well). In Figure 6-6 the final day of the measurement is cut so that it contains samples of exactly two weeks; this way no distortion is introduced in the sampling. We applied the same technique as we calculated the daily distribution (see Figure 6-7 below) of the access times. Number of YouTube videos viewed (per hour) 70 60 50 40 30 20 10 0 0h 12h 0h 12h 0h 12h 0h 12h 0h 12h 0h 12h 0h 12h 0h 12h 0h 12h 0h 12h 0h 12h 0h 12h 0h 12h 0h Time (f rom Mon, 17 Nov 2008 16:31, to Tue, 2 Dec 2008 09:56) 12h 0h Figure 6-5: Number of viewed YouTube videos per hour (Swedish network measurement (DSL) from 2008-11-17 to 2008-12-02) D3.2 Traffic Models Public 52 (74) Project Deliverable CELTIC TRAMMS CP4-025 Figure 6-6: Weekly distribution of viewed YouTube videos per day (Swedish network measurement (DSL) from 2008-11-17 16:31 to 2008-12-01 16:34) Figure 6-7 shows the daily distribution of the videos; apparently, user activity is higher in the afternoon and evening hours. The busy hours start at around 4 PM, which is the typical time when people return home from work. The peak hours can be observed around 6-7 PM. It is known that the population mostly consists of home users. Thus this kind of behavior was expected. Taking all things into consideration, statistics of YouTube users and traffic intensity may be different in another environment, even though employees in a business environment may also watch YouTube videos during working hours. Number of YouTube videos viewed (per 15 mins) 100 90 80 70 60 50 40 30 20 10 0 0 AM 3 AM 6 AM 9 AM 12 PM 15 PM Time of the day 18 PM 21 PM 24 PM Figure 6-7: Daily distribution of viewed YouTube videos per hour (Swedish network measurement (DSL) from 2008-11-17 to 2008-12-02) Finally, Figure 6-8 shows the popularity distribution of the YouTube videos. The videos were ranked according to the number of times they had been watched. Figure 6-8 (left) shows the popularity distribution on a linear-linear scale, while the right subfigure shows the same on a linear-logarithmic scale. They suggest an exponential-like decrease in popularity. Consequently, the popularity of the content is definitely not even; a limited number of videos are extremely popular, while others are watched rarely. D3.2 Traffic Models Public 53 (74) CELTIC TRAMMS CP4-025 15 15 12 12 Number of views Number of views Project Deliverable 9 6 6 3 3 0 9 0 500 1000 1500 YouTube videos (ranked) 2000 0 0 10 1 2 3 10 10 10 YouTube videos (ranked) 4 10 Figure 6-8: Ranking of YouTube videos according to popularity on linear-linear scale (left) and logarithmic-linear scale (right) (Swedish network measurement (DSL) from 2008-11-17 to 2008-12-02) 6.2 Video streaming PacketLogic defines precisely the applications belonging to the “Streaming Media” category; we decided to use the same classification. According to PacketLogic the “Streaming Media” category contains the following applications and subcategories (organized in a hierarchy): • • • Audio o Last.fm client o social.fm Peer-to-Peer o MySee o P2P-Radio o PeerCast o RawFlow o SopCast o TVUPlayer o TvAnts tcp o TvAnts udp Video o Abacast o EBS lecture o HTTP RealPlayer stream o HTTP media stream o Joost o Live Delivery Network o LocationFree player o MMS o Miro o Octoshape o Octoshape o Octoshape discovery o PPLive o PPStream D3.2 Traffic Models Public 54 (74) Project Deliverable • CELTIC TRAMMS CP4-025 o RTCP o RTMP o RTMPT o RTP o RTSP o RTSP media stream o Radegast o STTV o Slingbox media stream o SpotLife o StreamerOne o Chumby o Nabaztag Toys In Deliverable 3.1 we investigated the characteristics of video streaming traffic in different access networks (CMTS and GGSN measurement points). The general observation was (as Figure 6-9 and Figure 6-10 suggest) that there is a significant difference between the two access types in terms of data volume. Figure 6-9 and Figure 6-10 show the total amount of video-streaming traffic for downlink and uplink directions. The downlink direction exceeds the uplink direction significantly. After analyzing several measurements made in the same CMTS network at different times, we realized that the daily amounts of data do not differ significantly; it varies between 120 and 180 GB in downlink direction. The amount of video-streaming traffic generated in mobile networks (at the GGSN measurement point) is a fraction of the traffic measured at the CMTS measurement point. Video-Stream ing video traffic in CMTS m easurem ent 180 160 140 Traffic (GB) 120 100 Inbound traffic Outbound traffic 80 60 40 20 0 March 10. Mon11. 12. 13. 14. 15. 16. 17. Time (days) 18. 19. 20. 21. 22. 23. Sun Figure 6-9: Fluctuation of video-streaming video traffic (Spanish Network measurement (CMTS) from 2008-03-10 to 2008-03-23) D3.2 Traffic Models Public 55 (74) Project Deliverable CELTIC TRAMMS CP4-025 Video - Streaming video traffic in GGSN-Internet measurement 1.4 Video/Streaming video in Video/Streaming video out 1.2 traffic (GB) 1 0.8 0.6 0.4 0.2 0 February 19 Tuesday 20 21 22 23 24 time (days) 25 26 Tuesday 27 28 29 Friday Figure 6-10: Fluctuation of video-streaming video traffic (Spanish Network measurement (GGSN) from 2008-02-19 to 2008-02-29) 6.3 Web traffic analysis In this subsection, the distribution of web traffic among different websites and domains was investigated. Large web services often distribute the traffic load between numerous web servers. These servers usually belong to the same domain, but since the request URLs of the servers are different, it is not straightforward to calculate the aggregated traffic. We managed to calculate the aggregated traffic by grouping the request URLs based on the domain (like “facebook”) substring and top domain (like “com”) part. Host part and other sub-domains are omitted. In this way we could determine the exact amount, which is of interest, of traffic generated by each service. Table 6-1 shows the top 30 websites generating the largest amount of inbound traffic in the 24 day long measurement. YouTube is clearly the number one web service generating around 1.3 TB of traffic during 24 days. The results tell us that web based file sharing services (e.g., megaupload.com, rapidshare.com) also generate significant traffic. Such services work the following way: the user may upload the desired file to a website and the system sends out e-mails to those who the user wants to share the content with. Adult content sites are apparently also very popular, including primarily video sharing sites. Google search engine is at the 6th place with almost 164 GB of data in 24 days, which is surprising, since this is a service providing mostly textual content. However, the considerable traffic amount can be easily explained with the extreme popularity of the search engine. Social networking websites seem also very popular among users. Tuenti xxiii is a Madrid-based, social networking website that has been referred to as the “Spanish Facebook”. Tuenti is targeted at the Spanish audience. The site is currently accessible only to those who have been invited. Myspace.com is the most popular International social network site suggested by the fact that myspacecdn.com (the location where MySpace stores photos) is at position 22. Naturally, as expected, some local (in this case meaning Spanish sites, since the investigated measurement was made in Spain) websites are also in the top list. Besides the local social networking site, other popular sites include elcorreodigital.com and elmundo.es. They provide news and media content. Official websites of software giants Microsoft and Apple also generated large traffic volumes most of all by offering software downloads for customers. Microsoft’s dedicated software update site, which provides the updates for all Windows-based computers, generated the 3rd largest traffic in the network. Microsoft’s search engine (offering various services as well) Live.com has position 19 in the top list. Microsoft’s information portal msn.com is at position 29. D3.2 Traffic Models Public 56 (74) Project Deliverable CELTIC TRAMMS CP4-025 The traffic of the Panda Security software company’s website (pandasoftware.com) is likely stemming from the antivirus update requests. Rank Site name Traffic inbound (MB) 1 youtube.com 1299373.5 2 megaupload.com 820211.2 3 windowsupdate.com 222035.8 4 megarotic.com 210257.2 5 rapidshare.com 187761.1 6 google.com 163999.3 7 llnwd.net 142528.0 8 redtube.com 107147.6 9 microsoft.com 102370.9 10 playstation.net 100670.2 11 youporn.com 89200.9 12 dailymotion.com 85900.5 13 tuenti.com 69744.5 14 apple.com 66917.4 15 veoh.com 66325.6 16 gigasize.com 60016.6 17 elcorreodigital.com 56595.0 18 edgesuite.net 53447.4 19 live.com 48652.6 20 pandasoftware.com 47963.2 21 elmundo.es 45829.0 22 myspacecdn.com 45188.3 23 xvideos.com 41006.6 24 porkolt.com 40401.9 25 pornhub.com 37253.4 26 photobucket.com 37151.5 27 pajilleros.com 35581.3 28 imageshack.us 33271.2 29 msn.com 31696.8 30 ytimg.com 31530.4 Table 6-1, Top 30 web domains (web services) ranked in the order of generated inbound traffic (Spanish Network measurement from 2008-03-07 to 2008-03-30) D3.2 Traffic Models Public 57 (74) Project Deliverable 6.3.1 CELTIC TRAMMS CP4-025 Top domain analysis This section summarizes the results and conclusions of the domain analysis presented in Deliverable 3.1. Figure 6-11, as an example from the 1st Spanish network measurement, shows the top 5 domains based on the generated downlink traffic. (The uplink traffic is negligible in case of HTTP traffic, namely the users download about 20 times more web traffic than that they upload). The largest part of HTTP traffic is provided by .com domain and other international domains (.net, .org). Spanish domain (.es) is in the top 5 as well, since the measurement was carried out in Spain. The percentage of unknown HTTP traffic (containing IP addresses that could not be resolved to domain names) is relatively high. This traffic may not have been generated by websites, rather by other applications using the HTTP protocol. Total traffic inbound (GB) 3500 3000 2500 2000 1500 1000 500 0 com other net .es org Domains Figure 6-11: Top 5 domains ranked by generated downlink web traffic (Spanish Network measurement from 2008-02-19 to 2008-02-29) Figure 6-12 shows the subsequent domains ranked by the generated download traffic; it contains mainly the domains of European countries. Figure 6-12: Subsequent domains ranked by generated downlink web traffic (Spanish Network measurement from 2008-02-19 to 2008-02-29) Figure 6-13 shows the contribution of all domains of the total downlink web traffic. The most significant part of the traffic is related to international domains and the local homeland domain. The downlink and uplink profiles are not significantly different. D3.2 Traffic Models Public 58 (74) Project Deliverable CELTIC TRAMMS CP4-025 Distribution of total inbound traffic among domains Spanish Network measurement: 7th March 2008 ‐ 30th March 2008 0,07% 0,07% 0,09% 0,10% 7,30% 0,10% 4,54% 0,91% 0,12% 0,70% 2,96% 0,13% 83,61% 0,17% 0,18% 0,18% com net .es org .tv .us 0,07% 0,07%0,07%0,06% .fm 0,05% .to 0,04% 0,03% nfo 0,03% .de 0,03% 0,03% .fr 0,03% .uk 0,02% 0,02% .br 0,02% .ru 0,02% .cn 0,02% 0,02% .eu 0,02% biz 0,02% 0,02% 0,01% 0,01% .ar .pt edu .ve .nl 0,18% .it .se .cz .be .hu .ie .jp .ws gov .cm .ee 0,46% .ch cat .ro .mx .cl .is .co 0,22% others 0,19% Figure 6-13, Share of domains in the total generated downlink traffic (Spanish Network measurement from 2008-03-07 to 2008-03-30) 6.4 P2P file sharing There are several popular file sharing applications generating significant traffic volumes (including often illegal content) on the Internet. In this subsection the aggregated file sharing traffic and the traffic of the most popular applications (e.g., eDonkey, BitTorrent and Gnutella) are investigated in detail in several types of access networks. This section summarizes the analysis results from Deliverable 3.1. Gnutella xxiv is a file sharing network supported by several clients with varying capabilities. Gnutella is a fully distributed system based on P2P technology. The network is never completely stable, since peers are constantly joining and leaving the system. In the original version of Gnutella all clients were regarded as equal, search requests were performed by flooding the search message all over the overlay network. These features made searching and data downloading quite unreliable and ineffective and raised scalability problems. This observation inspired the development of distributed hash tables (which are much more scalable but support only exact-match, rather than keyword search) and the introduction of ultra-peers and leaf-nodes. The eDonkey network is a decentralized, server-based, peer-to-peer file sharing network used primarily to exchange audio files, video files and computer software. Like most file sharing networks, it was decentralized; files were not stored on a central server but were exchanged directly between users based on the peer-to-peer principle. eDonkey supports both server-based and DHT-based searching in the most recent clients. Direct connect is a peer-to-peer file-sharing protocol. Direct connect (DC) clients connect to a central hub and can download files directly from one user to another. DC hubs are central servers to which clients connect, thus the networks are not as de-centralized as Gnutella or FastTrack. Hubs provide D3.2 Traffic Models Public 59 (74) Project Deliverable CELTIC TRAMMS CP4-025 information about the clients, as well as file searching and chat capabilities. File transfers are done directly between clients, in true peer-to-peer fashion. BitTorrent is a P2P file sharing communications protocol. BitTorrent is a method of distributing large amounts of data widely without the original distributor incurring the entire costs of hardware, hosting, and bandwidth resources. Instead, when data is distributed using the BitTorrent protocol, each recipient supplies pieces of the data to newer recipients, reducing the cost and burden on any given individual source, providing redundancy against system problems, and reducing dependence on the original distributor. Usage of the protocol accounts for significant traffic on the Internet. There are numerous compatible BitTorrent clients (e.g., written in a variety of programming languages, and running on a variety of computing platforms). FastTrack is a P2P protocol, used by the Kazaa (and variants, Grokster and iMesh) file sharing programs. In 2003, FastTrack was the most popular file sharing network, being mainly used for the exchange of music mp3 files. Popular features of FastTrack are the ability to resume interrupted downloads and to simultaneously download segments of one file from multiple peers. Also, the search for a certain keyword is optimal: if the search is not stopped or timed out, FastTrack finds a source for the search if one exists. The network had approximately 2.4 million concurrent users at its peak in 2003, afterwards the number of users decreased significantly. We analyzed several measurements originating from different access networks and different technologies (e.g., ADSL network, FTTH network, mobile network). The figures include daily and weekly fluctuation of the traffic and specific applications. The figures included in this section are a small selection of those in Deliverable 3.1. File Transfer - P2P Traffic in CMTS m easurem ent Inbound traffic Outbound traffic Traffic (GB) 2000 1500 1000 500 0 March 10. Mon 12. 13. 14. 15. 16. 17. Time (days) 18. 19. 20. 21. 22. 23. Sun Figure 6-14: Fluctuation of P2P file sharing traffic (Spanish Network measurement (CMTS) from 2008-03-10 to 2008-03-23) Figure 6-14 and Figure 6-15 show the fluctuation of file sharing traffic in CMTS and GGSN networks. Naturally, significantly less traffic is generated in the mobile network for the same reasons mentioned earlier: high cost of data transfer and lack of bandwidth. D3.2 Traffic Models Public 60 (74) Project Deliverable CELTIC TRAMMS CP4-025 File Transfer - P2P traffic in GGSN measurement 4.5 File transfer/p2p in File transfer/p2p out 4 3.5 traffic (GB) 3 2.5 2 1.5 1 0.5 0 February 19 Tuesday 20 21 22 23 24 time (days) 25 26 Tuesday 27 28 29 Friday Figure 6-15: Fluctuation of P2P file sharing traffic (Spanish Network measurement (GGSN) from 2008-02-19 to 2008-02-29) File Sharing daily traffic in FTTH - peak hour measurment 45 in out total 40 35 traffic (GB) 30 25 20 15 10 5 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 time (hours) 14 15 16 17 18 19 20 21 22 23 Figure 6-16: Average daily fluctuation of P2P file sharing traffic (Swedish municipal network No1 measurement (FTTH) from 2007-08-20 to 2007-10-21) Figure 6-16 shows a huge difference between upload and download traffic in the FTTH network, the upload/download ratio is about 3. In the case of file sharing, the uploaded data is typically larger than the downloaded. This behavior is especially true for peers with high uplink bandwidth. In a typical scenario these peers will serve other peers (who generally have relatively fast downlink, but slow uplink). D3.2 Traffic Models Public 61 (74) Project Deliverable CELTIC TRAMMS CP4-025 A clear weekly profile seems to appear in Figure 6-17. The traffic load is higher on the weekend and lower on weekdays. The uplink traffic seems to show an increasing trend throughout the measurement, though we cannot give a sound explanation for this phenomenon. File Sharing Traffic Volume Per Day 8000 7000 Traffic volume (GB) 6000 5000 Inbound Outbound 4000 3000 2000 1000 Monday Monday Monday Monday Monday Time (day) Monday Monday Monday Figure 6-17: Fluctuation of P2P file sharing traffic (Swedish municipal network No1 measurement (FTTH) from 2007-08-20 to 2007-10-21) We can also draw some general (and partly obvious) conclusions based on the measurements: • The higher the access speed, the more traffic is generated. We have the strong impression that if more bandwidth is offered for the user, it will be used mostly for file sharing, or it will not be utilized at all. • The higher the access speed, the larger the difference between upload and download traffic in the favor of upload. This behavior can be explained by the general working of P2P file sharing networks. Peers with high uplink bandwidth are rare. Therefore exceptional peers with high uplink will be well utilized, because all other peers will download data from them. We also analyzed some specific, most voluminous P2P file sharing applications in terms of number of users, transferred data in uplink and downlink directions, daily and weekly profiles, and ranking of users according to traffic volume. We investigated the following applications which turned out to generate the latgest traffic volume: eDonkey, BitTorrent and Gnutella. According to the results, the most popular file sharing application in the measurement was eDonkey generating 32.48 TB of total traffic, 2.95 TB daily (1.39 TB downlink, 1.56 TB uplink). BitTorrent finished in second place with 11.03 TB of total traffic, 1.00 TB daily (0.52 TB downlink, 0.48 TB uplink). In the case of BitTorrent the upload/download ratio, contrary to eDonkey, was over 1.0. Traffic volumes and shares of the total traffic were similar in the measurements, since only a short time elapsed between them. The 3rd most voluminous application was Gnutella, but its traffic was negligible compared to the others. D3.2 Traffic Models Public 62 (74) Project Deliverable CELTIC TRAMMS CP4-025 When comparing the number of active users (e.g., Figure 6-18), eDonkey is seen to have more than two times the users than BitTorrent, while the number of Gnutella users is rather small. In general the number of users seems to be steady during the first measurement interval. We could see similar numbers in all measurements. No clear weekly profile can be recognized in the figures. The most important facts are summarized in Table 6-2. Number of users (P2P file sharing) 1800 1600 1400 Number of users 1200 1000 800 600 400 200 EDonkey BitTorrent 0 19 Tue 20 21 22 23 24 25 Time (days) 26 Tue 27 28 29 Fri Figure 6-18: Fluctuation of the number of BitTorrent and eDonkey users (Spanish Network measurement (CMTS) from 2008-02-19 to 2008-02-29) We ranked BitTorrent and eDonkey users according to the generated traffic; the ranking of BitTorrent users (Figure 6-19) showed a clear exponentially decreasing tail distribution, while the ranking of eDonkey users did not follow a clear trend. In addition, it was also interesting that the penetration of both applications was surprisingly high: about 1800 user use BitTorrent (31%), while almost 3300 user used eDonkey at least once during the measurement (56%). Ranking of Bittorrent users 4 10 3 10 2 10 1 Traffic (GB) 10 0 10 -1 10 -2 10 -3 10 -4 10 0 200 400 600 800 1000 1200 Rank of users 1400 1600 1800 2000 Figure 6-19: Ranking of BitTorrent users (Spanish Network measurement (CMTS) from 2008-02-19 to 2008-02-29) D3.2 Traffic Models Public 63 (74) Project Deliverable CELTIC TRAMMS CP4-025 Table 6-2, Table 6-3 and Table 6-4 show the basic traffic profiles of the file-sharing applications in the two Spanish network measurements. BitTorrent and eDonkey are clearly the most popular applications generating significant traffic volume. The daily traffic of an average BitTorrent or eDonkey user is also considerable. Direct Connect and the downfallen Kazaa applications generate minor traffic in the Spanish measurement. Gnutella seems to have few users in Spain. However, the average generated daily traffic per user is not negligible. This may be due to the fact that Gnutella is popular elsewhere, outside Spain. Application Traffic downlink (GB) Traffic uplink (GB) Total traffic (GB) ↓Total number of users EDonkey 15336 17144 32481 3246 BitTorrent 5755 5277 11032 1759 Kazaa 1 1 1 119 Gnutella 36 69 106. 70 Direct Connect 0 0 0 2 Table 6-2, Traffic comparison of file sharing applications (Spanish Network measurement (CMTS) from 2008-02-19 to 2008-02-29) Application Traffic downlink (GB) Traffic uplink (GB) Total traffic (GB) ↓Total number of users EDonkey 18752 19487 38240 3289 BitTorrent 6262 5783 12045 1814 Kazaa 2 1 2 131 Gnutella 43 74 117 95 Direct Connect 0 0 0 2 Table 6-3, Traffic comparison of file sharing applications (Spanish Network measurement (CMTS) from 2008-03-10 to 2008-03-23) Spanish Network measurement from 2008-02-19 to 2008-02-29 Application Users/day Traffic/user/day (downlink, MB) Spanish Network measurement from 2008-03-10 to 2008-03-23 Users/day ↓ Traffic/user/day (downlink, MB) Edonkey 1559 893 1404 951 BitTorrent 643 813 551 809 Gnutella 18 185 17 177 Kazaa 15 4 15 7 0 0 0 53 Direct Connect Table 6-4, Comparison of file-sharing application in terms of average number of daily users and daily traffic D3.2 Traffic Models Public 64 (74) Project Deliverable 6.5 CELTIC TRAMMS CP4-025 P2p telephony and VoIP Skype is currently the most popular P2P VoIP network; the number of registered users, in 2006, went beyond 100 million. Users can initiate and receive voice and video calls to/from other Skype users, or even PSTN users using SkypeOut/SkypeIn. Moreover, instant messaging (chat) and file transfer is also supported within the Skype infrastructure. The overlay network contains a huge number of peers (or ordinary nodes) among whom some are promoted to be super node. Super nodes are the switching elements in the overlay network and (among others) responsible for maintaining a Global Index distributed directory, which allows users to find each other. There are also some dedicated components of the network, (e.g., login servers, update servers and buddy-list servers). These central entities are operated by Skype. All communications between the Skype network entities are strongly encrypted. More detailed information about Skype network entities, operation and identification techniques can be found in papers xxv,xxvi. 6.5.1 Skype traffic Several types of Skype traffic are recognized by PacketLogic traffic analyzer (Figure 6-20). Among them, the P2P component is the most dominating part. TCP transfer is mainly used for file transfer only, because Skype tries to avoid the usage of TCP for voice and video transfer. Skype traffic components 30 Traffic inbound (GB) 25 20 15 10 5 0 discovery login version check Hub2Hub P2P SSL TCP UDP Figure 6-20: Components of Skype (downlink) traffic (Spanish Network measurement from 2008-02-19 to 2008-02-29) Comparing Figure 6-21 and Figure 6-22, it can be observed that the amount of Skype traffic is significantly lower in the mobile network (GGSN measurement point) than in the fixed cable TV network (CMTS measurement point). This observation is interesting, since Skype is well applicable and even cost effective in mobile networks for two main reasons: the bandwidth provided by (highspeed) mobile networks (3G and HSDPA) is sufficient for Skype. The cost of Skype usage should be calculated according to the transmitted amount of data which is charged on a per megabyte basis or even on a flat rate basis (depending on the tariff package). Even if it is charged on a per megabyte basis, it can still be cheaper than traditional calls (charged on a rather expensive per minute basis). According to Figure 6-22, however, the amount of Skype traffic is still negligible. Weekly periodicity can be seen neither in Figure 6-21 nor in Figure 6-22. The upload/download ratio also varies around 1.0. However, in theory, it should be close to one considering that a single Skype call usually generates symmetric traffic. D3.2 Traffic Models Public 65 (74) Project Deliverable CELTIC TRAMMS CP4-025 Total Skype traffic in GGSN network 0.18 Skype traffic inbound Skype traffic outbound 0.16 0.14 traffic (GB) 0.12 0.1 0.08 0.06 0.04 0.02 0 January 1 Tuesday 7 Monday 14 Monday time (days) 21 Monday 28 Monday 31 Thursday Figure 6-21: Fluctuation of total Skype traffic (Spanish GGSN Network measurement from 2008-01-01 to 2008-01-31) Skype traffic appears to be more significant in fixed networks (Figure 6-22). Regarding the first 11 days long measurement, it generates a daily average traffic of 4.02 GB and 3.55 GB in downlink and uplink direction. Assuming that the total traffic consists of voice calls only, this amount of data would correspond to about 200 hours of speech calculating with the old Skype voice codec, or about 100 hours calculating with the latest codec. The equivalent average Minutes of Usage per user (MOU) would be 1.76 and 0.88, respectively. Total Skype traffic 5.5 inbound outbound 5 4.5 4 Traffic (GB) 3.5 3 2.5 2 1.5 1 0.5 0 19 Tue 20 21 22 23 24 25 time (days) 26 Tue 27 28 29 Fri Figure 6-22: Fluctuation of Skype traffic (Spanish CMTS Network measurement from 2008-02-19 to 2008-02-29) The number of Skype users (Figure 6-23) was quite stable during the whole measurement period and varied around 290 every day which means a penetration of 5%. According to the measurement about 2200 users generated MSN traffic throughout the 11-days measurement, which means a total penetration of 38%. However, for many users PacketLogic detected minor daily traffic, which is rounded to zero, but still included in the log making the penetration higher (as opposed to this, zero activity is not even included in the traffic log.). D3.2 Traffic Models Public 66 (74) Project Deliverable CELTIC TRAMMS CP4-025 Skype user activity 350 300 Number of users 250 200 150 100 50 0 19 Tue 20 21 22 23 24 25 time (days) 26 Tue 27 28 29 Fri Figure 6-23: Fluctuation of Skype users (Spanish Network measurement from 2008-02-19 to 2008-02-29) Figure 6-24 shows the ranking of Skype users according to the generated traffic. The straight curve in the whole range suggests a clear exponential tail of generated traffic volume distribution. Rank of users 4 10 3 10 2 Traffic (log GB) 10 1 10 0 10 -1 10 -2 10 0 200 400 600 Rank of users 800 1000 1200 Figure 6-24: Rank of Skype users according to the generated (downlink) traffic (Spanish Network measurement from 2008-02-19 to 2008-02-29) The daily traffic of an average Skype user and the average number of Skype users per day are visible in Table 6-5 along with the same statistics for other multimedia applications for comparison. Figure 6-25 shows the traffic pattern of active and inactive users of Skype in the Swedish municipal network No. 1. The measurements have been done with the PacketLogic appliance. The PacketLogic differentiates between users whose usage are below 1 kbps and users who use more than 1 kbps as D3.2 Traffic Models Public 67 (74) Project Deliverable CELTIC TRAMMS CP4-025 an average during 5 minute intervals, so this limit was chosen as the differentiator between active and inactive users. This means that if a user has generated more than 1 kbps during 5 minutes, this user is classified as active. Two graphs are showed inFigure Figure 6-26, the upper line describes the number of users that have been seen running Skype in every measured 5 minute period, but not using more than 1 kbps of Skype bandwidth. The lower line shows the number of Skype users that use Skype and generate more than 1 kbps of Skype traffic during the measured 5 minutes. The measurement has been conducted during 10 days. In the graph, we see that in the peak hour over 130 users were logged in to their Skype account, and 10 were active, out of the total 3687 IP addresses investigated. This means that 8% of the logged in Skype users use more than 1 kbps of Skype traffic during a 5 minute average. xviii Figure 6-25 Average daily traffic pattern showing active and total number of users seen using Skype during a 10 day measurement in the FTTH part of the Swedish network No. 1. 6.5.2 MSN Messenger (Windows Live Messenger) traffic Windows Live Messenger (formerly called as MSN Messenger) is the instant messaging solution of Microsoft. MSN (unlike Skype) is a centralized system; it does not use P2P technology. In addition it offers other features, like voice calls, video conferencing, file transfer, etc. Several 3rd party clients have been also released for other platforms. However, they may not support all the features. According to Figure 6-26 the amount of traffic produced by MSN has the same order of magnitude as the traffic of Skype. MSN generates an average daily traffic of 3.51 GB downlink and 3.24 GB uplink. D3.2 Traffic Models Public 68 (74) Project Deliverable CELTIC TRAMMS CP4-025 Windows Live Messenger total traffic 5.5 5 4.5 4 Traffic (GB) 3.5 3 2.5 2 1.5 1 0.5 0 19 Tue 20 21 22 23 24 25 Time (days) 26 Tue 27 28 29 Fri Figure 6-26: Fluctuation of total MSN Messenger (Windows Live Messenger) traffic (Spanish Network measurement from 2008-02-19 to 2008-02-29) Comparing Figure 6-23 and Figure 6-27 it can be recognized that MSN has more than two times as many users as Skype. MSN has 732 users (12.6% of the total number of users) on an average day with small variance. About 3800 users generated MSN traffic throughout the 11-days measurement, which means a total penetration of 65.6%, which is unexpectedly high even though MSN is considered to be probably the most popular online application. The daily number of MSN users was found to be similar in all measurements. 900 800 700 Number of users 600 500 400 300 200 100 0 19 Tue 20 21 22 23 24 25 Time (days) 26 Tue 27 28 29 Fri Figure 6-27: Fluctuation of the number of Windows Live Messenger users (Spanish Network measurement from 2008-02-19 to 2008-02-29) The ranking of Windows Live Messenger users Figure 6-28 shows again an exponential decrease. D3.2 Traffic Models Public 69 (74) Project Deliverable CELTIC TRAMMS CP4-025 Ranking of Windows Live Messenger users 2 10 1 10 0 Traffic (log GB) 10 -1 10 -2 10 -3 10 -4 10 0 500 1000 1500 Rank of users 2000 2500 Figure 6-28: Ranking of Windows Live Messenger (MSN Messenger) users according to generated total traffic (Spanish Network measurement from 2008-01-01 to 2008-01-31) The daily traffic of an average MSN user and the average daily number of MSN users are visible in Table 6-5 along with the same statistics of other multimedia applications for comparison. Spanish Network measurement from 2008-02-19 to 2008-02-29 Application Skype Yahoo Messenger MSN (Win. Live) Users/day Traffic/user/day (downlink, MB) Spanish Network measurement from 2008-03-10 to 2008-03-23 Users/day ↓Traffic/user/day (downlink, MB) 290.10 13.8 253.50 14.3 9.7 6.5 8.8 6.3 731.91 4.8 567.71 5.2 Table 6-5, Comparison of multimedia application in terms of average number of daily users and daily traffic Yahoo Messenger is also denoted in the table, though its traffic was negligible both in terms of volume and number of users. Using the definition that an active MSN messenger user is one that consumes more than 1 kbps in total network traffic solely for the MSN messenger application, we constructed an average daily traffic pattern graph. The measurement in Figure 6-29 is from the Swedish municipal network No. 1. The measurement period was two weeks and averaged over each 5 minute period of the day. Two things are notable in the figure, few MSN messenger users use more than 1 kbps of transferred data and the average number of online users is quite high during the early hours of the day. It is evident that those computers were left on during the night, probably with the purpose of file sharing. During the peak hour at 20:00, we see that almost 500 users are logged in to their MSN messenger account. The active part is although not more than 10, giving the percentage that only 2% of the MSN messenger users use more than 1 kbps of MSN messenger traffic during a 5 minute average. The measurement period was between 2007-09-01 00:00 and 2007-09-11 00:00. xviii D3.2 Traffic Models Public 70 (74) Project Deliverable CELTIC TRAMMS CP4-025 Figure 6-29 Average daily traffic pattern graph showing number of users consuming more than and less than 1 kbps using the MSN application on the FTTH network. Measurements from the Swedish municipal network No. 1. 7 7.1 CONCLUSIONS/DISCUSSION Description of Aggregate Traffic According to the Spanish network measurement, heavy users seem to be active throughout the whole day. They dominate the most during the night hours, when light users leave the network. The average traffic rate per active user remains stable throughout the day, but increases significantly during the night hours. By comparing several daily profiles of different countries and technologies, it can be concluded that the shape of the daily traffic patterns depends on the subscriber type of the network (residential, enterprise, academic), and that there is a common daily traffic pattern for the networks that have mainly residential users. Previous experiments suggest that the amount of downlink and uplink traffic depends on the access technology (CMTS, DSL or FTTH). In the Swedish network we can see that all households generate much file sharing traffic. The FTTH traffic is very asymmetric, with more uplink traffic than downlink traffic. The DSL households show a more balanced traffic pattern, and in the evening they have a more classical Internet traffic pattern, with mostly downlink traffic. Also, the FTTH households seem to use their broadband access for mainly file sharing applications such as Bit Torrent, whereas the DSL households have less file sharing and thereby a larger share of HTTP traffic. The CMTS traffic in the commercial Spanish network is also downlink dominated, but not as pronounced as the DSL traffic in the Swedish network. The network with the most pronounced downlink traffic is the wireless access network in the commercial Spanish network. In terms of access capacity, the faster the access links, the larger the uplink traffic share of the total traffic. As far as the Spanish Network is concerned, the traffic volume studies have shown that the daily profile is quite different for the fixed and mobile networks. In the first case there is a peak (in the evening) and a valley (in the morning), whereas in the second one there are several peaks and valleys throughout the day. Comparing the daily profile of the Spanish fixed network with the other fixed access networks under study, we can see that the CMTS access network is quite similar to the DSL network in Sweden. In both cases the downlink traffic is the main responsible for the total traffic, however during the early hours of the day the amount of uplink is bigger than the downlink. D3.2 Traffic Models Public 71 (74) Project Deliverable 7.2 CELTIC TRAMMS CP4-025 Application usage Regarding the application usage it should be noted that the Peer-to-Peer applications are mainly responsible for the traffic volume in the Spanish fixed Network. eDonkey is the main application in use, in comparison with the Swedish Networks where BitTorrent is clearly dominant. The user activity statistics show a significant difference between HTTP (mostly web traffic) and File Sharing traffic (in the Spanish network): File Sharing traffic is steady in time, though the penetration of it is lower than web surfing. Web traffic, on the other hand, varies more in time and has a higher total penetration. In the case of File Sharing traffic, the difference is the smallest between the daily and the total penetration, suggesting that file sharing applications are “always on”. The comparative study of applications usage in different networks and technologies reveals several interesting findings. As far as the uplink traffic is concerned, in the fixed networks (FTTH, DSL and CMTS) the P2P file sharing is responsible for more than 97% of the considered traffic regardless of the technology. In the case of the downlink traffic, in the fixed networks (FTTH, DSL and CMTS) the P2P file sharing generates an important amount of traffic depending on the technology (from 66% to 88%). The application penetration analysis in the Swedish network shows that the HTTP protocol (used by World Wide Web) is at the top of list. BitTorrent finishes second, as the most popular P2P file sharing protocol in Sweden. The results suggest that IMAP mail access protocol is more popular than the old POP3 counterpart. The penetration of Windows update (95%) gives us a good hint about the usage of Microsoft Windows in this rather small population. We investigated the distribution of web traffic among different websites and domains in the Spanish network. The results tell that YouTube and other web based file sharing services dominate the web traffic in terms of traffic volume. Adult content sites, Google search engine, and social networking websites are also very popular. According to the domain statistics of web traffic (Spain), international domains (.com, .org, .net) carry the highest portion of web traffic. In addition, however, the local domain (.es) also appeared in the top 5. YouTube, on its own, produced about 2% of the total traffic and 15% of web traffic. The YouTube content popularity analysis also exposed several interesting findings. We regarded the “number of viewed videos per hour” a good estimation of the user activity and the traffic intensity as well. The user activity seems more intense on weekdays and lower on the weekend. Apparently, the user activity is higher in the afternoon and evening hours. The rank curve indicates that the popularity of the contents is not even; a limited number of videos are extremely popular, while others are watched rarely. VoIP and instant messaging application also have significant penetration in the population (e.g., Skype has a daily penetration of 5%, while MSN has 13%). However, they produce much less traffic than the file sharing and video distribution applications. In the Spanish network we observed much less Skype traffic in the mobile network than in the fixed cable TV network. The Skype traffic analysis suggests that around 8% of the logged on Skype users generate traffic higher than 1 kbps, which may be the indication of conducting voice calls. MSN messenger was found to be the most popular instant messaging application in the Spanish and Swedish networks. Though only 2% of the MSN messenger users used more than 1 kbps of MSN messenger traffic, which may show that MSN messenger is used mainly for presence and chatting, which does not require high bandwidth. 7.3 Clustering of users Based on the clustering of subscribers, a few general conclusions can be made. First of all, there is a small group of subscribers who generate huge amount of – mainly P2P – traffic, while there are subscribers whose traffic demands are much more moderate and the ratio of web browsing is more significant in their traffic mix, although they use P2P applications as well. Another general conclusion is that the traffic of the heavy users seems to be constant over the repeated measurements, while the moderate subscribers seem to gradually increase their activity. Some partly natural conclusions can be made from the experiment when a daily traffic limit was set up. The clustering process classified more users as “minimal (light) user” at the expense of the “medium D3.2 Traffic Models Public 72 (74) Project Deliverable CELTIC TRAMMS CP4-025 users” because the real minimal users were practically filtered out in advance. The percentage of heavy users did not change. The number of users meeting the criterion dropped sharply; the average traffic per user also rose; and the standard deviation of the generated traffic per user increased significantly as well. Regarding the light users, the higher the traffic limit, the more HTTP traffic is shifting backward in the ranking of the top applications. P2P applications are more dominant as the traffic limit is increased. Some new, interesting applications showed up among the top 15 characteristic applications. Considering the heavy users, the list of top applications remained unchanged irrespectively of the traffic limits. The analysis of minimal users suggests that their traffic is distributed unevenly (just like the total traffic). Most minimal users browse the web (HTTP and HTTP secure) and read emails. Even light users use P2P applications. The analysis of popular applications (HTTP, P2P applications, FTP) shows that there are huge differences between users according to the generated traffic of a certain application. I.e. traffic is uneven at the aggregate level and also at the level of individual applications. 8 i REFERENCES http://www.maxmind.com/app/geolitecountry ii R. Braden, D. Clark, and S. Shenker. “Integrated Services in the Internet Architecture: An Overview”, IETF RFC 1633 (1994). iii S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, and W. Weiss. “An Architecture for Differentiated Services”, IETF RFC 2475 (1998). iv MRTG website: http://oss.oetiker.ch/mrtg/ v A. Odlyzko. “Data networks are lightly utilized, and will stay that way”, Review of Network Economics, vol. 2, no. 3, pp. 210-237, Sept 2003. vi C. Fraleigh, F. Tobagi, and C. Diot. “Provisioning IP Backbone Networks to Support Latency Sensitive Traffic”, Proc. of IEEE Infocom, San Francisco, USA. 2003 vii C. Fraleigh. “Provisioning Internet Backbone Networks to Support Latency Sensitive Applications”, Ph.D. thesis, Stanford University, June 2002. viii J. van den Berg, M. Mandjes, R. van de Meent, A. Pras, F. Roijers, and P. Venemans. “QoS-aware bandwidth provisioning for IP links”, Computer Networks 50, 631-647 (2006). ix TRAMMS Deliverable "D3.1 – Traffic Characterization", May 2008 x Procerea Networks home page, http://www.proceranetworks.com xi http://www.cisco.com/go/netflow xii http://en.wikipedia.org/wiki/Netflow xiii http://www.wireshark.org/ xiv http://www.tcpdump.org/ xv http://www.maxmind.com/app/geolitecountry xvi R. Guimerà, L. Danon, A. Díaz-Guilera, F. Giralt, and A. Arenas, “Self-similar community structure in organizations”. xvii K.Papagiannaki, N. Taft, S. Bhattacharyya, P. Thiran, K. Salamatian, and C. Diot, “A pragmatic definition of elephants in Internet backbone traffic,” Proc. Internet Measurement Workshop, 2002. xviii T. Bonnedal, “Traffic Measurement and Analysis in Fixed and Mobile Broadband Access Networks”, Master Thesis, LTH, to be published (2009) D3.2 Traffic Models Public 73 (74) Project Deliverable xix CELTIC TRAMMS CP4-025 Metacafe website: http://www.metacafe.com/ xx P. Gill, M. Arlitt, Z. Li, and A. Mahanti. “YouTube Traffic Characterization: A View From the Edge”, Proc. of the ACM SIGCOMM Internet Measurement Conference (IMC), San Diego, USA. Oct. 2007. xxi Xu Cheng, Cameron Dale, Jiangchuan Liu, “Understanding the Characteristics of Internet Short Video Sharing: YouTube as a Case Study”, cs.NI Networking and Internet Architecture (cs.MM Multimedia), 2007. xxii Michael Zink, Kyoungwon Suh, Yu Gu, Jim Kurose, “Watch global, cache local: YouTube network traffic at a campus network: measurements and implications”, Multimedia Computing and Networking 2008. xxiii Wikipedia, The free encyclopaedia, www.wikipedia.org xxiv Wikipedia, The free encyclopaedia, www.wikipedia.org xxv Marcell Perényi, András Gefferth, Trang D. Dang and Sándor Molnár, “Skype Traffic Identification”, Proc., 50th IEEE Global Communications Conference (GLOBECOM), pages 399-404, Washington, DC, USA, 2007. xxvi Marcell Perényi and Sándor Molnár, “Enhanced Skype Traffic Identification”, Proc. 2nd Int. Conf. on Performance Evaluation Methodologies and Tools (VALUETOOLS), Nantes, France, 2007. D3.2 Traffic Models Public 74 (74)