Communication Systems Peer-To-Peer Networking
Transcription
Communication Systems Peer-To-Peer Networking
Mühlenpfordtstraße 23, 38106 Braunschweig, Germany Email: wolf@ibr.cs.tu-bs.de l5_p2p_e.fm 1 6.August.02 3.2 Processing Sharing with Central Server 3.3 Decentral File Sharing 3.4 Anonymous File Sharing 4. Second Generation P2P Networks 4.1 Decentral File Sharing with Supernodes 4.2 File Sharing with Charging 4.3 Distributed Search Engine 4.4 P2P Virus Protection 5. GRID Computing 6. Problems in P2P Networking 7. Annex: Literature about P2P l5_p2p_e.fm 3 6.August.02 Kommunikationssysteme: Peer-to-peer 3.1 File Sharing with Central Server L2 Data Link Layer (Sicherung) L1 Physical Layer (Bitübertragung) WAN: ISDN & ATM Web Telnet Files Email Internet: IP LAN, MAN High-Speed LAN Media Data Flow RT(C)P Transport Network Other Lectures of “ET/IT” & Computer Science Introduction One of the newest buzzwords in networking is Peer-to-Peer (P2P). Is P2P a hype? • 40 million Napster users in 2 years • strong presence in international networking conferences • strong support by industry (e.g. Intel, Sun, Deutsche Bank) • mayor traffic source, e.g. 10’2001 at TU Munich: ~40% P2P, ~45% Web l5_p2p_e.fm 4 6.August.02 Security Network Layer (Vermittlung) MM COM - QoS specific L3 Internet: TCP, UDP Moblie IP Transport Layer (Transport) IP-Tel: Signal. H.323 SIP 1. Motivation IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig 3. First Generation P2P Protocols/Applications Kommunikationssysteme: Peer-to-peer IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig 2. Peer To Peer Networking (Basics) L4 P2P l5_p2p_e.fm 2 6.August.02 Overview 1. Motivation Application Layer (Anwendung) Transitions & Addressing Applications L5 Mobile Communications TU Braunschweig Institut für Betriebssysteme und Rechnerverbund Complementary Courses: Multimedia Systems, Distributed Systems, Mobile Communications, Security, Web, Mobile+UbiComp, QoS Kommunikationssysteme: Peer-to-peer Prof. Dr.-Ing Lars Wolf IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig Communication Systems Peer-To-Peer Networking Kommunikationssysteme: Peer-to-peer IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig Scope 2nd generation (since 90ties): • WWW & graphical browsers • dynamic IP addresses / NAT* (network address translation) / roaming users • heterogeneous applications • asymmetric server based services • protocol: HTTP ⇒ World Wide Web * NAT (network address translation): • clients in a (company) network do not get a global IP address, only a local one: 192.168.XX.XX • gateway has a global IP address and translates local addresses ⇒ impossible to open a connection from outside to a client in a NAT network l5_p2p_e.fm 5 6.August.02 2000 - the P2P revolution? Kommunikationssysteme: Peer-to-peer ⇒ World Wide Access Evolution of Internet Computing Paradigms IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig (1) 1st generation (since the beginning of the Internet): • permanent IP adresses • static domain name system (DNS) mapping • always connected • limited specialized centralized applications • protocols: Telnet, FTP, Gopher, .... Kommunikationssysteme: Peer-to-peer IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig Evolution of Internet Computing Paradigms l5_p2p_e.fm 6 6.August.02 Litmus test for P2P: 1. does it treat variable connectivity as the norm? e.g. does it support dial-up users with variable IP addresses 2. does it give the nodes at the edges of the network significant autonomy? e.g. is storage / processing done by autonomous end-systems ⇒ if the answer to both is YES then the application is P2P otherwise it is not. See Andy Oram: Peer-To-Peer / Harnessing the Power of Disruptive Technologies, O’Reilly 2001 l5_p2p_e.fm 7 6.August.02 • nodes act both as clients and servers: ⇒ "SERVer + cliENT = SERVENT" • P2P application easy to use and well integrated Kommunikationssysteme: Peer-to-peer Peer-to-peer (P2P) is a class of applications that takes advantage of resources storage, cycles, human presence - available at the edges of the Internet. Because accessing these decentralized resources means operating in an environment of unstable connectivity and unpredictable IP addresses, peer-topeer nodes must operate outside the DNS and have significant or total autonomy from central servers P2P Architecture Characteristics IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig C. Shirkey: Kommunikationssysteme: Peer-to-peer IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig 2. Peer To Peer Networking (Basics) DEFINITION OF P2P NETWORKING 3rd generation (since 2000): • more collaboration and personalized applications • powerful edge devices (peers) • instant networking • protocols/applications: • Napster • Gnutella • Fasttrack • Freenet • ... • P2P applications operate outside the domain name system (DNS) • Napster/Fasttrack username • dynamic IP address • ... • P2P applications operate in unstable environments • e.g. dial-up connections, users disconnect and change their IP often l5_p2p_e.fm 8 6.August.02 (2) Cost effectiveness • reduces centralized management resources • optimizes computing, storage and communication resources • rapid deployment P2P applications/protocols tailored for user’s needs • Napsters success depended to a great amount on its ease of use l5_p2p_e.fm 9 6.August.02 Highly attractive content • users share their content with other users ⇒ attractive content • copyrights are usually not respected ⇒ cheap content Kommunikationssysteme: Peer-to-peer Group collaboration superior for Business Processes • grow organically, non-unifom and highly dynamic • largely manual, ad-hoc, iterative and document-intensive work • often distributed, not centralized • no single person/organisation understands the entire process from beginning to end IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig Why P2P Networking? New Services at the edge of the network Kommunikationssysteme: Peer-to-peer IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig Why P2P Networking? basic network node ⇒ P2P can learn a lot from the according Internet solutions (routing algorithms etc) ⇒ P2P and the Internet are based on very much the same principles (but on different layers) Kommunikationssysteme: Peer-to-peer overlay network node 3. First Generation P2P Protocols/Applications IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig Kommunikationssysteme: Peer-to-peer IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig P2P networks form an overlay network on top of the Internet IP network l5_p2p_e.fm 11 6.August.02 processing power l5_p2p_e.fm 10 6.August.02 The Principles of P2P and the Internet IP rsp. the Internet • form an overlay network (politically and technically) over the underlying telecom infrastructure • introduced their own addressing scheme • e.g. user names • emphasized fault-tolerance • are based on the end-to-end principle: as much intelligence as possible to the end nodes Unused resources • assume e.g. a large business with 2000 desktop computers: • storage space: 2000 x 10 GB = 20 TB spare storage space • processing power: 2000 x 600 MHz x 5 ops/cycle = 6 trillion ops/sec spare First approaches for • Distributed Collaboration/Communication • P2P groupware, P2P content generation • P2P instant messaging • Online games • Distributed Storage • P2P file sharing, online backups • Distributed Computing • P2P CPU cycle sharing • Distributed simulation • Distributed Search Engines, Intelligent Agents l5_p2p_e.fm 12 6.August.02 (2) l5_p2p_e.fm 13 6.August.02 Kommunikationssysteme: Peer-to-peer • first famous P2P filesharing tool • for downloading mp3 files only • central P2P network • decentral storage (content at the edges) • central server (file index, search engine) • file transfer between clients • decentralization as a tool, not a goal • users as providers and consumers • napster user namespace instead of • fixed IP addresses • domain name system (DNS) 3.2 Processing Sharing with Central Server IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig see e.g. http://www.napster.com/ Kommunikationssysteme: Peer-to-peer IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig 3.1 File Sharing with Central Server main differences to Napster • no central server • all kinds of files are shared • beyond filesharing (more flexibility) l5_p2p_e.fm 15 6.August.02 Kommunikationssysteme: Peer-to-peer decentral filesharing tool parallel analysis of radio signals SETI = Search for Extraterrestrial Intelligence about 50 GB of data coming in from the Arecibo telescope per day distributed via central server to millions of processing-clients client = screensaver (using idle CPU cycles) uses unused processing power of desktop computers • architecture similar to napster • started 1998 until october 2000 4x1020 floating point operations performed (largest computation ever performed) • similar idea: GRID computing • • • • Gnutella: Architecture IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig Kommunikationssysteme: Peer-to-peer IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig http://dss.clip2.com/GnutellaProtocol04.pdf http://gnutella.wego.com/ http://www.gnutella.co.uk/ • first famous P2P network for sharing processing power for the massively l5_p2p_e.fm 14 6.August.02 3.3 Decentral File Sharing see e.g. see e.g. http://setiathome.ssl.berkeley.edu/ Characteristics • each node keeps a database of known and connected nodes • message broadcasting for node discovering and search requests • each message can be identified by a globally unique ID (GUID) • flooding (to all connected nodes) is used to distribute information • nodes recognize message they already have forwarded by their GUID and do not forward them twice Functionality of the Protocol • connecting • PING message: Actively discover hosts on the network • PONG message: Answer to the PING messages, includes information about one connected Gnutella servent • search • QUERY message: Searching the distributed network • QUERY HIT message: Response to a QUERY message (can contain several matching files of one servent) • data transfer • HTTP is used to transfer files (HTTP GET) • PUSH message: To circumvent firewalls l5_p2p_e.fm 16 6.August.02 ping servent 2 ping servent 1 pong servent 2 ping pong servent 1 servent 2 l5_p2p_e.fm 17 6.August.02 Kommunikationssysteme: Peer-to-peer servent 1 IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig Gnutella: Connecting - The Gnutella Net In order to connect to a Gnutella network the servent must initially know (at least) one member node of the network and connect to it • these first member nodes must be found by other means (IRC, Web, ...) • nowadays host caches are usually used Kommunikationssysteme: Peer-to-peer IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig Gnutella: Connecting servent 2 query servent 1 servent 2 query QUERY message is distributed to all connected nodes (flooding) • a node that receives a QUERY message increases the HOP count field of the message and forwards it to all nodes (except the one he received it from) if • HOP <= TTL (Time To Live) • A QUERY message with this GUID was not received before (2) • the node also checks wether he can answer to this QUERY with a QUERYHIT message (that is wether he has files available that match the search criteria) Kommunikationssysteme: Peer-to-peer query Gnutella: Searching IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig servent 1 Kommunikationssysteme: Peer-to-peer IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig Gnutella connection (TCP on specified Port) The servent connects to a number of nodes he has PONG messages from and thus becomes part of the Gnutella net l5_p2p_e.fm 18 6.August.02 Gnutella: Searching l5_p2p_e.fm 19 6.August.02 servent 1 servent 1 query hit servent 2 • the QUERYHIT message contains the IP address of the sender plus information about one or more files that match the search criteria. The information is not flooded but passed back the same way the QUERY took. l5_p2p_e.fm 20 6.August.02 • if the servent with the file is behind a firewall, then • the downloading servent cannot initiate a connecting • he can instead send the PUSH message, asking the other servent to instead initiate a HTTP connection to him and push the file via it • does not work if both servents are behind firewalls l5_p2p_e.fm 21 6.August.02 QUERY 54.8% PONG 26.9% PING 14.8% QUERY HIT 2.8% PUSH 0.7% ⇒ 41.7% of the messages just for network discovery PING rate 500 Nodes 4000 Nodes 8000 Nodes 1/min 4.8 68.2 194.9 1/sec 288 4090.4 11694.5 Average bandwidth usage (kbps) per node for search messages: Search rate 500 Nodes 4000 Nodes 8000 Nodes 1/min 2.5 36.8 127.0 1/sec 151.0 2211.0 7617.3 ⇒ Low bandwidth clients easily use up all their available bandwidth for passing on PING and QUERY messages of other users, leaving no bandwidth for up- and downloads Free Rider • exist in big anonymous communities • selfish individuals that opt out of a voluntary contribution to the community social welfare (i.e. by not sharing any files but downloading from others) • ... and get away with it Kommunikationssysteme: Peer-to-peer Average bandwidth usage (kbps) per node for peer discovery : Gnutella Problem: Free Riders IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig (2) A TTL (Time To Live) of 4 hops for the PING messages leads to a known topology of rougly 8000 nodes! The TTL in the original Gnutella client was 7! Kommunikationssysteme: Peer-to-peer IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig Frequency Distribution of Gnutella Messages (Portmann et. al.): l5_p2p_e.fm 22 6.August.02 Gnutella Problem: Scalability Issues l5_p2p_e.fm 23 6.August.02 Gnutella suffers from a range of scalability issues due to the decentral approach and the flooding of messages. Kommunikationssysteme: Peer-to-peer • HTTP GET is used instead IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig Gnutella Problem: Scalability Issues • data transfer is not part of the Gnutella protocol ! Kommunikationssysteme: Peer-to-peer IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig Gnutella: Data Transfer Study results (Adar/Hubermann): • 70% of the Gnutella users share no files • 90% answer no queries Solutions • incentives for sharing • servents only accept connections / forward messages from servents that share content • but: how to verify? quality of the content? • micropayment • see Mojo Nation l5_p2p_e.fm 24 6.August.02 japanese movies british movies Gnutella connection Gnutella Node IP Route IP Node l5_p2p_e.fm 25 6.August.02 german movies l5_p2p_e.fm 27 6.August.02 ⇒ This overlay network has a better topology e.g. Freenet see http://freenet.sourceforge.net/ Kommunikationssysteme: Peer-to-peer Overlay Topology • G. Pandurangan et.al.: Building Low-Diameter P2P Networks • G. Pandurangan et.al.: Building P2P networks with good topological properties british movies 3.4 Anonymous File Sharing IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig Kommunikationssysteme: Peer-to-peer IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig Free Riders • E. Adar, B. Hubermann: Free Riding on Gnutella • P. Golle et.al.: Incentives for Sharing in Peer-to-Peer Networks japanese movies l5_p2p_e.fm 26 6.August.02 Gnutella: Related Work Scalability • M. Portmann et. al.: The Cost of Peer Discovery and Searching in the Gnutella Peer-to-peer File Sharing Protocol • J. Guterman: Gnutella to the Rescue? Not so Fast, Napster fiends. • K. Sripanidkulchai, Carnegie Mellon University: The popularity of Gnutella queries and its implications on scalability • Clip2.com: Bandwidth Barriers to Gnutella Network Scalability (2) search for german movie Kommunikationssysteme: Peer-to-peer german movies IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig Gnutella Problem: Overlay Topology Design search for german movie Kommunikationssysteme: Peer-to-peer IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig Gnutella Problem: Overlay Topology Design Documents are encrypted, cut into several pieces and stored on several machines • provides anonymity for users • author of document cannot be identified • prohibits censorship of documents • as document pieces are copied on several machines • removes any single point of failure or control • decentral network • provide plausible deniability for node operators • as they cannot read the content on their discs l5_p2p_e.fm 28 6.August.02 Charging for Content and Services • battling free riders Adaptation to Business Applications • e.g. distributed search engine • e.g. virus protection l5_p2p_e.fm 29 6.August.02 see www.fasttrack.nu www.musiccity.com, www.kazaa.com, www.grokster.com gift.sourceforge.net Developer: Fasttrack Kommunikationssysteme: Peer-to-peer Performance Improvements • decentral networks with supernodes 4.1 Decentral File Sharing with Supernodes IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig Currently the second generation of P2P networks is emerging Kommunikationssysteme: Peer-to-peer IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig 4. Second Generation P2P Networks l5_p2p_e.fm 31 6.August.02 take the role of the central server • supernodes answer search messages of the other nodes • one or more supernodes can drop out without problems • additionally the communication between nodes is encrypted example: posted to server t=1 Rev 1.0 Kommunikationssysteme: Peer-to-peer • extends Freenet • defines with electronic money (mojos) • mojos are earned by offering resources to the Mojo Nation network • storage space • processing power • bandwidth • mojos can be spent on • searching • downloading • neither completely central nor decentral: • distributed supernodes (nodes with high-performance network connections) 4.3 Distributed Search Engine IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig Kommunikationssysteme: Peer-to-peer IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig see http://www.mojonation.net/ Properties: • currently most successfull P2P network l5_p2p_e.fm 30 6.August.02 4.2 File Sharing with Charging e.g. Mojo Nation Clients: Morpheus, KaZaA, Grokster, GiFT/OpenFT Document Server posted to server t=2 Rev 1.1 t=3 Rev 1.2 t=4 Rev 1.3 P2P Search Rev 1.0 Rev 1.1 Rev 1.2 Rev 1.3 Server Search Rev 1.0 Rev 1.0 Rev 1.0 Rev 1.3 ⇒ P2P search yields better results other business applications: see JXTA, Groove l5_p2p_e.fm 32 6.August.02 t strange ILOVEYOU.vbs file strange ILOVEYOU.vbs file Virus Warning Peer 2 Peer 3 strange ILOVEYOU.vbs file Peer 4 l5_p2p_e.fm 33 6.August.02 Similar idea, similar concept as in P2P For many scientific applications high performance data processing centers are needed. They are expensive to provide and often do not offer enough performance. Thus the idea was born to interconnect the existing data processing centers into the GRID distributed processing center. Kommunikationssysteme: Peer-to-peer Peer 1 strange ILOVEYOU.vbs file 5. GRID Computing IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig example: Kommunikationssysteme: Peer-to-peer IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig 4.4 P2P Virus Protection Interoperability • there are a number of incompatible protocols • there are no standards and there is no dominating protocol (yet) l5_p2p_e.fm 35 6.August.02 Kommunikationssysteme: Peer-to-peer Copyright Management • how can I be sure that I am not downloading illegal stuff? • how can the rights of the owners of intellectual property be enforced? Typical Transfer Volume Typical Service Typical Problems Mainly private users Scientists Small (MP3) to medium (video) Huge (often terabytes) File Sharing Processing Sharing Hugh number of users cause scalability issues Transfering huge amounts of data Problems in P2P Networking IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig Kommunikationssysteme: Peer-to-peer IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig Free Riders • see Gnutella example • incentives for sharing content, for not misbehaving .... Users GRID Saving costs for data processing centers l5_p2p_e.fm 34 6.August.02 6. Problems in P2P Networking Performance & Scalability • see Gnutella example • much more communication overhead than in client-server systems • bandwidth is usually scarce • if a peer is unreachable, TCP/IP can take up to several minutes to time out the connection History P2P Sharing MP3 files & illegal content (2) Trust • how can I be sure that the document/software/movie... I am downloading is the one that it was announced as • how can I be sure that it is unchanged, does not contain viruses etc.? • which community members can I trust? • trust can be increased by • decreasing the number of people that must be trusted • reducing the risk • components: message digest functions, digital certificates etc. • see slides KNII: Security ⇒ Potential Solution: Reputation Mechanisms • see eBay, PGP web of trust, ... • eBay collects feedback about each eBay participant, users are encouraged to post feedback about the trade and rate their trading partner. • someone considering a trade can look into the trading partner’s eBay record • difficult in a decentral network • tradeoff with anonymity l5_p2p_e.fm 36 6.August.02 ⇒ There are a number of important unsolved problems in P2P networking l5_p2p_e.fm 37 6.August.02 7. Annex: Literature about P2P Andy Oram: Peer-To-Peer / Harnessing the Power of Disruptive Technologies, O’Reilly 2001 Kurt Tutschku: Management of Peer-To-Peer Networks Gnutella Protocol Specification: http://dss.clip2.com/GnutellaProtocol04.pdf Krishna Kant, Vijay Tewari: On the Potential of Peer-to-Peer Computing Kommunikationssysteme: Peer-to-peer Other security aspects • authentication? • protection against denial of service attacks? • protection against tampering of files, search messages etc.? • virus protection? • P2P protocols are built to circumwent firewalls • ... (3) IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig Kommunikationssysteme: Peer-to-peer IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig Problems in P2P Networking c’t Computermagazin: P2P & Filesharing-Reports in 06/2001 & 26/2001 DFN Symposiums Präsentation: www.dfn.de/projekte/symposium01/ oertel_symposium2001_p2p/ l5_p2p_e.fm 38 6.August.02