SOSP 2015 slides
Transcription
SOSP 2015 slides
Vuvuzela a scalable private messaging system David Lazar Jelle van den Hooff, Matei Zaharia, Nickolai Zeldovich Motivation Alice Bob (Oncologist) Encryption Z28gUGF0cmlvdHMhCg c2VhaGF3a3Mgc3Vjawo Alice Bob (Oncologist) Problem: metadata Z28gUGF0cmlvdHMhCg Ex-boyfriend Pfizer Lawyer c2VhaGF3a3Mgc3Vjawo Alice Bob (Oncologist) Hospital Lawyer Snowden NY Times AA Guardian White House Goal: hide metadata Vuvuzela Ex-boyfriend Pfizer Lawyer AA Alice Bob (Oncologist) Hospital Lawyer Snowden NY Times Guardian White House Goal: hide metadata Vuvuzela Ex-boyfriend Pfizer Lawyer AA Alice Bob (Oncologist) Hospital Lawyer Snowden NY Times Guardian White House Goal: scalability Vuvuzela Ex-boyfriend Pfizer Lawyer AA Alice Bob (Oncologist) Hospital Lawyer Snowden NY Times Guardian White House Tor is scalable Alice Bob Tor network Tor is insecure Alice Bob Tor network Tor is insecure Low-Cost Traffic Analysis of Tor Steven J. Murdoch and George Danezis University of Cambridge, Computer Laboratory 15 JJ Thomson Avenue, Cambridge CB3 0FD United Kingdom {Steven.Murdoch,George.Danezis}@cl.cam.ac.uk Users Get Routed: Traffic Correlation on Tor by Realistic Adversaries Abstract Other systems, based on the idea of a mix, were developed to Aaron carry low latency traffic. ISDNWacek mixes [33] 1 2 Johnson Chris Rob Jansen1 Micah Sherr2 Paul Syverson1 Tor is the second generation Onion Router, supporting propose a design that allows phone conversations to be 1 web-mixes [6] follow the same design pat2 the anonymous transport of TCP streams over the Interanonymised, and U.S. Naval Research Laboratory, Washington DC Georgetown University, Washington DC net. Its low latency makes it very suitable for common terns to{aaron.m.johnson, anonymise web traffic. A service paul.syverson}@nrl.navy.mil based on these rob.g.jansen, {cwacek, msherr}@cs.georgetown.edu tasks, such as web browsing, but insecure against trafficideas, the Java Anon Proxy (JAP)1 has been implemented and is running at the University of Dresden. These apanalysis attacks by a global passive adversary. We present proaches work in a synchronous fashion, which is not well new traffic-analysis techniques that allow adversaries with ABSTRACT The traffic correlation problem in Tor has seen much attention adapted for the asynchronous nature of widely deployed only a partial view of the network to infer which nodes are in the literature. Prior Tor security analyses often consider entropy We present the first analysis of the popular Tor anonymity network or similar statistical measures as metrics of the security provided TCP/IPthat networks being used to relay the anonymous streams and therefore Circuit Fingerprinting Attacks: indicates[8]. the security of typical users against reasonably realisby the system at a static point in time. In addition, while prior greatly reduce the anonymity provided by Tor. Furthermore, The Onion Routing project has been working on streamtic adversaries in the Tor network or in the underlying Internet. Our Passive Deanonymization of Tor Hiddento compromise Services metrics of security may provide useful information about overall we show that otherwise unrelated streams can be linked show that Tor users are faranonymous more susceptible level, results low-latency, high-bandwidth communiusage, they typically do not tell users how secure a type of behavindicated by prior Specific of the paper back to the same initiator. Our attack is feasible for the cationsthan [35]. Their latestwork. design and contributions implementation, ior is. Further, similar previous work has thus far only considered include (1) a model of various typical kinds of users, (2) an adveradversary anticipated by the Tor designers. Our theoretiTor [18], has many attractive features,†including forward seadversaries†that control either a subset of the members of the Tor † ‡§†⇤ ‡ model that ,includes Tor network relays,Dacier autonomous systems Albert Kwon AlSabah Lazar , Marc , and Srinivas network, Devadas cal attacks are backed up by experiments performed on the , Mashael curity sary and support for David anonymous servers. These features, a single autonomous system (AS), or a single Internet ex(ASes), Internet exchange points (IXPs), and groups of IXPs drawn deployed, albeit experimental, Tor network. Our techniques and its ease of use, have already made it very popular, and change point (IXP). These analyses have missed important charfrom empirical study, (3) metrics that indicate how secure users are † Massachusetts Institute of Technology, {kwonal,lazard,devadas}@mit.edu acteristics of the network, such as that a single organization often should also be applicable to any low latency anonymous a testing network, available for public use, already has 50model to over a period of time, (4) the most accurate topological controls several geographically diverse ASes or IXPs. That organiComputing Research network. These attacks highlight the relationship between ‡ Qatar nodes acting as onion routers (as of November 2004). date of ASes and IXPs as Institute, they relate tomdacier@qf.org.qa Tor usage and network conzation may have malicious intent or undergo coercion, threatening §figuration, the field of traffic-analysis and more traditional computer (5) a novel realistic Tor path simulator (TorPS), and (6) Qatar University, malsabah@qu.edu.qa Tor aims to protect the anonymity of its users from nonusers of all network components under its control. of security all the above. To show security issues, such as covert channel analysis. Our reglobalanalyses adversaries. Thismaking meansuse thatofthe adversary has thethat our Given the severity of the traffic correlation problem and its seapproach is useful to explore alternatives and not just Tor as cursearch also highlights that the inability to directly observe ability to observe and control some part of the network, but curity implications, we develop an analysis framework for evaluatrently deployed, we analyze published alternative pathservices senetwork links does not prevent an attacker from performing This paper sheds light on weaknesses in also As aa result, many are accessi-of various user behaviors on the live Tor network notcrucial its lection totality. Similarly, the adversary isTor. assumed tosensitive beancaingonly the security algorithm, Congestion-Aware We create empirical traffic-analysis: the adversary can use design the anonymising of hiddennetservices pable that allow us to break the ble through Tor.By Prominent examplesand include human show how to concretely apply this framework by performing a of controlling some fraction of Tor nodes. making model of Tor congestion, identify novel attack vectors, and show comprehensive evaluation of the security of the Tor network [41] work as an oracle to infer the traffic load on remote nodes service anonymity of hidden clients and operators pasrights and whistleblowing organizations such as Wikthese assumptions, the designers of Tor believe it is safe to that it too is more vulnerable than previously indicated. against the threat of complete deanonymization. To enable such an thatonly theminimal circuits,mixing paths of the ileaks andcells Globalleaks, tools for anonymous messagin order to perform traffic-analysis. sively. In particular, we show employ stream that are reanalysis, we develop a detailed model of a network adversary that established through the Torlayed, network, used lowering to commuing such as TorChat Bitmessage, and black markets therefore the latency overhead of theand comCategories andbeSubject includes (i) many the largest and most accurate system for AS path innicate with hidden servicesmunication. exhibit a very different like Descriptors Silkroad and Black Market Reloaded. Even ference yet applied to Tor and (ii) a thorough analysis of the threat C.2.0 [Computer-Communication Networks]: General—Secuhavior compared to a general This circuit. Weofpropose two with non-hidden services, 1 Introduction choice threat model, its limitation of thelike ad-Facebook andofDuckDuckGo, Internet exchange points and IXP coalitions. We also develop rity and protection attacks, under two slightly versaries’ different threat models, that a subject recently have started in providing hidden versions of theirthat inform this analysis, considering the network realistic metrics powers, has been of controversy the could identify a hidden service client or operator using websites to provide stronger anonymity guarantees. topology as it evolves over time, for example, as new relays are anonymity community, yet most of the discussion has foAnonymous communication networks were first introthese weaknesses. We found that we can identify the Keywords introduced and others go offline. cused on assessing whether these restrictions of attackers’ duced by David Chaum in his seminal paper [10] describing That said, over the past few years, hidden services users’ involvement with hidden services withmetrics; more than Our analysis shows that 80% of all types of users may be deAnonymity; onion routing capabilities are ‘realistic’ or not.have We witnessed leave thisvarious discussion the mix as a fundamental building block for anonymity. A active attacks in the wild [12,by 28], anonymized a relatively moderate Tor-relay adversary within six 98% true positive rate and less than 0.1% false positive aside and instead show that traffic-analysis can be mix acts as a store-and-forward relay that hides the correresulting in attacks several takedowns [28]. To months. examineOur the results se- also show that against a single AS adversary rate with the first attack, and 99% true positive rate and successfully mounted against Tor even within this very re- Alice Bob Tor network Related work Pond Scalability Tor Riposte [Oakland 2015] Dissent [OSDI 2012] Privacy Contribution Tor Pond Scalability Vuvuzela Riposte [Oakland 2015] Dissent [OSDI 2012] Privacy Contribution • Vuvuzela: the first private messaging system that hides metadata from powerful adversaries for millions of users • Vuvuzela scales linearly with the number of users • Differential privacy for millions of messages per user for one million users • 37s end-to-end message latency • 60,000 messages / second throughput • Good match for private text-based messaging Vuvuzela overview • Handful of servers arranged in a chain • Users send/receive messages through the first server Alice Bob Charlie Server 1 • Server 2 Server 3 Last server decides who gets what messages and sends them back down the chain Vuvuzela’s two protocols Dialing protocol: Alice Initiate conversation session between two users Bob Conversation protocol: Charlie Exchange messages between two users Threat model • All but one server are compromised • Adversary is active (can knock users offline, tamper with messages, etc) • All users might be malicious (besides you and your friends) • PKI: users know each other’s keys Alice Bob Charlie Metadata privacy Scenario 1 Alice Bob Charlie Vuvuzela Scenario 2 Scenario 3 Alice Bob Charlie Alice Bob Charlie Vuvuzela Vuvuzela Metadata privacy Scenario 1 Alice Bob Charlie Scenario 2 Scenario 3 Alice Bob Charlie Alice Bob Charlie Vuvuzela Vuvuzela Vuvuzela ? ? ? 47D1FC9A… traffic analysis hacked servers Approach to scalable privacy • Use efficient cryptography to encrypt as much metadata as possible. • Add noise to metadata that we can’t “encrypt.” • Use differential privacy to reason about how much privacy the noise gives us. Dead drops prevent users from talking directly Alice Bob Charlie Dead drop: a place to leave a message that another user can pick up Talking via dead drops Alice Bob Charlie De ad Me dro ssa p: z ge: zp8 “Hi ns0 Bo b! nrxt3 Ho w’s g9efb it g 6c oin g?” Dead Mess drop: zzp 8ns0 age: nrxt3 “” g9ef b6c Conversation protocol Alice Bob Charlie De ad Me dro ssa p: z ge: zp8 “Hi ns0 Bo b! nrxt3 Ho w’s g9efb it g 6c oin g?” Dead Mess drop: zzp 8ns0 age: nrxt3 “” g9ef b6c Round 1 Conversation protocol Dead drop: Fsdd5vPML H3KARqE2a Message: “” Alice Bob a 2 E q R A K 3 H L M P v 5 d d s F : Dead drop ” ! s k n a h t , d o o g m ’ I “ : e g a s s Me Charlie Round 2 Conversation protocol Alice Bob Charlie Round 3 Conversation protocol Alice Bob Charlie Round 4 Messages are encrypted Alice Bob Charlie Dead drop: Fsdd5vPML H3KARqE2a Message: W CzdjL5wBNp JUtt9tE7… a 2 E q R A K 3 H L M P v 5 d d s F : j… Dead drop E g 6 P u 4 W q 8 k V s W Q 1 T j y : e g Messa Idle clients send cover traffic Alice Bob Charlie Dead drop: Fsdd5vPML H3KARqE2a Message: W CzdjL5wBNp JUtt9tE7… a 2 E q R A K 3 H L M P v 5 d d s F : j… Dead drop E g 6 P u 4 W q 8 k V s W Q 1 T j y : e g Messa Idle clients send cover traffic Alice Dead drop: Fsdd5vPML H3KARqE2a Message: W CzdjL5wBNp JUtt9tE7… Bob a 2 E q R A K 3 H L M P v 5 d d s F : j… Dead drop E g 6 P u 4 W q 8 k V s W Q 1 T j y : e g Messa Charlie h C r 7 U R E r v T T u O Z 6 0 y u : p o r d d Dea … 0 s O K 7 2 6 B e r 5 H G D p X w J : e g a s Mes Dead drops give privacy Alice Dead drop: Fsdd5vPML H3KARqE2a Message: W CzdjL5wBNp JUtt9tE7… Bob a 2 E q R A K 3 H L M P v 5 d d s F : j… Dead drop E g 6 P u 4 W q 8 k V s W Q 1 T j y : e g Messa Charlie h C r 7 U R E r v T T u O Z 6 0 y u : p o r d d Dea … 0 s O K 7 2 6 B e r 5 H G D p X w J : e g a s Mes Dead drops give privacy Alice Dead drop: Fsdd5vPML H3KARqE2a Message: W CzdjL5wBNp JUtt9tE7… Bob a 2 E q R A K 3 H L M P v 5 d d s F : j… Dead drop E g 6 P u 4 W q 8 k V s W Q 1 T j y : e g Messa Charlie h C r 7 U R E r v T T u O Z 6 0 y u : p o r d d Dea … 0 s O K 7 2 6 B e r 5 H G D p X w J : e g a s Mes Mixnet hides origin of messages Alice Bob Charlie Mixnet hides origin of messages A Alice B Bob C Charlie Mixnet hides origin of messages Alice Bob B C A Charlie Mixnet hides origin of messages Alice Bob B C A Charlie Are we done yet? Alice Bob Charlie 2 1 Are we done yet? Alice 2 Bob Charlie 1 Challenge: dead drop counts reveal access patterns Demo! Let’s see why access counts are a problem. Solution: Each server adds noise Alice Bob Charlie Fake exchanges (noise) 1 2 1 1 2 What is noise? Fake singles Fake doubles Dead drop: RY9VjW4XROtTcbnZPaJ Message: Bzizd2loCIeXdIfHU33mds… Dead drop: 3nPki8GbZWfXRyw61wk Message: nE7yvLJLeiCvcD1Cu62… Dead drop: t53c81TtFdmBCzFLQ7Q Message: rCCnMCttJ8C8JMthLxN8… Dead drop: pavnHQmuegSmvXz6Y5 Message: IuA94shFx7okpZdBacjBg… Dead drop: 3nPki8GbZWfXRyw61wk Message: 4QjdRfoB7GoEEb0vtMjf… Dead drop: kt2JnceRb7ieU3M1k5Oj Message: mb4ZgDABTLTtm9rUZzV… Dead drop: kt2JnceRb7ieU3M1k5Oj Message: wYNxuyoOiP9Ffjr4LKtv38… Dead drop: LWnyE3AB2TTmUcCGL Message: k1bVsoTVlJQTEy92Vxd1o… Dead drop: LWnyE3AB2TTmUcCGL Message: mTLa2cdkKgzADt0oJm8s… Demo! Vuvuzela with noise is effective! Formalizing privacy guarantee Pr[ i | Alice talked to Bob] Alice Bob Vuvuzela ≈𝜺 × Pr[ i | not Alice talked to Bob] Alice Bob Vuvuzela (𝜀,𝛿) differential privacy, simplified Pr[ i | Alice talked to Bob] Alice Bob Vuvuzela ≤ 𝜺 × Pr[ i | not Alice talked to Bob] Alice Bob Vuvuzela Noise achieves DP • Let d be the number of dead drops with two accesses in a single round. • To make d differentially private, we need to make these distributions very close (indistinguishable): Pr[ d=x | not Alice talked to Bob] Probability Probability Pr[ d=x | Alice talked to Bob] 0 1 Dead drops with two messages 250 Dead drops with two messages Generating this distribution Pr[ d=x | not Alice talked to Bob] Probability Pr[ d=x | Alice talked to Bob] 250 Dead drops with two messages Constraints: • Can’t have negative dead drops • Distributions have to be close enough for differential privacy Generating this distribution Pr[ d=x | Alice talked to Bob] Pr[ d=x | not Alice talked to Bob] Probability Average noise is hundreds of fake messages 250 Dead drops with two messages Constraints: • Can’t have negative dead drops • Distributions have to be close enough for differential privacy Privacy degrades every round • Each round leaks metadata • We want differential privacy after sending many messages • This means adding more noise to support more messages. Vuvuzela’s approach to noise • More noise means privacy for more messages. • Add as much noise as possible, while still keeping the system practical. • Use differential privacy to compute how much privacy users get. • • Using composition theorem [Dwork & Roth 2014] We picked: 300,000 fake singles and 300,000 fake doubles per server per round. Privacy with 300,000 noise Pr[ i | Alice talked to Bob] ≤ 𝜺 × Pr[ i | not Alice talked to Bob] 9 8 7 6 � 5 4 3 2 1 10,000 100,000 1M 2M Messages Alice wants to keep private Eve is very evil • Alice sees previous graph and sends Eve many messages through Vuvuzela. • Will NSA arrest Alice for talking to Eve? • • Probably: using Vuvuzela is already suspicious Will a fair jury convict Alice of talking to Eve? • Unlikely: Vuvuzela observations are not damning evidence! Alice gets a fair trial • Jury is already 50% certain Alice did the crime (NSA is intimidating, other evidence, etc) • Beyond unreasonable doubt = 90% certainty Alice is innocent for millions of messages 90 Jury certainty % 89 88 86 83 80 75 67 50 10,000 100,000 1M 2M Messages Alice wants to keep private Implementation • 3,000 lines of Go • Untrusted entry server manages user connections • Entry server notifies clients when a new round starts • Available soon on Github: • github.com/davidlazar/vuvuzela Evaluation • Can Vuvuzela servers support a large number of users and messages? • Does Vuvuzela provide acceptable performance? Asymptotic performance • Noise is independent of number of users. • Performance is linear in number of users • Bandwidth, latency, CPU Setup Entry server Server 1 Server 2 36 cores per VM 10 Gbps links Client VMs Server 3 Acceptable end-to-end latency for text messaging End-to-end latency for conversation messages 60 s 50 s 40 s 30 s 20 s 10 s 0s 10 500,000 1M 1.5M Number of online users 2M Performance bottlenecks • CPU bound • • Dominated by mixnet operations High bandwidth cost • 166 MB/s for servers, 12 KB/s for clients • Can lower bandwidth by increasing latency linearly Conclusion • Problem: hide metadata in a secure and scalable way. • Approach: • Encrypt • Add as much metadata as possible. noise to obscure remaining metadata. • Formalized • Vuvuzela: • Scales privacy guarantee with differential privacy scalable private messaging without metadata linearly with number of users • Privacy for millions of messages per user → 37s latency • 60,000 messages / second of throughput What happens after 2M? • Privacy for lifetime of messages is unrealistic under this configuration • User’s should change their expectation to just expect privacy for a subset of messages • • Example: privacy just for important messages. • Example: privacy just for recent messages. User does not need to specify which subset of messages to keep private • Vuvuzela’s guarantee holds for any (small) subset of messages that the adversary cares about