apr_02_pal 5613KB May 23 2014 06:15:14 PM

Transcription

apr_02_pal 5613KB May 23 2014 06:15:14 PM
Davis Social Links
S. Felix Wu
Computer Science Department
University of California, Davis
wu@cs.ucdavis.edu
http://www.cs.ucdavis.edu/~wu/
1
Urgent! Please contact me!
FROM:MR.CHEUNG PUI
Hang Seng Bank Ltd
Sai Wan Ho Branch
171 Shaukiwan Road
Hong Kong.
Please contact me on my personal box [puicheungcheungpui@yahoo.com]
Let me start by introducing myself. I am Mr. Cheung Pui, director of operations of
the Hang Seng Bank Ltd,Sai Wan Ho Branch. I have a obscured business
suggestion for you.
Before the U.S and Iraqi war our client Major Fadi Basem who was with the Iraqi
forces and also business man made a numbered fixed deposit for 18 calendar
months, with a value of Twenty Four millions Five Hundred Thousand United
State Dollars only in my branch. Upon maturity several notice was sent to him,…
2
1
http://www.ebolamonkeyman.com/cheung.htm
3
Pick your favor Spam Filter(s)
4
2
Antispam Filters
• An arm-racing game
– In some sense, “the venders” are doing
reasonable well.
• Bind the “spams” to the “sources”
– IP addresses or Source Email addresses
– Spammers control a large number of bots
– Collaboration between yahoo.com and gmail.com
5
This was considered a spam!
6
3
This was considered a spam!
Sometimes, the cost of False Positive may be very high…
7
The Implication of FP’s
• Spam-filters have to be conservative…
• We will have some false negatives in our
own inboxes.
• We will use our own time to further filter..
– For me, 1~2 seconds per email
8
4
You have about 1 second to decide……
9
10
5
11
12
6
“Social Spams”
• They might not be spams as we often
overlooked the social values of them!
13
Motivations
• What is the fundamental issue of “spams”?
– Is it something to do with the design of our
“basic communication mechanism”?
• Why can’t we explicitly utilize the “social
context” in our communication?
14
7
Davis Social Links
• What is the fundamental issue of “spams”?
– Is it something to do with the design of our
“basic communication mechanism”?
• Why can we explicitly utilize the “social
context”?
• Routable identity versus receiver control
• Trust & Reputation system in “L3”
15
Communicate: [A, D]
B
A
D
C
As long as “A” knows “D’s routable identity”
16
8
Hijackable Routable Identify
17
[A,D] + social context
B
A
D
C
“A” has to explicitly declare if there is any social
context under this communication activity with “D”!
18
9
The same message content
• “M” from Cheung Pui
• “M” from Cheung Pui via IETF mailing list
• “M” from Cheung Pui via Karl Levitt
19
Social Context
• “M” from Cheung Pui
Probably a spam
• “M” from Cheung Pui via IETF mailing list
Probably not interesting
• “M” from Cheung Pui via Karl Levitt
Better be more serious…
20
10
Social Context
• “M” from Cheung Pui
Probably a spam
• “M” from Cheung Pui via IETF mailing list
Probably not interesting
• “M” from Cheung Pui via Karl Levitt
Better be more serious…
Either “M” is important, or
Karl’s machine has been subverted!
21
[A,D] + social context
??
B
A
D
C
“A” has to explicitly declare if there is any
social context under this communication activity
with “D”! But, “D” only cares if it is from
“C” or not!
22
11
Online Social Network
• What is an online social network?
– Realize and represent the human social
networks “explicitly” (from “somewhat vague,
fuzzy and implicit”)
– Promote “OSN Applications”
– Utilizing the “online” perspective to further
develop the human social network
• Representation, Application, Development
23
24
12
25
Who is Salma?
26
13
Who is Salma?
27
Who is Salma?
28
14
My message to Salma
29
The Social Path(s)
30
15
More Examples
31
CyrusDSL
• How do we accomplish these features?
• How do we realize the concepts scaleable?
• How will this system work against spams?
32
16
Just a couple issues …
• How to establish the social route?
– How would “A” know about “D” (or “D’s
identity”) ?
• How to maintain this “reputation network”?
– MessageReaper: A Feed-back Trust Control
System (Spear/Lang/Lu)
33
Social network analytical models
• Network Mathematics
– Random graph model (low diameter)
• Newman/Watts/Strogatz, 2002
– Small world model (high cluster coefficient)
• Watts/Strogatz, 1998
– Scale-free network (node degree distribution)
• Barabasi/Albert, 1999
• What is the right model for the network?
34
17
[A,D] + social context
??
B
D
A
C
“A” has to explicitly declare if there is any
social context under this communication activity
with “D”! But, “D” only cares if it is from
“C” or not!
35
Search on “OSN”
• How to get to
from
?
• The Small world model
– 6 degree separation (Milgram, 1967)
– “existence of a short path”
– How to find the short path? (Kleinberg, 2000)
36
18
Routing in a Small World
•
Common question: do short paths exist?
• Algorithmic question: assuming short paths exist.
How do people find them?
37
Kleinberg’s Model
• Kleinberg’s model:
– People points on a two
dimensional grid.
– “P” Grid edges (short range).
– “Q” long range contacts chosen
with the inverse rth-power
distribution.
– How to search?
• [S, T]
• Find the neighbor closest to T
– Work well only when r=2, p=q=1
38
19
Kleinberg’s Model
• Use only Local information, except the
distance to the target.
– However, what is the “global distance” in cyber
space? Yet, the assumption behind is that the
“edges” depend on the “relative distance”.
39
X, Y, and Z
• How will we tell whether the relative
distance between X&Y is closer than X&Z?
– X, Y, Z (assuming they are all direct friends to
each other)
• One simple idea: “Keyword intersection”
– KW(X), KW(Y), KW(Z)
– 1/(#[KW(a) I KW(b)] + 1)
– Will this work? How about global distance?
40
20
Similarity
41
Similarity
42
21
Kleinberg’s model
• Inherently assume “routable identity”
– You have to know the Target identity, and you
also need to know the distance metric.
– And, then the search algorithm will get to it
probabilistically.
– The sender/receiver interface is very simple.
43
Social Route Discovery for A2D
??
B
A
D
C
Let’s assume A doesn’t have D’s “routable identity”
Or, “D” doesn’t have a global unique identity!
Then, how can we do A2D?
44
22
Finding
??
B
A
D
C
A2D, while D is McDonald’s!
D would like “customers” to find the right route.
“idea: keyword propagation” e.g., “McDonald’s”
45
Announcing
B
D
K: “McDonald’s”
A
C
Hop-by-hop keyword propagation
46
23
Announcing
B
D
K: “McDonald’s”
K: “McDonald’s”
A
C
Hop-by-hop keyword propagation
47
Announcing
B
D
K: “McDonald’s”
K: “McDonald’s”
K: “McDonald’s”
A
C
Hop-by-hop keyword propagation
48
24
Announcing
B
D
K: “McDonald’s”
K: “McDonald’s”
K: “McDonald’s”
A
C
Hop-by-hop keyword propagation
And, I know I am doing FLOODING!!
49
Now Finding
Q: McDonald’s
B
D
K: “McDonald’s”
K: “McDonald’s”
K: “McDonald’s”
A
C
Search Keyword: “McDonald’s”
A might know D’s keyword via two channels
(1) Somebody else (2) From its friends
Questions: does D need an identity? Scalable?
50
25
51
52
26
53
54
27
Phishing/Hijacking is the default
Application Test
Q: McDonald’s
B
D
K: “McDonald’s”
K: “McDonald’s”
K: “McDonald’s”
A
C
Search Keyword: “McDonald’s”
Questions: is this the right Felix Wu’s?
55
Application Tests
• Example 1: credential-oriented
– “PKI certificate” as the keyword
– If you can sign or decrypt the message, you are
the ONE!
• Example 2: service-oriented
– Service/protocol/bandwidth support
• Example 3: offer-oriented
– Please send me your coupons/promotions!
56
28
“Routable Identity”
• Application identity =M=> Network identity
• Network identity =R=> Network identity
• Network identity =M=> Application identity
57
“App/Route Identity”
• Application identity =M=> Network identity
• Network identity =R=> Network identity
• Network identity =M=> Application identity
• Keywords =(MF-R)=> “Multiple Paths”
• Application identity selection
• Network route selection
58
29
Hijackable Routable Identify
59
Application Test ~ “Layer 3”
60
30
Finding
Application Test
Q: McDonald’s
B
D
K: “McDonald’s”
K: “McDonald’s”
K: “McDonald’s”
A
C
Search Keyword: “McDonald’s”
Questions: is this the right Felix Wu’s?
How to avoid/control flooding??
61
Scalability - Avoid the Flooding
• As it is, every keyword will need to be
propagated to all the nodes/links (but the
same keyword will be propagated through
the same link once possibly with different
policies).
• The issue: “who should receive my
keywords?”
62
31
Community-Keyword Model
• A Social Peer, P, has 3 keyword sets:
– Attributes (ATTR)
– Original Keywords (OK)
– Propagating Keywords (PK)
63
Community-Keyword Model
• Attributes (ATTR)
– Keywords describing P (the social node)
– Decided/configured by the owner of P
• Original Keywords (OK)
– Keywords announced by P (the social node)
– Decide/configured by the owner of P
– Each keyword is associated with a propagation
policy (decided by the owner of P)
• Propagating Keywords (PK)
– From its own OK and other direct neighbors
– Each keyword is associated with a propagation
policy
64
32
Community-Keyword Model
• Attributes (ATTR)
– Keywords describing P (the social node)
– Decided/configured by the owner of P
• Original Keywords (OK)
– Keywords announced by P (the social node)
– Decide/configured by the owner of P
– Each keyword is associated with a propagation
policy (decided by the owner of P)
• Propagating Keywords (PK)
– From its own OK and other direct neighbors
– Each keyword is associated with a propagation
policy
65
Community-Keyword Model
• Attributes (ATTR)
– Keywords describing P (the social node)
– Decided/configured by the owner of P
• Original Keywords (OK)
– Keywords announced by P (the social node)
– Decide/configured by the owner of P
– Each keyword is associated with a propagation
policy (decided by the owner of P)
• Propagating Keywords (PK)
– From its own OK and other direct neighbors
– Each keyword is associated with a propagation
policy
66
33
in Community of Davis
??
B
A
D
C
Who should receive the keyword announcement for
“McDonald’s”?
67
as the Social Peer
• Attributes:
– {McDonald’s Express, 640 W Covell Blvd, # D,
Davis, (530) 756-8886, Davis Senior High
School, Community Park, North Davis}
68
34
as the Social Peer
• Attributes:
– {McDonald’s Express, 640 W Covell Blvd, # D,
Davis, (530) 756-8886, Davis Senior High School,
Community Park, North Davis}
• Original Keywords:
– {McDonald, Davis, California, DHS, North Davis,
Happy Meal, 50% off Tuesday, Lobster}
• Propagating Keywords:
– {McDonald, Davis, California, DHS, North Davis,
Happy Meal, 50% off Tuesday, Lobster, Anderson
Plaza, Save-Mart, Taqueria Guadalajara}
69
“Per-Keyword Policy”
• For each keyword, we will associate it with
a propagation policy: [T, N, A]
– T: Trust Value Threshold
– N: Hop counts left to propagate (-1 each step)
– A: Community Attributes
• Examples:
– [>0.66, 4, “Davis”] K via L1
– [>=0, , ∅ ] K via L2
∞
70
35
in Community of Davis
??
B
A
D
C
Who should receive the keyword announcement for
“McDonald’s”?
71
Scalability & Controllability
• McDonald’s doesn’t want to flood the whole
network
– It only wants to multicast to the “Target set”
of customers
• And, it only wants this target set of users
being able to use that particular keyword
to contact.
– Receiver/owner controllability
72
36
Autonomous Community
• Each social entity configures a set of
“attributes” for itself.
• Some or all of the attributes will be
exchange with certain neighbors.
73
Social/Community Attributes
??
B
A
D
C
Who should receive the keyword announcement for
“McDonald’s”? Answer:
74
37
Relevant Attribute/OK/PK
• ATTR = Davis
• OK = McDonald’s
• PK = McDonald’s
• The owner uses the “policy” to control the
flooding:
– K = McDonald’s
– [T > 0.66, N = 6, ATTR = “Davis”]
75
IP versus DSL
• IP address prefixes announced by BGP to
ALL the Autonomous Systems in the whole
Internet
– Every IP node can send packets to McDonald’s at
Davis (if we have a unique IP address)
• DSL will only announce “McDonald’s” (under
the control of McDonald’s express) within
the Davis social community
– Only the receivers of the announcement can use
the keyword to contact McDonald’s express!
76
38
Community-Keyword Model
• A Social Peer, P, has three keyword sets:
– Attributes (ATTR)
– Original Keywords (OK)
– Propagating Keywords (PK)
• Flooding Avoidance + Receiver/Owner
Control
77
[T >= 0, N =
∞, ATTR = ∅ ] K
• What is the consequence?
– Spam
– Denial of Service
• How to deal with it?
78
39
[T >= 0, N =
∞, ATTR = ∅ ] K
• Limited Resources on PK
– “P” can only remember up to M keywords in its
own PK
• Ordering Preference between Ki and Kj
– T(Ki) > T(Kj)
– N(Ki) < N(Kj)
– ATTR(Ki) ⊃ATTR(Kj)
• Incentive Model
– P is willing to pay a price
79
Potential Problems
• Mostly only local contacts
– Local interests dominate
– Possible resource allocation for different
ATTRs within the same community
80
40
Community
• A connected graph of social nodes sharing
a set of community attributes
81
Community
??
B
A
D
C
82
41
Community Control:
D
C
E
Who should receive the keyword announcement for
“wu@cs.ucdavis.edu”? Answer:
Who should receive the keyword announcement fot
“South Lake Tahoe Tournament”? Answer:
83
Community
• A connected graph of social nodes sharing
a set of community attributes
84
42
Community
??
B
A
D
C
85
Social/Community Attributes
??
B
A
D
C
Who should receive the keyword announcement for
“McDonald’s”? Answer:
but not ALL
86
43
Community
• A connected graph of social nodes sharing
a set of community attributes
• The community members can decide the
administrative policy within the community
–
–
–
–
–
Membership maintenance
Attribute setting
Keyword propagation policy (e.g., allocation)
Application-dependent policy
Incentive model
87
Potential Problems
• Mostly only local contacts
– Local interests dominate
– Possible resource allocation for different
ATTRs within the same community
• “Reachability”
– How likely will my keywords be able to go
through to the community I want?
– I must be a direct friend of the community
– How can we set up “remote long range contact”?
88
44
Community Development
• How will each one of us set up our
Attributes and Original Keywords plus
policy such that together we can
communicate with each other “optimally”?
– A game theoretical setting problem for network
formation
89
Community
??
B
A
D
C
90
45
Network Formation
??
B
A
D
C
91
Network Formation
??
B
A
D
C
What is B’s incentive in adding the new ATTR keyword?
92
46
Network Formation
??
B
A
D
C
If B adds
, then A will add
!
93
Network Formation
??
B
A
D
C
Both A & C: why would A & C be willing to establish a direct
friendship?
94
47
Open Issues
• What is the “value” of this social network?
• How would this “value” be distributed and
allocated to each individual peers?
95
What is the “value” difference?
B
A
D
C
B
A
D
C
96
48
“C can join
!“
B
A
D
C
B
A
D
C
97
“A alone can help C to join more
communities!“
B
A
D
C
B
A
D
C
98
49
Value Allocation for B ?
B
A
D
C
B
A
D
C
99
Nash Equilibrium with CS
B
A
0~30~30
Propagating
D
C
or not?
100
50
Three Person Coalition Game
Γ nf (N,v, µ),v = 60u1,2 + 60u1,3 + 60u2,3 −108u1,2,3
Player 2 get “44”!
1
2
Again, players 1 and 3 can
collaborate and break their
links with 2 to get “30” each
from merely “14”!
1
3
2
1
2
3
1
3
2
3
101
Value calculation
∑κ (S)v(S) ≤ v(N)
S ∈2 N \{∅}
= κ (1,2) × 60 + κ (1,3) × 48 + κ (2,3) × 30 + κ (1,2,3) × 72
1− κ (1,2,3)
× (60 + 48 + 30) + κ (1,2,3) × 72
2
1− κ (1,2,3)
=
× (138) + κ (1,2,3) × 72
2
= 69 + 3 × κ (1,2,3)
=
≤ 72 = v(N)
102
51
Open Issues
• What is the “value” of this social network?
• How would this “value” be distributed and
allocated to each individual peers?
• DSL, Facebook, LinkedIn didn’t define the
“game” for network formation and value
allocation.
– But, it is important to design the game such
that the OSN will eventually converge to a
state to best support the communities.
103
Social Network Games
104
52
Let’s come back to SPAM!
• How will the proposed DSL model handle
spam?
• Social Network games can be another
major “social spams” to reduce the value of
our online social network.
105
Let’s come back to SPAM!
• How will the proposed DSL model handle
spam?
106
53
wu@cs.ucdavis.edu +
??
B
D
K: “wu@…” + Policy
A
C
Who should receive the keyword announcement for
“wu@cs.ucdavis.edu”? Answer:
107
Even if “A” claims
??
B
D
K: “wu@…”
A
C
Who should receive the keyword announcement for
“wu@cs.ucdavis.edu”? Answer:
108
54
“B” can help…
??
B
D
K: “wu@…”
A
C
What is B’s incentive? What is B’s risk?
109
Message Value & Prioritization
Link Ranks
Reputation
Incentives
Other Trust Metrics
Application IDS
[good, bad] messages
110
55
111
MessageReaper
• A Feedback Control Trust/Reputation
system
• Trust needs to be maintained along the
route path!
112
56
Reputation on Feed-back
??
B
A
D
C
“D” is the one to decide whether the message
from A/B/C is good or bad!
113
Trust Structure
114
57
Three Trust Values
• Ainit: a neighbor sending a message as the
first hop.
• Afwd: a neighbor sending a message without
being the first hop
• Art: a neighbor forwarding a message from
me which reaches the destination
115
Example
116
58
117
118
59
119
1000 nodes, 20% bad
120
60
1000 nodes, 10%/40% bad
121
Increasing the Spammers
122
61
Orkut (15329 nodes)
123
Collusive Attacks
B
A
D
C
124
62
Robustness as OSN “Value”
B
A
D
C
B
A
D
C
125
Community-Oriented Networking
• DSL offers a way to dynamically identify
and establish social communities
– But, we still have a lot of open issues
• Facebook:
– Networks: email address dependent
– Groups: you have to use your existing social
network to invite.
126
63
Davis Social Links over Facebook
127
Smart Proxy
• Overlay Social Graph
• User-defined keywords
and attributes
• DSL server
• Trust Routing Protocol
DSL
Facebook
128
64
Sub-communities
• Social Graph
• User-defined keywords
and attributes
• DSL server
• Trust Routing Protocol
DSL
Facebook
129
Social Network Development
• Social Graph
• User-defined keywords
and attributes
• DSL server
• Trust Routing Protocol
DSL
Facebook
130
65
Component Interactions
Attributes
Keywords & Policies
DSL
Profiles
Social Graph,
Keywords
Facebook
131
Route Discovery & Messaging
Sender
Recipient
Keywords,
Message
Optimal
routes
DSL
Keywords,
Message
Previous Interaction
Outcomes, Shortest Paths
Basic Algorithm
MessageReaper
•Identify destination nodes
•Determine Optimal paths
•Remove paths that violate
keyword policies
•If there is a path, store
message for recipient
132
66
Antispam email/IM
UCD Network
Keyword Policy:
All UCD Members get keyword
‘wu+Davis@cs.ucdavsis.edu’
133
134
67
135
136
68
137
“Bypassing” Facebook
• When you send a
message…
– Via Facebook
– Via DSL
• Activity and Intensity
hiding via
Decentralization!
DSL
Facebook
138
69
DSL vs. Google
139
“Google”
• It’s about the “content”
– Data-centric networking.
• Input to the Engine
– A set of key words characterizing the target
document.
• Output
– A set of documents/links matching the
keywords
140
70
“DSL”
• It’s also about the “content”
– Application will decide the mechanism to
further the communication.
• Input to the Decentralized Engine
– A set of key words characterizing the target
document (plus the aggregation keywords).
• Output
– A set of DSL entities with the DSP (Davis
Social Path pointer) matching the keywords
141
DSL Search Engine
Receiver or
Content
Sender or
Reader
DSL Social World
We are not just connecting the IP addresses!
We are connecting all the contents that can be interpreted!
142
71
Google vs. DSL
• Google is essentially a “routing” framework
between the contents and their potential
consumers.
• Google decides how to extract the “key
words” from your (the owner) web page or
document.
143
Google vs. DSL
• Google is essentially a “routing” framework
between the contents and their potential
consumers.
• Google decides how to extract the “key
words” from your (the owner) web page or
document.
• A DSL “owner/receiver to be” has the
complete control over that. A balance
between:
– How I would like others to know about me?
• And, I might want different folks to know me in
different ways!
– How I can differentiate myself from other Felix
Wu?
144
72
DSL is an old idea!
A
B
We, as human, have been using similar
communication principles. Maybe it is a
good opportunity to re-think about our
cyber communication system.
Identity is a per-application, contextoriented, and sometime relative issue.
Forming cyber communities of interests for
application.
A
F
F
F
B
145
LinkedIn: Get Introduced
146
73
Another one
147
DSL, Facebook, AL-BGP and GENI
http://www.geni.net/DSLport
AL-BGP over GENI/PlanetLab
Each DSL/FB user should
select a “closer” GENI
entrance as www.geni.net. In
other words, we might need to
set up DNS records correctly.
Facebook
148
74
DSL Architecture
Applications with Tests
DSL
AL-BGP
149
Link
Applications with Tests
2
1
3
4
150
75
AS-oriented Social Mapping
Applications with Tests
151
Control versus Data Path
Applications with Tests
control path
2
1
data path
152
76
Social-Control Routing
Applications with Tests
3
2
1
153
DSL is still an old idea!
A
B
Many applications already have “social
network like” structure to enable P2P
sharing across Internet.
e.g., media sharing, on-line game,
restaurant recommendation,…
Should we push these into a generic Social
Network layer-3 to support all the
applications?
A
F
F
F
B
154
77
A Different Internet?!
• Current Internet: every IP address will be
able to communicate with every other IP
address!
– Allow by Default
• DSL-based “Internet”: we have a large
number of “pairs” (two entities and their
corresponding direct social link)
– Deny by Default
155
The Physical Pipe
• Facebook, Overlay ~ no problem…
• Can we do better?
156
78
Comparison
• IP/email:
– Convergence to an absolute consistent state
– IP/email addresses are all you need, but the
controllability is biased toward the sender
• DSL:
– Convergence to a relative consistent state
– No global network identity. Every DSL entity
defines its own relative identity based on origin
keywords.
– Controllability is more balanced with other
application challenges.
157
Easy to Send & Receive
• Easy for both the good users and the
spammers. (fair simplicity)
• The spammers abuse the “sending” right,
while the good users have very limited
options to counter back.
– how easy can we change our email address?
– how often do we need to do that?
• A “receiver” or “the owner of the
identity” should have some control.
– But, that means also “burden” to the users.
158
79
Easy to Send & Receive
• Easy for both the good users and the
spammers. (fair simplicity)
• The spammers abuse the “sending” right,
while the good users have very limited
options to counter back.
– how easy can we change our email address?
– how often do we need to do that?
• A “receiver” or “the owner of the
identity” should have some control.
– But, that means also “burden” to the users.
159
Davis Social Links
• Peer-to-Peer System (P2P)
– How human socially communicate?
• Online Social Network (OSN)
– How to utilize OSN to enhance communication?
– How to have a securer OSN?
• Autonomous Community (AC)
– How to build/develop more effective
community-based social networks?
160
80
Acknowledgement
A
A
B
F
F
F
B
161
81