Lysbilde 1

Transcription

Lysbilde 1
4/27/2015
Societal benefits vs. privacy: what
distributed secure multi-party
computation enable?
Privacy
Research
eHelse 2015
21 - 22 April
Oslo
Kassaye Yitbarek Yigzaw
UiT The Arctic University of Norway
Outline
•
•
•
•
•
Background
Introduction
Data utility
De-identification
Secure multi-party computation
21.04.15
Societal benefits vs. privacy aspects of data
reuse
2
1
4/27/2015
Background
• Electronic health data are being widely collected
• Administrative data, e.g. census, survey,
socioeconomic, and registry
• Invaluable resource to improve healthcare
systems´ effectiveness, efficiencies and quality of
care
• Enormous benefits for individuals and society in
general
Jutte DP et al. Administrative Record Linkage as a Tool for Public Health Research. Annual Review of Public
Health 2011
Geissbuhler A et al. Trustworthy reuse of health data: A transnational perspective. International Journal of
Medical Informatics 2013
21.04.15
Societal benefits vs. privacy aspects of data
reuse
3
Data reuse opportunities
• Why are healthcare costs increasing?
• What are the comparative benefits and risks of
prescription drugs?
• What is the evidence base for procedures?
• What explains variation in health care spending and
use?
• How do environmental factors affect disease
patterns?
• How can the health of minorities and special needs
groups be improved?
• What does this mean for patients like me?
Slide borrowed from “Michael G. Kahn. Learning Health Systems From Concept to National Deployment.
IEEE BHI2014 Conference”
21.04.15
Societal benefits vs. privacy aspects of data
reuse
4
2
4/27/2015
Introduction
• Individuals privacy concerns are the main challenge
• Healthcare institutions are also concerned about
their own privacy1
• Most ethical and legal regulations allow data reuse
through:
 Informed consent
 Consent waiver by ethics committee (e.g. REK) (under
certain conditions)
 Data de-identification
• Secure multi-party computation (SMC)
1El
Emam K, et al. Physician privacy concerns when disclosing patient data for public health purposes during a pandemic
influenza outbreak. BMC Public Health 2011;11:454.
21.04.15
Societal benefits vs. privacy aspects of data
reuse
5
Increase privacy protection and data utility
Privacy
Research
21.04.15
Societal benefits vs. privacy aspects of data
reuse
6
3
4/27/2015
Data utility
• Data utility is the value of a given data for research
• The data being available for research
• Analytical completeness is possible required
analyses that can be done on the data
• Analytical validity is accuracy and generalizability of
analyses results
21.04.15
Societal benefits vs. privacy aspects of data
reuse
7
Informed consent
Everybody
Systematic bias
Consented subjects
Potential
subjects
• Decreased generalizability of analyses results to the
general population
Bohensky MA et al. Data Linkage: A powerful research tool with potential problems. BMC Health Services Research 2010
21.04.15
Societal benefits vs. privacy aspects of data
reuse
8
4
4/27/2015
Informed consent
cont…
• Consent requires infeasible time and money for
large public health studies
• Consent alone protects autonomy, but does not
guarantees that released data will remain private1
1Taylor
P. Personal genomes: when consent gets in the way. Nature 2008
21.04.15
Societal benefits vs. privacy aspects of data
reuse
9
De-identification
• Data de-identification methods remove or modify
identifiers
• The aim is to prevent identity disclosure
• Often, there is probability of assigning correct
identity (re-identification) to records
• The more quasi-identifiers are removed or modified
 Less probability of re-identification
 Less analytical completeness or validity
21.04.15
Societal benefits vs. privacy aspects of data
reuse
10
5
4/27/2015
cont…
De-identification
• HIPAA safe harbor removes 18 identifiers
• Limited dataset removes 16 of the 18 identifiers,
except dates and some geographical data
• Safe harbor has less analytical completeness, e.g.
association between treatments and health
outcomes1
• Limited dataset has more probability of reidentification2
1Nass
SJ et al. Beyond the HIPAA Privacy Rule: Enhancing privacy, improving health through research.
National Academies Press; 2009
K et al. Evaluating re-identification risks with respect to the HIPAA privacy rule. JAMIA 2010
Societal benefits vs. privacy aspects of data
reuse
11
cont…
Probability of
re-identification
Non-public
Data utility
µ
Growth of public data
Probability of
re-identification
De-identification
µ
µ
control
on data recipient
21.04.15
Acceptable probability
of re-identification
2Benitez
Public
Ohm P. Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization. Rochester, NY: Social Science
Research Network; 2009
Emam KE at al. Anonymising and sharing individual patient data. BMJ 2015.
Benitez K et al. Evaluating re-identification risks with respect to the HIPAA privacy rule. JAMIA 2010
21.04.15
Societal benefits vs. privacy aspects of data
reuse
12
6
4/27/2015
De-identified data sharing
Lab
Hospital
Data
Data
Third Party
• Centralized storage
• Distributed storage
GP
Data
21.04.15
Societal benefits vs. privacy aspects of data
reuse
13
Secure multi-party computation (SMC)
Lab
Hospital
Data
Data
Secure multi-party
computation emulate
the third party
GP
Data
Lindell Y, et al. Secure multiparty computation for privacy-preserving data mining.
Journal of Privacy and Confidentiality 2009
21.04.15
Societal benefits vs. privacy aspects of data
reuse
14
7
4/27/2015
Secure multi-party computation (SMC)
• SMC ensures that no more private information is
revealed beyond the computation output
• Each data custodian has the capability of
determining what/who compute on their data
21.04.15
Societal benefits vs. privacy aspects of data
reuse
15
Secure summation protocol
 Individual value remain private
 The summation of an
institution’s values is also private
r2 = r1 + x2
n2
r3 = r2 + x3
Id
x3 = 40+52+68
x
1
52
2
58
3
70
{r2}
{r1}
n3
n1
x
7
40
8
52
9
Id
{r0}
{r3}
68
nc
x2 = 52+58+70
r1 = r0 + x1
Id
x
4
45
5
65
6
70
x1 = 45+65+70
r0 = rand()
SUM = (r3 – r0)
Andersen A, Yigzaw KY, Karlsen R. Privacy preserving health data processing. IEEE Healthcom, 2014
21.04.15
Societal benefits vs. privacy aspects of data
reuse
16
8
4/27/2015
Example applications
• SMC protocol for disease surveillance1
• SMC protocol for logistic regression tested for
detection of adverse drug events2
1El
2El
Emam K et al. A secure protocol for protecting the identity of providers when disclosing data for disease surveillance
JAMIA 2011
Emam K et al. A secure distributed logistic regression protocol for the detection of rare adverse drug events.
JAMIA 2013
21.04.15
Societal benefits vs. privacy aspects of data
reuse
17
History
• The SMC concept is pioneered by Yao (1982)
• Most SMC protocols were designed to show
feasibility
• During the last decade better SMC protocols and
implementations started to appear
• SMC protocols vary with privacy guaranty, efficiency
and scalability
• Strong privacy is often achieved using more complex
techniques, which are less efficient and scalable
Bogdanov D. Sharemind: programmable secure computations with practical applications. PhD Thesis.
Tartu University, 2013.
21.04.15
Societal benefits vs. privacy aspects of data
reuse
18
9
4/27/2015
Discussion
• SMC techniques do not modify or remove data
attributes and do not have selection bias, therefore
data utility is not affected
• More efficient and scalable techniques are being
developed
• Protects the privacy of both individuals and health
institutions
• Enable health institutions to maintain strong control
over their private data
• These could encourage more individuals and
institutions to allow data reuse
21.04.15
Societal benefits vs. privacy aspects of data
reuse
19
Acknowledgement
• PhD supervisors (Johan Gustav Bellika, Anders
Andersen, and Gunnar Hartvigsen)
• Meskerem Asfaw Hailemichael
• Tromsø Telemedicine Laboratory (TTL)
• UiT The Arctic University of Norway
• Norwegian Center for Telemedicine and Integrate
Care (NST)
21.04.15
Societal benefits vs. privacy aspects of data
reuse
20
10
4/27/2015
Kassaye Yitbarek Yigzaw
PhD candidate
UiT The Arctic University of Norway
kassaye.y.yigzaw@uit.no
21.04.15
Societal benefits vs. privacy aspects of data
reuse
21
http://www.panoramio.com/photo/10889343
11