Private traits and attributes are predictable from digital records of

Transcription

Private traits and attributes are predictable from digital records of
Private traits and attributes are predictable from
digital records of human behavior
Michal Kosinski et al.
The Psychometrics Centre, University of Cambridge, UK
Computational Social Media EPFL Course
Rémi Lebret
7th March 2014
Introduction
2 / 15
Digital records of individuals behavior
I
People may choose not to reveal certain pieces of information about
their lives.
→ sexual orientation, age, ...
I
This information might be predicted.
→ Predict pregnancies from customer shopping records.
I
It could lead to tragic outcome.
→ Revealing pregnancy of an unmarried woman to her family in a culture
where this is unacceptable.
⇒ Dangerous invasion of privacy.
3 / 15
Predicting individual traits and attributes
From:
I
People’s Web site browsing logs
Contents of personal Web sites
I
Music collections
I
Properties of Facebook or Twitter profiles
I
Language used by users
I
Location within a friendship network at Facebook
→ Predictive of sexual orientation
I
4 / 15
Facebook Likes
Mechanism used by Facebook users to express their positive association with
online content.
I
Very generic class of digital records.
→ Similar to Web search queries, Web browsing history, and credit card
purchases.
I
Currently public available by default.
→ Automatically and accurately estimate personal attributes using these
basic records.
5 / 15
Method
6 / 15
Design of the study
Selection of traits and attributes potentially private:
I
Sexual orientation
I
Ethnic origin
I
Political views
I
Religion
I
Personality
I
Intelligence
I
Satisfaction with life (SWL)
I
Substance use (alcohol, drugs, cigarettes)
I
User’s parents together at 21
I
Basic demographic attributes
7 / 15
Design of the study
Selection of traits and attributes potentially private:
I
Sexual orientation
I
Ethnic origin
I
Political views
I
Religion
I
Personality → International Personality Item Pool questionnaire
I
Intelligence
I
Satisfaction with life (SWL)
I
Substance use (alcohol, drugs, cigarettes)
I
User’s parents together at 21
I
Basic demographic attributes
7 / 15
Design of the study
Selection of traits and attributes potentially private:
I
Sexual orientation
I
Ethnic origin
I
Political views
I
Religion
I
Personality → International Personality Item Pool questionnaire
I
Intelligence → Raven’s Standard Progressive Matrices (SPM)
I
Satisfaction with life (SWL)
I
Substance use (alcohol, drugs, cigarettes)
I
User’s parents together at 21
I
Basic demographic attributes
7 / 15
Design of the study
Selection of traits and attributes potentially private:
I
Sexual orientation
I
Ethnic origin
I
Political views
I
Religion
I
Personality → International Personality Item Pool questionnaire
I
Intelligence → Raven’s Standard Progressive Matrices (SPM)
I
Satisfaction with life (SWL) → SWL scale (5-item instrument)
I
Substance use (alcohol, drugs, cigarettes)
I
User’s parents together at 21
I
Basic demographic attributes
7 / 15
Design of the study
Selection of traits and attributes potentially private:
I
Sexual orientation
I
Ethnic origin
I
Political views → Facebook profiles
I
Religion → Facebook profiles
I
Personality → International Personality Item Pool questionnaire
I
Intelligence → Raven’s Standard Progressive Matrices (SPM)
I
Satisfaction with life (SWL) → SWL scale (5-item instrument)
I
Substance use (alcohol, drugs, cigarettes)
I
User’s parents together at 21
I
Basic demographic attributes → Facebook profiles
7 / 15
Design of the study
Selection of traits and attributes potentially private:
I
Sexual orientation
I
Ethnic origin
I
Political views → Facebook profiles
I
Religion → Facebook profiles
I
Personality → International Personality Item Pool questionnaire
I
Intelligence → Raven’s Standard Progressive Matrices (SPM)
I
Satisfaction with life (SWL) → SWL scale (5-item instrument)
I
Substance use (alcohol, drugs, cigarettes) → Online survey
I
User’s parents together at 21 → Online survey
I
Basic demographic attributes → Facebook profiles
7 / 15
Design of the study
Selection of traits and attributes potentially private:
I
Sexual orientation
I
Ethnic origin → Visual inspection of profile pictures
I
Political views → Facebook profiles
I
Religion → Facebook profiles
I
Personality → International Personality Item Pool questionnaire
I
Intelligence → Raven’s Standard Progressive Matrices (SPM)
I
Satisfaction with life (SWL) → SWL scale (5-item instrument)
I
Substance use (alcohol, drugs, cigarettes) → Online survey
I
User’s parents together at 21 → Online survey
I
Basic demographic attributes → Facebook profiles
7 / 15
Design of the study
Selection of traits and attributes potentially private:
I
Sexual orientation → Facebook profile “Interested in” field
I
Ethnic origin → Visual inspection of profile pictures
I
Political views → Facebook profiles
I
Religion → Facebook profiles
I
Personality → International Personality Item Pool questionnaire
I
Intelligence → Raven’s Standard Progressive Matrices (SPM)
I
Satisfaction with life (SWL) → SWL scale (5-item instrument)
I
Substance use (alcohol, drugs, cigarettes) → Online survey
I
User’s parents together at 21 → Online survey
I
Basic demographic attributes → Facebook profiles
7 / 15
Sample
I
58,466 US Facebook users
I
Psychodemographic profile + list of their Likes
I
From myPersonality Facebook application
8 / 15
Prediction with Facebook Likes
9 / 15
Results
10 / 15
Prediction accuracy
(a) Accuracy of classification for
dichotomous attributes.
(b) Accuracy of regression for
numeric attributes and traits.
11 / 15
Amount of Data Available and Prediction Accuracy
I Between 1 and 700 Likes per individual → median = 68
12 / 15
Selected Most Predictive Likes
IQ
Extraversion
High
The Godfather
Mozart
Lord Of The Rings
Beerpong
Dancing
Cheerleading
Chris Tuker
Theatre
Openness
Outgoing & Active
Liberal & Artistic
Oscar Wilde
Leonardo Da Vinci
Leonard Cohen
Bauhaus
Low
Sephora
I Love Being A Mom
Harley Davidson
Shy & Reserved
Video Games
Programming
Role Playing Games
Manga
Minecraft
Conservative
NASCAR
ESPN2
The Bachelor
Justin Moore
13 / 15
Average Levels for several popular Likes
14 / 15
Average Levels for several popular Likes
14 / 15
Conclusion
I Variety of people’s personal attributes can be inferred using their Facebook
Likes.
+
I Improve numerous products
and services
I Permit assessment across
time to detect trends
I Open new doors for research
in human psychology
I No individual consent, no
notice
I Infer attributes that individual
intends not to share
I Threat to individual’s
well-being, freedom or even life
I Trust in online services could decrease without transparency and control
over users informations.
15 / 15