Here

Transcription

Here
Part II – Introduction to SILC
Data Structure and Documentation
DwB Training Course on EU-SILC Longitudinal data
Paris, 19-21 February 2014
Heike Wirth
Aims of this session
•
•
•
•
2
Introduce the rotational design
Explain the concept of the selected respondent
Explain the organisation of the data
Point out some reading: Documents of priority
Illustration of the rotational design
3
Rotational design - Illustration
2006
Initial sample
4
Rotational design – Illustration cross-sectional
2006
5
Rotational design – Illustration longitudinal
6
Rotational design – Illustration longitudinal
2006
e.g.
longitudinal
data 2011
7
Rotational design – empirical
Not equivalent to the number of years of participation
8
Rotational design – empirical
tab DB075 HHYNR
HHYNR (number of hh-year)
HHYNR(= number of household year) is not included in the data, must be created
Source: UDB_l11D_ver 2011-1 from 01-08-2013.dta; own calculations
10
Rotational design - empirical
tab HHYNR YEAR
HHYNR
(number of hh-year)
HHYNR(= number of household year) is not included in the data, must be created
Source: UDB_l11D_ver 2011-1 from 01-08-2013.dta; own calculations
11
Rotational design - empirical
tab HHYCOUNT HHYNR
HHYNR
HHYCOUNT
HHYCOUNT (= count of household-years) is not included in the data, must be created
Source: UDB_l11D_ver 2011-1 from 01-08-2013.dta; own calculations
12
Observation Units
Concept of the selected respondent
13
Selected respondent
Collection unit/data source
14
Survey
countries
Register countries
Household (HH)
HHRespondent
Registers/HH-R
All HH-members
HHRespondent
Registers/HH-R
All HHmembers aged
16+
Registers/HH-R
Type of information
Observation unit
Social exclusion, housing,
childcare …
Basic demographic personal
data
Basic personal data on
education, labour
information, income …
All HH-members aged
16+
Detailed personal data on
health, access to health
care, labour market activity
…
All HH-members aged
16+
or
Selected respondent
All HHmembers aged
16+
Selected respondent
(One person 16+ per
Household)
15
Example: PH030- Limitation in activities because of
health problems (register countries) (mainly) not selected
respondents (see
PH030_F)
16
Source: UDB_l11P_ver 2011-1 from 01-08-2013.dta
Organisation of the data
17
Organisation of the data
EU-SILC consists of 4 separate files for the cross-sectional data
Household Register FILE
Household Data FILE
Personal Register FILE
Personal Data FILE
18
Organisation of the data
… and of 4 separate data files for the longitudinal data
Household Register FILE
Household Data FILE
Personal Register FILE
Personal Data FILE
19
Household Files- longitudinal
Household Register
D-File
• Includes every
selected household
(also those where the
address could not be
contacted or which could
not be interviewed)
> 19 variables: household
Household Data
H-File
• Only households which have been contacted
and completed a hh interview
and at least one hh member has complete
data in the personal data file
> 180 variables (incl. flag-variables & imputationfactors): basic data, social exclusion, income, housing
identifier, sampling
design information,
region
UDB_l11D_ver 2011-1 from 01-08-2013:
N = 542 942 households
20
UDB_l11H_ver 2011-1 from 01-08-2013:
N = 411 189 households
Personal Files - longitudinal
Personal Register
R-File
• Every person currently
living in hh or
temporarily absent.
Longitudinal file:
also persons registered in the
R-File of the previous year or
living at least 3 months in the
hh during the income
reference period.
Personal Data
P-File
• Only reference population (persons aged 16 and
over) and only persons for whom the information
could be completed by interview (personal/proxy)
and/or register
> 190 variables (incl. flag variables & imputation factors):
e.g. demographic, income, work and unemployment
> 50 variables (incl. flag
variables): basic information
e.g. relationship between
household members
21
UDB_l11R_ver 2011-1 from 01-08-2013
N=1,079,261 persons
UDB_l11P_ver 2011-1 from 01-08-2013; N= 879,720 persons
Depending on the research question:
Use of separate datasets
Household Register
Personal Register
Household Data
Personal Data
22
…. or a combination of different datasets
Household Register
Personal Register
Household Data
Personal Data
23
Organisation of the data
While for both, c-s and longitudinal data all 4 files are linkable
among each other, c-s and longitudinal data are not linkable
Household
Register
Personal
Register
Household
Register
Personal
Register
Household
Personal
Data
Household
Data
Personal
Data
Data
cross-sectional data
24
longitudinal data
Organisation of the data
… as well as cross-sectional data are not linkable over time
(HH-ID and related identifaction variables are randomized)
t
t+1
25
HH
Register
Personal
Register
HH
Data
Personal
Data
HH
Register
Personal
Register
hh
Data
Personal
Data
Organisation of the data
… combine different datasets – Key Variables
• In order to link (combine) the four files D, H, R and P among
each others all observations must have a unique link to the
respective three other files
This link is achieved by the following 4 key variables
(1) Year of Survey
(2) Country
(3) Household ID
(4) Personal ID
26
Organisation of the data
… combine different datasets – Key Variables
Household Register
Year of Survey
Country
Household ID
Household Data
27
Personal Register
Year of Survey
Country
Household ID
Year of Survey
Country
Household ID
Personal ID
Personal Data
Organisation of the data
Household ID – Personal ID
• Household ID
•
•
Cross-sectional (max. 6 digits) = hh number 1-999999
Longitudinal (max. 8 digits) = hh number 1-999999 + split number

Default split number = 00
• Personal ID
•
•
28
Cross-sectional = hh-id + personal number (max 2 digits)
Longitudinal = hh number + default split number (00) + personal number

In the longitudinal survey the Personal ID never changes, even if the person
moves to a different household

in the cross-sectional survey, from year to year the Household ID and
Personal ID may change
The 4 key variables – illustration (longitudinal data)
year
country
hh_id
pers_id
year of birth
2010
2010
2011
2011
2009
2009
2009
2009
2009
2010
2010
2010
2010
2010
2010
2010
2011
2011
2011
2011
2011
2011
A
A
A
A
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
40017100
40017100
40017100
40017100
40017100
40017100
40017100
40017100
40017100
40017100
40017100
40017100
40017100
40017100
40017101
40017101
40017100
40017100
40017100
40017101
40017101
40017101
4001710001
4001710002
4001710001
4001710002
4001710001
4001710002
4001710003
4001710004
4001710005
4001710001
4001710002
4001710003
4001710004
4001710005
4001710003
4001710004
4001710001
4001710002
4001710005
4001710002
4001710003
4001710004
1937
1939
1937
1939
1953
1956
1982
1984
1985
1953
1956
1982
1984
1985
1982
1984
1953
1956
1985
1956
1982
1984
29
Combining information from two separate files at a
1:1 level
30
Combined data
31
Combining information from two separate files at a
1:n level
32
Combined data
33
Use of separate sub datasets
Create household level variables from personal level data, e.g.
number of current household members
• persons < 18 in household
• age of the youngest child in household
• Number of unemployed hh-members
• Highest educational level in household
• …
•
34
Create new household level summary variables from person level information, e.g. household size,
number of children, age of youngest child (< 18 years)
year
2010
2010
2010
2011
2011
2011
2011
2011
2010
2010
2008
2008
2008
2009
2009
2009
2010
2010
2010
2011
2011
2011
2011
country
a
a
a
a
a
a
b
b
b
b
c
c
c
c
c
c
c
c
c
c
c
c
c
hh_id
6800
6800
6800
6800
6800
6800
6800
6800
7000
7000
7000
7000
7000
7000
7000
7000
7000
7000
7000
7000
7000
7000
7000
pers_id
680001
680002
680003
680001
680002
680003
680001
680002
700001
700002
700001
700002
700003
700001
700002
700003
700001
700002
700003
700001
700002
700003
700004
RX010
36
35
17
36
36
18
69
73
80
80
42
34
2
43
35
3
44
36
4
45
37
5
0
new hh-level variables
added from hh-data
hhsize
numchild
ychild
HX080
3
1
17
0
3
1
17
0
3
1
17
0
3
0
.
0
3
0
.
0
3
0
.
0
2
0
.
0
2
0
.
0
2
0
.
0
2
0
.
0
3
1
2
1
3
1
2
0
3
1
2
0
3
1
3
0
3
1
3
0
3
1
3
0
3
1
4
1
3
1
4
1
3
1
4
1
4
2
0
1
4
2
0
1
4
2
0
1
4
2
0
1
35
Some reading – Documents of priority
36
Some reading – Documents of priority
Guidelines_Doc65_2011.pdf
•
•
•
General technical information on sample design, weights, etc.
List of all variables included in the original EU-SILC data base
Description of (cross-sectional and longitudinal) variables
DIFFERENCES BETWEEN DATA COLLECTED AND UDB.doc
•
•
List of variables removed or added to Userdata Base (UDB)
Methods of anonymisation
SILC L-2011 UDB PROBLEMS AND MODIFICATIONS.xls
National and EU Quality reports
•
http://epp.eurostat.ec.europa.eu/portal/page/portal/income_social_inclusi
on_living_conditions/quality
37
Some reading – Documents of priority
Guidelines_Doc65_2011.pdf
38
Source: Guidelines_Doc65_2011.pdf
Some reading – Documents of priority
Flag Variable
HH020_F
39
Source: Guidelines_Doc65_2011.pdf
Some reading – Documents of priority
Flag Variable
HH021_F
40
Source: Guidelines_Doc65_2011.pdf
Some reading – Documents of priority
Cross-sectional data 2011
Source: UDB_c11H_ver 2011-2 from 01-08-13.dta
41
Some reading – Documents of priority
Longitudinal data 2011
Old (HH020)
New (HH021)
Source: UDB_l11H_ver 2011-1 from 01-08-2013.dta
42
Some reading – Documents of priority
Example: variable included in the cross-sectional and longitudinal data
43
Source: Guidelines_Doc65_2011.pdf
Some reading – Documents of priority
Example: variable included in the cross-sectional only
44
Source: Guidelines_Doc65_2011.pdf
Some reading – Documents of priority
Example: variable included in longitudinal data only
45
Source: Guidelines_Doc65_2011.pdf
Some reading – Documents of priority
Example: selected respondent
46
Source: Guidelines_Doc65_2011.pdf
Some reading – Documents of priority
Differences between data collected and Userdata Base (cross-sectional file)
47
Some reading – Documents of priority
Differences between data collected and Userdata Base (longitudinal file)
48
Source: L2011 DIFFERENCES BETWEEN DATA COLLECTED AND UDB.doc
Some reading – Documents of priority
Differences between data collected and Userdata Base (cross-sectional file)
49
Some reading – Documents of priority
Differences between data collected and Userdata Base (longitudinal file)
50
Some reading – Documents of priority
SILC L-2011 UDB PROBLEMS AND MODIFICATIONS.xls
Source: SILC L-2011 UDB PROBLEMS AND MODIFICATIONS.xls
51
Some reading – Documents of priority
Quality reports
52
Data Structure – Some reading
National quality reports
53
Data Structure – Some reading
E.G. Austria: Final Quality Report Relating to the EU-SILC Operation 2007-2010
54
Source: Austria, Final Quality Report Relating to the EU-SILC Operation 2007-2010, p. 7
55
THANK YOU 
heike.wirth@gesis.org
56