Newtonbrook Secondary School (Nov. 29, 2007)

Transcription

Newtonbrook Secondary School (Nov. 29, 2007)
“Lies, damned lies and Statistics”
Georges Monette, York University
Newtonbrook Secondary School, November 2007
There are three kinds of lies:
lies, damned lies and statistics
– Benjamin Disraeli, Prime Minister of Great Britain
(1868, 1874-1880)
Newer research suggests Disraeli might have said:
“Lies … damned lies – and Statistics”
Statistical thinking will one day
be as necessary
for efficient citizenship
as the ability
to read and write.
– H. G. Wells
It is easy to lie with statistics.
It is hard to tell the truth without it.
– Andrejs Dunkels
Some examples of the types of
analyses that can lead to lies:
Country
Afghanistan
Albania
Algeria
Andorra
Angola
Antigua and Barbuda
Argentina
Armenia
Australia
Austria
Azerbaijan
The Bahamas
Bahrain
Bangladesh
Barbados
Belarus
Belgium
Belize
Benin
Bhutan
Bolivia
Bosnia and Herzegovina
Botswana
Brazil
Life
Expectancy
in 1998
46.8
68.6
68.9
83.5
47.9
71.2
74.1
66.7
79.9
77.3
63.3
74.0
75.0
56.7
74.8
68.3
77.4
69.0
53.6
52.3
60.9
63.0
40.1
64.4
Life
Expectancy
in 2000
45.9
71.6
69.7
83.5
38.3
70.5
75.1
66.4
79.8
77.7
62.9
71.1
73.0
60.2
73.0
68.0
77.8
70.9
50.2
52.4
63.7
71.5
39.3
62.9
Life
Country
Expectancy
in 1998
Brunei
71.7
Bulgaria
72.0
Burkina Faso
46.1
Burundi
45.6
Cambodia
48.0
Cameroon
51.4
Canada
79.2
Cape Verde
70.5
Central African Republic
46.8
Chad
48.2
Chile
75.2
China
69.6
Colombia
70.1
Comoros
60.4
Congo, Republic of the
47.1
Congo, Democratic Republic of the 49.3
Costa Rica
75.9
Cote d'Ivoire
46.2
Croatia
73.8
Cuba
75.6
Cyprus
76.8
Czech Republic
74.1
Denmark
76.3
Djibouti
51.1
Dominica
77.8
Life
Expectancy
in 2000
73.6
70.9
46.7
46.2
56.5
54.8
79.4
68.9
44.0
50.5
75.7
71.4
70.3
60.0
47.4
48.8
75.8
45.2
73.7
76.2
76.7
74.5
76.5
50.8
73.4
Country
Dominican Republic
Ecuador
Egypt
El Salvador
Equatorial Guinea
Eritrea
Estonia
Ethiopia
Fiji
Finland
France
Gabon
The Gambia
Georgia
Germany
Ghana
Greece
Grenada
Guatemala
Guinea
Guinea-Bissau
Guyana
Haiti
Honduras
Hungary
Life
Expectancy
in 1998
69.7
71.8
62.1
69.7
53.9
55.3
68.5
40.9
66.3
77.2
78.5
56.5
53.9
64.8
77.0
56.8
78.3
71.4
66.0
46.0
49.1
62.8
51.4
65.0
70.8
Life
Expectancy
in 2000
73.2
71.1
63.3
69.7
53.6
55.8
69.5
45.2
67.9
77.4
78.8
50.1
53.2
64.5
77.4
57.4
78.4
64.5
66.2
45.6
49.0
64.0
49.2
69.9
71.4
Country
Iceland
India
Indonesia
Iran
Iraq
Ireland
Israel
Italy
Jamaica
Japan
Jordan
Kazakhstan
Kenya
Kiribati
Korea, North
Korea, South
Kuwait
Kyrgyz Republic
Laos
Latvia
Lebanon
Lesotho
Liberia
Libya
Liechtenstein
Life
Expectancy
in 1998
78.8
62.9
62.5
68.3
66.5
76.2
78.4
78.4
75.4
80.0
72.8
63.6
47.6
62.6
51.3
74.0
76.8
63.8
53.7
67.1
70.6
54.0
59.5
65.4
78.0
Life
Expectancy
in 2000
79.4
62.5
68.0
69.7
66.5
76.8
78.6
79.0
75.2
80.7
77.4
63.2
48.0
59.8
70.7
74.4
74.5
63.4
53.1
68.4
71.3
50.8
51.0
75.5
78.8
Country
Lithuania
Luxembourg
Macedonia
Madagascar
Malawi
Malaysia
Maldives
Mali
Malta
Marshall Islands
Mauritania
Mauritius
Mexico
Federated States of Micronesia
Moldova
Monaco
Mongolia
Morocco
Mozambique
Myanmar (Burma)
Namibia
Nauru
Nepal
Netherlands
New Zealand
Life
Expectancy
in 1998
68.8
77.5
72.8
52.9
36.6
70.4
67.6
47.0
77.6
64.5
50.0
70.9
71.6
68.3
64.3
78.4
61.5
68.5
45.4
54.5
41.5
66.7
57.8
78.0
77.6
Life
Expectancy
in 2000
69.1
77.1
73.8
55.0
37.6
70.8
62.2
46.7
77.9
65.5
50.8
71.0
71.5
68.6
64.5
78.8
67.3
69.1
37.5
54.9
42.5
60.8
57.8
78.3
77.8
Country
Nicaragua
Niger
Nigeria
Norway
Oman
Pakistan
Palau
Panama
Papua New Guinea
Paraguay
Peru
Philippines
Poland
Portugal
Qatar
Romania
Russia
Rwanda
Saint Kitts and Nevis
Saint Lucia
Samoa
San Marino
Sao Tome and Principe
Saudi Arabia
Senegal
Life
Expectancy
in 1998
66.6
41.5
53.6
78.2
71.0
59.1
67.5
74.5
58.1
72.2
70.0
66.4
72.8
75.7
73.9
70.5
65.0
41.9
67.6
71.6
69.5
81.4
64.3
70.0
57.4
Life
Expectancy
in 2000
68.7
41.3
51.6
78.7
71.8
61.1
68.6
75.5
63.1
73.7
70.0
67.5
73.2
75.8
72.4
69.9
67.2
39.3
70.7
72.3
69.2
81.1
65.3
67.8
62.2
Country
Serbia
Seychelles
Sierra Leone
Singapore
Slovakia
Slovenia
Solomon Islands
Somalia
South Africa
Spain
Sri Lanka
Sudan
Suriname
Swaziland
Sweden
Switzerland
Syria
Taiwan
Tajikistan
Tanzania
Thailand
Togo
Tonga
Trinidad and Tobago
Tunisia
Life
Expectancy
in 1998
n/a
70.8
48.6
78.5
73.2
75.2
71.8
46.2
55.5
77.6
72.6
56.0
70.6
38.5
79.2
78.9
67.8
76.8
64.5
46.4
69.0
58.8
69.5
70.5
73.1
Life
Expectancy
in 2000
72.4
70.4
45.3
80.1
73.7
74.9
71.3
46.2
51.1
78.8
71.8
56.6
71.4
40.4
79.6
79.6
68.5
76.4
64.1
52.3
68.6
54.7
67.9
68.0
73.7
Country
Turkey
Turkmenistan
Tuvalu
Uganda
Ukraine
United Arab Emirates
United Kingdom
United States
Uruguay
Uzbekistan
Vanuatu
Venezuela
Vietnam
Yemen
Zambia
Zimbabwe
Life
Expectancy
in 1998
72.8
61.3
63.9
42.6
65.8
74.9
77.2
76.1
75.5
64.1
61.0
72.7
67.7
59.5
37.1
39.2
Life
Expectancy
in 2000
71.0
60.9
66.3
42.9
66.0
74.1
77.7
77.1
75.2
63.7
60.6
73.1
69.3
59.8
37.2
37.8
Annual consumption of cigarettes per capita
Cigarette
Consumption
Hungary
2515
Japan
2510
USA
2020
South Africa
1950
UK
1700
France
1690
USSR
1650
Brazil
1200
Philippines
1150
Venezuela
950
Zaire
150
India
100
Country
80
70
Life expectancy
60
50
0
500
1000
1500
Cigarettes per capita
2000
2500
80
70
Life expectancy
60
50
0
500
1000
1500
Cigarettes per capita
2000
2500
Increasing cigarette consumption by 1,000 per year
corresponds to an increase in life expectancy of 6.8
years.
Increasing cigarette consumption by 1,000 per year
corresponds to an increase in life expectancy of 6.8
years.
All it takes is 3 cigarettes a day
to add 7 years to your life!
80
Japan
70
Hungary
60
India
Zaire
50
Life expectancy
US
0
500
1000
1500
Cigarettes per capita
2000
2500
80
Japan
France
US
UK
70
Hungary
Philippines
Russia
Brazil
60
India
South Africa
Zaire
50
Life expectancy
Venezuela
0
500
1000
1500
Cigarettes per capita
2000
2500
Maybe it isn’t smoking that’s responsible for
higher life expectancies.
Maybe it isn’t smoking that’s responsible for
higher life expectancies.
Correlation is not necessarily causation.
Two solutions:
Two solutions:
Solution 1: (the ideal solution)
Experiment
Two solutions:
Solution 1: (the ideal solution)
Experiment
Correlation is not necessarily causation
unless you are analyzing an experiment
Sir Ronald A. Fisher
laid the foundations of
Experimental Design
ca 1925 to 1940
Two solutions:
Solution 2: (the usual solution)
Use observational data with care
Two solutions:
Solution 2: (the common solution)
Use observational data with care
Often we can’t afford to experiment:
- too costly
- too risky
- too long
- data already on hand
We use observational data and try to control for
the possible effects of other variables by measuring
them and using statistical controls,
e.g. with stratification or multiple regression
We use observational data and try to control for
the possible effects of other variables by measuring
them and using statistical controls,
e.g. with stratification or multiple regression
‘Correlation is not causation
but sometimes it’s a darn good hint of causation’
– Anon.
We use observational data and try to control for
the possible effects of other variables by measuring
them and using statistical controls,
e.g. with stratification or multiple regression
‘Correlation is not causation
but sometimes it’s a darn good hint of causation’
– Anon.
Problem: Knowing when
Caution:
Taking a hard line “correlation is not causation”
may be as problematic as seeing causation in every
correlation.
Caution:
Taking a hard line “correlation is not causation”
may be as problematic as seeing causation in every
correlation.
R. A. Fisher, the father of experimental design,
delayed public action against smoking in the 1950s
when he sided with the tobacco companies
objecting that the evidence against smoking was
only observational.
For many important issues, we only have
observational data.
For many important issues, we only have
observational data.
We need to develop better concepts to assess
observational evidence wisely.
This is a major challenge for modern Statistics.
New York Times
November 18,
2007
Another common way of thinking
that can mislead:
1998
2000
2002
Year
2004
2006
60
70
80
Life Expectancy
90
100
1998
2000
2002
Year
2004
2006
60
70
80
Life Expectancy
90
100
100
90
80
70
60
Life Expectancy
Canada
1998
2000
2002
Year
2004
2006
100
90
80
70
US
60
Life Expectancy
Canada
1998
2000
2002
Year
2004
2006
100
90
80
70
US
China
60
Life Expectancy
Canada
1998
2000
2002
Year
2004
2006
100
90
80
70
US
China
Bosnia and Herzegovina
60
Life Expectancy
Canada
1998
2000
2002
Year
2004
2006
100
90
80
70
US
China
Bosnia and Herzegovina
60
Life Expectancy
Canada
1998
2000
2002
Year
2004
2006
So:
To have a very long life,
move to Bosnia-Herzegovina
100
90
80
70
US
China
Bosnia and Herzegovina
60
Life Expectancy
Canada
1998
2000
2002
Year
2004
2006
80
Canada
Sometimes you
should use the la
value.
70
US
China
Bosnia and Herzegovina
60
Life Expectancy
90
100
Sometimes the
best prediction
comes from
projecting the
trend.
1998
2000
2002
Year
2004
2006
Usually:
something in
between.
Do statistics lie?
OR is it misuse of statistics that lies?
Statistics ≠ calculations
Statistics = calculations + statistical reasoning
If you take away reasoning, you’re not really doing
statistics
Statistics done right ≠ lies
Why I like being a statistician:
Why I like being a statistician:
John W. Tukey:
“The best thing about
being a statistician is
that you get to play in
everyone's backyard.”
Some practical advice from a
statistician
– A few things I’ve learned recently
Gambling is OK
Gambling is OK
if you own the casino
If you are a mathematician
don’t drive a motorcycle.
If you are a mathematician
don’t drive a motorcycle.
But if you’re an english major
it’s not as bad.
Don’t use a cell phone
while you drive
Observational data with a very clever
analysis:
Redelmeier and Tibshirani found a
clever solution to this problem:
Phoning vs drinking:
Politics may be
good for your health
ROC
Qc French Only
Qc French Multi
Qc Other
0.3
Effect on log Distress
0.2
0.1
0.0
-0.1
-0.2
94
95
96
Year
97
98
Politics may also be bad for your
health
depending on where you live, what
languages you speak
and on the outcome of the next
referendum
A Newtonbrook alumnus who might be worth a
project in Statistics or Probability:
Howie Mandel
Let’s Make a Deal
If you like doing statistics
you can be certified!
If you like doing statistics
you can be certified!
The Statistical Society of Canada now
offers the “A. Stat.” accreditation
for graduates who have taken
specified statistics and other courses
Two levels:
• Professional (P.Stat.)
• Associate (A.Stat.)
What Statisticians do:
•
•
•
•
•
Health and Medicine
Finance, Banking, Insurance
Business and Industry
Education
Government
Health and Medicine
Biostatistics
Clinical Trials
Drug Monitoring
Epidemiology
Genetics
Pharmaceutical research
Public Health
Business and Industry
Actuaries for Insurance and Pensions
Agriculture
Banking: e.g. methods to assess risk
Chemistry
Computer Science
Economics
Finance
Manufacturing
Market Research
Quality Improvement and Reliability
Government:
Statistics Canada
Environment
Forestry
Government Regulation
Law
National Defense
Population Research
Risk Assessment
Watch:
www.ssc.ca
A highly recommended book:
Thank you
Georges Monette
georges@yorku.ca
http://www.math.yorku.ca/~georges
Notes:
R. A. Fisher and Tobacco:
R. A. Fisher and the Role of a Statistical Consultant
J. H. Bennett
Journal of the Royal Statistical Society. Series A (Statistics in Society), Vol. 154, No. 3 (1991), pp. 443-445
doi:10.2307/2983153
Extracts from R. A. Fisher's letters referring to the responsibilities of statistical consultants are considered along with his view of his own role as a
scientific consultant to the Tobacco Manufacturers' Standing Committee in the late 1950s. Contrary to a recent suggestion that Fisher may have been
`misrepresenting data on lung cancer while acting as an adviser to the tobacco industry', his letters show that he was very deeply concerned about the
possible misrepresentation to consumers of an alleged statistical result. Further, Fisher believed that it is `only by giving students the opportunity of
making fine distinctions in the logic of the subject that they can learn to recognize the difference between honest and dishonest work in statistical practice'.
American Journal of Epidemiology Vol. 133, No. 5: 416-425
Copyright © 1991 by The Johns Hopkins University School of Hygiene and Public Health
When Genius Errs: R. A. Fisher and the Lung Cancer Controversy
Paul D. Stolley
Clinical Epidemiology Unit, University Pennsylvania School of Medicine 220-L Nursing Education Building, Philaiaphia, PA 19104-6095
R. A. Fisher's work on lung cancer and smoking is critically reviewed. The controversy is placed in the context of his career and personality. Although
Fisher made invaluable contributions to the field of statistics, his analysis of the causal association between lung cancer and smoking was flawed by an
unwillingness to examine the entire body of data available and prematurely drawn conclusions. His views may also have been influenced by personal and
professional conflicts, by his work as a consultant to the tobacco industry, and by the fact that he was himself a smoker.
Text
Scraps:
Some inferences need tags attached:
CAUTION
THIS INFERENCE WAS BASED ON OBSERVATIONAL
DATA AND MUST BE REVISED WHEN BETTER
EVIDENCE BECOMES AVAILABLE
DO NOT DETACH THIS TAG UNDER PENALTY OF LAW