Newtonbrook Secondary School (Nov. 29, 2007)
Transcription
Newtonbrook Secondary School (Nov. 29, 2007)
“Lies, damned lies and Statistics” Georges Monette, York University Newtonbrook Secondary School, November 2007 There are three kinds of lies: lies, damned lies and statistics – Benjamin Disraeli, Prime Minister of Great Britain (1868, 1874-1880) Newer research suggests Disraeli might have said: “Lies … damned lies – and Statistics” Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write. – H. G. Wells It is easy to lie with statistics. It is hard to tell the truth without it. – Andrejs Dunkels Some examples of the types of analyses that can lead to lies: Country Afghanistan Albania Algeria Andorra Angola Antigua and Barbuda Argentina Armenia Australia Austria Azerbaijan The Bahamas Bahrain Bangladesh Barbados Belarus Belgium Belize Benin Bhutan Bolivia Bosnia and Herzegovina Botswana Brazil Life Expectancy in 1998 46.8 68.6 68.9 83.5 47.9 71.2 74.1 66.7 79.9 77.3 63.3 74.0 75.0 56.7 74.8 68.3 77.4 69.0 53.6 52.3 60.9 63.0 40.1 64.4 Life Expectancy in 2000 45.9 71.6 69.7 83.5 38.3 70.5 75.1 66.4 79.8 77.7 62.9 71.1 73.0 60.2 73.0 68.0 77.8 70.9 50.2 52.4 63.7 71.5 39.3 62.9 Life Country Expectancy in 1998 Brunei 71.7 Bulgaria 72.0 Burkina Faso 46.1 Burundi 45.6 Cambodia 48.0 Cameroon 51.4 Canada 79.2 Cape Verde 70.5 Central African Republic 46.8 Chad 48.2 Chile 75.2 China 69.6 Colombia 70.1 Comoros 60.4 Congo, Republic of the 47.1 Congo, Democratic Republic of the 49.3 Costa Rica 75.9 Cote d'Ivoire 46.2 Croatia 73.8 Cuba 75.6 Cyprus 76.8 Czech Republic 74.1 Denmark 76.3 Djibouti 51.1 Dominica 77.8 Life Expectancy in 2000 73.6 70.9 46.7 46.2 56.5 54.8 79.4 68.9 44.0 50.5 75.7 71.4 70.3 60.0 47.4 48.8 75.8 45.2 73.7 76.2 76.7 74.5 76.5 50.8 73.4 Country Dominican Republic Ecuador Egypt El Salvador Equatorial Guinea Eritrea Estonia Ethiopia Fiji Finland France Gabon The Gambia Georgia Germany Ghana Greece Grenada Guatemala Guinea Guinea-Bissau Guyana Haiti Honduras Hungary Life Expectancy in 1998 69.7 71.8 62.1 69.7 53.9 55.3 68.5 40.9 66.3 77.2 78.5 56.5 53.9 64.8 77.0 56.8 78.3 71.4 66.0 46.0 49.1 62.8 51.4 65.0 70.8 Life Expectancy in 2000 73.2 71.1 63.3 69.7 53.6 55.8 69.5 45.2 67.9 77.4 78.8 50.1 53.2 64.5 77.4 57.4 78.4 64.5 66.2 45.6 49.0 64.0 49.2 69.9 71.4 Country Iceland India Indonesia Iran Iraq Ireland Israel Italy Jamaica Japan Jordan Kazakhstan Kenya Kiribati Korea, North Korea, South Kuwait Kyrgyz Republic Laos Latvia Lebanon Lesotho Liberia Libya Liechtenstein Life Expectancy in 1998 78.8 62.9 62.5 68.3 66.5 76.2 78.4 78.4 75.4 80.0 72.8 63.6 47.6 62.6 51.3 74.0 76.8 63.8 53.7 67.1 70.6 54.0 59.5 65.4 78.0 Life Expectancy in 2000 79.4 62.5 68.0 69.7 66.5 76.8 78.6 79.0 75.2 80.7 77.4 63.2 48.0 59.8 70.7 74.4 74.5 63.4 53.1 68.4 71.3 50.8 51.0 75.5 78.8 Country Lithuania Luxembourg Macedonia Madagascar Malawi Malaysia Maldives Mali Malta Marshall Islands Mauritania Mauritius Mexico Federated States of Micronesia Moldova Monaco Mongolia Morocco Mozambique Myanmar (Burma) Namibia Nauru Nepal Netherlands New Zealand Life Expectancy in 1998 68.8 77.5 72.8 52.9 36.6 70.4 67.6 47.0 77.6 64.5 50.0 70.9 71.6 68.3 64.3 78.4 61.5 68.5 45.4 54.5 41.5 66.7 57.8 78.0 77.6 Life Expectancy in 2000 69.1 77.1 73.8 55.0 37.6 70.8 62.2 46.7 77.9 65.5 50.8 71.0 71.5 68.6 64.5 78.8 67.3 69.1 37.5 54.9 42.5 60.8 57.8 78.3 77.8 Country Nicaragua Niger Nigeria Norway Oman Pakistan Palau Panama Papua New Guinea Paraguay Peru Philippines Poland Portugal Qatar Romania Russia Rwanda Saint Kitts and Nevis Saint Lucia Samoa San Marino Sao Tome and Principe Saudi Arabia Senegal Life Expectancy in 1998 66.6 41.5 53.6 78.2 71.0 59.1 67.5 74.5 58.1 72.2 70.0 66.4 72.8 75.7 73.9 70.5 65.0 41.9 67.6 71.6 69.5 81.4 64.3 70.0 57.4 Life Expectancy in 2000 68.7 41.3 51.6 78.7 71.8 61.1 68.6 75.5 63.1 73.7 70.0 67.5 73.2 75.8 72.4 69.9 67.2 39.3 70.7 72.3 69.2 81.1 65.3 67.8 62.2 Country Serbia Seychelles Sierra Leone Singapore Slovakia Slovenia Solomon Islands Somalia South Africa Spain Sri Lanka Sudan Suriname Swaziland Sweden Switzerland Syria Taiwan Tajikistan Tanzania Thailand Togo Tonga Trinidad and Tobago Tunisia Life Expectancy in 1998 n/a 70.8 48.6 78.5 73.2 75.2 71.8 46.2 55.5 77.6 72.6 56.0 70.6 38.5 79.2 78.9 67.8 76.8 64.5 46.4 69.0 58.8 69.5 70.5 73.1 Life Expectancy in 2000 72.4 70.4 45.3 80.1 73.7 74.9 71.3 46.2 51.1 78.8 71.8 56.6 71.4 40.4 79.6 79.6 68.5 76.4 64.1 52.3 68.6 54.7 67.9 68.0 73.7 Country Turkey Turkmenistan Tuvalu Uganda Ukraine United Arab Emirates United Kingdom United States Uruguay Uzbekistan Vanuatu Venezuela Vietnam Yemen Zambia Zimbabwe Life Expectancy in 1998 72.8 61.3 63.9 42.6 65.8 74.9 77.2 76.1 75.5 64.1 61.0 72.7 67.7 59.5 37.1 39.2 Life Expectancy in 2000 71.0 60.9 66.3 42.9 66.0 74.1 77.7 77.1 75.2 63.7 60.6 73.1 69.3 59.8 37.2 37.8 Annual consumption of cigarettes per capita Cigarette Consumption Hungary 2515 Japan 2510 USA 2020 South Africa 1950 UK 1700 France 1690 USSR 1650 Brazil 1200 Philippines 1150 Venezuela 950 Zaire 150 India 100 Country 80 70 Life expectancy 60 50 0 500 1000 1500 Cigarettes per capita 2000 2500 80 70 Life expectancy 60 50 0 500 1000 1500 Cigarettes per capita 2000 2500 Increasing cigarette consumption by 1,000 per year corresponds to an increase in life expectancy of 6.8 years. Increasing cigarette consumption by 1,000 per year corresponds to an increase in life expectancy of 6.8 years. All it takes is 3 cigarettes a day to add 7 years to your life! 80 Japan 70 Hungary 60 India Zaire 50 Life expectancy US 0 500 1000 1500 Cigarettes per capita 2000 2500 80 Japan France US UK 70 Hungary Philippines Russia Brazil 60 India South Africa Zaire 50 Life expectancy Venezuela 0 500 1000 1500 Cigarettes per capita 2000 2500 Maybe it isn’t smoking that’s responsible for higher life expectancies. Maybe it isn’t smoking that’s responsible for higher life expectancies. Correlation is not necessarily causation. Two solutions: Two solutions: Solution 1: (the ideal solution) Experiment Two solutions: Solution 1: (the ideal solution) Experiment Correlation is not necessarily causation unless you are analyzing an experiment Sir Ronald A. Fisher laid the foundations of Experimental Design ca 1925 to 1940 Two solutions: Solution 2: (the usual solution) Use observational data with care Two solutions: Solution 2: (the common solution) Use observational data with care Often we can’t afford to experiment: - too costly - too risky - too long - data already on hand We use observational data and try to control for the possible effects of other variables by measuring them and using statistical controls, e.g. with stratification or multiple regression We use observational data and try to control for the possible effects of other variables by measuring them and using statistical controls, e.g. with stratification or multiple regression ‘Correlation is not causation but sometimes it’s a darn good hint of causation’ – Anon. We use observational data and try to control for the possible effects of other variables by measuring them and using statistical controls, e.g. with stratification or multiple regression ‘Correlation is not causation but sometimes it’s a darn good hint of causation’ – Anon. Problem: Knowing when Caution: Taking a hard line “correlation is not causation” may be as problematic as seeing causation in every correlation. Caution: Taking a hard line “correlation is not causation” may be as problematic as seeing causation in every correlation. R. A. Fisher, the father of experimental design, delayed public action against smoking in the 1950s when he sided with the tobacco companies objecting that the evidence against smoking was only observational. For many important issues, we only have observational data. For many important issues, we only have observational data. We need to develop better concepts to assess observational evidence wisely. This is a major challenge for modern Statistics. New York Times November 18, 2007 Another common way of thinking that can mislead: 1998 2000 2002 Year 2004 2006 60 70 80 Life Expectancy 90 100 1998 2000 2002 Year 2004 2006 60 70 80 Life Expectancy 90 100 100 90 80 70 60 Life Expectancy Canada 1998 2000 2002 Year 2004 2006 100 90 80 70 US 60 Life Expectancy Canada 1998 2000 2002 Year 2004 2006 100 90 80 70 US China 60 Life Expectancy Canada 1998 2000 2002 Year 2004 2006 100 90 80 70 US China Bosnia and Herzegovina 60 Life Expectancy Canada 1998 2000 2002 Year 2004 2006 100 90 80 70 US China Bosnia and Herzegovina 60 Life Expectancy Canada 1998 2000 2002 Year 2004 2006 So: To have a very long life, move to Bosnia-Herzegovina 100 90 80 70 US China Bosnia and Herzegovina 60 Life Expectancy Canada 1998 2000 2002 Year 2004 2006 80 Canada Sometimes you should use the la value. 70 US China Bosnia and Herzegovina 60 Life Expectancy 90 100 Sometimes the best prediction comes from projecting the trend. 1998 2000 2002 Year 2004 2006 Usually: something in between. Do statistics lie? OR is it misuse of statistics that lies? Statistics ≠ calculations Statistics = calculations + statistical reasoning If you take away reasoning, you’re not really doing statistics Statistics done right ≠ lies Why I like being a statistician: Why I like being a statistician: John W. Tukey: “The best thing about being a statistician is that you get to play in everyone's backyard.” Some practical advice from a statistician – A few things I’ve learned recently Gambling is OK Gambling is OK if you own the casino If you are a mathematician don’t drive a motorcycle. If you are a mathematician don’t drive a motorcycle. But if you’re an english major it’s not as bad. Don’t use a cell phone while you drive Observational data with a very clever analysis: Redelmeier and Tibshirani found a clever solution to this problem: Phoning vs drinking: Politics may be good for your health ROC Qc French Only Qc French Multi Qc Other 0.3 Effect on log Distress 0.2 0.1 0.0 -0.1 -0.2 94 95 96 Year 97 98 Politics may also be bad for your health depending on where you live, what languages you speak and on the outcome of the next referendum A Newtonbrook alumnus who might be worth a project in Statistics or Probability: Howie Mandel Let’s Make a Deal If you like doing statistics you can be certified! If you like doing statistics you can be certified! The Statistical Society of Canada now offers the “A. Stat.” accreditation for graduates who have taken specified statistics and other courses Two levels: • Professional (P.Stat.) • Associate (A.Stat.) What Statisticians do: • • • • • Health and Medicine Finance, Banking, Insurance Business and Industry Education Government Health and Medicine Biostatistics Clinical Trials Drug Monitoring Epidemiology Genetics Pharmaceutical research Public Health Business and Industry Actuaries for Insurance and Pensions Agriculture Banking: e.g. methods to assess risk Chemistry Computer Science Economics Finance Manufacturing Market Research Quality Improvement and Reliability Government: Statistics Canada Environment Forestry Government Regulation Law National Defense Population Research Risk Assessment Watch: www.ssc.ca A highly recommended book: Thank you Georges Monette georges@yorku.ca http://www.math.yorku.ca/~georges Notes: R. A. Fisher and Tobacco: R. A. Fisher and the Role of a Statistical Consultant J. H. Bennett Journal of the Royal Statistical Society. Series A (Statistics in Society), Vol. 154, No. 3 (1991), pp. 443-445 doi:10.2307/2983153 Extracts from R. A. Fisher's letters referring to the responsibilities of statistical consultants are considered along with his view of his own role as a scientific consultant to the Tobacco Manufacturers' Standing Committee in the late 1950s. Contrary to a recent suggestion that Fisher may have been `misrepresenting data on lung cancer while acting as an adviser to the tobacco industry', his letters show that he was very deeply concerned about the possible misrepresentation to consumers of an alleged statistical result. Further, Fisher believed that it is `only by giving students the opportunity of making fine distinctions in the logic of the subject that they can learn to recognize the difference between honest and dishonest work in statistical practice'. American Journal of Epidemiology Vol. 133, No. 5: 416-425 Copyright © 1991 by The Johns Hopkins University School of Hygiene and Public Health When Genius Errs: R. A. Fisher and the Lung Cancer Controversy Paul D. Stolley Clinical Epidemiology Unit, University Pennsylvania School of Medicine 220-L Nursing Education Building, Philaiaphia, PA 19104-6095 R. A. Fisher's work on lung cancer and smoking is critically reviewed. The controversy is placed in the context of his career and personality. Although Fisher made invaluable contributions to the field of statistics, his analysis of the causal association between lung cancer and smoking was flawed by an unwillingness to examine the entire body of data available and prematurely drawn conclusions. His views may also have been influenced by personal and professional conflicts, by his work as a consultant to the tobacco industry, and by the fact that he was himself a smoker. Text Scraps: Some inferences need tags attached: CAUTION THIS INFERENCE WAS BASED ON OBSERVATIONAL DATA AND MUST BE REVISED WHEN BETTER EVIDENCE BECOMES AVAILABLE DO NOT DETACH THIS TAG UNDER PENALTY OF LAW