Arby`s sandwiches (from a while ago) Arby`s sandwiches (2012
Transcription
Arby`s sandwiches (from a while ago) Arby`s sandwiches (2012
10/1/12 Oct. 1 Statistic for the Day: Probability of correctly predicting the sex of a baby at birth 19 or more times out of 21: About 1/10,000 Probability that someone, somewhere has done this: Nearly certain! (e.g., This American Life story) Assignment: Read Chapter 10 Arby’s sandwiches (2012 update) weight Big Montana Giant Roast Beef Max Regular Roast Beef Classic Beef ‘n Cheddar Classic Super Roast Beef Mid Junior Roast Beef Chicken Breast Fillet Crispy Chicken Bacon ‘n Swiss Crispy Roast Chicken Grand Turkey Club Market Fresh Turkey Ranch Bacon Market Fresh Ultimate BLT Market Fresh Roast Beef Swiss Market Fresh Roast Ham Swiss Market Fresh Roast Turkey Swiss Market Fresh Chicken Salad Arby’s Sandwiches 590 450580 320350 440 440 270210 500510 550610 470490 830800 780 780 700 720700 770 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Big Montana Giant Roast Beef Regular Roast Beef Beef ‘n Cheddar Super Roast Beef Junior Roast Beef Chicken Breast Fillet Chicken Bacon ‘n Swiss Roast Chicken Club Market Fresh Turkey Ranch Bacon Market Fresh Ultimate BLT Market Fresh Roast Beef Swiss Market Fresh Roast Ham Swiss Market Fresh Roast Turkey Swiss Market Fresh Chicken Salad calories 309 g 224 154 195 230 125 233 209 228 379 293 357 357 357 322 590 450 320 440 440 270 500 550 470 830 780 780 700 720 770 Research Question: At Arby’s, are calories related to the weight of the sandwich? Let’s try using tools from previous chapters first: Observational study • Response = calories • Explanatory variable = small or large sandwich Small sandwich means less than 225 grams (n = 6) Large sandwich means more than 225 grams (n = 4) 800 700 600 This is where we consider the new topic of Chapter 10: 400 500 We can refine the explanatory variable and get more information about the relationship between calories and weight: Rather than split it into small and large, keep the numerical values of the explanatory variable. 300 Calories 309 g 224281 154 195 230210 12587 233221 209205 228233 379344 293 357 357 357326 322 weight There seems to be a difference. (Is it statistically significant? That question comes later in the course!) 200 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 calories Arby’s sandwiches (from a while ago) Large Small (Note: when we do this, we can no longer think of the explanatory variable as identifying which subpopulation the observation belongs to.) 1 10/1/12 Arby's Sandwiches 800 600 700 The correlation measures the strength of the linear relationship between weight and calories. 500 Calories 500 300 400 Correlation=0.95 200 300 200 100 150 200 250 300 100 350 150 200 Facts about Correlation: 300 350 800 Arby's Sandwiches 600 700 The best-fitting line through the data is called the regression line. 400 a. r=0 means no linear relationship. b. Positive r means the two variables tend to increase together (with r=1 meaning a perfect linear relationship) c. Negative r means that one variable increases while the other decreases (with −1 meaning a perfect linear relationship) 500 Calories • We use the letter “r” to denote the correlation coefficient. • The correlation coefficient is a measure of the strength of the linear relationship between the two variables in a scatterplot. • The value of r must always be between −1 and 1: 250 Weight Weight 200 300 How should we describe this line? 100 150 200 250 300 350 Weight Formula for a regression line Arby's Sandwiches calories = (intercept) + (slope)(weight) So all we need to describe the line is the intercept and the slope. 800 600 Calories 500 400 300 or, in this case, 700 Remember your algebra! The equation for a line is y = (intercept) + (slope)(x) The intercept is 41 in this case and the slope is 2.1. cal = 41 + (2.1)(wt) 200 Calories 600 700 This type of plot, with two measurements per subject, is called a scatterplot (see p. 166). 400 800 Arby's Sandwiches 100 150 200 250 300 350 In this class, you don’t need to know how to calculate the slope and intercept (but see p. 195 if you like formulas). Weight 2 10/1/12 intercept slope intercept calories = 41 + (2.1)(weight in grams) calories = 41 + (2.1)(weight in grams) ------------------------------------------------For example, if you have a 200g sandwich, on the average you expect to get about: For every extra gram of weight, you expect an increase of 2.1 calories in your Arby’s sandwich. 41 + (2.1)(200) = 41 + 420 = 461 calories -------------------------------------------------For a 350g sandwich: Interpretation of slope: Expected increase in response for every unit increase (increase of one) in explanatory. 41 + (2.1)(350) = 41 + 735 = 776 calories Men and Women Combined 250 Weight vs. Ideal Weight Men and Women Combined 200 Ideal Weight (not a regression line; rather, it’s a line for comparison purposes) 150 200 100 Ideal Weight Weight = Ideal Weight 150 We’ll use SP2004 data. Dotted red line: 250 Question: What is the relationship between weight and ideal weight? slope 100 100 Compare with case study 10.2, page 193 100 150 200 150 200 250 Weight 250 Weight Men and Women Combined 150 200 250 S=15.17 Weight 180 200 220 100 160 Ideal Weight R-squared = .752 140 200 150 Correlation = .867 100 Ideal Weight 240 250 Men Only 150 200 250 Weight The green line is the regression line: Ideal weight = 25.6 + 0.78 Weight Dotted red line: Weight = Ideal Weight 3 10/1/12 Men Only 180 150 110 160 140 Ideal Weight S=12.36 130 220 R-squared = .723 120 240 Correlation = .850 200 Ideal Weight 160 Women Only 200 250 Weight Green regression line: 100 What does it mean when the lines cross at 169 pounds? 140 150 100 120 140 160 180 200 220 240 Weight Dotted red line: Weight = Ideal Weight Ideal weight = 66.2 + 0.61 Weight Women Only 160 150 140 S=8.20 120 130 R-squared = .691 100 110 Ideal Weight Spring 2001 Mean Correlation = .831 100 120 140 160 180 200 220 240 Weight The lines cross at 112 pounds. Green regression line: Ideal weight = 56.1 + 0.50 Weight Fall 2008 Mean Wt. Ideal Wt. Diff. Wt. Ideal Wt. Diff. Comb. 146 138 8 154 146 8 Men 175 171 4 174 172 2 Women 132 122 10 138 126 12 This pattern remained fairly steady over many years of STAT 100: Men on average are about 0-5 pounds heavier than their ideal, whereas women on average are about 10-12 pounds heavier than their ideal. Note, however, that the regression lines tell a more complete story! A weighty puzzle: SP 2001 vs. FA 2008 in STAT 100 SP 2001 Mean Weight FA 2008 Mean Weight Combined 146 154 Men 175 174 Women 132 138 Percent men 32% 43% Notice: Combined mean weight is 8 pounds heavier in 2005. But women are only 6 pounds heavier on average, and men are actually lighter. How is this possible? The answer is related to Simpson’s paradox. 4