Arby`s sandwiches (from a while ago) Arby`s sandwiches (2012

Transcription

Arby`s sandwiches (from a while ago) Arby`s sandwiches (2012
10/1/12
Oct. 1 Statistic for the Day:
Probability of correctly predicting the sex of
a baby at birth 19 or more times out of 21:
About 1/10,000
Probability that someone, somewhere has done this: Nearly
certain!
(e.g., This American Life story)
Assignment: Read Chapter 10
Arby’s sandwiches (2012 update)
weight
Big Montana
Giant Roast Beef Max
Regular Roast Beef Classic
Beef ‘n Cheddar Classic
Super Roast Beef Mid
Junior Roast Beef
Chicken Breast Fillet Crispy
Chicken Bacon ‘n Swiss Crispy
Roast Chicken Grand Turkey Club
Market Fresh Turkey Ranch Bacon
Market Fresh Ultimate BLT
Market Fresh Roast Beef Swiss
Market Fresh Roast Ham Swiss
Market Fresh Roast Turkey Swiss
Market Fresh Chicken Salad
Arby’s Sandwiches
590
450580
320350
440
440
270210
500510
550610
470490
830800
780
780
700
720700
770
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Big Montana
Giant Roast Beef
Regular Roast Beef
Beef ‘n Cheddar
Super Roast Beef
Junior Roast Beef
Chicken Breast Fillet
Chicken Bacon ‘n Swiss
Roast Chicken Club
Market Fresh Turkey Ranch Bacon
Market Fresh Ultimate BLT
Market Fresh Roast Beef Swiss
Market Fresh Roast Ham Swiss
Market Fresh Roast Turkey Swiss
Market Fresh Chicken Salad
calories
309 g
224
154
195
230
125
233
209
228
379
293
357
357
357
322
590
450
320
440
440
270
500
550
470
830
780
780
700
720
770
Research Question: At Arby’s, are calories related to
the weight of the sandwich?
Let’s try using tools from previous chapters first:
Observational study
• Response = calories
• Explanatory variable = small or large sandwich
Small sandwich means less than 225 grams (n = 6)
Large sandwich means more than 225 grams (n = 4)
800
700
600
This is where we consider the new topic of Chapter 10:
400
500
We can refine the explanatory variable and get more
information about the relationship between calories
and weight:
Rather than split it into small and large,
keep the numerical values of the explanatory variable.
300
Calories
309 g
224281
154
195
230210
12587
233221
209205
228233
379344
293
357
357
357326
322
weight
There seems to be a difference.
(Is it statistically significant? That question comes later
in the course!)
200
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
calories
Arby’s sandwiches (from a while ago)
Large
Small
(Note: when we do this, we can no longer think of the
explanatory variable as identifying which subpopulation the
observation belongs to.)
1
10/1/12
Arby's Sandwiches
800
600
700
The correlation
measures the
strength of the
linear relationship
between weight
and calories.
500
Calories
500
300
400
Correlation=0.95
200
300
200
100
150
200
250
300
100
350
150
200
Facts about Correlation:
300
350
800
Arby's Sandwiches
600
700
The best-fitting
line through the
data is called the
regression line.
400
a.  r=0 means no linear relationship.
b.  Positive r means the two variables tend to increase
together (with r=1 meaning a perfect linear relationship)
c.  Negative r means that one variable increases while the
other decreases (with −1 meaning a perfect linear
relationship)
500
Calories
•  We use the letter “r” to denote the correlation coefficient.
•  The correlation coefficient is a measure of the strength of
the linear relationship between the two variables in a
scatterplot.
•  The value of r must always be between −1 and 1:
250
Weight
Weight
200
300
How should we
describe this line?
100
150
200
250
300
350
Weight
Formula for a regression line
Arby's Sandwiches
calories = (intercept) + (slope)(weight)
So all we need to describe the line is the intercept and
the slope.
800
600
Calories
500
400
300
or, in this case,
700
Remember your algebra! The equation for a line is
y = (intercept) + (slope)(x)
The intercept is 41
in this case and
the slope is 2.1.
cal = 41 + (2.1)(wt)
200
Calories
600
700
This type of
plot, with two
measurements
per subject, is
called a
scatterplot
(see p. 166).
400
800
Arby's Sandwiches
100
150
200
250
300
350
In this class, you
don’t need to
know how to
calculate the
slope and
intercept (but see
p. 195 if you like
formulas).
Weight
2
10/1/12
intercept
slope
intercept
calories = 41 + (2.1)(weight in grams)
calories = 41 + (2.1)(weight in grams)
------------------------------------------------For example, if you have a 200g sandwich,
on the average you expect to get about:
For every extra gram of weight, you expect an
increase of 2.1 calories in your Arby’s sandwich.
41 + (2.1)(200) = 41 + 420 = 461 calories
-------------------------------------------------For a 350g sandwich:
Interpretation of slope: Expected increase in response
for every unit increase (increase of one) in explanatory.
41 + (2.1)(350) = 41 + 735 = 776 calories
Men and Women Combined
250
Weight vs. Ideal Weight
Men and Women Combined
200
Ideal Weight
(not a regression line;
rather, it’s a line for
comparison purposes)
150
200
100
Ideal Weight
Weight = Ideal Weight
150
We’ll use SP2004
data.
Dotted red line:
250
Question: What is
the relationship
between weight
and ideal weight?
slope
100
100
Compare with case
study 10.2, page 193
100
150
200
150
200
250
Weight
250
Weight
Men and Women Combined
150
200
250
S=15.17
Weight
180
200
220
100
160
Ideal Weight
R-squared = .752
140
200
150
Correlation = .867
100
Ideal Weight
240
250
Men Only
150
200
250
Weight
The green line is the regression line:
Ideal weight = 25.6 + 0.78 Weight
Dotted red line: Weight = Ideal Weight
3
10/1/12
Men Only
180
150
110
160
140
Ideal Weight
S=12.36
130
220
R-squared = .723
120
240
Correlation = .850
200
Ideal Weight
160
Women Only
200
250
Weight
Green regression line:
100
What does it
mean when the
lines cross at
169 pounds?
140
150
100
120
140
160
180
200
220
240
Weight
Dotted red line: Weight = Ideal Weight
Ideal weight = 66.2 + 0.61 Weight
Women Only
160
150
140
S=8.20
120
130
R-squared = .691
100
110
Ideal Weight
Spring 2001 Mean
Correlation = .831
100
120
140
160
180
200
220
240
Weight
The lines cross
at 112 pounds.
Green regression line:
Ideal weight = 56.1 + 0.50 Weight
Fall 2008 Mean
Wt.
Ideal
Wt.
Diff.
Wt.
Ideal
Wt.
Diff.
Comb.
146
138
8
154
146
8
Men
175
171
4
174
172
2
Women
132
122
10
138
126
12
This pattern remained fairly steady over many years of
STAT 100: Men on average are about 0-5 pounds heavier
than their ideal, whereas women on average are about
10-12 pounds heavier than their ideal.
Note, however, that the regression lines tell a more
complete story!
A weighty puzzle: SP 2001 vs. FA 2008 in STAT 100
SP 2001
Mean Weight
FA 2008
Mean Weight
Combined
146
154
Men
175
174
Women
132
138
Percent men
32%
43%
Notice: Combined mean weight is 8 pounds heavier in 2005.
But women are only 6 pounds heavier on
average, and men are actually lighter. How is this
possible?
The answer is related to Simpson’s paradox.
4