Guided5 200 ans - Ilvento Web page

Transcription

Guided5 200 ans - Ilvento Web page
STAT 200
Guided Exercise 5
1. Sampling Distribution Exercise. In the lecture I show the sampling distribution for the mean of a die
based on a sample of three rolls. I could do this easily because it was possible mathematically to work out
all the different combinations of three rolls and then work out the sampling distribution. Your can help me
do the same for the sampling distribution of 5 rolls (n=5). This is not required, but I would like you to do
it if you have a die handy.
I want you to do the following:
1. Roll the die 5 times and record each face
2. Calculate the mean of the five rolls
3. Calculate the median of the five rolls
Then repeat this 10 times. Each group will contribute 10 samples to our sampling distribution for the
mean of a die with sample size n = 5. I suggest you develop a division of labor to speed things up - one
person rolls, another records, a third or fourth calculates the mean and median. Please turn your
answers in.
Sample
Roll 1
Roll 2
Roll 3
Roll 4
1
2
3
4
5
6
7
8
9
10
1
Role 5
Mean
Median
2. Alpha is the Greek term α. In Inferential Statistics alpha represents the probability of making an error
when we make an inference from a sample to the population. In most cases α is the probability of finding a
sample with our z-score value or beyond - i.e., the area in the tail of the distribution, either positive or
negative. In a confidence Interval, we divide α by 2 to spread this error in both tails.
We need to calculate the values of α for different Confidence Intervals. Fill in the following table. It will
require you to find the z value that is associated with an α/2 level of probability in the Standard Normal
Table in the book.
Hint: the standard normal table shows the probability up to a value, and α /2 is the probability after the value.
It is easiest to find in the table the probability of .5 - α/2 and then read the z-value that corresponds to it.
For example, for the first example for α/2 = .125, find the probability for .5 - .125 = .375 (or something very
close to it) and read the z-value of 1.15 that corresponds to this probability.
Confidence Level
100(1 - α )
α
α/2
Z
75%
.25
.125
1.15
80%
.20
.100
1.28
90%
.10
.050
1.645
95%
.05
.025
1.96
99%
.01
.005
2.575
2
α
/2
3. An experiment was conducted at MIT on the effect of melatonin on inducing sleep. Young male
volunteers were either given melatonin or a placebo. They were then placed in a dark room at midday and
told to close their eyes for 30 minutes. The length of time it took them to fall asleep was recorded.
With the placebo, the researchers found it took on average 15 minutes to fall asleep. We will assume this
is the population parameter for young males. That is, µ = 15.
Here is the stem and leaf plot and the JMP descriptives for a random sample of 40 young men who were
given melatonin. Note: the bottom values in the stem and leaf are 1.5 minutes, 1.6 minutes and so forth.
a. Summarize the descriptive statistics of the data and note any outliers in the data (there will be more on
this in part e).
The mean is 5.55 minutes. The median is slightly less at 5.05 minutes.
The standard deviation is 2.94 minutes which reflects a fair amount of spread in the data – the
Coefficient of Variation is 53%.
There are two values that are considerably larger than the rest at 15.6 and 16.2 minutes, both of
which are more than 3.4 standard deviations away from the mean.
b. Calculate the Standard Error of the data:
S.E. =
s2
s
=
=
n
n
2.941/6.3246 = .465
c. Construct a 95% Confidence Interval for this data. Use the z-value you calculated from problem 2, or you
could use a t-value.
Z-value Approach
5.553 ± 1.96*(.465) = 5.553 ± .911
4.642 to 6.464
t-value Approach
5.553 ± 2.023*(.465) = 5.553 ± .941
4.612 to 6.494
d. Calculate the Z-score for this sample mean as if it were part of a sampling distribution with µ = 15. That
is, as if the sample was one of many samples from a population that was similar to the placebo group. You
will need to use the standard error as the denominator is this calculation. Interpret this value - did the
melatonin seem to work?
3
z=
(5.5 −15)
= -12.017
5
40
This result is based on using population values of µ = 15 and σ = 5.
This is a very large z-value (in absolute terms)! The probability of getting a z-value of
-12.017 or smaller (on out into the left hand tail) is considerably less than .001.
If instead I used the sample estimate of s = 2.941 and therefore the standard error is .465, my
result would be.
z = (5.553 - 15)/.465
z = -20.32
This is a very large z-value (in absolute terms)! The probability of getting a z-value of -20.32 or
smaller (on out into the left hand tail) is considerably less than .001.
It sure looks like the melatonin worked!!!! But here is how we will express it:
It was a rare event to get a random sample of 40 young men with a mean of 5.55 if it really came from
a population with a mean of 15
We have evidence to support that our sample is different from the placebo group.
With a low probability of being wrong, we have evidence that melatonin does lead to a reduction
in time to get to sleep.
e. There are two extreme values in the data. What do you think about these? Are they unusual?
For the observation 15.6
Z = (15.6 – 5.55)/2.94 = 3.42
For the observation 16.2
Z = (16.2 – 5.55)/2.94 = 3.62
Both observations are more than 3 standard deviations from the mean. This is very rare. Perhaps
something different happened with these two subjects – it is worth investigating.
4. Pond's Age-Defying Complex is a cream with alpha-hydroxy acid, a product that is advertised to
improve the skin. In a study, 130 women over age 40 used a cream with alpha-hydroxy acid, for 22
weeks. At the end of the study period the women were examined by dermatologists and 72 were
determined to exhibit skin improvement. This is a proportion problem.
a. Calculate a 95% Confidence Interval for the proportion of women who exhibited improvement.
Remember, the formula for the confidence interval for a proportion is:
p ± Zα / 2 *
( p * (1 − p))
n
p = 72/130 = .554
q = .446
Standard Error = [(.554)(.446)/130].5 = .0439 rounded to .044
€
.554 ±1.96 *
(.554 * (1−.554))
130
4
.554 ± 1.96(.044)
.554 ± .0855
.4684 to .6393
b. Does there seem to be support from this study that the cream improved more than half of the women
who use it?
No, we do not have sufficient evidence that more than half of the women using the cream would
show improvement. Even though nearly 55% of the sample showed improvement, based on our
confidence interval, .5 is within the confidence interval.
It is possible that less than 50% showed improvement.
5. A study was done that looked at the affect of an oil spill on plant growth. Random plots were
selected within the oil spill and counts were made of the number of plants growing in the plot. Later this will
be compared to random plots not affected by the spill. For now we will focus on the oil spill plots, 40 in
total. For now we will treat this as a large sample and you can use a z-value for the confidence interval.
The information from JMP is given below. All the information you need for a confidence interval is given
below.
a. Briefly summarize the data using measures of central tendency and variability.
The mean number of plants per plot was 26.925, or nearly 27 plants per plot. The median was slightly lower at
26 plants per plot. Based on the histogram and the stem and leaf plot, the distribution appears symmetrical and
mound shaped. An assumption of normality of the distribution is reasonable. The range of the data is from a
low of 5 plants to a high of 52. The standard deviation is 9.882. The Coefficient of Determination is 36.7
indicating a moderate amount of spread in this variable.
b. Construct a 99% C.I. for the number of plants in the oil spill plots. Remember, the formula for a
confidence interval is:
X ± Zα / 2 *
€
s
n
The standard error is given in the JMP output as 1.563. We will use a z-value of 2.575 for the 99%
C.I.
26.925 ± 2.575*1.563
26.925 ± 4.025
5
22.900 to 30.950
c. Say what this confidence interval means in words.
We are 99% confident that the true mean lies between 22.900 and 30.950 plants per plot. By this
we mean if we took many samples and constructed a confidence interval for each sample, 99% of
them would contain the true population mean. It is highly likely that the true population mean value
is within our confidence interval.
6