Inference Based on a Single Sample size point estimation testing of statistical hypotheses

Transcription

Inference Based on a Single Sample size point estimation testing of statistical hypotheses
Inference Based on a Single Sample
t We discuss point estimation, confidence interval
estimation and testing of statistical hypotheses.
t
We also discuss reliability of estimation and sample
size determination.
t We distinguish two cases depending on the sample size large sample or small sample.
Copyright © by the Dept. of Statistics, University of Connecticut
8.1
t The population mean W, and population proportion p are
unknown parameters. We make inferences about W and
p using sample data.
t We estimate the unknown parameter W, based on the
sample mean X obtained from a large sample Ýn ³ 30Þ
or a small sample Ýn < 30Þ.
å
t The sample proportion p estimates p.
t We also test hypotheses about W and p.
t The population standard deviation a is usually an
unknown parameter also. It is estimated by the sample
standard deviation s.
t For simplicity, we sometimes assume a is known when
we make inference for W.
Copyright © by the Dept. of Statistics, University of Connecticut
8.2
Def. 1: A point estimator of an unknown population
parameter is a rule or formula that enables us to
calculate a single number based on sample data which
estimates the parameter.
t A point estimator of the population mean W is the sample
mean X.
t A point estimator of the population proportion p is the
å
sample proportion p.
t A point estimator of the population standard deviation a
is the sample standard deviation s.
Copyright © by the Dept. of Statistics, University of Connecticut
8.3
We wish to estimate a population parameter such as W,
the population mean, and p, the population proportion.
Until now, we have only considered point estimates;
recall that X estimates W.
Point estimates only give us a single value to represent
the corresponding parameter. Usually, we wish to have
an interval of plausible values.
Copyright © by the Dept. of Statistics, University of Connecticut
8.4
Confidence Intervals
Def. 2: An interval estimator (or confidence interval) is
a formula that tells us how to use sample data to
calculate an interval that estimates a population
parameter.
In general, a confidence interval for a parameter has the
form
Point est. ±(critical value)(std. error of point est.)
t Interpretation: If assumptions are met, and we
repetedly calculate confidence intervals for the parameter
using different samples, then, 100Ý1 ? JÞ% of those
intervals will capture the unknown population parameter.
Copyright © by the Dept. of Statistics, University of Connecticut
8.5
t Associated with an interval estimate is a confidence
coefficient that measures the confidence we place on the
estimator.
t 95% confident ö 0.95 confidence coefficient.
99% confident ö 0.99 confidence coefficient.
Copyright © by the Dept. of Statistics, University of Connecticut
8.6
Def. 3: The confidence coefficient is the probability
that an interval estimator captures the population
parameter if this estimator is used repeatedly a very
large number of times.
t The confidence level is the confidence coefficient
expressed as a percentage.
t Let S : population parameter.
Confidence coeff. = 0.95 ö in the long run, 95% of the
interval estimates will contain S.
t Denote confidence coefficient by 1 ? J.
t Common confidence levels are 99%, 95% and 90%.
t 1 ? J= 0.95 öJ= 0.05
Copyright © by the Dept. of Statistics, University of Connecticut
8.7
Confidence Intervals for W
Case I. Large Sample Size Ýn ³ 30Þ
t We construct a large sample 100Ý1 ? JÞ%
confidence interval (C.I.) for W
t The sampling distrinution of X is Normal with mean W
and std. dev. a X = an
t X has an exact Normal distribution if the Population
distribution is itself NÝW, a 2 Þ.
t If the Population distribution is NOT normal, since n is
large, the CLT allows us to assume that X is Normal.
t When a is unknown, we may replace with the sample
std. dev. s, without much penalty.
Copyright © by the Dept. of Statistics, University of Connecticut
8.8
t Given n, X, J, s, or a, we compute the 100Ý1 ? JÞ%
confidence interval as
X±z J/2 6 a x? = X±z J/2 6 a ,i.e.,
n
X ? z J/2 6 a , X + z J/2 6 a
n
n
Note: If the population distribution is Normal, and a is
known, n may be less than 30, since the sampling
distribution of X will be Normal (by Theorem 1).
Copyright © by the Dept. of Statistics, University of Connecticut
8.9
t Suppose a is not known and suppose n ³ 30.
Replace a by the sample standard deviation s.
Then the 100Ý1 ? JÞ% C.I. for W is
X ± z J/2 6 s
n
Recall
s=
Fx 2i ? ÝFx i Þ 2 /n
.
n?1
and shortcut formula:
t
s=
FÝx 2i
?2
? n x Þ/Ýn ? 1Þ
Copyright © by the Dept. of Statistics, University of Connecticut
8.10
z J/2 Values for Commonly Used
100Ý1 ? JÞ
J
J/2
z J/2
90%
.10
.05
1.645
95%
.05 .025
1.96
99%
.01 .005 2.575
Copyright © by the Dept. of Statistics, University of Connecticut
8.11
Example 1: n = 80 observations are taken from a NÝW,
a 2 Þ population. Both W and a 2 are unknown. We are
given X = 14.1, s = 2.6.
(a) Find a 95% confidence interval for W.
Here, Ý1 ? JÞ = 0.95 ö J = 1 ? .95 = 0.05.
The 95% large sample C.I. for W is
X±z J/2 6 s
n
where
Copyright © by the Dept. of Statistics, University of Connecticut
8.12
From Table IV, when J/2 = .025, z J/2 = z .025 = 1.96
Substitute X = 14.1, s = 2.6 into the C.I. formula:
14.1 ± Ý1.96Þ Ý2.6Þ
80
= 14.1 ± Ý1.96ÞÝ.2907Þ = 14.1 ± .5698
= Ý14.1 ? .5698, 14.1 + .5698Þ
= Ý13.5302, 14.6698Þ.
Interpretation: If assumptions have been met, and we
repeatedly calculate the C.I. using different random
samples, 95% of those confidence intervals will capture
W.
Copyright © by the Dept. of Statistics, University of Connecticut
8.13
b) Find a 99% confidence interval for W.
The C.I. is
X ± z J/2 6 s .
n
We must find z J/2 when
1 ? J = 0.99 ö J = 1 ? .99 = .01, i.e., J/2 = 0.005
Copyright © by the Dept. of Statistics, University of Connecticut
8.14
From Table IV, z .005 = 2.58.
The large sample 99% C.I. for W is
X ±Ý2.58Þ s
n
Ý2.6Þ
14.1 ±Ý2.58Þ
80
= 14.1 ± 0.749 = Ý13.351, 14.849Þ.
Copyright © by the Dept. of Statistics, University of Connecticut
8.15
c) Compare the intervals in (a) and (b).
95% C.I. : Ý13.5302, 14.6698Þ
99% C.I. : Ý13.351, 14.849Þ
The 99% C.I. is wider than the 95% C.I.
t If Ý1 ? JÞ increases, the confidence interval is wider,
when s and n remain fixed.
Copyright © by the Dept. of Statistics, University of Connecticut
8.16
d) Do the confidence intervals in (a) and (b) depend on
population normality?
NO. Since n = 80 > 30, by the Central Limit Theorem, X
2
is approximately N W, an .
So the large sample C.I. is valid even if the population is
not normally distributed.
When n is large Ýn ³ 30Þ, the CLT guarantees that the
sampling distribution of X is approximately normal.
Copyright © by the Dept. of Statistics, University of Connecticut
8.17
Example 2: Suppose instead of drawing n = 80
observations from a population which is
NÝW, a 2 Þ, we have n = 32. Suppose X = 14.1, s = 2.6.
a) Find a 95% confidence interval for W.
Recall z J/2 when 1 ? J = 0.95 was z .025 = 1.96.
The large sample 95% C.I. for W is
14.1 ± Ý1.96Þ Ý2.6Þ
32
= 14.1 ± .9009
= Ý13.1991, 15.0009Þ
Copyright © by the Dept. of Statistics, University of Connecticut
8.18
b) Compare the 95% C.I. for W when n = 80 and n = 32.
C.I. when n = 80 : Ý13.5302, 14.6698Þ
C.I. when n = 32 : Ý13.1991, 15.0009Þ
t For fixed Ý1 ? JÞ and s,
reduce the sample size ö C.I. becomes wider
increase the sample size ö C.I. becomes narrower
Copyright © by the Dept. of Statistics, University of Connecticut
8.19
Example 3: Find the confidence level corresponding to
the following confidence intervals for W.
a)X ± Ý1.96Þ
a
n
.
We know Ý1.96Þ = z J/2 . Hence, we must find J/2 and
hence 100Ý1 ? JÞ%.
PÝZ > 1.96Þ = 0.5 ? PÝ0 < Z < 1.96Þ
= 0.5 ? 0.475 = 0.025
So,
J
2
= .025 ö J = 2Ý.025Þ = 0.05
ö 1 ? J = 1 ? 0.05 = 0.95
a
confidence level is 100Ý1 ? JÞ% = 95%.
Copyright © by the Dept. of Statistics, University of Connecticut
8.20
b) Given X ± Ý1.645Þ a . Find the confidence level.
n
PÝZ > 1.645Þ = 0.5 ? PÝ0 < Z < 1.645Þ
= 0.5 ? 0.45 = 0.05
So, J/2 = 0.05 ö J = 2Ý.05Þ = 0.10
ö 1 ? J = 1 ? 0.10 = 0.90
The confidence coefficient is 0.90. The confidence level
is 90%.
Copyright © by the Dept. of Statistics, University of Connecticut
8.21
Example 4: A random sample of 110 lightning flashes in
a certain region resulted in an average radar echo
duration of .81 sec. and a sample standard deviation of
.34 sec. Compute a 99% confidence interval for the true
average echo duration W.
Copyright © by the Dept. of Statistics, University of Connecticut
8.22
Example 5: A random sample of 36 smoking members
of a labor union revealed that the mean number of
cigarettes smoked during a 5 day period per person was
75. The standard deviation was 12 cigarettes. Construct
a 99% confidence interval for the true mean number of
cigarettes smoked in the population.
Copyright © by the Dept. of Statistics, University of Connecticut
8.23
Sample Size Determination
We show how to determine n in two situations.
Situation 1:
t Suppose we want to estimate W to within some bound B,
with a confidence of Ý1 ? JÞ.
t Suppose we know the population standard deviation a.
t We must determine what sample size n is required to
achieve this bound B.
Recall:
where, a x? =
a
n
.
Bound B = z J/2 6 a x? = z J/2 6
a
n
.
Then, solving for n, we get
Ýz J/2 Þ 2 6 a 2
n=
B2
Copyright © by the Dept. of Statistics, University of Connecticut
8.24
Situation 2:
t Suppose we want a confidence interval width of w, when
estimating W with a 100Ý1 ? JÞ% confidence level.
t Suppose that a is known.
t We must determine n to achieve the width w.
We are given that z J/2 6
a
n
=
w
2
Solving for n, we get,
4Ýz J/2 Þ 2 6a 2
n=
w2
Copyright © by the Dept. of Statistics, University of Connecticut
8.25
Example 6: We wish to estimate W to within a bound
B = 0.2, with confidence level 1 ? J = 0.95, when
a = 6.1. Find the sample size n.
We saw under Situation 1 that
Ýz J/2 Þ 6 a 2
n=
B2
2
1 ? J = .95 ö J = .05 ö J = .025
2
From Table IV, z J/2 = z .025 =1.96.
Substituting into the formula,
Ý1.96Þ 2 6Ý6.1Þ 2
= 585.84
n=
2
Ý62Þ
i.e., n = 586 (after rounding).
Copyright © by the Dept. of Statistics, University of Connecticut
8.26
Example 7: Suppose it costs you $10 to draw a sample
of size n = 1. You have a budget of $1,200.
a) Suppose you want to estimate W with 95%
confidence, with w = 4, given a = 12. Find n.
4Ýz J/2 Þ 2 6 a 2
We saw that n =
w2
1 ? J = .95 ö J2 = .025 ö From Table IV. z J/2 = 1.96
Copyright © by the Dept. of Statistics, University of Connecticut
8.27
Substituting into the formula,
4Ý1.96Þ 2 6Ý12Þ 2
n =
= 138.30
2
Ý4Þ
We need n = 139 samples (after rounding) at a cost of
$Ý139ÞÝ10Þ = $1, 390.
The budget, however, will not cover this.
Copyright © by the Dept. of Statistics, University of Connecticut
8.28
b) What happens in (a) if 1 ? J = .90?
Now, J = .10 ö J2 = .05 from Table IV, z J/2 = 1.645
We saw that,
n
=
4Ý1.645Þ 2 6Ý12Þ 2
Ý4Þ 2
= 97.42
So you need n = 98 samples (after rounding).
Cost = Ý98ÞÝ10Þ = $980.
The budget covers this.
Copyright © by the Dept. of Statistics, University of Connecticut
8.29
Case II. Small sample Size Ýn < 30Þ
Since n is small, we have two problems:
1. The CLT does not apply.
2. The sample std. dev. s is a poor estimate of the
population std. dev. a.
To proceed with inference, we must
1. assume that the data comes from a Normal
distribution, and
2. use the Student’s t ?table.
Given n, X, s, and J, we compute the 100Ý1 ? JÞ%
confidence interval as
X±t J/2 , n?1 s
n
where t J/2 , n?1 is the 100Ý1 ? J/2Þth percentile from a
Student’s t ?distribution with n ? 1 degrees of freedom.
Copyright © by the Dept. of Statistics, University of Connecticut
8.30
Notes about the Student’s t ?distribution
t t ?table values are larger than the normal table values for
the same J
t small sample confidence intervals are therefore wider
than large sample intervals for the same J and same s
t As the degrees of freedom increase, the t ?distribution
and the normal distribution get closer
t percentiles in the t ?table for the row corresponding to
infinite ÝKÞ degrees of freedom match the percentiles of
the normal distribution
t note that the degrees of freedom we use are related to the
denominator in the formula for the sample standard
deviation
Copyright © by the Dept. of Statistics, University of Connecticut
8.31
Example 8:
Chronic exposure to asbestos fiber is a well-known health
hazard. Construction workers who had been exposed to asbestos over a
prolonged period were examined. Among the data is pulmonary
compliance (cm^3/cm H20) for each of 16 subjects 8 months after the
exposure period. (Pulmonary compliance is a measure of lung elasticity, or
how effectively the lungs are able to inhale and exhale.)
167.9 180.8 184.8 189.8
207.2 208.4 226.3 194.8
200.2 201.9 227.7 228.5
206.9 232.4 239.8 258.6
Construct a 95% confidence interval for the mean pulmonary complacence
of for construction workers exposed to asbestos.
Copyright © by the Dept. of Statistics, University of Connecticut 8.32
Example 9: A random sample of 16 complaints filed by customers of a
large department store revealed a mean time required for processing of 30
minutes and a standard deviation of 8 minutes. Construct a 95%
confidence interval for the true mean required to process customer
complaints. Assume that the population of processing times is
approximately normally distributed.
Copyright © by the Dept. of Statistics, University of Connecticut 8.33
Confidence Interval for p
For example,
a. What proportion of college graduates in U.S.A.
read mystery novels avidly ? Out of n = 225 students
sampled, X = 120 are avid mystery readers. Then,
X = 120 = 0.5333 = å
p
n
225
b. what proportion of the country supports abortion ?
A survey of n = 1025 people shows that X = 500 support
abortion. Then,
X = 500 = 0.4878 = å
p
n
1025
Copyright © by the Dept. of Statistics, University of Connecticut 8.34
t The aim is to make inferences about p. This is a binomial
parameter problem.
å Number of success
t Sample proportion = p =
.
Number of trials.
å
p is a random variable.
å
t The sampling distribution of p is the probability
å
distribution of the r.v. p
å
t The mean of the sampling distribution of p is p.
å
t The std. dev. of the sampling distribution of p is
.
a åp = pÝ1?pÞ
n
å
t For large samples, the sampling distribution of p is
approximately Normal, provided n > 30, np ³ 5 and
npÝ1 ? pÞ ³ 5.
å
å
t A check is to see whether Ý p ? 3a åp , p + 3a åp Þ contain 0
or 1. If this interval does not include 0 or 1, then the
large sample approximation may be used.
Copyright © by the Dept. of Statistics, University of Connecticut 8.35
The 100Ý1 ? JÞ% confidence interval for p is
åå
pq
å
p ± z J/2 n
å
å
where q = 1 ? p.
Copyright © by the Dept. of Statistics, University of Connecticut 8.36
Example 10: Construct a 95% confidence interval for p
å
based on a random sample with n = 400, p = .42.
We first check whether the condition for the
å
approximate normal distribution of p is met.
å
ap =
åå
pq
n =
Ý.42ÞÝ.58Þ
= .0247
400
å
å
å
Ý p ? 3a p , p + 3a åp Þ
= Ý.42 ? 3Ý.0247Þ, .42 + 3Ý.0247Þ
= Ý.42 ? .0741, .42 + .0741Þ
= Ý.3459, .4941Þ
Does not include 0 or 1 ì large sample.
Copyright © by the Dept. of Statistics, University of Connecticut 8.37
The 95% C.I. for pis
å
p ± z J/2 6 a åp ;
From Table IV, z J/2 = 1.96.
a 95% C.I. for p is
.42 ± Ý1.96ÞÝ.0247Þ
= .42 ± .0484
= Ý.3716, .4684Þ
Copyright © by the Dept. of Statistics, University of Connecticut 8.38
Example 11: A pizza parlor is considering replacing its oven with a
new one. The new oven is particularly suited for baking large (16-in)
pizzas. Let p denote the proportion of all pizzas ordered that are large. A
random sample of n = 150 pizza orders yielded 120 pizzas that were large.
Construct a 97% confidence interval for the true proportion of large pizzas
at this restaurant.
Copyright © by the Dept. of Statistics, University of Connecticut 8.39
Example 12:
In a random sample of 200 college students, 10 stated
that they own a motorcycle. Construct a 99% confidence interval for the
true proportion owning motorcycles.
Copyright © by the Dept. of Statistics, University of Connecticut 8.40
Example 13: A hatchery examined a random sample of 574 eggs from
a supplier and found 10 to be infertile. Construct a 93% confidence interval
for the true proportion of infertile eggs.
Copyright © by the Dept. of Statistics, University of Connecticut 8.41
Sample Size Determination for
Inference about p
The 100Ý1 ? JÞ% C.I. for p is
å
p ± z J/2 6
åå
pq
n
Width and Bound of this C.I. are
Width = 2 6 z J/2 6 a åp
(W)
pq
n
= 2 6 z J/2 6
Bound = z J/2 6
pq
n
(B)
Copyright © by the Dept. of Statistics, University of Connecticut 8.42
To determine 100Ý1 ? JÞ% C.I. for p within a bound B, we
need a sample size of
Ýz J/2 Þ 2 6 ÝpqÞ
n=
B2
We do not know p, q. Replace by their estimates. Then
åå
Ýz J/2 Þ 2 6 Ý p qÞ
n=
B2
Copyright © by the Dept. of Statistics, University of Connecticut 8.43
Example 14: We wish to estimate the proportion p of TV
viewers who watch a prime-time comedy show on May
å
24. We expect that p u 630. Find n in order to make the
width of the confidence interval to be ² .01.
W = .01 ì B = W = .005
2
so that
åå
Ýz J/2 Þ 2 6 p q
n=
B2
Ý1.96Þ 2 Ý.3ÞÝ.7Þ
=
Ý.005Þ 2
= 32269.44 > 32270
Copyright © by the Dept. of Statistics, University of Connecticut 8.44