MAT 141 – Statistics Page 1 Section 3.4 (Sullivan 4e)

Transcription

MAT 141 – Statistics Page 1 Section 3.4 (Sullivan 4e)
MAT 141 – Statistics
Section 3.4 (Sullivan 4e)
Page 1
These data sets will be used in Sections 3.4 and 3.5.
You should enter the following data sets in your calculator:
 For sections 3.4 and 3.5:
o Fortune 30 CEO Five-Year Compensation Data (Table 2a)
o Fortune 30 CEO Ages (Table 2b)
 For section 3.5:
o Snow Thrower Prices (Tables 4a and 4b)
The data do not need to be sorted.
Table 1a: Exam scores (Section 001) (N=27)
μ=75.7, σ=15.6
36
40
43
58
62
65
67
72
73
78
78
79
79
80
81
83
84
85
85
86
86
89
90
90
90
92
94
Table 2a: Fortune 30 CEO Five-Year
Compensation (N=20)
μ=65.90, σ=47.48 (Millions of dollars)
1.5
5.8
14.1
25.8
26.5
37.6
38.8
40.3
44.7
45.6
53.4
53.8
55.3
95.2
110.0
117.5
120.5
127.8
130.2
173.6
Table 1b: Exam scores (Section 002) (N=24)
μ=71.8, σ=18.4
26
41
49
50
52
57
60
63
67
72
72
74
75
79
80
83
84
85
85
89
90
94
96
99
Table 2b: Fortune 30 CEO Ages (N=30)
μ=59.8, σ=6.4 (Years)
51
51
52
52
54
54
55
55
56
57
57
57
58
58
58
59
60
60
60
62
62
63
64
64
64
64
65
67
75
80
Data continue on the reverse side 
geoffrey.krader@morton.edu
kradermath.jimdo.com
02/2014
MAT 141 – Statistics
Section 3.4 (Sullivan 4e)
Page 2
Table 3: All-in-One Inkjet Printers (n=55)
Cost to print one page of text (cents)
3 | 7 means 3.7₵
0 9
1 1 1
2 1 6 7 7
3 0 1 1 2 3 4 5 5 6 9
4 2 4 4 5 5 5 5 5 6 7 7 8 8 8 8 8 9 9
5 0 1 2 3 3 4 7 7
6 0 0 1 2 3 4 5 6
7 1 8
8 3
9
10
11
12 6
Table 4a: Two-Stage Gas Snow Throwers
Model
Craftsman (Sears) 88700
Yard Machines S6FEE
Husqvarna 524ST
Craftsman (Sears) 88790
Ariens 8524LE
Yard-Man E5KLF
Toro Power Max 828LXE
Troy-Bilt Storm 10030
Craftsman (Sears) 888111
Simplicity 9560E
Frontier STO927
Honda HS928WAS
geoffrey.krader@morton.edu
Price
$600
$700
$700
$950
$1,000
$1,100
$1,250
$1,300
$1,300
$1,300
$1,800
$2,080
Table 4b: Single-Stage Gas Snow Throwers
Model
Craftsman 88140
Yard-Man 285
Yard Machines S260
Troy-Bilt Squall 521
Ariens 522
Toro CCR 2450 GTS 38515
Honda Harmony HS520AS
Toro Snow Commander 38602
Price
$300
$400
$400
$500
$500
$540
$750
$900
Source: Consumer Reports, October 2004
kradermath.jimdo.com
02/2014
MAT 141 – Statistics
Section 3.4 (Sullivan 4e)
Page 3
Learning Outcomes
After we cover Section 3.4, you should be able to:
1. List several measures of position, and describe how measures of position differ from
measures of central tendency and measures of dispersion.
2. Describe what is meant by the z-score and find the z-score for a given data point.
3. Describe what is meant by percentile.
4. Describe what is meant by quartile, and find the quartiles for a given data set.
a. Describe the relationship between quartiles, percentiles and the median.
5. Describe what is meant by interquartile range (IQR), and calculate the IQR for a given
data set.
6. Use the shape of the distribution to determine the most appropriate measure of
dispersion: standard deviation or IQR.
7. Describe what is meant by an outlier, and use the IQR to identify outliers in a given data
set.
a. Describe how to handle outliers in a statistical study (e.g., when should outliers
be eliminated from a data set?)
Numerical Summaries of Data


To describe the distribution of a variable:
o Measures of Central Tendency – Describe a “typical” data value, the “middle”
of the data set.
o Measures of Dispersion – Describe the “spread” of the data set.
To describe the location of individual data points within the data set:
o Measures of Position – Describes where a data point is located within the
distribution.
NOTE: We will also use measures of position to define an additional measure of dispersion.
geoffrey.krader@morton.edu
kradermath.jimdo.com
02/2014
MAT 141 – Statistics
Section 3.4 (Sullivan 4e)
Page 4
z-score
The z-score describes the position of a data point within the data set as the number of standard
deviations from the mean.
Calculating the z-score:
For populations:
z
z>0
z<0
z=0
For samples:
X 
z

XX
s
Data point lies to the right (i.e., above) the mean.
Data point lies to the left (i.e., below) the mean.
Data point lies at the mean.
z -score
X  50.0
Sample I
41
44
z=
45
-3
47
47
-2
48
-1
51
0
53
1
58
66
2
s  7.4
3
The z-score
describesz-scores
the position of an individual data point as the number
EXAMPLE:
Calculating
of standard
deviations
from
mean. for the following data points:
In Sample
I, above,
calculate
thethe
z-scores

z>0
z<0
X=58
z=0
Data point lies to the right of the mean
Data point lies to the left of the mean
Data point lies at the mean.
MAT 141 (Sullivan 3e) - 3.4-3.5
Slide 4

GHK 02/2012
X=44
geoffrey.krader@morton.edu
kradermath.jimdo.com
02/2014
MAT 141 – Statistics
Section 3.4 (Sullivan 4e)
Page 5
EXAMPLE: Using z-scores to Compare Data Points in Different Data Sets
The scores of two students are circled. Recall that the two sections used different tests (one was
multiple-choice, the other was free-response), so it may not be appropriate to compare an
individual exam score from one section with an individual exam score in the other section.
However, we can use z-scores to see which student did better relative to the rest of his/her
class.
z -scores
be used to compare the relative
Sectionmay
001
μ=75.7, of
σ=15.6
data
in compare
differentthe
data
sets
zlocation
-scores
may
bepoints
used to
relative
location
points in different data sets
36
72 of data
81
89
Exam scores (Section 001)
40
73
83
90
36
72
81
89
43
78
84
90
μ = 75.7
 = 15.6
Exam
scores (Section
001)
40
73
83
90
58
78
85
90
43
78
84
90
μ = 75.7
 = 15.6
62
79
85
92
85  75.7 9.3
58
65
62
67
65
78
79
79
80
79
85
86
85
86
86
90
94
92
67
80
86
26
60
75
85
41
26
49
41
50
49
52
50
57
52
63
60
67
63
72
67
72
72
74
72
79
75
80
79
83
80
84
83
85
84
89
85
90
89
94
90
96
94
99
96
57
74
85
99
94
z

 0.60
8515.6
 75.7 15.6
9.3
z

 0.60
15.6
15.6
Section 002
μ=71.8, σ=18.4
Exam scores (Section 001)
Exam
scores (Section
001)
μ = 71.8
 = 18.4
μ = 71.8
 = 18.4
84  71.8 12.2
z

 0.66
18.4
8418.4
 71.8 12.2
z

 0.66
18.4
18.4
 141Which
MAT
(Sullivan student
3e) - 3.4-3.5 scored better relative to the rest of
Slide 8
MAT 141 (Sullivan 3e) - 3.4-3.5
( ) Student in Section 001 (Score=85)
Slide 8
(
his/her class?
GHK 02/2012
GHK 02/2012
) Student in Section 002 (Score=84)
geoffrey.krader@morton.edu
kradermath.jimdo.com
02/2014
Percentiles

Percentiles divide the data values (written in ascending order)
MATinto
141 –100
Statistics
equal groups.
Section 3.4 (Sullivan 4e)

Page 6
The k-th percentile, Pk, is a number that separates the bottom
k% of the data from the upper (100-k)%.
Percentiles
Percentiles divide
thebe
variable
ascending
order) into 100 equal
may the
or values
may of
not
one(written
of theindata
points.
groups.
There
The k-th
, is a number
that separates the bottom k% of the data from the
arepercentile,
99 (notPk100)
percentiles.
upper (100 – k)%.
areorcounting
numbers;
Percentiles
Percentiles may
may not be one
of the datathere
points. are no fractional or
decimal
There arepercentiles
99 (not 100) percentiles.
(e.g., there is no P62.5)
Percentiles




Percentiles are counting numbers (i.e., 1, 2, 3, 4, …, 99). There are no fractional or
decimal percentiles (e.g., there is no P62.5).
Percentiles
L
P1 P10
P20
P25
P50
(Median)
P75
MAT 141 (Sullivan 3e) - 3.4-3.5
EXAMPLE:
Percentiles
Slide 9
H
GHK 02/2012

If you score at the 80th percentile on a test, how does your score compare to the other
scores?

If your height is at the 30th percentile for your age group, how does your height compare
to the height of other people your age?
Caution:



There is no 0-th or 100-th percentile.
Percentile does not mean percent. If you get 72% of the questions correct on an exam,
your percentile depends on how the other students did.
Percentiles represent the boundaries between the 100 equally-sized groups of data
points; percentiles are not “bins” into which data points are placed. (For example, you
can be at the 40th percentile, not in the 40th percentile).
geoffrey.krader@morton.edu
kradermath.jimdo.com
02/2014
MAT 141 – Statistics
Section 3.4 (Sullivan 4e)
Finding(cont’d)
the number that corresponds to a
Percentiles
given percentile
Page 7
P50 = 50th percentile = 9.5
Separates lower 50% from upper
50%
2
3
5
6
8 11 13 15 18 20
P60 = 60th percentile = 12
Separates lower 60% from upper
40%
MAT 141 (Sullivan 3e) - 3.4-3.5
Slide 10
GHK 02/2012
EXAMPLE: Percentiles
Use the table on the right to answer the
following questions.

Interpret the 95th percentile for
household income.

20th
Interpret the
percentile for
household income.

What can you say about a
household whose annual income is
$52,000?
geoffrey.krader@morton.edu
US Household
Income
2012
P95
P90
P80
P50
P20
$191,156
$146,000
$104,096
$51,017
$20,599
Source: Income, Poverty, and Health
Insurance Coverage in the United States:
2012, US Census Bureau.
kradermath.jimdo.com
02/2014
MAT 141 – Statistics
Section 3.4 (Sullivan 4e)
Page 8
EXAMPLE: Percentile Charts
Use the percentile chart to answer the
following question:
Percentile
charts

What is the median BMI for a 9year old boy?

90% of 9-year old boys have a BMI
between __________ and __________.
Source: US Department of Health
and Human Services, Health
Resources and Services
Administration
MAT 141 (Sullivan 3e) - 3.4-3.5
lide 12
GHK 02/2012
Source: US Dept. of Health and Human Services
Health Resources and Services Administration
The two dots show the Body Mass Index of a single boy whose level of physical activity has
decreased because of asthma.

What was his BMI and percentile at age 13?

What was his BMI and percentile at age 15?

What would his BMI be at age 15 if his percentile had remained the same?

Describe what typically happens to the BMI of boys as they get older?

What can you say about the dispersion of BMI as boys get older?
geoffrey.krader@morton.edu
kradermath.jimdo.com
02/2014
MAT 141 – Statistics
Section 3.4 (Sullivan 4e)
Page 9
Quartiles
Some
percentiles
have
Some
percentiles
have special names:
special names
Percentiles
L
P1 P10
P20
P25
P50
P75
H
Median
Quartiles
Q1
Q2
Q3
MAT 141 (Sullivan 3e) - 3.4-3.5
Process
for Finding Quartiles
Slide 13
Finding quartiles
GHK 02/2012
Q2 = 9
Also known as the median or P50
2
5
6
8
9 11 13 15 20
Q1 = 5.5
Q3 = 14.0
Also known as P25
It’s the median of the
points below the
median
Also known as P75
It’s the median of the
points above the
median
MAT 141 (Sullivan 3e) - 3.4-3.5
Slide 14
GHK 02/2012
EXAMPLE: Quartiles
Use Tables 2a and 2b to find the quartiles for Fortune 30 CEO Five-Year Compensation and Age.
Five-Year
Compensation
Age
Q1
Q2
Q3
geoffrey.krader@morton.edu
kradermath.jimdo.com
02/2014
MAT 141 – Statistics
Section 3.4 (Sullivan 4e)
Page 10
Interquartile Range (IQR) – A Resistant Measure of Dispersion
In Section 3.2 we learned two measures of dispersion, neither of which is resistant to extreme
values:
 Range
o Based on only two data points (high and low values).
o Does not describe the spread of data points in between.
o Very sensitive to extreme values.
 Standard Deviation
o Based on all data points, not just the two most extreme values.
o Still sensitive to extreme values (but less sensitive than the range).
For variables with a skewed distribution (where there are frequently extreme values on the left
or right of the distribution), we use a different measure of spread that is resistant to extreme
values.
Interquartile Range (IQR) = Q3 – Q1
EXAMPLE: Fortune 30 CEO Data
Calculate the IQR for:

CEO Five-Year Compensation

CEO Age
Measures of Central Tendency and
Measures of Dispersion
Summary: Measures of Central Tendency and Measures of Dispersion
Shape of
Distribution
Measure of
Central Tendency
Measure of
Dispersion
Roughly
symmetric
Skewed
(left or right)
Mean
Standard
deviation
Interquartile
range
Median
Resistant measures
MAT 141 (Sullivan 3e) - 3.4-3.5
Slide 18
geoffrey.krader@morton.edu
GHK 02/2012
kradermath.jimdo.com
02/2014
MAT 141 – Statistics
Section 3.4 (Sullivan 4e)
Page 11
Outliers
Outliers are data points that are unusually small or unusually large compared to the rest of the
data set.


In skewed distributions, it is not unusual to find outliers in the tails.
However, outliers may occur in any distribution including symmetric distributions.
How to Determine Whether a Data Set Includes Outliers



Find Q1 and Q3.
Use Q1 and Q3 to calculate the Interquartile Range (IQR).
Calculate the upper fence (UF) and lower fence (LF).
o UF = Q3 + 1.5(IQR)
o LF = Q1 – 1.5(IQR)
Outliers are defined to be any data points that lie outside the fences.
Outliers and the interquartile range

Outliers are data points that are unusually small or
unusually large compared to the rest of the data set.
EXAMPLE:
All-in-One
Printers
– Text Cost Per
Page
 Detected
using
the interquartile
range:
Use Table 3 to determine whether the highest data point (12.6 cents per page) is considered an
IQR = Q3 – Q1
outlier.

Q11.5(IQR) = 0.2
0
1
2
Q3+1.5(IQR) = 9.0
3
4
Q1
5
6
7
8
9
10
11
12
13
Q2 Q3
Q1=3.5
Q3=5.7
IQR = 5.7 – 3.5 = 2.2
MAT 141 (Sullivan 3e) - 3.4-3.5
Slide 21
geoffrey.krader@morton.edu
GHK 02/2012
kradermath.jimdo.com
02/2014
MAT 141 – Statistics
Section 3.4 (Sullivan 4e)
Page 12
Working With Outliers
A single outlier will impact the mean and standard deviation. Later in the course we will learn
that the mean and standard deviation are used in inferential statistics to draw conclusions about
data. Therefore, it is important to identify outliers – and sometimes eliminate them from the
data set – in order to avoid faulty conclusions.
In order to decide whether to eliminate an outlier, it is useful to understand why a data point is
so unusually extreme.
If the outlier occurs for some special reason,
you may want to eliminate it from the data set.



If there is no special explanation, removing the
outlier is a judgment call. (From time to time,
some data points may be unusually high or
low).

EXAMPLE (All-in-One Printers)
Measurement or typographical errors.
Broken printer.
Obsolete printer (i.e., no longer
representative of the population of
printers).
This printer just happens to be more
costly than the rest.
EXAMPLE: Fortune CEO Data
Use Tables 2a and 2b to determine whether the Fortune 30 CEO Five-Year Compensation data
set or the Age data set contains any outliers.
geoffrey.krader@morton.edu
kradermath.jimdo.com
02/2014