MAT 141 – Statistics Page 1 Section 3.4 (Sullivan 4e)
Transcription
MAT 141 – Statistics Page 1 Section 3.4 (Sullivan 4e)
MAT 141 – Statistics Section 3.4 (Sullivan 4e) Page 1 These data sets will be used in Sections 3.4 and 3.5. You should enter the following data sets in your calculator: For sections 3.4 and 3.5: o Fortune 30 CEO Five-Year Compensation Data (Table 2a) o Fortune 30 CEO Ages (Table 2b) For section 3.5: o Snow Thrower Prices (Tables 4a and 4b) The data do not need to be sorted. Table 1a: Exam scores (Section 001) (N=27) μ=75.7, σ=15.6 36 40 43 58 62 65 67 72 73 78 78 79 79 80 81 83 84 85 85 86 86 89 90 90 90 92 94 Table 2a: Fortune 30 CEO Five-Year Compensation (N=20) μ=65.90, σ=47.48 (Millions of dollars) 1.5 5.8 14.1 25.8 26.5 37.6 38.8 40.3 44.7 45.6 53.4 53.8 55.3 95.2 110.0 117.5 120.5 127.8 130.2 173.6 Table 1b: Exam scores (Section 002) (N=24) μ=71.8, σ=18.4 26 41 49 50 52 57 60 63 67 72 72 74 75 79 80 83 84 85 85 89 90 94 96 99 Table 2b: Fortune 30 CEO Ages (N=30) μ=59.8, σ=6.4 (Years) 51 51 52 52 54 54 55 55 56 57 57 57 58 58 58 59 60 60 60 62 62 63 64 64 64 64 65 67 75 80 Data continue on the reverse side geoffrey.krader@morton.edu kradermath.jimdo.com 02/2014 MAT 141 – Statistics Section 3.4 (Sullivan 4e) Page 2 Table 3: All-in-One Inkjet Printers (n=55) Cost to print one page of text (cents) 3 | 7 means 3.7₵ 0 9 1 1 1 2 1 6 7 7 3 0 1 1 2 3 4 5 5 6 9 4 2 4 4 5 5 5 5 5 6 7 7 8 8 8 8 8 9 9 5 0 1 2 3 3 4 7 7 6 0 0 1 2 3 4 5 6 7 1 8 8 3 9 10 11 12 6 Table 4a: Two-Stage Gas Snow Throwers Model Craftsman (Sears) 88700 Yard Machines S6FEE Husqvarna 524ST Craftsman (Sears) 88790 Ariens 8524LE Yard-Man E5KLF Toro Power Max 828LXE Troy-Bilt Storm 10030 Craftsman (Sears) 888111 Simplicity 9560E Frontier STO927 Honda HS928WAS geoffrey.krader@morton.edu Price $600 $700 $700 $950 $1,000 $1,100 $1,250 $1,300 $1,300 $1,300 $1,800 $2,080 Table 4b: Single-Stage Gas Snow Throwers Model Craftsman 88140 Yard-Man 285 Yard Machines S260 Troy-Bilt Squall 521 Ariens 522 Toro CCR 2450 GTS 38515 Honda Harmony HS520AS Toro Snow Commander 38602 Price $300 $400 $400 $500 $500 $540 $750 $900 Source: Consumer Reports, October 2004 kradermath.jimdo.com 02/2014 MAT 141 – Statistics Section 3.4 (Sullivan 4e) Page 3 Learning Outcomes After we cover Section 3.4, you should be able to: 1. List several measures of position, and describe how measures of position differ from measures of central tendency and measures of dispersion. 2. Describe what is meant by the z-score and find the z-score for a given data point. 3. Describe what is meant by percentile. 4. Describe what is meant by quartile, and find the quartiles for a given data set. a. Describe the relationship between quartiles, percentiles and the median. 5. Describe what is meant by interquartile range (IQR), and calculate the IQR for a given data set. 6. Use the shape of the distribution to determine the most appropriate measure of dispersion: standard deviation or IQR. 7. Describe what is meant by an outlier, and use the IQR to identify outliers in a given data set. a. Describe how to handle outliers in a statistical study (e.g., when should outliers be eliminated from a data set?) Numerical Summaries of Data To describe the distribution of a variable: o Measures of Central Tendency – Describe a “typical” data value, the “middle” of the data set. o Measures of Dispersion – Describe the “spread” of the data set. To describe the location of individual data points within the data set: o Measures of Position – Describes where a data point is located within the distribution. NOTE: We will also use measures of position to define an additional measure of dispersion. geoffrey.krader@morton.edu kradermath.jimdo.com 02/2014 MAT 141 – Statistics Section 3.4 (Sullivan 4e) Page 4 z-score The z-score describes the position of a data point within the data set as the number of standard deviations from the mean. Calculating the z-score: For populations: z z>0 z<0 z=0 For samples: X z XX s Data point lies to the right (i.e., above) the mean. Data point lies to the left (i.e., below) the mean. Data point lies at the mean. z -score X 50.0 Sample I 41 44 z= 45 -3 47 47 -2 48 -1 51 0 53 1 58 66 2 s 7.4 3 The z-score describesz-scores the position of an individual data point as the number EXAMPLE: Calculating of standard deviations from mean. for the following data points: In Sample I, above, calculate thethe z-scores z>0 z<0 X=58 z=0 Data point lies to the right of the mean Data point lies to the left of the mean Data point lies at the mean. MAT 141 (Sullivan 3e) - 3.4-3.5 Slide 4 GHK 02/2012 X=44 geoffrey.krader@morton.edu kradermath.jimdo.com 02/2014 MAT 141 – Statistics Section 3.4 (Sullivan 4e) Page 5 EXAMPLE: Using z-scores to Compare Data Points in Different Data Sets The scores of two students are circled. Recall that the two sections used different tests (one was multiple-choice, the other was free-response), so it may not be appropriate to compare an individual exam score from one section with an individual exam score in the other section. However, we can use z-scores to see which student did better relative to the rest of his/her class. z -scores be used to compare the relative Sectionmay 001 μ=75.7, of σ=15.6 data in compare differentthe data sets zlocation -scores may bepoints used to relative location points in different data sets 36 72 of data 81 89 Exam scores (Section 001) 40 73 83 90 36 72 81 89 43 78 84 90 μ = 75.7 = 15.6 Exam scores (Section 001) 40 73 83 90 58 78 85 90 43 78 84 90 μ = 75.7 = 15.6 62 79 85 92 85 75.7 9.3 58 65 62 67 65 78 79 79 80 79 85 86 85 86 86 90 94 92 67 80 86 26 60 75 85 41 26 49 41 50 49 52 50 57 52 63 60 67 63 72 67 72 72 74 72 79 75 80 79 83 80 84 83 85 84 89 85 90 89 94 90 96 94 99 96 57 74 85 99 94 z 0.60 8515.6 75.7 15.6 9.3 z 0.60 15.6 15.6 Section 002 μ=71.8, σ=18.4 Exam scores (Section 001) Exam scores (Section 001) μ = 71.8 = 18.4 μ = 71.8 = 18.4 84 71.8 12.2 z 0.66 18.4 8418.4 71.8 12.2 z 0.66 18.4 18.4 141Which MAT (Sullivan student 3e) - 3.4-3.5 scored better relative to the rest of Slide 8 MAT 141 (Sullivan 3e) - 3.4-3.5 ( ) Student in Section 001 (Score=85) Slide 8 ( his/her class? GHK 02/2012 GHK 02/2012 ) Student in Section 002 (Score=84) geoffrey.krader@morton.edu kradermath.jimdo.com 02/2014 Percentiles Percentiles divide the data values (written in ascending order) MATinto 141 –100 Statistics equal groups. Section 3.4 (Sullivan 4e) Page 6 The k-th percentile, Pk, is a number that separates the bottom k% of the data from the upper (100-k)%. Percentiles Percentiles divide thebe variable ascending order) into 100 equal may the or values may of not one(written of theindata points. groups. There The k-th , is a number that separates the bottom k% of the data from the arepercentile, 99 (notPk100) percentiles. upper (100 – k)%. areorcounting numbers; Percentiles Percentiles may may not be one of the datathere points. are no fractional or decimal There arepercentiles 99 (not 100) percentiles. (e.g., there is no P62.5) Percentiles Percentiles are counting numbers (i.e., 1, 2, 3, 4, …, 99). There are no fractional or decimal percentiles (e.g., there is no P62.5). Percentiles L P1 P10 P20 P25 P50 (Median) P75 MAT 141 (Sullivan 3e) - 3.4-3.5 EXAMPLE: Percentiles Slide 9 H GHK 02/2012 If you score at the 80th percentile on a test, how does your score compare to the other scores? If your height is at the 30th percentile for your age group, how does your height compare to the height of other people your age? Caution: There is no 0-th or 100-th percentile. Percentile does not mean percent. If you get 72% of the questions correct on an exam, your percentile depends on how the other students did. Percentiles represent the boundaries between the 100 equally-sized groups of data points; percentiles are not “bins” into which data points are placed. (For example, you can be at the 40th percentile, not in the 40th percentile). geoffrey.krader@morton.edu kradermath.jimdo.com 02/2014 MAT 141 – Statistics Section 3.4 (Sullivan 4e) Finding(cont’d) the number that corresponds to a Percentiles given percentile Page 7 P50 = 50th percentile = 9.5 Separates lower 50% from upper 50% 2 3 5 6 8 11 13 15 18 20 P60 = 60th percentile = 12 Separates lower 60% from upper 40% MAT 141 (Sullivan 3e) - 3.4-3.5 Slide 10 GHK 02/2012 EXAMPLE: Percentiles Use the table on the right to answer the following questions. Interpret the 95th percentile for household income. 20th Interpret the percentile for household income. What can you say about a household whose annual income is $52,000? geoffrey.krader@morton.edu US Household Income 2012 P95 P90 P80 P50 P20 $191,156 $146,000 $104,096 $51,017 $20,599 Source: Income, Poverty, and Health Insurance Coverage in the United States: 2012, US Census Bureau. kradermath.jimdo.com 02/2014 MAT 141 – Statistics Section 3.4 (Sullivan 4e) Page 8 EXAMPLE: Percentile Charts Use the percentile chart to answer the following question: Percentile charts What is the median BMI for a 9year old boy? 90% of 9-year old boys have a BMI between __________ and __________. Source: US Department of Health and Human Services, Health Resources and Services Administration MAT 141 (Sullivan 3e) - 3.4-3.5 lide 12 GHK 02/2012 Source: US Dept. of Health and Human Services Health Resources and Services Administration The two dots show the Body Mass Index of a single boy whose level of physical activity has decreased because of asthma. What was his BMI and percentile at age 13? What was his BMI and percentile at age 15? What would his BMI be at age 15 if his percentile had remained the same? Describe what typically happens to the BMI of boys as they get older? What can you say about the dispersion of BMI as boys get older? geoffrey.krader@morton.edu kradermath.jimdo.com 02/2014 MAT 141 – Statistics Section 3.4 (Sullivan 4e) Page 9 Quartiles Some percentiles have Some percentiles have special names: special names Percentiles L P1 P10 P20 P25 P50 P75 H Median Quartiles Q1 Q2 Q3 MAT 141 (Sullivan 3e) - 3.4-3.5 Process for Finding Quartiles Slide 13 Finding quartiles GHK 02/2012 Q2 = 9 Also known as the median or P50 2 5 6 8 9 11 13 15 20 Q1 = 5.5 Q3 = 14.0 Also known as P25 It’s the median of the points below the median Also known as P75 It’s the median of the points above the median MAT 141 (Sullivan 3e) - 3.4-3.5 Slide 14 GHK 02/2012 EXAMPLE: Quartiles Use Tables 2a and 2b to find the quartiles for Fortune 30 CEO Five-Year Compensation and Age. Five-Year Compensation Age Q1 Q2 Q3 geoffrey.krader@morton.edu kradermath.jimdo.com 02/2014 MAT 141 – Statistics Section 3.4 (Sullivan 4e) Page 10 Interquartile Range (IQR) – A Resistant Measure of Dispersion In Section 3.2 we learned two measures of dispersion, neither of which is resistant to extreme values: Range o Based on only two data points (high and low values). o Does not describe the spread of data points in between. o Very sensitive to extreme values. Standard Deviation o Based on all data points, not just the two most extreme values. o Still sensitive to extreme values (but less sensitive than the range). For variables with a skewed distribution (where there are frequently extreme values on the left or right of the distribution), we use a different measure of spread that is resistant to extreme values. Interquartile Range (IQR) = Q3 – Q1 EXAMPLE: Fortune 30 CEO Data Calculate the IQR for: CEO Five-Year Compensation CEO Age Measures of Central Tendency and Measures of Dispersion Summary: Measures of Central Tendency and Measures of Dispersion Shape of Distribution Measure of Central Tendency Measure of Dispersion Roughly symmetric Skewed (left or right) Mean Standard deviation Interquartile range Median Resistant measures MAT 141 (Sullivan 3e) - 3.4-3.5 Slide 18 geoffrey.krader@morton.edu GHK 02/2012 kradermath.jimdo.com 02/2014 MAT 141 – Statistics Section 3.4 (Sullivan 4e) Page 11 Outliers Outliers are data points that are unusually small or unusually large compared to the rest of the data set. In skewed distributions, it is not unusual to find outliers in the tails. However, outliers may occur in any distribution including symmetric distributions. How to Determine Whether a Data Set Includes Outliers Find Q1 and Q3. Use Q1 and Q3 to calculate the Interquartile Range (IQR). Calculate the upper fence (UF) and lower fence (LF). o UF = Q3 + 1.5(IQR) o LF = Q1 – 1.5(IQR) Outliers are defined to be any data points that lie outside the fences. Outliers and the interquartile range Outliers are data points that are unusually small or unusually large compared to the rest of the data set. EXAMPLE: All-in-One Printers – Text Cost Per Page Detected using the interquartile range: Use Table 3 to determine whether the highest data point (12.6 cents per page) is considered an IQR = Q3 – Q1 outlier. Q11.5(IQR) = 0.2 0 1 2 Q3+1.5(IQR) = 9.0 3 4 Q1 5 6 7 8 9 10 11 12 13 Q2 Q3 Q1=3.5 Q3=5.7 IQR = 5.7 – 3.5 = 2.2 MAT 141 (Sullivan 3e) - 3.4-3.5 Slide 21 geoffrey.krader@morton.edu GHK 02/2012 kradermath.jimdo.com 02/2014 MAT 141 – Statistics Section 3.4 (Sullivan 4e) Page 12 Working With Outliers A single outlier will impact the mean and standard deviation. Later in the course we will learn that the mean and standard deviation are used in inferential statistics to draw conclusions about data. Therefore, it is important to identify outliers – and sometimes eliminate them from the data set – in order to avoid faulty conclusions. In order to decide whether to eliminate an outlier, it is useful to understand why a data point is so unusually extreme. If the outlier occurs for some special reason, you may want to eliminate it from the data set. If there is no special explanation, removing the outlier is a judgment call. (From time to time, some data points may be unusually high or low). EXAMPLE (All-in-One Printers) Measurement or typographical errors. Broken printer. Obsolete printer (i.e., no longer representative of the population of printers). This printer just happens to be more costly than the rest. EXAMPLE: Fortune CEO Data Use Tables 2a and 2b to determine whether the Fortune 30 CEO Five-Year Compensation data set or the Age data set contains any outliers. geoffrey.krader@morton.edu kradermath.jimdo.com 02/2014