Chapter 23. Two Categorical Variables: The Chi
Transcription
Chapter 23. Two Categorical Variables: The Chi
Chapter 23. Two Categorical Variables: The Chi-Square Test STAT 145 The new test addresses a general question: is there a relationship between two categorical variables? Let' consider the following Example along with theory: A sample of 70 girls and boys was taken at random from a population of children (perhaps of a particular age of interest), and then the numbers of right-handed boys, right-handed girls, left-handed boys, and left-handed girls were recorded as shown below: ← two-way table, because we have two categorical variables: Left-handed 12 6 18 • handedness (with 2 categories) Right-handed 24 28 52 • sex (with 2 categories) Table's dimension is 2 x 2 (look at number of column Total 36 34 70 categories in each variable). Is there evidence that, in the sampled population, handedness is independent of sex? (Use α=0.05 ) Observed Girls Boys row Total Step 1: State Hypotheses. H 0 : In the sampled population, there is NO relationship between handedness and sex. H a : In the sampled population, there is a relationship between handedness and sex. Note: The alternative hypothesis is so called “many-sided” because it allows any kind of difference (thus it is not one-sided or two-sided). Step 2: Compute Expected counts. The expected count in any cell of a two-way table when expected count= Expected H 0 is true is row total×column total table total Girls Boys Left-handed 18×36 =9.26 70 18×34 =8.74 70 9.26+ 8.74 Right-handed 52×36 =26.74 70 52×34 =25.26 70 26.74+25.26 = 52 column Total 9.26+26.74 =36 8.74+ 25.26 = 34 row Total = 18 70 1 Chapter 23. Two Categorical Variables: The Chi-Square Test STAT 145 Step 3: Compute test statistic. Draw an SRS from a large population and make a two-way table of the sample counts for two categorical variables. To test the null hypotheses H 0 that there is no relationship between the row and column variables in the population, calculate the chi-square statistic (observed count−expected count)2 χ=∑ expected count all cells 2 The chi-square test rejects 2 (Oij −Eij ) or χ = ∑ Eij all cells 2 H 0 when χ 2 is large or calculate P-values from a chi-square distribution (see Table D). Think of χ 2 as a measure of the distance of the observed counts from the expected counts. It is always zero or positive, and it is zero only when the observed counts are exactly equal to the expected counts. Contribution to the chisquare test statistic: 2 (Oij −Eij ) Eij Girls 2 Left-handed (12−9.26) =0.81 9.26 Right-handed (24−26.74) =0.28 26.74 2 Boys 2 (6−8.74) =0.86 8.74 2 (28−25.26) =0.30 25.26 (Oij −Eij )2 χ=∑ =0.81+0.28+ 0.86+0.30=2.25 Eij all cells 2 Step 4: P-value and Conclusion. The degrees of freedom for the chi-square test for this two-way table are (r−1)×(c−1) , where r is a number of rows and c is the number of columns. In this particular problem (handedness – sex): df =(2−1)×(2−1)=1 . Using Table D, 2.07< χ2=2.25<2.71 for the line where df =1 , which implies that 0.10< P−value< 0.15 . Since P−value>0.05 (because we compare to α=0.05 ), we CAN'T reject H 0 . We conclude that there is NO relationship between handedness and sex. Cell counts required for the chi-square test You can safely use the chi-square test with P-values from the chi-square distribution when no more than 20% of the expected counts are less than 5 and all individual expected counts are 1 or greater. 2 Chapter 23. Two Categorical Variables: The Chi-Square Test STAT 145 The chi–square distributions are a family of distributions that take only positive values and are skewed to the right. A specific chi–square distribution is specified by giving its degrees of freedom. The chi–square test for a two-way table with r rows and c columns uses critical values from the chi–square distribution with (r − 1)(c − 1) degrees of freedom. The P–value is the area under the density curve of this chi– square distribution to the right of the value of the test statistic. 3 Chapter 23. Two Categorical Variables: The Chi-Square Test STAT 145 Problem 1. A study of the career plans of young women and men sent questionnaires to all 722 members of the senior class in the College of Business Administration at the University of Illinois. One question asked which major within the business program the student had chosen. Here are the data from the students who responded: This is an example of a single sample classified according to two categorical variables (gender and major). a) Test the null hypothesis that there is no relationship between the gender of students and their choice of major. Give a P-value and state your conclusion(Use α=0.05 ) b) Verify that the expected cell counts satisfy the requirement for use of chi-square. Problem 2. Is there a relationship between being a pet owner and being happy? To answer this question, a psychologist asks randomly selected individuals about their level of happiness and whether or not they own any pet. The psychologist's observed results are listed in the table: Happy Not Happy but not Sad Sad row Total Pet Owner 75 19 18 112 Not Pet Owner 36 10 33 79 column Total 111 29 51 a) How many pet owners would be expected to be happy if there is no relationship between pet ownership and happiness? (round answer to the nearest whole number) b) If there is no relationship between pet ownership and happiness, the number of people who are not pet owners who are expected to be sad is 24. What is the contribution to the chi-square test statistic? (round answer to the nearest whole number) c) Suppose that chi-square value for this test was 8.5. What is the corresponding p-value? And what can we conclude? d) To check the conditions for inference for this test of significance, we check for an SRS from the population as well as … (continue). 4