Statistics Term Project Final Draft

Transcription

Statistics Term Project Final Draft
Chad Joel Harrison Statistics 1040 Sanborn 29 April 2015 Skittles Term Project For my Statistics 1040 class at Salt Lake Community College we have been given
an assignment to do a statistical analysis of the amount of each color (red, orange, yellow,
purple, and green) of Skittles candies in a 2.71 bag. The class in total recorded data from
40 different bags of Skittles and the information is broken down into some different
statistical categories as follows.
Skittles Data Pie Chart Red Orange Yellow Green Purple 20% 19% 19% 22% 20% Skittles Data Pareto Chart 550 500 450 400 Orange Purple Yellow Red Green The data I gathered seem to fall right in line with the totals from the class data.
My orange totals were a little low compared to the class total where orange was the
largest population of the data. Coming into this project with never attempting something
like this before I would have imagined that the colors would all be evenly distributed.
Red
Orange
Yellow Green
Purple
My Data
11
12
10 12
16
Class Data
457
541
487 452
500
Class Mean Candies in a bag – 60.9
Standard Deviation – 1.93
5 Number Summary
Min – 54
Q1 – 60
Q2(Mean) – 61
Q3 – 62
Max – 64
Column Std. dev. Median Min Max Q1 Q3 n Mean IQR
Total
1.92
61 54 64 60 62 40 60.925
2
Frequency Skittles Data Histogram 15 10 5 Frequency 0 54 55 59 60 61 62 63 64 More Amount of Skittles The shape of the data would be skewed left as can be seen in the Histogram. The
total number of Skittles I had in my bag is a fair representation of the mean number of
skittles per bag from the class data totals. The rest of the data from my bag of Skittles are
almost the opposite, purple was my high, and for the class it was low. Likewise, orange
was one of my lows, but for the class it was the largest sample of the groups collected.
The Categorical data in this project would be represented by the colors of the
candies. The quantitative data would be represented by the values, which are the totals or
amounts of each color. The 5 number summary would look pretty silly if I tried to
explain that using just the colors. I need both the categorical and qualitative information
attached for most of the data to be interpreted correctly. The pie charts are a good
representation of both categorical data and quantitative data working together, where the
histogram and 5 number summaries are more representative of the qualitative data only.
Confidence Intervals
A Confidence Interval is a group of values from a sample that we would use to
estimate the true value of a population parameter. We will take the data from the class
and use it to calculate if the sample size we have is similar to other random samples of
the combined data of 40 bags of Skittles of the same size. Using the software Statdisk, I
will layout give the results of some of the different confidence intervals we will be
looking at.
Purple Skittle 95% Confidence Interval Proportion
Margin of error, E = 0.016033
95% Confidence Interval (using normal approx):
0.1891373 < p < 0.2212033
With this data we are saying that we are 95% confident that in a sample of 40
bags of Skittles the proportion of purple Skittles would fall between 18.91% - 22.12%
Mean Candies Per Bag 99% Confidence Interval
Margin of error, E = 0.1007831
99% Confident the population mean is within the range:
60.79922 < mean <61.00078
This data shows that we are 99% confident that in a sample of 40 bags of Skittles
we would have a mean range between 60.79 – 61 Skittles per bag.
Standard Deviation 98% Confidence Interval
98% Confidence Interval for the St. Dev.:
1.867668 < SD < 1.996436
These data tell us that in a sample of 40 bags of Skittles we can be 98% confident
that in a similar sample the standard deviation would be between 1.87 – 2.0 if we round
Hypothesis Testing
Hypothesis testing is where we make a claim about some characteristic of the
population we are analyzing and then we take all the data we have collected from our
sample and test the claim that we are making. Using the Statdisk software we will test
the given hypotheses.
Hypothesis Test
20% of all Skittles Candies are GREEN
Ho= 20% of Skittles are Green
Ha is not equal to 20%
Sample proportion: 0.1854739
Test Statistic, z: -1.7927
Critical z: ±2.5758
P-Value: 0.0730
99% Confidence interval: 0.1651932 < p < 0.2057546
Fail to reject the null Hypothesis that 20% or 1/5 of the Skittles are green in a
sample of 40 bags of the candies. There was sufficient evidence to warrant the claim that
20% of the Skittles are green. With the Hypothesis test we performed it shows that 17%21% would be within the range given for green skittles in a bag.
Hypothesis Test
The Mean Number of Candies in a Bag of Skittles is 56
Ho = µ of 56 Skittles per bag
Ha not equal to µ of 56 Skittles per bag
Alternative Hypothesis:µ not equal to µ(hyp)
t Test
Test Statistic, t: 125.3333
Critical t: ±1.9609
P-Value: 0.0000
95% Confidence interval: 60.82334 < µ < 60.97666
Reject the null Hypothesis that the mean number of skittles in a bag is 56 total
candies. There was not sufficient evidence to warrant this claim. The estimated mean of
Skittles in a bag would be 60.8-61.
The conditions for all the confidence intervals and hypothesis tests that were
performed were all done using Statdisk with the data we collected in class. To my
surprise most of the data collected by the class fell right in line with the confidence
intervals and different hypothesis we tested. Some of the data I used to compare my
single bag sample from did not meet the requirements.
My purple skittles were 26% of my total sample, but the confidence interval we
performed for the purple Skittles said that we were 95% confident the proportion of
purple Skittles would be between 18.9%-22.1%. Also, the standard deviation from my
bag of Skittles is 2.28, but the test we performed gave us a range of 1.87-2.0, and my
standard deviation was higher. However my mean candies per bag was right on par with
the confidence interval we performed.
Then when we take the larger sample of all the data collected by the class it gives
most of our collective data a little bit more of an even spread between all the confidence
intervals we performed. Standard deviation was 1.93, which was within the estimated
range. Purple Skittles accounted for 20% of the class sample, and that is in the estimated
range. Mean candies per bag was 61, which is also right in line with the estimated range.
All of the calculations were performed by students in class, which leaves a wide
variety of possibilities for human error to occur, but if it did it would only slightly throw
off the sample sizes of the individual colors. Also, the use of technology is only as good
as the person who is entering it into the computer or calculator, which would also fall
under the human error category. The only way I can see that the sampling method we
used to be improved would have been to have someone else double check the counts of
the individual samples given, the input of the data into excel, and the entering of data to
complete the calculations necessary to complete the confidence intervals and hypothesis
tests.
Chad Joel Harrison Statistics 1040 Sanborn 229 April 2015 Skittle Term Project During the course of this project I learned over and over again how helpful the use of technology helped me as I dove deeper and deeper into the world of statistics. I feel like I only began to unlock the potential of what you can complete using tools like Excel and Statdisk. As helpful as these tools are it also adds a different element to the process of completing the assignment where you have to make sure that all the completed calculations make sense, and that I didn’t let human error taint any of the work that I had done. Thankfully the internet is full of tutorials on how to use Excel to make different graphs and formulas. The problem solving skills I used throughout this project mostly evolved around the use of technology. I was making sure to double and triple check everything I was entering into Excel and Statdisk. After a while I didn’t even have to look at the original sheet where I collected my data and transferred data from the class over. I will probably be able to remember for then next few months that my total number of candies collected in the class sample was 2437. I learned that even though some of the parts of this assignment, this class, and the homework seem at times tedious and time consuming, there is a need for statistics in many different aspects of the world we live in. I recently started a new job working for Two Men and a Truck as a Quality Control Manager and there are statistics all over that place. Through the data that Two Men and a Truck has collected through the decades they have been in business, they can give you a very good estimation of how much a move will cost, how much of the truck you will use, how many guys it will take to complete the job, and how much they can compensate a customer who happens to have something damaged during the process of their move. Math is not one of my favorite subjects, hence the reason my last few semester at Salt Lake Community College have been dedicated to finishing the math classes I have been procrastinating until the end of my Associates Degree, but I have gained a greater appreciation for math including statistics in the way it helps us in our every day lives.