MATH 3070: Test I: Data sheet Billionaires Data. Fortune magazine

Transcription

MATH 3070: Test I: Data sheet Billionaires Data. Fortune magazine
MATH 3070: Test I: Data sheet
Billionaires Data. Fortune magazine publishes a list of the world's billionaires each year.
The 1992 list includes 233 individuals, with mean value 2.68 and median 1.8 (billion
dollars). We also obtained the standard deviation 3.32, the lower quartile 1.3 and the
upper quartile 3. The relative frequency histogram for their wealth (in billions of dollars)
is shown as below.
Hubble Data. In 1929 Edwin Hubble investigated the relationship between distance and
velocity of extra-galactic nebulae (celestial objects). He published the data about how
galaxies are moving away from us no matter which direction we look, and hypothesized
the so-called “Hubble's law” as follows:
Recession velocity = (Hubble's constant) * Distance
Given here are two plots which were produced from 23 data points Hubble published in
1929. The correlation coefficient was 0.8 between two variables. The intercept −40 and
the slope 454 were estimated for the regression line.
Port-wine Stain Treatment Data. The flash-pumped pulsed-dye laser was used for the
treatment of port-wine stains in children. Four age groups were determined, and the
improvement of each patient was measured for the reduction in color. Box plots are given
here.
Given here is the ANOVA table obtained from the above measurement data.
Source
Group
Error
Total
df
SS
MS
3 45.04602 15.01534
85 1285.802 15.12709
88 1330.848
MATH 3070: Test I
NAME:
Question 1. Answer the following questions regarding the world's billionaires in 1994
(see Billionaires Data).
1) Describe the shape of wealth distribution. Are there any potential outliers?
skewed to the right. There are two potential outliers.
It is
2) Explain when the empirical rule is applicable. Does the empirical rule apply for the
wealth distribution of the billionaires? The empirical rule is applicable when a
distribution is symmetric and bell-shaped. Thus, the empirical rule does not apply for
the wealth distribution.
3) What is the percentage of billionaires whose wealth are between 1.3 and 3 billion
dollars? Between 1.3 and 3 we have the interquartile range, which contains 50% of the
data.
4) What are the values about measure of center for the billionaires' wealth? Which
measure of center would you recommend to use for the billionaires' wealth? Justify
your answer. Mean 2.53 and median 1.8. We recommend median because the wealth
distribution is skewed to the right.
Question 2. The temperatures (in degrees Fahrenheit) in 7 different cities on New Year's
Day are listed below.
78
33
49 65 26
29
68
5) Find the median temperature. 49
Question 3. Assume that Z is a standard normal variable. Answer the following
questions.
6) Find the probability that Z is between 0.47 and 1.45. 0.9265 – 0.6808 = 0.2457
7) What value separates the rest of 90% from the largest 10%? The exact area of 0.90 is
between 1.28 and 1.29. Thus, it can be approximated by 1.28, 1.29, or 1.285
Question 4. The annual precipitation amounts are normally distributed with a mean of
107 inches and a standard deviation of 10 inches.
8) What is the probability that the annual precipitation will exceed 120inches? The
probability is 0.0968
Question 5. The annual yield of various investment options has a normal distribution
with mean 5% and standard deviation 5%, and are assumed to be independent.
9) If one chooses a single investment option, what is the probability that the annual yield
is more than 7%? The z score is (7 – 5)/5 = 0.4. Thus, the probability is 1 – 0.6554 =
0.3446
10) A fund combines 16 different investment options and guarantees the average annual
yield of them at the end of term. What is the probability that the yield of the combined
investment fund is more than 7% According to the central limit theorem, 5 / [square
root of 16] = 5/4 = 1.25 is the standard deviation for the average annual yield. The zscore is z = (7 – 5)/1.25 = 1.6, and therefore, the probability is 1 – 0.9452 = 0.0548
Question 6. Answer the following questions regarding Hubble Data.
11) Which variable, Recession velocity or Distance, should be the explanatory variable?
Distance should be the explanatory variable according to the Hubble’s law equation.
12) Does the Hubble's law seem appropriate from the data? Justify your answer. No.
The intercept is not zero, and therefore, you cannot write “Recession velocity = (slope)
* Distance.”
13) After removing the 16th data point, the correlation coefficient was 0.82. The intercept
−58 and the slope 449 were estimated. Does the result indicate a stronger association
between Recession velocity and Distance? Justify your answer. The correlation
coefficient 0.82 indicates a stronger relationship in comparison with the previous
value of 0.80
14) Is the 16th data point influential? Justify your answer. No. The slope 449 does not
change much, and the new intercept −58 does not change either.
15) The Hubble's constant is now thought to be about 75. Does the data published in
1929 support this Hubble's constant up to date? Justify your answer. No, since the
slope coefficient was about 449 from the data published in 1929.
Question 7. Answer the following questions regarding Port-wine Stain Treatment
Data.
16) Which of the four age groups has the lowest improvement? Justify your answer. The
group (age 18 or above) at the top of boxplots has the lowest median.
17) The objective of the study is to evaluate whether the treatment of port-wine stains
was more effective for younger children than for older ones. Write your observation
based on the box plots in Port-wine Stain Treatment Data.
In the boxplots below we observe that four groups do not appear to have a great
difference in improvement. The boxplots also do not indicate that the variances differ
among four groups.
18) Explain what analysis of variance does in comparison of groups.
It helps to see whether different groups share the same mean value or not.
19) Here we obtained the ANOVA table (as shown below). Calculate the ratio of mean
squares. Approximately 0.99
20) Does it change your opinion stated above, or reaffirm it? Extend your conclusion of
study based on the ANOVA table.
From the output the ratio of mean squares is 0.99, which is not larger than one. Thus,
there is not a significant difference in the mean improvement for the four groups. We
cannot conclude that treatment of port-wine stains at an early age is more effective than
treatment at a later age.