02 - Descriptive Statistics

Transcription

02 - Descriptive Statistics
5/8/2014
• Descriptive Statistics
Descriptive Statistics
PSYC 381
– Organize
– Summarize
– Graphing
• New Terms:
Arlo Clark-Foos
– Raw scores
– Distribution
• Frequency Tables
• Frequency Tables
– Visual depiction of data that shows how often
each value occurred.
• Typically used for discrete variables.
– Steps in Creating…
•
•
•
•
Find highest and lowest scores
Create 2 columns, one for the value and one for the frequency
List full range of values, including those with frequency of 0
Count the # of scores at each value, record that in frequency
column.
• Frequency Tables
Days with 2+ hours of TV
7
6
5
– Example
– On the board,
board record the number of days each
week that you spend watching 2+ hours of
television
• Expanding Frequency Tables to Percentages
Frequency
– Cumulative Percentage
• The
Th percentage
t
off individuals
i di id l who
h have
h
scores att a
given value or lower.
4
3
2
1
• Calculating: Count how many scores fall at or below a
given value. Divide this number by the total number of
scores.
0
1
5/8/2014
• Expanding Frequency Tables to Percentages
Days with 2+
hours of TV
Frequency
Percent
Cumulative
Percent
• Grouped Frequency Tables
– Reports the frequencies within a given interval
rather than the frequencies for a specific value.
7
– When to use instead of frequency tables…
6
5
• Continuous, interval/ratio (scale) variables
• When data cover a huge range
4
3
2
– Determining the # of intervals…(5-10?)
1
0
• Grouped Frequency Tables, example.
Number of Siblings
5
Frequency
• Histograms
– Typically used to depict interval data with the values
of the variable on the x-axis and the frequencies on
the y-axis (similar to a bar graph).
4
3
2
1
0
• Histogram
– Example
– Constructing from a Frequency Table
• Draw x-axis and label with variable of interest and full range
of values
• Draw y-axis and label it “frequency”
• Draw a bar for each value as high as the frequency for that
value, as represented on the y-axis
• Frequency Polygons
– Line graphs with the x-axes representing values
(or midpoints of intervals) and the y-axes
representing
p
g frequencies.
q
• Start the same as a Histogram
• Instead of drawing bars to represent frequency, place a
dot at the appropriate frequency for each value or
interval of values
• Connect the dots
2
5/8/2014
• Frequency Polygons
• Normal Distribution
– A specific frequency distribution in the shape of a
bell-shaped, symmetric, unimodal curve
• Similar to a frequency polygon with infinite
observations, thus a smooth curve instead of
connected dots
• Skewness
• Kurtosis
– How much one of the tails of the distribution is
pulled away from the center
Floor effects
&
Ceiling effects
– The degree to which a curve’s width and thickness
of its tails deviate from the normal curve
• Mesokurtic (normal), Leptokurtic (tall and thin),
Platykurtic (short and fat)
Platykurtic
• Best represents the center of a
data set, the particular value
that all the other data seem to
be gathering around. Usually at
high point of histogram.
Leptokurtic
• Mean
– Arithmetic average of a group of scores
– M = Σx/N
– Mean, Median, Mode
• There is no best measure for all
data!
3
5/8/2014
• Median (mdn)
• Mode
– The middle score of all the scores in a sample
when the scores are arranged in ascending order.
5 3 6 9 11 28 3 1 15
1 3 3 5 6 9 11 15 28
– The most common score of all the scores in a
sample
• Unimodal, Bimodal, Multimodal
5 3 6 9 11 28 3 1 15
9 45 32 27 16 3 89 12
3 9 12 16 27 32 45 89
1 3 3 5 6 9 11 15 28
21.5
• An extreme score that is either very high or
very low in comparison with the rest of the
scores in the distribution.
• Which is the best measure?
3 9 12 2 16 2 17 5 11 45 89 32 1 96
• Example from book (1e):
– Tuition Increases & Outliers
1 2 2 3 5 9 11 12 16 17 32 45 89 96
Mode = 2
Median = 11.5
Mean = 24.29
4
5/8/2014
• Range = Xhighest - Xlowest
– Does not tell us much, other than
absolute spread
• How close to the mean?
• How far from the mean is the typical
score?
• Variance & Standard Deviation
– The typical amount that the scores in a sample vary,
or deviate, from the mean.
– Variance:
V i
SD
S 2 or s2 or σ2
• SD2 = Σ(X-M)2
N
Sum of Squares (SS): The
sum of squared deviations
from the mean
– Standard Deviation: SD or s or σ
• SD = √ Σ(X-M)2
N
• Variance & Standard Deviation
– Steps in Calculating
1.
2.
3.
4.
5.
Find the mean of the data
Subtract the mean from each individual score
Square each of these numbers
Sum all of these squared numbers (Sum of Squares)
Divide the resulting number by the number of scores (N)
6.
If calculating SD, take the square root of this number
• The difference between the first and third
quartiles of a data set.
– 1st Quartile: 25th percentile
• Median of lower half of distribution
– 3rdd Quartile: 75thh percentile
• Median of upper half of distribution
1.
2.
3.
Calculate the median
For the lower half, calculate another median (Q1).
For the upper half, calculate another median (Q3)
IQR = Q3 - Q1
• Ages of everyone in class
• Decide the best way to portray this data graphically, then
graph it neatly (use a ruler if you have to)
– Describe the shape of this distribution
• Calculate Measures of Central Tendency
– Which is the best for this data set
• Calculate Measures of Variability (including IQR)
• Show all of your work!
5