TUTORIAL STATISTIK

Transcription

TUTORIAL STATISTIK
TUTORIAL STATISTIK
Menggunakan Aplikasi SPSS
Mardhani Riasetiawan, M.Eng
mardhani@gmail.com
Mardhani.blog.ugm.ac.id
Sumber/referensi
•  PASW Statistics (SPSS), INFORMATION TECHNOLOGY
SERVICES , California State University, Los Angeles.
www.youtube.com/mycsula
•  M Riasetiawan, Pengolahan Data Statistika untuk
Penelitian, unpublished tutorial
Statistic Windows - SPSS
!
Statistic - PSPP
Statistic - SOFA
Data View
Element
Variable
Case
Cell
Description
Each column represents a variable. Any survey questionnaire item or
test item can be a variable. Commonly defined variable types are
numeric or string. When defining variables as numeric, users need to
specify decimal places. Variable names can be up to 256 characters
long and must start with a letter. Make variable names meaningful and
easily recognizable.
Each row represents a case. The participants in the study can be cases.
For example, if 100 participants are involved in your study, then 100
cases (or rows) of information should be generated. Responses to the
question items should be entered consistently from left to right for
each participant.
A cell is an intersection between cases and variables. Each response to
a survey question should be entered in a cell for each participant
according to the defined variable data types.
Variable View
Element
Variable Name
Description
PASW Statistics will initially give a default variable name (var00001)
that users can change. It is recommended to assign a brief and
meaningful name to variables (e.g., “Name,” “Gender,” and “GPA”).
Variable Type
The variable type determines how the cases are entered. Generally,
text-based characters are of “String” type and number-based
characters are of “Numeric” type. For example, if a user has a
variable called “Name,” then its variable type should be “String.”
Similarly, a variable named “GPA” should be a “Numeric” type with
(normally two) decimal places.
Value labels allow users to describe what the variable name stands
for. For example, if a variable has been defined as “Fav,” most likely
others may not know what it stands for. To avoid misinterpretation,
value labels can be utilized to clearly define variable names.
Value Labels
STEP1. CREATING DATA
1.1 Define variables
•  Klik Variable View tab
!
•  Ketikan nama Variabel
•  Definisikan tipe variable
!
•  Menentukan Value Label yang sesuai
!
1.2 Entering data
•  Masuk ke Data View tab.
!
•  Entry Data sesuai tabulasi yang dipakai
Cell Editor
sample
•  Masukan Data-data berikut kedalam aplikasi PSAW.
•  1. Buka file sample.xls
•  2. Tentukan variable name, variabel type
•  3. Masukkan data
STEP2. DESCRIPTIVE
ANALYSIS
2.1 analyze the variables
•  Variable:
•  Sample name
•  IGSN
•  Latitude
•  Longitude
•  Elevation
•  Location
•  Lithology
•  Tentukan variabel mana yang akan dianalisa
•  Key: homogenitas tinggi, komponen utama penelitian
2.2 frequencies
•  Frequency analysis is a descriptive
Figure 1 - Frequency Analysis from Analyze Menu
statistical method that shows the
number of occurrences of each
response chosen by the
respondents. When using frequency
analysis, PASW Statistics can also
calculate the mean, median, and
mode to help users analyze the
results and draw conclusions
•  Pilih menu Analyze
•  Melakukan analisa Frequencies
Figure 2 - Frequencies Dialog Box
•  Menentukan komponen penelitian
yang dianalisa
Menentukan statistics option
!
!
2.3. Crosstab
•  Crosstabs are used to examine the relationship between
two variables
!
!
2.4 Data Manipulation
•  SELECT CASES
•  If you have two or more
subject groups in your data
and you want to analyze each
subject in isolation, you can
use the select cases option
!
!
!
•  SPLITTING A FILE
•  To answer the third
!
research question, we
need to split the file. You
!
can analyze one particular
group of subjects using the
select cases option.
However, if you wish to
compare the response or
performance differences by
groups within one variable,
it is best to use the split
files option.
!
•  FIND AND REPLACE
•  In PASW Statistics, the
!
Find and Replace function
is more efficient to use.
Users can use Find and
Replace in Data View.
However, only the Find
function is available for
users in Variable View.
•  Chalsie – what do you
mean by ‘more active’?!?
Please explain…
•  Reporting
•  Once the statistical analysis is complete, the final step is
to create a report. In the report, you may include PASW
Statistics output (e.g., graphs and tables) for supporting
your analysis. Using the Copy and Paste functions, the
tables/graphs generated in PASW Statistics can be copied
from the Output Viewer window and pasted into a
Microsoft Word document without having to create new
tables or graphs.
STEP3. STATISTICAL TEST
3.1 test of significances
•  Statistical Tests
•  Statistics is a set of mathematical techniques used to
summarize research data and determine whether the data
supports a proposed hypothesis. PASW Statistics
includes tools that can be used to analyze variables and
determine the strength and nature of the relationship
between two variables and whether the means (averages)
of two data sets (samples) are statistically the same or
different.
•  Tests of Significance
•  The following examples are sample research questions
that can be answered using PASW Statistics analytical
methods
Correlations
•  CORRELATIONS
•  A correlation is a statistical
device that measures
strength or degree of a
supposed linear
association between two or
more variables. One of the
more common measures
used is the Pearson
correlation, which
estimates a relationship
between two interval
variables.
H0: There is no difference between…..
H1: There is a significant difference
between …..
Click the Analyze menu, point to
Correlate, and select Bivariate….
The Bivariate Correlations dialog box
opens
Select the variables in the list box on
the left.
Click the transfer arrow button to
move them to the Variables: list box.
!
Select the Pearson check box and
the Two-tailed option if necessary.
•  PAIRED-SAMPLES T TEST
•  A Paired-Samples T Test is used to test if an observed
difference between two means is statistically significant.
To run a t test, the following assumptions should be met:
the data
•  1) has normal distribution,
•  2) is a large data set, and
•  3) has no outliers. If any of these assumptions are not
met, then a nonparametric test should be used.
Click the Analyze
menu, point to
Compare Means, and
select PairedSamples T Test….
The Paired-Samples T
Test dialog box opens
!
!
The observed mean difference is -4.5172. Since the value of t is -3.820 at p < .001,
the mean difference (-4.5172) between “pretest” and “posttest” is statistically
significant. According to the Sig. of 0.001 (which is less than 0.05), the hypothesis
is rejected.
•  INDEPENDENT-SAMPLES T TEST
•  An Independent-Samples T Test is used to determine the
likelihood that two independent data samples came from
populations that have identical means. If this were true,
then the difference between the means should be equal to
zero. The null hypothesis in this case would be that the
two means are equal.
Data View, click the Analyze menu, point to
Compare Means, and select IndependentSamples T Test…. The Independent-Samples T
Test dialog box opens
Select the variable in the list box on the left.
Click the transfer arrow button to move the
variable to the Test Variable(s): list box.
Select the other variable in the list box on the left.
!
Click the transfer arrow button to move the
variable to the Grouping Variable: list box.
Click the Define Groups… button. The Define
Groups dialog box opens
Enter [0] in the Group 1: box, enter [1] in the
Group 2: box, and then click the Continue
button.
Click the OK button. The Output Viewer window
opens with several tables, including an
Independent-Samples Test table
!
!
The mean difference in seedlings sprouted between the two treatments (light and
dark) was -2.900. The value of t, which is -3.179, was statistically significant
(p=0.005). Therefore, the null hypothesis is rejected.
•  Multiple Response Sets
•  Very often, a survey will contain questions where the
respondent is allowed to select more than one answer.
Managing such questions in PASW Statistics can produce
some difficulty. Each response in a multiple response
question should be coded as a separate variable and then
grouped under a multiple response set of variables. The
multiple response set can then be analyzed using
frequency counts or crosstabs.
In Data View, click the Analyze menu,
point to Multiple Response, and select
Define Variable Sets…
Make sure the Dichotomies option is
selected and enter [1] in the Counted
value: box.
!
!
•  MULTIPLE RESPONSE FREQUENCIES
•  It is possible to obtain the answer by running a frequency
analysis for each of the airline variables. The result of
such an analysis will only provide an overall raw
frequency for each response and will not allow percentage
comparisons between the different airlines. A frequency
analysis that uses a multiple response set will provide an
appropriate response with concise output.
Click the Analyze menu, point to
Multiple Response, and select
Frequencies…. The Multiple
Response Frequencies dialog box
opens.
!
!
As seen in the Output Viewer window, there were 18 people surveyed and 44
total responses generated. Of the 44 total responses, United was selected
most often with 12 responses (representing 27.3% – the largest portion of the
total responses).
•  MULTIPLE RESPONSE CROSSTABS
•  Without the use of a multiple response set, each airline
would have to be analyzed against the variable that the
passengers used to identify themselves as being afraid of
flying. This would require the use of a crosstab analysis.
However, the overall results would not allow for easy
comparison between each of the airlines. The best way to
answer the question would be to include the multiple
response set into a crosstab analysis.
Click the Analyze menu, point to Multiple
Response, and select Crosstabs…. The
Multiple Response Crosstabs dialog box
opens
!
Select the “option1” variable as the Row
(s): variable and the “option2” multiple
response set as the Column(s): variable.
Select the “option1” variable after it is
designated as the Row(s): variable. The
Define Ranges… button becomes active.
Click the Define Ranges… button. The
Multiple Response Crosstabs: Define
Variable Ranges dialog box opens
!
!
!
STEP4. REGRESION
4.1 simple regresion
•  Simple Regression
•  Simple regression estimates how the value of one
dependent variable (Y) can be predicted based on the
value of one independent variable (X).
Scatter plot
•  SCATTER PLOT
•  A scatter plot displays the nature of the relationship
between two variables. It is recommended to run a scatter
plot before performing a regression analysis to determine
if there is a linear relationship between the variables. If
there is no linear relationship (i.e., points on a graph are
not clustered in a straight line), there is no need to run a
simple regression.
!
!
!
!
•  PREDICTING VALUES OF DEPENDENT VARIABLES
•  Since it is known that a linear relationship exists between
the two variables, the regression analysis can be
performed to predict this year’s sales.
!
!
!
!
4.2 Multiple Regresion
•  Multiple regression estimates the coefficients of the linear
equation when there is more than one independent
variable that best predicts the value of the dependent
variable. For example, it is possible to predict a
salesperson’s total annual sales (the dependent variable)
based on independent variables such as age, education,
and years of experience.
•  PREDICTING VALUES OF DEPENDENT VARIABLES
•  The previous section demonstrated how to predict this
year’s sales (the dependent variable) based on one
independent variable (number of years of experience) by
using simple regression analysis.
Click the Analyze menu, point
to Regression, and select
Linear…. The Linear
Regression dialog box opens
!
!
“R Square” = “.976”
indicates that this model
can predict this year’s
sales almost 98%
correctly.
!
4.3 Polynomial
•  Polynomial Regression
•  This type of regression involves fitting a dependent
variable (Yi) to a polynomial function of a single
independent variable (Xi)
•  REGRESSION ANALYSIS
•  To look at the growth relationship between weight and
age:
•  Click the Analyze menu, point to Regression, and select Curve
• 
• 
• 
• 
• 
• 
Estimation…. The Curve Estimation dialog box opens to define the
parameters of the analysis (see Figure 22).
Transfer the “option1” variable to the Dependent(s): box and the
“option2” variable to the Independent Variable: box.
NOTE: The weight (dependent) variable is what is being predicted using
the age (independent) variable.
Deselect the Plot models check box.
Select the Display ANOVA table check box.
Under Models, deselect the Linear check box and select the Cubic
check box.
Click the OK button.
!
Analyzing the Results
This cubic model has an R2
of 99.567%
!
•  Chart Editing
•  During the final stage of research, enhancing the
appearance of charts and figures can be very helpful for
readers to understand what may seem to be confusing
statistics. This will save the time and effort to copy and
paste an object from one program to another and to
modify its features. The following steps explain some
useful methods to enhance the appearance of a chart.
•  ADDING A LINE TO THE SCATTER PLOT
•  Adding a straight line to fit the scattered pattern of a data
chart can help emphasize the linear relationship among
the data.
1.  Click the Graphs menu, point to Legacy Dialogs, and select Scatter/Dot….
2.  Select the Simple Scatter option, and then click the Define button.
3.  Transfer the “option1” variable to the X Axis: box and the “option2” variable to the Y Axis: box,
and then click the OK button. A chart appears in the Output Viewer window.
4.  Double-click the chart in the Output Viewer window to modify it. The Chart Editor window
opens.
5.  Right-click a chart marker and select Add Fit Line at Total from the shortcut menu.
6.  Under Fit Method, select the Cubic option, and then click the Apply button.
7.  Close the Chart Editor window.
!
!
!
STEP5. CHI-SQUARE,
ANOVA
•  Chi-Square
•  The Chi-Square (χ2) test is a statistical tool used to
examine differences between nominal or categorical
variables. The Chi-Square test is used in two similar but
distinct circumstances:
•  To estimate how closely an observed distribution matches
an expected distribution – also known as the Goodnessof-Fit test.
•  To determine whether two random variables are
independent.
5.1 chi-square
•  CHI-SQUARE TEST FOR GOODNESS-OF-FIT
•  This procedure can be used to perform a hypothesis test
about the distribution of a qualitative (categorical) variable
or a discrete quantitative variable having only finite
possible values. It analyzes whether the observed
frequency distribution of a categorical or nominal variable
is consistent with the expected frequency distribution.
•  Click the Open button on the Data Editor toolbar. The Open Data dialog
box opens.
!
•  Click the Data menu and select Weight Cases
•  Select the Weight cases by option.
•  Select the “Average Daily Discharges [discharge]” variable and
transfer it to the Frequency Variable: box.
•  Click the OK button.
• 
Click the Analyze menu, point to
Nonparametric Tests, and select ChiSquare
!
!
!
Reporting the analysis results:
H0: Rejected in favor of H1.
H1: Patients do not leave the hospital at a constant rate.
Explanation: Figure 4 indicates that the calculated χ2 -statistic, for six degrees of freedom, is 29.389. Additionally, it
indicates that the significance value (0.000) is less than the usual threshold value of 0.05. This suggests that the null
hypothesis, H0 (patients leave the hospital at a constant rate), can be rejected in favor of the alternate hypothesis, H1
(patients leave the hospital at different rates during the week).
5.2 ANOVA
•  One-Way Analysis of Variance
•  One-way analysis of variance (One-Way ANOVA)
procedures produce an analysis for a quantitative
dependent variable affected by a single factor
(independent variable). Analysis of variance is used to test
the hypothesis that several means are equal. This
technique is an extension of the two-sample t test. It can
be thought of as a generalization of the pooled t test.
Instead of two populations (as in the case of a t test),
there are more than two populations or treatments
Data View, click the Analyze menu, point
to Compare Means, and select One-Way
ANOVA….
!
!
result
!
!
!
•  Post Hoc Tests
•  In ANOVA, if the null hypothesis is rejected, then it is
concluded that there are differences between the means
(µ1, µ2,…, µa). It is useful to know specifically where these
differences exist. Post hoc testing identifies these
differences. Multiple comparison procedures look at all
possible pairs of means and determine if each individual
pairing is the same or statistically different. In an ANOVA
with α treatments, there will be α*(α-1)/2 possible unique
pairings, which could mean a large number of
comparisons.
Data View, click the Analyze menu, point
to Compare Means, and select One-Way
ANOVA….
!
Click the Post Hoc… button. The OneWay ANOVA: Post Hoc Multiple
Comparisons dialog box opens
!
!
!
•  Two-Way Analysis of Variance
•  Two-way analysis of variance (Two-Way ANOVA) is an
extension to the one-way analysis of variance. The
difference is that instead of running the test by using a
single independent variable, two or more independent
variables can be used to run the test in two-way analysis
of variance. There are several advantages in using
several variables over using a one variable design. Some
of the advantages are a two-variable design ANOVA is
more efficient and it also helps increase statistical power
of the result.
!
!
!
!
!