TUTORIAL STATISTIK
Transcription
TUTORIAL STATISTIK
TUTORIAL STATISTIK Menggunakan Aplikasi SPSS Mardhani Riasetiawan, M.Eng mardhani@gmail.com Mardhani.blog.ugm.ac.id Sumber/referensi • PASW Statistics (SPSS), INFORMATION TECHNOLOGY SERVICES , California State University, Los Angeles. www.youtube.com/mycsula • M Riasetiawan, Pengolahan Data Statistika untuk Penelitian, unpublished tutorial Statistic Windows - SPSS ! Statistic - PSPP Statistic - SOFA Data View Element Variable Case Cell Description Each column represents a variable. Any survey questionnaire item or test item can be a variable. Commonly defined variable types are numeric or string. When defining variables as numeric, users need to specify decimal places. Variable names can be up to 256 characters long and must start with a letter. Make variable names meaningful and easily recognizable. Each row represents a case. The participants in the study can be cases. For example, if 100 participants are involved in your study, then 100 cases (or rows) of information should be generated. Responses to the question items should be entered consistently from left to right for each participant. A cell is an intersection between cases and variables. Each response to a survey question should be entered in a cell for each participant according to the defined variable data types. Variable View Element Variable Name Description PASW Statistics will initially give a default variable name (var00001) that users can change. It is recommended to assign a brief and meaningful name to variables (e.g., “Name,” “Gender,” and “GPA”). Variable Type The variable type determines how the cases are entered. Generally, text-based characters are of “String” type and number-based characters are of “Numeric” type. For example, if a user has a variable called “Name,” then its variable type should be “String.” Similarly, a variable named “GPA” should be a “Numeric” type with (normally two) decimal places. Value labels allow users to describe what the variable name stands for. For example, if a variable has been defined as “Fav,” most likely others may not know what it stands for. To avoid misinterpretation, value labels can be utilized to clearly define variable names. Value Labels STEP1. CREATING DATA 1.1 Define variables • Klik Variable View tab ! • Ketikan nama Variabel • Definisikan tipe variable ! • Menentukan Value Label yang sesuai ! 1.2 Entering data • Masuk ke Data View tab. ! • Entry Data sesuai tabulasi yang dipakai Cell Editor sample • Masukan Data-data berikut kedalam aplikasi PSAW. • 1. Buka file sample.xls • 2. Tentukan variable name, variabel type • 3. Masukkan data STEP2. DESCRIPTIVE ANALYSIS 2.1 analyze the variables • Variable: • Sample name • IGSN • Latitude • Longitude • Elevation • Location • Lithology • Tentukan variabel mana yang akan dianalisa • Key: homogenitas tinggi, komponen utama penelitian 2.2 frequencies • Frequency analysis is a descriptive Figure 1 - Frequency Analysis from Analyze Menu statistical method that shows the number of occurrences of each response chosen by the respondents. When using frequency analysis, PASW Statistics can also calculate the mean, median, and mode to help users analyze the results and draw conclusions • Pilih menu Analyze • Melakukan analisa Frequencies Figure 2 - Frequencies Dialog Box • Menentukan komponen penelitian yang dianalisa Menentukan statistics option ! ! 2.3. Crosstab • Crosstabs are used to examine the relationship between two variables ! ! 2.4 Data Manipulation • SELECT CASES • If you have two or more subject groups in your data and you want to analyze each subject in isolation, you can use the select cases option ! ! ! • SPLITTING A FILE • To answer the third ! research question, we need to split the file. You ! can analyze one particular group of subjects using the select cases option. However, if you wish to compare the response or performance differences by groups within one variable, it is best to use the split files option. ! • FIND AND REPLACE • In PASW Statistics, the ! Find and Replace function is more efficient to use. Users can use Find and Replace in Data View. However, only the Find function is available for users in Variable View. • Chalsie – what do you mean by ‘more active’?!? Please explain… • Reporting • Once the statistical analysis is complete, the final step is to create a report. In the report, you may include PASW Statistics output (e.g., graphs and tables) for supporting your analysis. Using the Copy and Paste functions, the tables/graphs generated in PASW Statistics can be copied from the Output Viewer window and pasted into a Microsoft Word document without having to create new tables or graphs. STEP3. STATISTICAL TEST 3.1 test of significances • Statistical Tests • Statistics is a set of mathematical techniques used to summarize research data and determine whether the data supports a proposed hypothesis. PASW Statistics includes tools that can be used to analyze variables and determine the strength and nature of the relationship between two variables and whether the means (averages) of two data sets (samples) are statistically the same or different. • Tests of Significance • The following examples are sample research questions that can be answered using PASW Statistics analytical methods Correlations • CORRELATIONS • A correlation is a statistical device that measures strength or degree of a supposed linear association between two or more variables. One of the more common measures used is the Pearson correlation, which estimates a relationship between two interval variables. H0: There is no difference between….. H1: There is a significant difference between ….. Click the Analyze menu, point to Correlate, and select Bivariate…. The Bivariate Correlations dialog box opens Select the variables in the list box on the left. Click the transfer arrow button to move them to the Variables: list box. ! Select the Pearson check box and the Two-tailed option if necessary. • PAIRED-SAMPLES T TEST • A Paired-Samples T Test is used to test if an observed difference between two means is statistically significant. To run a t test, the following assumptions should be met: the data • 1) has normal distribution, • 2) is a large data set, and • 3) has no outliers. If any of these assumptions are not met, then a nonparametric test should be used. Click the Analyze menu, point to Compare Means, and select PairedSamples T Test…. The Paired-Samples T Test dialog box opens ! ! The observed mean difference is -4.5172. Since the value of t is -3.820 at p < .001, the mean difference (-4.5172) between “pretest” and “posttest” is statistically significant. According to the Sig. of 0.001 (which is less than 0.05), the hypothesis is rejected. • INDEPENDENT-SAMPLES T TEST • An Independent-Samples T Test is used to determine the likelihood that two independent data samples came from populations that have identical means. If this were true, then the difference between the means should be equal to zero. The null hypothesis in this case would be that the two means are equal. Data View, click the Analyze menu, point to Compare Means, and select IndependentSamples T Test…. The Independent-Samples T Test dialog box opens Select the variable in the list box on the left. Click the transfer arrow button to move the variable to the Test Variable(s): list box. Select the other variable in the list box on the left. ! Click the transfer arrow button to move the variable to the Grouping Variable: list box. Click the Define Groups… button. The Define Groups dialog box opens Enter [0] in the Group 1: box, enter [1] in the Group 2: box, and then click the Continue button. Click the OK button. The Output Viewer window opens with several tables, including an Independent-Samples Test table ! ! The mean difference in seedlings sprouted between the two treatments (light and dark) was -2.900. The value of t, which is -3.179, was statistically significant (p=0.005). Therefore, the null hypothesis is rejected. • Multiple Response Sets • Very often, a survey will contain questions where the respondent is allowed to select more than one answer. Managing such questions in PASW Statistics can produce some difficulty. Each response in a multiple response question should be coded as a separate variable and then grouped under a multiple response set of variables. The multiple response set can then be analyzed using frequency counts or crosstabs. In Data View, click the Analyze menu, point to Multiple Response, and select Define Variable Sets… Make sure the Dichotomies option is selected and enter [1] in the Counted value: box. ! ! • MULTIPLE RESPONSE FREQUENCIES • It is possible to obtain the answer by running a frequency analysis for each of the airline variables. The result of such an analysis will only provide an overall raw frequency for each response and will not allow percentage comparisons between the different airlines. A frequency analysis that uses a multiple response set will provide an appropriate response with concise output. Click the Analyze menu, point to Multiple Response, and select Frequencies…. The Multiple Response Frequencies dialog box opens. ! ! As seen in the Output Viewer window, there were 18 people surveyed and 44 total responses generated. Of the 44 total responses, United was selected most often with 12 responses (representing 27.3% – the largest portion of the total responses). • MULTIPLE RESPONSE CROSSTABS • Without the use of a multiple response set, each airline would have to be analyzed against the variable that the passengers used to identify themselves as being afraid of flying. This would require the use of a crosstab analysis. However, the overall results would not allow for easy comparison between each of the airlines. The best way to answer the question would be to include the multiple response set into a crosstab analysis. Click the Analyze menu, point to Multiple Response, and select Crosstabs…. The Multiple Response Crosstabs dialog box opens ! Select the “option1” variable as the Row (s): variable and the “option2” multiple response set as the Column(s): variable. Select the “option1” variable after it is designated as the Row(s): variable. The Define Ranges… button becomes active. Click the Define Ranges… button. The Multiple Response Crosstabs: Define Variable Ranges dialog box opens ! ! ! STEP4. REGRESION 4.1 simple regresion • Simple Regression • Simple regression estimates how the value of one dependent variable (Y) can be predicted based on the value of one independent variable (X). Scatter plot • SCATTER PLOT • A scatter plot displays the nature of the relationship between two variables. It is recommended to run a scatter plot before performing a regression analysis to determine if there is a linear relationship between the variables. If there is no linear relationship (i.e., points on a graph are not clustered in a straight line), there is no need to run a simple regression. ! ! ! ! • PREDICTING VALUES OF DEPENDENT VARIABLES • Since it is known that a linear relationship exists between the two variables, the regression analysis can be performed to predict this year’s sales. ! ! ! ! 4.2 Multiple Regresion • Multiple regression estimates the coefficients of the linear equation when there is more than one independent variable that best predicts the value of the dependent variable. For example, it is possible to predict a salesperson’s total annual sales (the dependent variable) based on independent variables such as age, education, and years of experience. • PREDICTING VALUES OF DEPENDENT VARIABLES • The previous section demonstrated how to predict this year’s sales (the dependent variable) based on one independent variable (number of years of experience) by using simple regression analysis. Click the Analyze menu, point to Regression, and select Linear…. The Linear Regression dialog box opens ! ! “R Square” = “.976” indicates that this model can predict this year’s sales almost 98% correctly. ! 4.3 Polynomial • Polynomial Regression • This type of regression involves fitting a dependent variable (Yi) to a polynomial function of a single independent variable (Xi) • REGRESSION ANALYSIS • To look at the growth relationship between weight and age: • Click the Analyze menu, point to Regression, and select Curve • • • • • • Estimation…. The Curve Estimation dialog box opens to define the parameters of the analysis (see Figure 22). Transfer the “option1” variable to the Dependent(s): box and the “option2” variable to the Independent Variable: box. NOTE: The weight (dependent) variable is what is being predicted using the age (independent) variable. Deselect the Plot models check box. Select the Display ANOVA table check box. Under Models, deselect the Linear check box and select the Cubic check box. Click the OK button. ! Analyzing the Results This cubic model has an R2 of 99.567% ! • Chart Editing • During the final stage of research, enhancing the appearance of charts and figures can be very helpful for readers to understand what may seem to be confusing statistics. This will save the time and effort to copy and paste an object from one program to another and to modify its features. The following steps explain some useful methods to enhance the appearance of a chart. • ADDING A LINE TO THE SCATTER PLOT • Adding a straight line to fit the scattered pattern of a data chart can help emphasize the linear relationship among the data. 1. Click the Graphs menu, point to Legacy Dialogs, and select Scatter/Dot…. 2. Select the Simple Scatter option, and then click the Define button. 3. Transfer the “option1” variable to the X Axis: box and the “option2” variable to the Y Axis: box, and then click the OK button. A chart appears in the Output Viewer window. 4. Double-click the chart in the Output Viewer window to modify it. The Chart Editor window opens. 5. Right-click a chart marker and select Add Fit Line at Total from the shortcut menu. 6. Under Fit Method, select the Cubic option, and then click the Apply button. 7. Close the Chart Editor window. ! ! ! STEP5. CHI-SQUARE, ANOVA • Chi-Square • The Chi-Square (χ2) test is a statistical tool used to examine differences between nominal or categorical variables. The Chi-Square test is used in two similar but distinct circumstances: • To estimate how closely an observed distribution matches an expected distribution – also known as the Goodnessof-Fit test. • To determine whether two random variables are independent. 5.1 chi-square • CHI-SQUARE TEST FOR GOODNESS-OF-FIT • This procedure can be used to perform a hypothesis test about the distribution of a qualitative (categorical) variable or a discrete quantitative variable having only finite possible values. It analyzes whether the observed frequency distribution of a categorical or nominal variable is consistent with the expected frequency distribution. • Click the Open button on the Data Editor toolbar. The Open Data dialog box opens. ! • Click the Data menu and select Weight Cases • Select the Weight cases by option. • Select the “Average Daily Discharges [discharge]” variable and transfer it to the Frequency Variable: box. • Click the OK button. • Click the Analyze menu, point to Nonparametric Tests, and select ChiSquare ! ! ! Reporting the analysis results: H0: Rejected in favor of H1. H1: Patients do not leave the hospital at a constant rate. Explanation: Figure 4 indicates that the calculated χ2 -statistic, for six degrees of freedom, is 29.389. Additionally, it indicates that the significance value (0.000) is less than the usual threshold value of 0.05. This suggests that the null hypothesis, H0 (patients leave the hospital at a constant rate), can be rejected in favor of the alternate hypothesis, H1 (patients leave the hospital at different rates during the week). 5.2 ANOVA • One-Way Analysis of Variance • One-way analysis of variance (One-Way ANOVA) procedures produce an analysis for a quantitative dependent variable affected by a single factor (independent variable). Analysis of variance is used to test the hypothesis that several means are equal. This technique is an extension of the two-sample t test. It can be thought of as a generalization of the pooled t test. Instead of two populations (as in the case of a t test), there are more than two populations or treatments Data View, click the Analyze menu, point to Compare Means, and select One-Way ANOVA…. ! ! result ! ! ! • Post Hoc Tests • In ANOVA, if the null hypothesis is rejected, then it is concluded that there are differences between the means (µ1, µ2,…, µa). It is useful to know specifically where these differences exist. Post hoc testing identifies these differences. Multiple comparison procedures look at all possible pairs of means and determine if each individual pairing is the same or statistically different. In an ANOVA with α treatments, there will be α*(α-1)/2 possible unique pairings, which could mean a large number of comparisons. Data View, click the Analyze menu, point to Compare Means, and select One-Way ANOVA…. ! Click the Post Hoc… button. The OneWay ANOVA: Post Hoc Multiple Comparisons dialog box opens ! ! ! • Two-Way Analysis of Variance • Two-way analysis of variance (Two-Way ANOVA) is an extension to the one-way analysis of variance. The difference is that instead of running the test by using a single independent variable, two or more independent variables can be used to run the test in two-way analysis of variance. There are several advantages in using several variables over using a one variable design. Some of the advantages are a two-variable design ANOVA is more efficient and it also helps increase statistical power of the result. ! ! ! ! !