Reviews on the Determining Sample Size using Statistical Method

Transcription

Reviews on the Determining Sample Size using Statistical Method
Reviews on the Determining Sample Size using Statistical Method
Bog-Ja Jo, Hee-Hwa Oh, Kyoungho Choi
Reviews on the Determining Sample Size using Statistical Method
1
Bog-Ja Jo, 2Hee-Hwa Oh, 3Kyoungho Choi
Department of Business Administration, California International Business University,
qhr67@yahoo.co.kr
2
School of Management Administration, Chonbuk National University,
ohheehwa@hanmail.net
*3, Corresponding Author
Department of Basic Medical Science, Jeonju University, ckh414@jj.ac.kr
1
Abstract
The importance of sample size determination and a few types of software and web programs are
introduced in this paper. Also, existing problems of the papers regarding the power analysis are
examined. Results are as follows. Most studies did not utilize power analysis before performing the
clinical trials. As a result, performed analyses were less reliable due to their low test power. This is a
very apprehending matter. Domestic (Korea) and overseas software and web programs introduced by
this study will be useful for people who want to decide sample sizes.
Keywords: Sample size determination, power analysis, type I error
1. Introduction
Nowadays, statistics is being used in almost all academic areas, e.g. social sciences, natural sciences,
and as well as health and medical sciences. However, when determining the sample sizes for analyses
in research or experiments, either only the type I error (α) was considered or some sample sizes were
actually determined without even going through such consideration. These cases are clearly noticeable
especially in social science category, and a conclusion was usually made for hypothesis with proper
significant levels (usually 5%), after deciding the convenient and suitable sample size. In this case, the
analyses become less reliable due to uncertain test power. On the other hand, health and medicine
fields are using methods to choose reliable sample sizes. As the need for clinical trials is particularly
increasing in the health field, α and test power (1- β) are often considered in clinical trials for sample
size calculation [15]. On the other hand, [3] has made a table of sample sizes needed for statistical
analysis. Whatsoever, there was a limit to present sample sizes or analysis due to the existence of the
table; and according to [15], calculating sample sizes was troublesome because it had to go through
complicated calculations involving α and test power (1- β). To simplify this complication, software to
decide sample sizes that are needed for clinical trials was developed. Furthermore, due to the
improvements of Internet system, with the accessibility of CGI and PHP, other web program services
became available [10]. Both fee-charging and free, various software and web program services are
thoroughly described in [5] with the range of calculating sample sizes. In any event, providing web
programs and web sites, [12] is available for domestic (Korea) research, which will calculate sample
size needed for mean and proportion test. These researches are very useful despite the fact that there
are not many researches related to this topic. Yet, there is a weakness of not providing various
information compared to overseas ones.
The goal of this paper is to introduce the practical use of the software that helps accurately
calculating the sample size under certain conditions (hypothesis form, significant level, effect size, etc.)
when conducting clinical trials. [12] and GPower3.1[6] software will be used for domestic and
overseas software respectively. Furthermore, both of their software programs will be used to grasp the
problems occurring in the previously published researches regarding the domestic health and medicine
fields, and to provide suggestions about them. In the second chapter, the process of calculating sample
sizes is logically studied because the methods are similar, although there are various ones. The
practical use for GPower3.1 software is introduced in the third chapter. In addition, problems in
previously published papers of calculating sample sizes and power are discussed in the fourth chapter.
In the fifth chapter, calculating sample sizes using statistic methods based on all the information
provided in the previous chapters is suggested.
Journal of Convergence Information Technology(JCIT)
Volume8, Number12, July 2013
doi:10.4156/jcit.vol8.issue12.32
284
Reviews on the Determining Sample Size using Statistical Method
Bog-Ja Jo, Hee-Hwa Oh, Kyoungho Choi
2. Sample Sizes Determination Methods
The process of producing sample sizes based on the
statistical theories is listed below. First, one must define null and alternative hypothesis, then decide
the test type (one-sided or two-sided hypothesis). Second, one must choose the appropriate statistical
test based on the measurements of independent and dependent variables in hypothesis. Third, one
must set the expected size difference, and if necessary, has to consider the reliability of the
measurement. Fourth, one must decide the α or β (or power). Fifth, one must produce the necessary
number of samples by using given equations. By the way, there are various equations for calculating
sample sizes depend on researches, and the followings are the equations given by [12]. If X1, X2, ⋯,
Xn is a random sample of N(μ, σ ) and σ is known, the mean test with a null hypothesis of H0 :μ =
μ , an alternative hypothesis of H1 : μ ≠ μ , satisfying significance level α, test power 1-β at the least,
and sample size nwill be shown as equation (1). Furthermore, under an alternative hypothesis, the test
power can also be shown as equation (2).
(  ) 
n=

(1)
(  )
1 − β = 1 − Pr(Z > −z +
(  )
(2)


The equation for sample sizes required when using an effect size, δ = |μ − μ |/σ, is shown as (3).
(  ) 
(3)
n=  

Next, when assuming that X1, X2, ⋯, Xn1 is independently random sample of a normal distribution

N(μ , σ
) and Y1, Y2, ⋯, Yn2 is independently random sample of a normal distribution N(μ , σ ), Xi
and Yi are also independent from each other. For the convenience, if n1 = n2 = n and σ = σ = σ are
assumed, the equation n to get 1-β from the significance level of α, a null hypothesis of H0 :μ − μ =
0, and an alternative hypothesis of H1 :μ − μ ≠ 0 is shown in (4). In this case, the test power is
expressed as the equation (5). Where μ , μ are population means under alternative hypothesis,
respectively.
(  ) 
n=

(4)
(  )
1 − β = 1 − Pr(Z > −z + 

(  )

(5)
In addition, there are two researches, research [14] and research [1]. Research [14] summarizes a test
for superiority in clinical trials using equations (1) and (2), and research [1] estimates a sample size
using the relative risk and odds ratio in the two populations proportion comparison. They are expressed
a little differently, but the method of producing sample sizes is very similar. However, using such
formula has one negative aspect. It requires a fairly complicated calculation process.
3. Software Introduction for Sample Size Determination
Determining the sample size by clinical trials considering form of hypothesis, significance level, size
of effects, power, etc. is very important. However, it is not easy and very inconvenient for researchers,
who are not familiar with statistics, to produce sample size by using equation (1). For this reason,
software (or a web program) that calculates and produces sample size accurately and quickly would be
very useful. Thus, this chapter will introduce free and fee-charging software and web program services’
characteristics and methods for calculating sample sizes based on [5]. Moreover, the practical use for
free program [12] and GPower3.1 will be introduced.
3.1 Free and Fee-Charging Software and Web Program Services
(1) http://www.tulane.edu/%7Edunlap/psylib.html
• powmr.exe: computes power for multiple regressions
• power.exe: computes power for one-way ANOVA
• powr.exe: computes power for simple correlation
285
Reviews on the Determining Sample Size using Statistical Method
Bog-Ja Jo, Hee-Hwa Oh, Kyoungho Choi
(2) http://www.psycho.uni-duessldorf.de/aap/projects/gpower/
• performs power analyses for some common statistical tests (t-test, F-test, chi-square)
(3) http://pages.infinit.net/rlevesqu/Syntax/SampleSize/SampleSizeForProportions.txt
• samples for proportions (SPSS syntax file)
(4) http://support.sas.com/faq/042/FAQ04291.html
• macro-power and sample size (SAS module for comparing two proportions)
(5) nQuery Advisor: http://www.statsol.ie/
(6) Power and Precision (logistic regression): http://power-analysis.com/
(7) Statistical Power Analysis: http://www.statsoft.com/products/power_an.html
(8) http://www.danielsoper.com/statcalc/calc05.aspx
• effect size calculator for multiple regression
• type II error calculator
(9)http://home.ubalt.edu/ntsbarsh/Business-stat/otherapplets/SampleSize.htm#rproptyp
• sample size for the test of one and two proportions
(10) http://www.math.yorku.ca/SCS/Online/power/
• power analysis for ANOVA designs
3.2 Introduction to Domestic and Foreign Software
Domestic programs for calculating sample sizes and power is available via web services [10], which
made [12] accessible. From http://pluto.hallym.ac.kr/zsize, the services are offered for free. Provided
services are calculations of sample sizes and power. For these cases, one-sample mean test (when
effective size is decided, hypothesis is decided), independent two-sample test (when effective size is
decided, hypothesis is decided), and proportion test (one-sample proportion test, independent twosample proportion test) are provided. [16]’s example 5 from page 254 is used to introduce the
instruction for program [12]. A question to calculate the minimum sample sizes that satisfies
significant level 0.2 and power 0.8, when null hypothesis is effective size 0.2, is considered.
This can be solved by hand using the equation (4), but as Figure 1 is showing, using [12] would get
the wanted answer more quickly and accurately.
Figure 1. Calculation sample size using web programming by [12]
Figure 2. The distribution-based approach
of test specification in GPower
3.
Figure 3. The power plot window of
GPower 3.1
For being the only service provided domestically by [12], it is very convenient, but it has a weakness
of providing only two topics, mean test and proportion test. On the other hand, GPower3.1, for the
286
Reviews on the Determining Sample Size using Statistical Method
Bog-Ja Jo, Hee-Hwa Oh, Kyoungho Choi
following introduction, is free software that is able to variously calculate sample sizes power. For this
introduction, [16]’s example 6 from the page 255 is used. A question that solves for the minimum
sample size needed to satisfy the requirements of significance level 0.05 and power 0.9 is considered.
This is used in a study for determining the correlation between cotinine and bone density for smokers’
quantity of smoking. However, the negative correlation of the blood CO concentration and bone
mineral density must be presumed in this study. Output Parameter can be obtained, as Figure 2 shown
above, if it is calculated with GPower3.1 and the given information is added in the Input Parameter. As
shown in Figure 2, GPower3.1 provides a lot of information. The tap “Protocol of Power Analyses”
shows the result of analyses neatly and the tap “X-Y Plot for a Range of Variables” for Plot shows
changes in sample sizes due to various parameters.
4. Empirical Analysis
Type 1 error (α) is generalized and centralized in statistical significant test of mean difference. The
type 2 error (β), on the other hand, bigger errors occur in the result occasionally, due to lack of concern.
Generally, error β relies on effective size (the difference in the mean of two groups with σ unit),
sample sizes, error, etc., which can be seen in the equation (5). To reduce the errorβ, either increase
the effective size, increase the sample size, or decrease the error α. On that account, calculating the
sample size for the statistical significant test by groups’ mean difference is related to both error α and
error β. Previously, [3] and [11] have provided evidence for determining the sample size, in this, 0.05
or 0.01 for error α and 0.2 for error β usually fixed. This is not a standard, but a convenient set-up [2].
Nonetheless, there is no research among previously published ones about health and medical that
calculates sample sizes considering the matters mentioned above. Thus, it is easy to discover studies
that were not thoroughly prepared enough to determine sample sizes. For example, [9] has gathered 14
people in 20’s, randomly separate them to perform a test, and analyzed the result with significance
level 5%. However, if the effect size (δ) is presumed as δ=0.5, then GPower3.1 can easily calculate
that the minimum sample size satisfied by setting α=0.05 and β=0.2 is n=34. Yet, this study was
performed with only 14, less than the half of what is required, which resulted in unsuccessful analysis
of power 0.57. Therefore, the reliability of this analysis result is not high. Table 1 shows a few other
researches that seem to lack preparation. In the table, the minimum sample sizes required should be
α=0.05 and β=0.2, same as [2].
Table 1. Underprepared example articles related to sample size determination
Sample
Required minimum
Author
Test parameter
Power
size
sample size
Gong et al. [7]
r (correlation coefficient)
40
0.610
82
Jeon et al. [8]
153
0.802
128
μ − μ (independent)
Choi et al. [4]
17
0.706
27
μ − μ (paired)
Lee [13]
61
0.450
204
μ − μ (independent)
[7] has performed a study to validate the correlation between the static muscle endurance time and
the joint working range. If the medium effect size, 0.3, is presumed by the calculation of GPower3.1,
then the minimum sample size that satisfies 0.8 is n=82. In reality, sample n=42 was used, which
resulted in analysis of power 0.610. On the other hand, [8] has used n=153 to test if the satisfaction in
physical therapy of parents of disable children differs based on gender. In this case, the study was
performed with unnecessarily too many samples, compare to the common standard, power 0.8, because
the sample size of n=128 is enough to get the power 0.8. Moreover, [4] researched whether groups
using paired sample have differences. However, this study, too, had a small sample size that produced
the actual power 0.706. One of the similarities of these studies is that designing process of the research
omitted power analysis. Unlike these, [13] presents power analysis in the studying method. In other
words, the minimum sample size was set as 30 based on Cohen [2]’s power analysis. However, this is a
sample size for paired test. If the situation is an independent test, the sample size has to be n=204 in
287
Reviews on the Determining Sample Size using Statistical Method
Bog-Ja Jo, Hee-Hwa Oh, Kyoungho Choi
order to satisfy [13]’s presumption, effect size δ=0.36, and power 0.7. Therefore, using much smaller
number of n=61 sample resulted power 0.450 and failed to meet the goal.
5. Conclusion and Suggestion
Nowadays, almost all studies in health and medical fields are using a statistical method for general
conclusions. However, researchers lack basic statistical knowledge, and they are negligent toward
determining sample sizes (power analysis) during process of designing research. In this situation,
statistical software (usually SPSS, SAS, etc.) is usually used for results, but the researchers are careless
about the satisfaction of the power. This is a matter to be concerned. Statistical software only shows
results of certain orders, but does not inform whether the sample size satisfies the power requirements.
According to statistical theory, the chances of choosing null hypothesis increase, if the sample size is
small. Therefore, researchers are willing to increase the sample size in spite of α risk. By doing this, the
error β can be diminished. When the sample size is too small, the error β would increase and cause a
false result. Thus, power analysis must be used to determine the sample size. In this study, the practical
uses for domestic and overseas software, which consider hypothesis type, significant level, effective
size, etc. to quickly and easily calculate sample size required in statistical analysis by clinical trials, are
introduced. Also, it analyzed problems of calculating sample size and power in previously published
researches by empirical analysis. As a result, following were founded: first, most studies omit power
analysis prior to clinical trials; instead of producing the minimum sample size that satisfies the power,
sample sizes were being chosen conveniently; and second, most analyses showed low results and have
low reliabilities. Statistical analysis is an inference process that estimates parameter by using samples.
For results to be reliable, enough samples must be gathered, based on theory. In this process, domestic
and overseas software and programs, introduced by this paper, will be very useful. There are
expectations for GPower3.1 and other software introduced to be used widely to determine sample size.
6. References
[1] Cho, S. K., Kang, W. and Che, S. S., “Sample size calculation based on the inference of relative
risk and odds ratio”, Journal of The Korean Society of Health Information and Statistics, vol. 36, no.
1, pp.109-121, 2011.
[2] Cho, N., Statistical errors and traps, C. A. J. Press, Seoul, Korea, 2001.
[3] Cohen, J., Statistical power analysis for the behavioral sciences, 2/e, Lawrence Erlbaum Associates,
Publishers, Hillsdale, NJ, 1988.
[4] Choi, Y., Koh, Y. and Kang, Y., “Effects of dance-movement therapy program on stress, anxiety and
depression reduction of middle aged women”, Korean Public Health Research, vol. 36, no. 1, pp.9-16,
2010.
[5] Dattalo, P., Determining sample size, Oxford University Press, Oxford, NY, 2008.
[6] Faul, F., Erdfelder, E., Lang, A. G. and Buchner, A., “G*Power3; A flexible statistical power analysis
program for the social, behavioral, and biomedical sciences”, Behavior Research Methods, vol. 39, no.
2, pp. 175-191, 2007.
[7] Gong, W., Lee, S. and Lee. Y., “The effect of cervical ROM muscle endurance on cervical joint
mobilization of normal adults”, Journal of the Korean Society of Physical Medicine, vol. 5, no. 1, pp. 713, 2010.
[8] Jeon, J. K., Kim, B. H., “A survey of satisfaction of parents with handicapped children at physical
therapy services-on the basis of Jeon-nam areas-1”, The Korean Academy of Physical Therapy Science,
vol. 18, no. 3, pp.41-51, 2011.
[9] Jung, S., “The effect comparison of Mckenzie exercise and conservative physical therapy on acute
neck pain”, Journal of the Korean Society of Sports Physical Therapy, vol. 7, no. 1, pp. 9-16, 2011.
[10] Kang, H., Sim, S., “Implementation of estimation and inference on the web”, Communications of
the Korean Statistical Society, vol. 7, no. 3, pp.913-926, 2000.
[11] Kirk, R. E., Statistics: An introduction, 5/e, Belmont, CA: Thomson Wadsworth, 2008.
288
Reviews on the Determining Sample Size using Statistical Method
Bog-Ja Jo, Hee-Hwa Oh, Kyoungho Choi
[12] Lee, C., Kang, H. and Sim, S., “An implementation of the sample size and the power for testing mean
and proportion”, Journal of the Korean Data and Information Science Society, vol. 23, no. 1, pp.53-61,
2010.
[13] Lee, J., “The effect of exercises program on sprit and sleep of old aged women”, CAU Nursing
Journal, vol. 14, pp.21-27, 2010.
[14] Lim, C. Y., Kwak, M., “The satisfaction considerations for superiority trial”, Journal of The
Korean Society of Health Information and Statistics, vol. 36, no. 2, pp.200-204, 2011.
[15] Rosener, B., Fundamentals of biostatistics, 7/e, Books/Cole Cengage Learning, Boston, MA, 2010.
[16] Shin, Y., Ahn, Y., Medical research methodology, Seoul National University Press, Seoul, Korea, 2008.
289