Concepts in sample size determination ABSTRACT Umadevi K Rao

Transcription

Concepts in sample size determination ABSTRACT Umadevi K Rao
[Downloaded free from http://www.ijdr.in on Friday, March 22, 2013, IP: 125.16.60.178] || Click here to download free Android application for this journal
Review Article
Concepts in sample size determination
Umadevi K Rao
Department of Oral and
Maxillofacial Pathology, Ragas
Dental College and Hospital,
2/102 East Coast Road, Uthandi,
Chennai, India
ABSTRACT
Received
: 12-05-11
Review completed : 11-11-11
Accepted
: 11-05-12
Investigators involved in clinical, epidemiological or translational research, have the drive to
publish their results so that they can extrapolate their findings to the population. This begins
with the preliminary step of deciding the topic to be studied, the subjects and the type of study
design. In this context, the researcher must determine how many subjects would be required
for the proposed study. Thus, the number of individuals to be included in the study, i.e., the
sample size is an important consideration in the design of many clinical studies. The sample
size determination should be based on the difference in the outcome between the two groups
studied as in an analytical study, as well as on the accepted p value for statistical significance
and the required statistical power to test a hypothesis. The accepted risk of type I error or alpha
value, which by convention is set at the 0.05 level in biomedical research defines the cutoff
point at which the p value obtained in the study is judged as significant or not. The power in
clinical research is the likelihood of finding a statistically significant result when it exists and
is typically set to >80%. This is necessary since the most rigorously executed studies may fail
to answer the research question if the sample size is too small. Alternatively, a study with too
large a sample size will be difficult and will result in waste of time and resources. Thus, the
goal of sample size planning is to estimate an appropriate number of subjects for a given study
design. This article describes the concepts in estimating the sample size.
Key words: Epidemiology, sample size, statistics
Investigators involved in clinical, epidemiological or
translational research, have the drive to publish their results
so that they can extrapolate their findings to the population.
This begins with the preliminary step of deciding the topic to
be studied, the subjects and the type of study design. In this
context, the researcher must determine how many subjects
would be required for the proposed study. Thus the number
of individuals to be included in the study i.e. the sample size
is an important consideration in the design of many clinical
studies.[1] This is necessary since the most rigorously executed
studies may fail to answer the research question if the sample
size is too small. Alternatively, a study with too large a sample
size will result in waste of time and resources. Thus the goal
Address for correspondence:
Dr. Umadevi K Rao
E-mail: umadevikrao@yahoo.co.in
Access this article online
Quick Response Code:
Website:
www.ijdr.in
PMID:
***
DOI:
10.4103/0970-9290.107385
Indian Journal of Dental Research, 23(5), 2012
of sample size planning is to estimate an appropriate number
of subjects for a given study design. To put the sample size in
the proper context, it is important to review some aspects of
research methodology, such as, the research question, choosing
the study subjects, variables and study design and P value. The
importance of sample size estimation and the parameters that
determine the sample size are hereby discussed.
IMPORTANCE OF SAMPLE SIZE
Let us consider the following study titled “Bonded versus
banded first molar attachments: A randomized controlled
clinical trial” This study was published in Journal of
Orthodontics, 2007.[2] The purpose of this study was to
compare the clinical failure rates of bonded molar tubes
with those of cemented bands during fixed appliance
therapy. The study concluded that the first molar tubes
bonded with rely-A bond composite showed a significantly
higher (33.7%) first-time failure rate than bands cemented
with intact Glass Ionomer Cement (GIC) (18.8%) and
the difference was nearly 15%. The main finding in this
study was that the failure rate of bonded molar tubes was
significantly higher, almost twice that seen for bands and
the survival time of the bonded tubes was almost half that
of the bands. The above study is an example of an analytical
or a comparative research study where the proportions of
660
[Downloaded free from http://www.ijdr.in on Friday, March 22, 2013, IP: 125.16.60.178] || Click here to download free Android application for this journal
Sample size determination
some characteristic in two or more comparison groups are
measured. Alternatively, in a research design the variable
of interest, which is, the parameter to be studied, can also
be a comparison of means in two or more groups.
The authors have done the study with an aim of demonstrating
a minimum of 15% difference in first-molar attachment
failure rates between bonded molar tube and cemented
molar bands. They chose the difference between the two
groups as 15% because they were of the opinion that this
would be a meaningful difference in the clinical scenario.
If the investigators failed to have the adequate sample
size, then they could not have demonstrated this 15%
difference by statistical analysis. Sample size estimation
gives information regarding the feasibility of the research
design and the scope of the variables that can be included.
It is always recommended that it should be estimated early
in the design phase of a study when major changes are
still possible. Apart from being necessary to carry out a
meaningful study, sample size determination for research
projects is an essential part of a study protocol for submission
to ethical committees, research funding bodies and peer
reviewed journals.[3]
Research question
All studies should start with a research question that
addresses what the investigator would like to know.
The research question is the objective of the study, the
uncertainty that the investigator wants to resolve and often
begins with a general concern that must be narrowed down
to a concrete, researchable issue. Good research questions
can arise from published medical literature where gaps in
knowledge are often highlighted. They may also come from
applying new concepts or methods to old issues and from
ideas that emerge from teaching.
Choosing the study subjects
In clinical research, it is often impossible to study the entire
population of interest, a subset referred to as a sample is often
used. There are many types of sampling methods but in a
true probability sampling, every member of the population
has equal probability of being included in the sample.
Sampling enables the researcher to draw inferences about
a large population by studying and examining a sample at
an affordable cost, time and effort. It is essential, that the
researcher must conceptualize the target population as an
initial step towards designing a study. This can be achieved
by formulating a specific set of inclusion criteria (that
establish the demographic and clinical characteristics of
subjects which are apt and answer the research question)
and exclusion criteria (that eliminates the subjects who are
not appropriate for the study).
Variables
A variable is the characteristic that varies from one study
subject to another. While designing a study, it is important
to decide which variable will be chosen for the study.
661
Rao
The validity of a study depends on how well the variables
designed for the study represent the phenomenon of interest.
For instance, how well does a fasting blood sugar level or
salivary sugar level represent the control of diabetes? Does
unstimulated salivary flow rate define xerostomia, or does the
extent of mouth opening help in the clinical staging of Oral
submucous fibrosis? The two types of variables are continuous
variables and categorical variables. Continuous variables have
quantified intervals on an infinite scale of values. eg: salivary
flow rate, age. A scale that has a finite number of intervals is
termed discrete. eg: number of cigarettes smoked per day and
parity. Discrete variables that are ordered and (i.e., arranged
in sequence from few to many) that have a considerable large
number of possible values resemble continuous variables for
practical purposes of measurement and analysis[4] Categorical
variables are referred to those variables that are not suitable for
quantification and are often measured by classifying them in
categories. Categorical variables with two possible values (e.g.
dead or alive) are termed dichotomous or binary. Categorical
variables with more than two categories can be either nominal
variables (have categories that are not ordered, eg; blood
groups: type A blood is neither more nor less than type B or O)
or ordinal variables (have categories that do have an intrinsic
order, for example severe, moderate and mild dysplasia).
In considering the association between two variables, the
one that precedes (or is presumed on biologic grounds to be
antecedent) is called the predictor or independent variable; the
other is called the outcome variable or dependent or response
variable. eg: if a study is designed to determine the efficacy of
probiotics in reducing the oral candida, the predictor variable
is probiotics and the outcome variable is oral candida.
Study design
The important step while designing research is to take a
decision, whether the researcher is going to take a passive
role in observing the events taking place in the study
subjects, as in an observational study [Table 1] or to apply
an intervention to the study subjects and examine its effects
as in an experimental study, eg: clinical trials.
P value
In any research study, the data thus collected, is analyzed
by using appropriate statistical tests. Such tests determine
the p value. The p value is the probability of obtaining a test
statistic as large as or larger than obtained in the study by
chance if the null hypothesis is true. The null hypothesis,
stated at the beginning of the study is rejected in favor of its
alternative if the obtained P value is less than α, which is the
predetermined level of statistical significance. If the obtained
P value is greater than α, the null hypothesis is accepted.
PARAMETERS THAT DETERMINE THE SAMPLE SIZE
Hypothesis
The research hypothesis summarizes the elements of the
study: the sample, the sample size, the design, the predictor
Indian Journal of Dental Research, 23(5), 2012
[Downloaded free from http://www.ijdr.in on Friday, March 22, 2013, IP: 125.16.60.178] || Click here to download free Android application for this journal
Sample size determination
Table 1: Examples of common clinical observational
research designs used to studywhether chewing areca nut
causes oral submucous fibrosis
Study design Key feature
Cross-sectional A group examined at one
study
point in time. This design,
measures disease,
while exposure status is
measured simultaneously
in a given population*
Case-control
study
Cohort study
Example
Researcher examines
the group of subjects,
observing the
prevalence of OSF in
those who have and
those who do not have
the habit of chewing
Areca nut.
Two groups, cases and
Researcher examines
disease free-controls
a group of subjects
are compared for their
with OSF (the cases)
exposure status and
and compares them
also compare the risk of
with a group of those
exposure in cases and
who do not have OSF
controls.*
(controls) and questions
them about the habit of
chewing arecanut.
Two disease free groups, Researcher examines
one group exposed and
a cohort of subjects, for
the other not exposed to many years, observing
the risk factorThey are
the incidence of
followed over a period of OSF in subjects with
time and the risk of the
arecanut chewing habit
disease in the exposed
and without arecanut
and those not exposed
chewing habit
are compared*
Where, OSF = Oral Submucous Fibrosis. (Reviewed and modified))
*Modified from: Cummings SR, Browner WS, Grady D, Hearst N, Newman
TB. DesigningClinical Research: An Epidemiologic Approach. 2nd edition,
Philadelphia: Lippincott Williamsand Wilkins. 2001
and the outcome variables. The primary necessity of stating
the hypothesis is to establish the basis for tests of statistical
significance. Hypothesis is essential for comparative studies
like our example stated above i.e. studying which type of
molar bonding clinically has lower failure rates. Hypothesis
can be either null or alternative.[5] In our example, the null
hypothesis would be that there is no difference in the longterm failure rates between bonded tube versus cemented
first molar attachments and the alternative hypothesis
would be that there is a difference in the clinical failure
rate. Alternative hypothesis cannot be tested directly; it
is accepted by default if the test of statistical significance
rejects the null hypothesis. Alternative hypothesis can be
one sided or two sided. An one sided alternative hypothesis
insists on the direction of the effect (bonded tube first molar
attachments will have a higher long term failure rate)
whereas a two sided alternative states that there will be a
difference which can go in any direction. When two sided
statistical tests are used, the P value includes the probabilities
of committing a type Ι error in each of the 2 directions,
which is about twice as great as in 1 direction only (a one
sided P value of 0.05 is usually the same as a two sided P
value of 0.1). Smaller sample sizes are required to test a onesided hypothesis and power is lost if you change to a two
sided hypothesis. After stating the hypothesis, we proceed
with the study and perform the appropriate statistical
test and based upon the level of statistical significance α,
Indian Journal of Dental Research, 23(5), 2012
Rao
which is decided prior to the study we either accept the
null hypothesis or reject the null hypothesis and accept
the alternative hypothesis based on the significance of the
statistical test.
Effect size: Minimum expected difference
The likelihood that a well-designed study will be able to
detect an association between a predictor and outcome
variable depends on the actual magnitude of the association
in the target population. If it is large, it will be easy to
detect in the sample. If the size of the association is small it
will be difficult to detect in the sample and a large sample
size would be required. This parameter is the smallest
measured difference between comparison groups, that the
researcher would like the study to demonstrate. Selecting an
appropriate effect size is the most difficult aspect of sample
size planning.[3] The investigator should first try to find data
from prior studies in the related area to make an informed
guess about a reasonable effect size. Alternatively, one can
choose the smallest effect size in one’s opinion that would
be clinically meaningful. When data are not available, it
may be necessary to do a pilot study. The discussion of
tradeoffs between sample size and effect size requires both
the technical skills of the statistician and the scientific
knowledge of the researcher.
α, β and power
When the researchers have executed the study and
completed it, they use statistical tests to try to reject the
null hypothesis in favor of its alternative, much in the same
way that a prosecuting attorney tries to convince a jury to
reject innocence in favor of guilt. Depending on whether
the null hypothesis is true or false in the target population,
and assuming that the study is free of bias, four situations
are possible.
• The jury will award the punishment if the guilt is proved
• The jury will free the innocent if not proved guilty.
• The innocent may be proved guilty (type Ι error).
• The guilty may be proved innocent (type ΙΙ error).
Similarly in research, in two of these four situations, the
findings in the sample and realities in the population are
concordant, and the researcher’s inference will be correct.
In the other two situations, either a type Ι or type ΙΙ error
has been made, or the inference will be incorrect. Thus,
after stating the hypothesis and during planning for the
study, the researcher should establish the following: the
probability of making type Ι error (i.e. rejecting the null
hypothesis when it is actually true also called as level of
statistical significance or α) and a type ΙΙ error (β, which
is failure to reject the null hypothesis) in advance of the
study.[6] Type Ι error is very serious as compared to Type
ΙΙ error. So the significance criterion (α) is normally set
at 0.05[7] and as the significance criterion becomes more
precise and stringent, the sample size necessary to detect
the minimum difference increases.
662
[Downloaded free from http://www.ijdr.in on Friday, March 22, 2013, IP: 125.16.60.178] || Click here to download free Android application for this journal
Sample size determination
Power
The probability for committing a type ΙΙ error is given by
the beta (β) value, whereas the probability of avoiding such
an error is termed as the statistical power of the study.[8]
Power is the quantity of 1-β. It is the chance of observing
an association of a given size or greater in a sample if one is
actually present in the population. If β is set at 0.10, then
the researcher has decided that he is willing to accept a 10%
chance of missing an association of a given effect size. This
represents a power of 0.90, that is, a 90% chance of finding
an association of that size. As power is increased, the sample
size increases. In clinical research, the statistical power is
customarily set to 0.80. In any study design, once the research
question is framed and the hypothesis specified and data
collected on the variables of interest, a statistical test is then
applied to test the hypothesis. This involves determining
whether or not there is a significant difference between the
means or proportions observed in the comparison groups.
The ability of the statistical test to determine the differences
between the study groups depend on several factors. These
include the statistical power, the size of the difference
earlier specified as clinically meaningful and the level of
statistical significance. Statistical power is the probability that
a statistical test will indicate a significant difference when
it truly exists. That is, in a study comprising two groups of
individuals, the power of a statistical study must be sufficient
to enable detection of a statistically significant difference
between the two study groups.
Calculating the sample size
Three main factors α, power and effect size must be considered
in calculating the appropriate sample size. To calculate the
sample size we must state the null and alternative hypothesis
and select the appropriate statistical test, which in turn is
dependent on the type of predictor and outcome variables
used in the study. Choose a reasonable effect size, set α
(normally, 0.05) and β (normally 0.20 i.e., a power of 0.80)
and use the appropriate formula given in Statistical books,[9,10]
or use a software programs like EpiInfo[11] or nQuery.[12] There
are also online web pages that can be used for sample size
estimation. Additional considerations in calculating sample
size for analytical studies include adjusting for potential
dropouts which can be mathematically calculated, stating
if it is a one sided or two sided hypothesis and determining
the ratio of cases to controls.
Sample size for descriptive studies
Descriptive studies (including diagnostic tests) do not
compare different groups and the concept of power and
hypothesis are not applicable. In these types of studies, in the
results, mean and proportions are presented. These studies
commonly report confidence intervals, a range of consistent
values about the sample mean or proportion. A confidence
interval is a measure of the precision of a sample estimate.
Thus when we are determining sample size for descriptive
studies, we must specify the desired level and width of the
663
Rao
confidence interval. The sample size can then be calculated
from the formulas.
OPTIONS TO MINIMIZE THE SAMPLE SIZE AND
MAXIMIZE THE POWER
•
•
•
•
•
•
After arriving at a sample size, it should be a feasible
number to eventually carry out the research in all
respects. If the sample is large and not practical, certain
methods can be adopted. Continuous variables can
be used as they could allow smaller sample sizes than
dichotomous variables.
Paired measurements, one at the baseline and another at
the conclusion of the study can be used for sample size
calculation in experimental studies or cohort studies.
Precise variables provide a smaller sample size in both
analytic and descriptive studies, because they reduce
variability. An outcome with a large variability requires
more samples to measure a difference that truly exists.
A more common outcome should be used in calculating
the sample size as this increases the power of the study.
Unequal groups can be studied as there are more
benefits, if we study additional individuals in one group.
It will be very easy to add the number of individuals in
the control group than in the case group.
Expanding the minimum expected difference will help
to reduce the sample size.[13]
SUMMARY
Determining the sample size is an important part of the
design of both, analytic and descriptive studies. The sample
size is an estimate of the number of subjects required to
detect an association of a given effect size and variability, at
a specified likelihood of making Type I (false-positive) and
Type II (false negative) errors. The maximum likelihood
of making a Type 1 error is called α, and that of making a
type 11 error, β. The quantity (1-β) is power, the chance
of observing an association of a given size or greater in a
sample if one is actually present in the population. Those
studies, which conclude without significant results could
actually be an example of a study without adequate power.
To achieve the desired aim in research studies concerned
with establishing a difference between groups or in those
conducted to estimate a quantity, appropriate sample size
planning is mandatory. It is always appropriate to consult
a statistician as microbiologic surveys, studies of medical
tests, and surveys with differential sampling probabilities
and other peculiar situations may require more complex
techniques to arrive at an appropriate sample size.
ACKNOWLEDGEMENT
I thank the management, Principal and staff of the
Department of Oral and Maxillofacial Pathology of Ragas
Indian Journal of Dental Research, 23(5), 2012
[Downloaded free from http://www.ijdr.in on Friday, March 22, 2013, IP: 125.16.60.178] || Click here to download free Android application for this journal
Sample size determination
Dental College and Hospital,Chennai for their support. I
extend my heartfelt gratitude to Prof. Greenspan J, Prof.
Greenspan D, Prof. Shiboskhi CH, USCF, California, USA
for their mentorship.
Rao
7.
8.
REFERENCES
9.
1.
10.
2.
3.
4.
5.
6.
Lenth RV. Some Practical Guidelines for Effective Sample-Size
Determination. Am Stat 2001;55:187-93.
Banks P, Macfarlane TV. Bonded versus banded first molar attachments:
A randomized controlled trial. J Orthod 2007;34:128-36.
Macfarlane TV. Sample size determination for research projects.
J Orthod 2003;30:99-100.
Hulley SB, Martin JN, Cummings SR. Planning the Measurements:
Precision and Accuracy. In: Cummings SR, Browner WS, Grady D,
Hearst N, Newman TB, editors. Designing Clinical Research: An
Epidemiologic Approach. 2nd ed. Philadelphia: Lippincott Williams
and Wilkins; 2001. p. 38.
Eng J. Sample size estimation: How many individuals should be studied?
Radiology 2003;227:309-13.
Browner WS, Newman TB, Hulley SB. Getting Ready to Estimate
Sample Size: Hypotheses and Underlying Principles. In: Cummings SR,
Browner WS, Grady D, Hearst N, Newman TB, editors. Designing
Clinical Research: An Epidemiologic Approach. 2nd ed. Philadelphia:
Indian Journal of Dental Research, 23(5), 2012
11.
12.
13.
Lippincott Williams and Wilkins. 2001. p. 56.
Maggard MA, O’Connell JB, Liu JH, Etzioni DA, Ko CY. Sample size
calculations in Surgery: Are they done correctly? Surgery 2003;134:
275-9.
Freedman KB, Back S, Bernstein J. Sample size and statistical power
of randomized, controlled trials in orthopaedics. J Bone Joint Surg
2001;83:397-402.
Fleiss JL. Statistical methods for rates and proportions. 2nd ed. London:
John Wiley and Sons; 1981.
Bland M. An Introduction to Medical Statistics. Oxford: Oxford
University Press; 1995.
EpiInfo 6. Database and statistics software for public health
professionals. Geneva: CDC and WHO; 1997.
Elashoff, J.D. (2000). nQuery advisor[R] version 4.0 user’s guide. Los
Angeles, CA: (2000) Statistical Solutions (complete refernce given for
this soft ware)
Browner WS, Newman TB, Hulley SB. Getting Ready to Estimate
Sample Size: Hypotheses and Underlying Principles. In: Cummings SR,
Browner WS, Grady D, Hearst N, Newman TB, editors. Designing
Clinical Research: An Epidemiologic Approach. 2nd ed. Philadelphia:
Lippincott Williams and Wilkins; 2001. p. 76.
How to cite this article: Rao UK. Concepts in sample size determination.
Indian J Dent Res 2012;23:660-4.
Source of Support: Nil, Conflict of Interest: None declared.
664