George Menexes , Iannis Papadimitriou

Transcription

George Menexes , Iannis Papadimitriou
POWER ANALYSIS AND SAMPLE SIZE DETERMINATION IN THE CONTEXT
OF CORRESPONDENCE ANALYSIS
George Menexes1, Iannis Papadimitriou2
University of Macedonia, Thessaloniki, Greece
The Correspondence Analysis is considered mainly as a descriptive technique designed to
analyze simple two-way and multi-way tables containing some kind of correspondence or
association between the rows and the columns. In the case of two variables where the data
have been collected using the simple random sampling scheme, the statistical significance of
the total inertia of the contingency table can be tested by means of the 2 distribution. If the
total inertia is considered as an index of a global effect size then the observed power of the 2
test of independence can be estimated. In the context of the Statistical Power Analysis
proposed by Cohen, the minimum required sample size can be estimated, a priori, in such
way that a predetermined effect size can be detected as statistically significant by the 2 test,
at significance level with power . In this study we introduce the concept of dynamic inertia
and we propose a methodology for an a priori and post hoc Power Analysis of the 2 test
using the square root of the total inertia and the dynamic inertia as measures of effect size.
Using the proposed methodology the minimum required sample size can be estimated in case
of a survey or of an experimental study in which the Correspondence Analysis method will be
applied. Finally, a generalization of the method is attempted in the multivariate case.
Keywords: Chi-square test, effect size, total inertia, dynamic inertia
0
INTRODUCTION
A conceptual presentation of the methodology used in Correspondence Analysis (CA) can be
provided in a great variety of ways (see Benzécri 1992, Greenacre 1993, Gifi 1996, Nishisato
1980, Israëls 1987). That is probably the main reason why this method “surfaced” on several
occasions during the 20th century (Michailidis & De Leeuw, 1998; Clausen, 1998). The CA is
mainly considered to be a descriptive method used to examine the relationship between two or
more categorical variables. The corresponding results provide information, similar in nature
to that produced by the techniques applied in factor analysis, and allow us to study the
structure of the variables included in the analysis. One characteristic of this method is that the
data from one sample is treated as if it represents the whole population under study.
Nevertheless, in the case of two variables, where the collection of the relevant data has been
carried out using simple random sampling, the statistical significance of the contingency
table’s total inertia can be tested via the 2 distribution (Lebart, et al., 1977; Lebart, et al.,
2000). Which is why, many authors (Weller & Romney, 1990; Greenacre, 1993; Van de Geer,
1993; Blasius, 1994; Micheloud, 1997; Clausen, 1998) combine the application of the CA
with the 2 test of independence.
1
2
E-mail: mariston@hol.gr, gmenexes@uom.gr
E-mail: iannis@uom.gr
1
By tradition, statistical hypothesis testing in scientific research has shown an obvious
preference towards the use of statistical significance as a criterion for the rejection or nonrejection of the null hypothesis H0 (Huck, 2000); with the result that more emphasis was
placed on the testing and management of Error Type . In recent years, however, and
particularly following Cohen’s studies (1962, 1965, 1988) related to Statistical Power
Analysis in Behavioral Sciences, the attention of researchers is beginning to turn also towards
the testing of Error Type II and the necessity of analyzing the power of statistical tests
(Cohen, 1988; Murphy & Myors, 1998). This necessity has exceeded the boundaries of
Behavioral Sciences and has made its presence in other scientific fields also (see Muller, et
al., 1992; Hubbard & Armstrong, 1992; Verma & Goodale, 1995; Buhl-Mortensen, 1996;
Thomas & Juanes, 1996; Heidelbaugh & Nelson, 1996; Meyer & Mark, 1996; Miller, et al.,
1997; Sheppard, 1999; Evans & Viengkham, 2001; Foster, 2001; Di Stefano, 2001; Nutahara,
et al., 2001).
Hallahan & Rosenthal (1996) and Sheppard (1999) present an informative
introduction to the theory of Power Analysis of statistical tests, while Thomas & Krebs (1997)
attempt to make a comparative presentation of the software used for Power Analysis. In fact,
Gatti & Harwell (1998) recommend the use of software for Power Analysis as opposed to
traditional Power Charts, which do not facilitate the reading of the numerical values and can
lead to false estimations, due to the frequently required linear interpolation.
Within the framework of Statistical Power Analysis recommended by Cohen (1988), it
is also possible to make an a priori estimation of the minimum required sample size, so that
the 2 statistical test of independence for two categorical variables, at a significance level ,
with power , can detect a predefined Effect Size (ES) as statistically significant. The issue of
Power Analysis in relation to the statistical test 2 has also concerned other researchers (Meng
& Chapman, 1966; Nathan, 1972; Guenther, 1977; Lachin, 1977). The methodologies they
propose seem to resolve the problem on a local basis (e.g. for specific sampling or
experimental designs and specific alternative hypotheses) and not through a general
methodological framework, like Cohen’s proposal, which, as we shall see later on, also allows
us to connect Power Analysis to the total inertia of the contingency table of two categorical
variables.
In the present paper, we consider total inertia as an index of effect size, based on
which we can estimate the observed power of the statistical test of independence 2. We
introduce the concept of dynamic inertia of a contingency table with two categorical variables
and propose a methodology for an a priori and post-hoc Power Analysis of the statistical test
2
, using the square root of the total inertia and the dynamic inertia as indexes of effect size.
Through the methodology we recommend, it is possible to estimate the minimum required
sample size for a sampling or experimental research, on whose data Correspondence Analysis
will be applied.
In Chapters 1-4, we present the basic notation and concepts related to the process of
statistical hypothesis testing (Error Type , Error Type II, Error Type ½, Power, Observed
Significance Level). Chapter 5 is a summary of the main arguments on which the criticism
against statistical hypothesis testing is based. In Chapter 6, there is a brief presentation of a
priori and post hoc Power Analysis within the framework recommended by Cohen. Chapter 7
includes an overview of the Power Analysis methodology for the 2 test of independence in
the case of two categorical variables and of the relation between the total inertia of the
relevant contingency table and the effect size, as determined by Cohen for the test of
independence 2. In Chapter 8, we suggest ways of predetermining the effect size and
highlight the relationships between the total inertia and other contingency coefficients. In
Chapter 9 we introduce the concept of dynamic inertia of a contingency table with two
categorical variables. Chapter 10 involves an attempt to generalize on the method for
2
calculating sample size in the case of multiple variables. In Chapter 11, we present the Power
Analysis methodology in the case of the 2 goodness of fit test, in order to determine the
number of significant factorial axes that will remain after CA is applied to a contingency table
with two variables. Finally, Chapter 12 includes numerical examples (the calculations were
made using MS-Excel software with the support of the add-in -face).
1
THE STATISTICAL SIGNIFICANCE OF TOTAL INERTIA
Let us suppose that X and Y are two categorical variables with k and l categories respectively.
We symbolize with:
F: the k l contingency table of absolute frequencies with elements fij (i=1,…,k and j=1,…,l)
which expresses the joint distribution of the variables and Y
fi+ : the marginal absolute frequency of row i, i=1,…,k in table F
f+j : the marginal absolute frequency of column j, j=1,…, l in table F
: the grand total of table F with
fi
f
i
j
N
j
P: the k l correspondence table whose elements are the elements of table F divided by ; i.e.
the elements of table P are provided by the formula:
f ij
pij
N
, i=1,…,k and j = 1,…, l
ri: the mass (or weight) of row i in table P, where:
ri
f ij
p ij
j
N
j
,
fi
i=1,…,k and j=1,…, l
N
cj: the mass (or weight) of column j in table P, where:
cj
pij
i
i
f ij
f
N
N
j
, i=1,…,k and j=1,…, l
It is a known fact (see Greenacre, 1993) that the total inertia IF of table F expresses a
generalized variance and, more specifically, the weighted average of the squared 2 distances
of the row profiles (or equivalents of the column profiles) from their center of gravity. To
calculate IF, we can also use the following formulae (Blasius & Greenacre, 1994):
( pij
IF
i
IF
j
ri c j ) 2
ri c j
Q
N
[1.1]
[1.2]
From [1.2]
Q=NIF
[1.3]
3
2
In [1.2], the Q quantity is the statistic
as follows:
f ij
Q
i
fi f
N
fi f
j
j
that corresponds to table F and is calculated
2
j
( p ij
N
i
j
ri c j ) 2
[1.4]
ri c j
N
When the data has been collected through simple random sampling, based on [1.3] and
the acceptance that the preconditions are valid for the application of the 2 statistical test of
independence (see Lancaster, 1969), the statistical significance of the quantity NIF can be
tested through the 2 distribution with (k-1)(l-1) degrees of freedom (Lebart, et al., 1977;
Lebart, et al., 2000).
2
ERROR TYPE AND ERROR TYPE II
In any statistical test, the decision related to the rejection of H0 can be correct or false. An
erroneous decision is reached when:
a) H0 is rejected, when in reality it is true. We then say that an Error Type or error of
the first kind is committed. The probability of committing an Error Type is designated by
and is the conditional probability:
= (rejection of H0 / H0 true)
[2.1]
b) H0 is not rejected, when in reality it is false. We then say that an Error Type or
error of the second kind is committed. The probability of committing an Error Type II is
designated by and is the conditional probability:
= (non-rejection of H0 / H0 false) [2.2]
When an H0 is tested, what is chosen as is a value that expresses the maximum
probability of accepting the commitment of an Error Type . This probability is known as
significance level and must be determined by the researcher prior to the sampling or
execution of an experiment, so that the results of the statistical analyses do not affect its value
(Hinkle, et al., 1988; Kachigan, 1991; Cohen, 1988). Thus, the value of should not be
determined following certain preliminary data analyses, neither should it be modified in order
to cater for the rejection or non-rejection of specific null hypotheses. Furthermore, the
significance level expresses the probability of committing an Error Type only when: a) the
measurements are valid and reliable and b) the preconditions are valid for the application of
the corresponding statistical test.
In practice, the conventional (arbitrary) values =0.10 or =0.05 or =0.01 are
traditionally used (Hinkle, et al., 1988; Kirk, 1995; Hopkins, 1997; Hair, et al., 1995; Huck,
2000). For example, if in one test the significance level =0.05 or 5% is given and the H0 is
rejected, then theoretically in 100 similar cases or 100 repetitions of the experiment, only 5
erroneous decisions are expected to be taken, i.e. rejections of H0, when it is actually correct.
It therefore seems that the significance level expresses an error rate that is mainly related to
4
the statistical process and not to the value of the statistic (e.g. t, F and 2) of the test
(Lohninger, 1999).
The probability of a true H0 not being rejected is determined by the significance
level :
1
= (non-rejection of H0 / H0 true)
[2.3]
The probability of a false H0 being rejected is determined by the probability
called the power of the statistical test:
=1
= (rejection of H0 / H0 false)
and is
[2.4]
Relations [2.3] and [2.4] express the probability of a correct decision having been
taken in a statistical test.
Therefore, in order for someone to reach relatively safe and reliable conclusions,
based on the available data, the statistical test should minimize
and . However, any
attempt to minimize one risk causes the other to increase. On a practical level, we attempt to
reduce whichever risk is considered most important. One way of simultaneously minimizing
both risks is by increasing the sample size (Zar, 1996); however, this is not always feasible
due to natural, technical, financial and ethical restrictions.
Still, which of the two errors is the most important? The answer is relative, and
depends on many factors, such as the general purpose and specific goals of the research, its
theoretical framework, the researcher’s knowledge or other justifications. In any case,
nevertheless, any decision to reject or not reject a hypothesis must calculate and take into
account both and .
Statistical tests include several conventions as regards the predefinition of and .
For example, many scientists who use statistical testing in their research set
0.05 and
0.20. This means that they consider the risk of committing an Error Type more serious than
the risk of an Error Type . If we calculate the ratio:
(committing an Error Type II) / (committing an Error Type I),
where =0.05 and =0.20, we have, 0.20 / 0.05=4. In this case, the Error Type is considered
4 times more serious, or critical, than the Error Type II. If =0.20 then the power is =0.80.
On the other hand, other scientists set the value 0.80 as the minimum accepted power of a
statistical test and, if the latter has a smaller power, then they either do not carry out or
redesign their research (Kirk, 1995).
3
ERROR TYPE II½
In order to avoid any erroneous conclusions, particular attention should be given to the fact
that the test’s inability to reveal a statistically significant result (e.g. difference, effect,
correlation) does not signify that the said difference or effect or correlation does not exist in
the corresponding populations. This erroneous conclusion is often referred to as Error Type
II½ (Kritzer, 1996). It is a logical error or fallacy and pertains to cases when, within a
hypothetical syllogism, it is assumed that the conclusion’s argument is valid (i.e. when
affirming the consequent) (Dometrius, 1992; Kargopoulos & Raftopoulos, 1998).
5
4
OBSERVED SIGNIFICANCE LEVEL (p-VALUE)
The observed significance level of a statistical test is the probability of observing a value of
the statistic that is greater or equal to the value given by the sample, provided that the H0 is
true, i.e.:
p=P(Z
|z | / H0 is true)
[4.1]
where is the random variable that corresponds to the test’s statistic and z the statistic’s
value for that specific sample (e.g. t, F and 2).
The value of the observed significance level, which is based on the data, will support
the decision about whether to reject the H0 or not. If a test’s observed significance level is
smaller or equal to the predefined significance level , then the H0 is rejected at significance
level (Dometrius, 1992; Kirk, 1995; Kinnear & Gray, 1999). If the observed significance
level is greater than the predefined significance level , then the H0 is not rejected. The
observed significance level expresses the probability of a statistical result, greater or equal to
the observed result, happening “by chance” if the H0 is true (Bryman & Cramer, 1999). The
value of the observed significance level expresses the lowest significance level at which the
H0 can be rejected. It should however be noted, that in all cases what is actually valid in
relation to H0 is unknown.
5
CRITICISM CONCERNING THE H0 SIGNIFICANCE TESTS
The Null-Hypothesis Significance-Test Procedure (NHSTP) became the subject of criticism
as early as the 1960s, and since then various writers have addressed the issue periodically
(Yates, 1951; Kish, 1959; Rozeboom, 1960; Bakan, 1966; Morrison & Henkel, 1970; Pratt,
1976; Cox, 1977; Carver, 1978; Parkhurst, 1985; Guttman, 1985; Oakes, 1986; Chatfield,
1991; Loftus, 1991; Yoccuz, 1991; Schmidt, 1996). The main arguments against the NHSTP
can be summarized as follows:
The statistical significance of a result can be due to the appropriate choice of the sample
size and the significance level .
The H0 can never be true.
Based on statistical significance we cannot reach conclusions concerning the reverse
probability of the hypothesis, i.e. the probability that the H0 is true given the available
data.
Statistical significance provides no information about the values of the parameters of the
populations.
Testing for Error Type II is unjustifiably overlooked.
Statistical significance cannot be used to reach conclusions related to the practical or
clinical significance of a result.
The binary logic of the NHSTP (H0 is either rejected or not) does not conform to the fact
that knowledge is acquired one step at a time.
The procedure in question carries the risk of stochastical and logical errors, as well as
misconceptions (Menexes & Oikonomoy, 2002).
However, in most cases, the criticism against it is not supported by statements related to
the statistical procedure itself, but rather to the fact that the erroneous perceptions of
researchers and the stochastically illiteracy are the factors that lead to a wrongful use and
interpretation of the results from the H0 statistical significance tests.
6
6
POWER ANALYSIS
Power Analysis is commonly carried out during the planning stage of a research or of an
experiment, i.e. prior to data collection (a priori) and is used to estimate the probability of a
false H0 being rejected. In other words, through Power Analysis we attempt to assess the
degree of confidence that will be attributed to the test’s “ability” to actually provide
statistically significant results. The power of a statistical test depends mainly on three
factors (Cohen, 1988; Murphy & Myors, 1998):
a) The significance level
b) The sample size n, and
c) The Effect size (ES).
Effect Size can be generally defined as the extent or the magnitude of the phenomenon
under study (Cohen & Cohen, 1983). It is a measurement of the degree to which a
phenomenon is realized (Cohen, 1965). From a different viewpoint, ES can also be considered
as the observed result’s degree of deviation from the H0 (Kramer & Rosental, 1999). Every
statistical test has a different ES that can be measured in two ways (Cohen, 1988; Kramer &
Rosental, 1999; Murphy & Myors, 1998): a) as a difference, standardized or not (e.g. Cohen's
d, Hedges' g, Glass' delta), or b) as a correlation or contingency (e.g. r, r2, 2, 2, ).
In conjunction with power, the three above-mentioned factors constitute a closed
system, in the sense that if three of the system’s elements are known and fixed, then the fourth
can also be fully defined (Cohen & Cohen, 1983). More specifically, for a given (fixed) n and
ES the power of the test increases in line with , for a given and ES the power increases in
line with n and for a given and n the power increases in line with ES. The aim of the Power
Analysis is to appropriately balance the system’s four parameters, taking into account both the
theoretical and practical objectives of the research, in combination with the resources (e.g.
financial, technological) available to the researcher. This balance should not contradict the
moral-ethical restrictions of the said research.
n a practical level, Power Analysis can also give answers to the following two basic
questions:
a) At a significance level and for a power level , what is the minimum sample size
n required for the implemented statistical test to diagnose an ES d as statistically significant?
In such a case, d (e.g. 0.20) is an estimation of the minimum ES that can be of practical or
clinical significance to the researcher and which is worth detecting as statistically significant.
b) Given the sample size n, the significance level and the observed ES, what is the
power of the statistical test?
The answer to question a) is the a priori approach to power analysis, while the answer
to question b) is the post-hoc approach.
7
POWER ANALYSIS ACCORDING TO COHEN FOR A CONTINGENCY
TABLE WITH TWO VARIABLES
Let us suppose that F is a contingency table of absolute frequencies of two categorical
variables X and Y with k and l categories respectively. The general element of table F is fij
with i=1,…,k and j=1,…,l (for the notation of the present Chapter, see also Chapter 1). Let us
also suppose that the preconditions are valid for the application of the 2 statistical test of
independence. The statistical test then realized is the following:
H0: X and Y are independent
vs
Ha: not H0,
7
(at significance level )
2
The null hypothesis H0 is rejected if Q
2
corresponds to table F and
( k 1)( l 1);
, where Q is the statistic
2
is the critical value of the
( k 1)( l 1);
2
that
distribution at a
significance level with (k-1)(l-1) degrees of freedom (d.f.).
Remark 1. If H0 is true, then Q asymptotically follows the 2 distribution with
(k-1)(l-1) d.f. If, however, Ha is true, then Q has the limiting non-central 2 distribution, with
a non-centrality parameter and (k-1)(l-1) d.f. (Cochran, 1952; Chapman & Nam, 1968;
Lachin, 1977; Guenther, 1977). References providing more information concerning the noncentral 2 distribution can be found in Patnaik (1949), Sankaran (1963), Guenther (1964) and
Han (1975).
In general, the following is valid for the non-centrality parameter (Lachin, 1977):
=nf( 0,
a
),
[7.1]
where n is the sample size and f the function of the vectors of parameters 0 and a, which are
involved in the statistical test 2 under the H0 and Ha respectively. From a different
perspective, f can be considered as the observed result’s degree of deviation from the
condition stated through H0 and therefore is a function of the statistical test’s corresponding
ES. From [7.1], we find that:
n= / f( 0,
a
)
[7.2]
Therefore, if the parameter and its corresponding ES are estimated, then [7.2] can be
used to calculate the minimum sample size required, at a significance level and power level
, for the statistical test 2 to diagnose the corresponding ES as statistically significant.
7.1
Post-hoc Power Analysis
Taking into account [2.2] and Remark 1, the observed Error Type
follows:
P Q
obs
where
2
( k 1)( l 1);a
2
/ Ha true
P
2
2
nc ( k 1)( l 1)
( k 1)( l 1); a
is the value of the non-central
nc ( k 1)( l 1)
(k-1)(l-1) d.f. Due to [2.4], the observed power
expression:
obs
1
In order to calculate
According to Cohen (1988),
obs
obs,
P
obs
2
obs
,
is estimated as
[7.3]
distribution with a parameter
of the
2
and
test is given by the following
2
2
nc ( k 1)( l 1)
( k 1)( l 1); a
[7.4]
it is necessary to have an estimation of the parameter .
8
=nw2,
[7.5]
where n is the sample size and w an estimation of the ES, as determined by Cohen for the test
of independence 2 with two variables. The ES w is generally estimated using the following
formula:
kl
w
p1i
p 0i
2
,
p0i
i
[7.6]
where p1i and p0i are the relative frequencies of cell i on the contingency table, under the H0
and Ha respectively. In the case of table F in particular, formula [7.6] taking into account
formula [1.1] can also be written thus:
( pij
w
i
ri c j ) 2
IF
ri c j
j
[7.7]
The w index ranges between 0, which signifies the independence of the two variables,
and the maximum value s , where s=min(k-1, l-1) (Cohen, 1988). The limiting maximum
value of w signifies a perfect correlation between the two variables.
From [7.7]
w2
IF
[7.8]
In addition, from [1.2] it is obvious that: w 2
Q
n
[7.9]
Due to [1.3] and [7.8], formula [7.5] is written as follows:
=nw2 =nIF =Q
[7.10]
Therefore, the observed power of the test of independence
following relation:
obs
P
2
nc ( k 1)( l 1)
nI F
2
P
( k 1)( l 1); a
2
nc ( k 1)( l 1)
Q
2
is provided by the
2
( k 1)( l 1); a
[7.11]
In practice, the numerical calculation of [7.11] is carried out with the aid of noncentral 2 distribution tables (Haynam, et al., 1970) or with the use of software such as SAS
and SPSS that include specific functions for all the relevant calculations.
7.2
A Priori Power Analysis
Using [7.2] and [7.10], it is possible to make an a priori calculation of the minimum sample
size required, when an estimation of the parameter and ES w is given. In this case, we can
deduct the sample size from the following formula:
9
n
w2
[7.12]
IF
For the 2 distribution, the values of parameter ( , , u) that correspond to power
=1- and a significance level , with u degrees of freedom, can be found in tables (Haynam,
et al., 1970; Pearson & Hartley, 1972) or can be calculated using relevant software. The
problem lies in providing a predefined estimation of w or IF that has a clinical or practical
significance within the framework of the research to be implemented.
8.
PREDEFINING THE EFFECT SIZE
A predefinition of w or IF can be achieved either through pilot research projects or following
meta-analyses of previous comparable studies on the same research subject. In addition,
Cohen’s conventions can also be used in relation to what can be considered as a “small”,
“medium” or “large” effect size, within the framework of the statistical test of independence
2
(see Table 1).
Table 1: Cohen’s Conventions And Correspondence Between w And IF
Small ES Medium ES Large ES
w=0.10
w=0.30
w=0.50
IF =0.01
IF =0.09
IF =0.25
What is of even greater interest, is the relation between w or IF and contingency
coefficients based on the statistic Q. There are two basic reasons that lead us to the abovementioned proposal: a) These indexes can be calculated in a relatively easy manner using
published results from other corresponding research studies b) they express the magnitude or
the degree of association between the variables on a scale of 0 to a maximum value 1. Two
of the most commonly used association indexes for contingency tables with two variables are
the Contingency Coefficient C and Cramer’s V index.
8.1
The Relation Of w And IF To The Contingency Coefficient C
It is known (Reynolds, 1984) that the C coefficient is given by the formula:
C
Q
[8.1]
Q n
Due to [7.9] and [7.8], formula [8.1] can also be written thus:
C
Q
n
Q
n
n
n
w2
w2 1
10
IF
IF
1
[8.2]
From [8.2]
C2
1 C2
w
IF
[8.3]
Therefore,
IF
C2
1 C2
[8.4]
Remark 2. The maximum value of the C coefficient depends on the dimensions of the
contingency table, which means that a direct comparison between C indexes from tables with
varying dimensions is not feasible (Cohen, 1988).
8.2
The Relation Of w And IF To Cramer’s V Coefficient
The Cramer’s V index is given by the formula (Reynolds, 1984):
Q
,
ns
V
[8.5]
where s=min(k-1, l-1).
Due to [7.9] and [7.8], formula [8.5] can also be written as follows:
w
V
s
IF
s
[8.6]
From [8.6]
w V s
[8.7]
Hence,
IF
V s
IF
V 2s
[8.8]
9
DYNAMIC INERTIA OF A CONTINGENCY TABLE WITH TWO
VARIABLES
The concept of inertia plays a core role within the framework of Data Analysis. More
specifically, the total inertia of a contingency table with two variables expresses a generalized
variance, but can simultaneously be viewed as an index of the information contained within
the table. In addition, it has been shown above that total inertia, due to its association with w,
can also be regarded as an index of effect size, which expresses the degree or magnitude of
11
the association between two categorical variables. The maximum value of total inertia in the
case of two variables is equal to s, where s=min(k-1, l-1).
Definition: If IF is the observed inertia of the contingency table F and Imax=s the
maximum possible inertia of F with only one constrain concerning the number of rows and
columns in the table for a given sample size n, then we define as the dynamic inertia ID of
table F the ratio:
IF
I max
ID
[9.1]
Dynamic inertia expresses the observed inertia as a percentage of the maximum
possible inertia. Through formulae [1.2] and [8.5], formula [9.1] can also be written thus:
Q
ns
ID
V2
[9.2]
Therefore, the dynamic inertia of the contingency table is equal to the square of the
contingency coefficient Cramer’s V. The quantity ID 100 expresses IF as a percentage (%) of
Imax. For those scientists who use Data Analysis methods in their research and are familiar
with the concept of inertia, dynamic inertia can constitute an alternative approach for
predefining effect size. The formulae relating ID to w and IF are provided below.
w
From formulae [8.7] and [9.2]
IF
From formula [9.2]
sI D
sI D
[9.3]
[9.4]
Remark 3. In practical applications, for an a priori determination of sample size, it is
useful to carry out a relevant sensitivity test, setting limits for ES, and a cost-benefit analysis
in order to balance the available resources with the research objectives.
10
CALCULATING SAMPLE SIZE IN THE CASE OF MULTIPLE VARIABLES
In the case of q categorical variables, CA is usually applied to the generalized contingency
table B (Burt’s table). According to Menexes & Papadimitriou (2004), that part of the total
inertia in table B, which expresses the pair-wise correlations (contingencies) between
variables and is worth investigating, is the interesting inertia I , defined by the following
formula:
g
Ic
I
c 1
q (q 1)
2
, g
1,
,
qq 1
,
2
g
I c is the sum of the inertias of the
where
c 1
[10.1]
qq 1
different simple contingency tables of
2
the q variables in pairs.
12
In view of such an observation, we could claim that CA in fact analyzes a “package”
of bivariate associations or, in other words, the pair-wise interactions of all variables. And if
we consider that total inertia expresses a measure of the information contained within the
relevant table or an overall effect size, then we could claim that in each case we are analyzing
an average effect size index, which is the result of bivariate and not multivariate relations.
In conclusion, it becomes obvious that pair-wise pre-testing of the associations
between the variables to be analyzed using CA is a prerequisite. When the sample is derived
from simple random sampling, the statistical test of independence 2 can be applied for the
qq 1
pair-wise correlations of the variables, with an adjustment of the significance level
2
, for example according to Bonferroni (Girden, 1992; Brown & Melamed, 1990); otherwise,
the goodness of fit of a suitable loglinear model can be tested.
Therefore, in the case of multiple variables, the minimum required sample size can be
calculated for each pair of variables in isolation and then the highest value can be selected
qq 1
values that will emerge, so that the requirements of all tests are fulfilled, as
from the
2
regards sample size. Naturally, such a process calls for a predefined ES for each bivariate
association.
11
POWER ANALYSIS OF THE GOODNESS OF FIT TEST
During the application of CA on the contingency table Fk l with two variables, the
reconstruction of the table’s absolute frequencies is achieved through the transition formula
(Benzécri, 1992):
f ij
fi f
j
n
u ip v jp
1
p
, i=1,…,k and j=1,…,l
[11.1]
p
where p is the eigenvalue (inertia) of the factorial axis p and {uip , vjp} are the standardized
coordinates of the rows and column in table F (Greenacre, 1993) respectively, on the factorial
axis p where p=0,…r and r=mim(k-1, l-1).
The statistical goodness of fit 2 test (Cohen, 1988) can be used to compare the
observed frequencies fij in table F, from a random sample size n, with the expected
frequencies ij specified by the null hypothesis Hp, which states that only the first p
eigenvalues are statistically different to zero (Saporta & Tambrea, 1993). The weighted least
squares estimates of the expected frequencies ij are the values that result from the transition
formula, where only the first p terms are used in the sum (Andersen, 1991). The goodness of
fit test can be realized with the statistic:
f ij
ij
Qp
i
j
2
[11.2]
ij
In cases where p=0, i.e. when the two variables are independent, and provided the
preconditions for the test’s application are satisfied, the statistic Q0 can be compared with the
critical value of the 2 distribution with (k-1)(l-1) degrees of freedom, at a significance level
13
. If p=1, then the statistic Q1 asymptotically follows the 2 distribution with (k-2)(l-2) d.f. In
general, under Hp, the statistic Qp can be compared with the critical value of the 2 distribution
with (k-p-1)(l-p-1) degrees of freedom (Andersen, 1991; Saporta & Tambrea, 1993; Rao,
1995). The statistical test 2 can be used to test whether the p first factorial axes are sufficient
in order to reconstruct table F. In practice, we carry out a series of tests, starting with p=0
until the null hypothesis Hp becomes “accepted” at a significance level . In this way, our
interest focuses on the calculation of obs, so as to avoid committing Errors Type and Type
½. The method for calculating the observed power obs of the statistical goodness of fit 2
test can be applied in exactly the same manner as with the test of independence (Cohen,
1988).
Remark 4. The statistic Qp has the disadvantage that when the observed frequencies fij
are small, then the estimated frequencies ij can even acquire negative values, in which case
the statistical test cannot be implemented. What Malinvaud (1987) has recommended on such
fi f j
an occasion, is to use the quantity
as the denominator of the fraction in formula
n
[11.2]. This then leads us to a modified statistic Q p , given by the formula:
f ij
ij
Qp
i
j
fi f
2
n
p 1
p 2
r
[11.2]
j
n
The statistic Q p is also a function of the remaining r-p eigenvalues and also
asymptotically follows the
12
2
distribution with (k-p-1)(l-p-1) degrees of freedom.
NUMERICAL EXAMPLES
In the following examples, the numerical calculations required were carried out with the
support of the add-in EXCEL -face, which is available at:
ftp://ftp.stat.uiowa.edu/pub/rlenth/PiFace/.
EXCEL software was chosen as the calculation platform, for two main reasons: a) it is
widely-used, and b) it provides the option of creating scenarios and What if Analysis. The face also includes the following functions, amongst others:
1) Chi2PowerG
2) Chi2PowerNC
The Chi2Power function accepts as arguments the desired significance level of the 2
test, the non-centrality parameter and the corresponding degrees of freedom. The function
returns the observed power of the test and can therefore be used for a post-hoc Power
Analysis approach. The Chi2PowerNC function accepts as arguments the desired significance
level of the 2 test, the desired power of the test and the corresponding degrees of freedom. It
returns the non-centrality parameter
and can therefore be used for an a priori Power
Analysis approach.
14
Example 1: Post hoc Power Analysis
Let us suppose we have two categorical variables X and Y with three categories each. The
sample size is n=80. The statistical test 2 has shown that Q=14.32. The value of Q for 4 d.f.
is statistically significant at =0.05 (p<0.05). The problem that arises, is how to calculate the
observed power obs of the 2 test that corresponds to =0.05. The observed inertia I from the
corresponding contingency table with two variables can be calculated using the formula [1.2]
and is I=0.179. Formula [7.7] shows us that the inertia I corresponds to an effect size w=0.423
(with a tendency towards a large ES according to Cohen’s conventions).
Formula [7.10] can be used to estimate the non-centrality parameter , which is
=14.32. From [7.11] and with the help of the Chi2Power function, the observed power of the
test is estimated at obs=0.875. Consequently, the probability of the 2 test identifying an ES
equal to the one observed as statistically significant, at =0.05, is approximately 87.5%.
Example 2: A priori Power Analysis
Let us suppose that during the planning stage of a research, attention is focused on testing the
association between two categorical variables X and Y with three and four categories
respectively. Past experience has shown that an ES that corresponds to a minimum inertia of
0.04 is considered clinically or practically significant, according to the objectives and
theoretical framework of the said research. The problem in this case, is how to estimate the
minimum required sample size n, so that the statistical test 2 (for 6 d.f.), at =0.10 with
power =0.99, can identify the predefined ES as statistically significant.
From formulae [7.7], [8.6] for s=2 and [9.2] it is seen that inertia I=0.04 corresponds
to w=0.20, to Cramer’s index V=0.14 and to a dynamic inertia ID= 0.019. With the help of the
Chi2PowerNC function, the non-centrality parameter =24.65 is calculated. Subsequently,
using [7.12] the required sample size is estimated at n=617 sampling units. If the desired
power level of the test decreases to 0.95, then the estimated sample size is n=447. And if we
set =0.90, then the corresponding sample size is estimated at n=367.
In this example, the predefinition of the clinically significant ES could be based either
on the V index or on dynamic inertia. The size of the sample that we will eventually attempt
to collect will also depend on the resources available. Therefore, the likely effect of other
restrictions, which are not related to the subject of the research, should be taken into account
during the research planning stage.
Example 3: Calculation of sample size in the case of three variables
Let us suppose that during the planning stage of a research, attention is focused on testing the
pair-wise association between three categorical variables X, Y and Z, with three, four and five
categories respectively. Past experience has shown that for the pair of variables (X, Y), an ES
that corresponds to a minimum dynamic inertia of 0.005 is considered clinically or practically
significant, while the corresponding ES for the pairs (X, Z) and (Y, Z) is 0.045 and 0.085
respectively. The problem in this case, is how to estimate the minimum required sample size
for each of the three 2 tests, at =0.05 with power =0.80. The relevant data and results from
this problem are presented in Table 2.
The methodology of the a priori Power Analysis approach can be used to calculate the
sample size for each pair of variables. The last column in Table 2 shows that with a sample
size n=1503, the requirements of all three 2 tests are fulfilled. Due to the fact that three
statistical 2 tests have to be carried out, the significance level can be adjusted according to
15
Bonferroni, so that the Cumulative Error Rate Type
(see Huck, 2000) is not greater than
0.05
0.0167 and
0.05. In this case, the significance level for each test could be predefined at
3
then the relevant calculations can follow.
Table 2: Sample Size For Each Pair Of Association
13
Correlation
Pairs
(X, Y)
s
2
Clinically Significant ES:
Dynamic Inertia ID
0.005
Sample Size n
for =0.05 and =0.80
1503
(X, Z)
2
0.045
152
(Y, Z)
3
0.085
68
CONCLUSIONS
In the case of two categorical variables, where the data has been collected using the simple
random sampling method, it is possible to proceed with a combination of Correspondence
Analysis and the statistical test of independence 2. In the present paper, we have introduced
the concept of dynamic inertia and recommended a methodology that can be used to estimate
both the observed power of the statistical test 2 (post hoc) as well as the minimum required
sample size (a priori) in a sampling or experimental research, on whose data CA will be
applied. In order to develop the proposed methodology, we have considered the total and
dynamic inertia of a contingency table with two variables as alternative effect size indexes
and used as a basis the Power Analysis framework of 2 statistical tests as proposed by Cohen.
Finally, the following risk must also be pointed out: an experienced statistical analyst,
who intentionally makes wrongful use of Statistics, can design a research in such a way so as
to appropriately balance the Errors Type and Type , and after an appropriate selection of ,
, effect size and sample size, can direct the related conclusions towards certain “desirable”
results.
14
REFERENCES
Andersen, E. (1991). The Statistical Analysis of Categorical Data. BerlinHeidelberg: Springer-Verlag.
Bakan, D. (1966). The test of significance in psychological research. Psychological
Bulletin, 66, 423-437.
Benzécri, J.-P. (1992). Correspondence Analysis Handbook. New York: Marcel
Dekker, Inc.
Blasius, J. & Greenacre, M. (1994). Computation of Correspondence Analysis. In:
M. Greenacre and J. Blasius (eds), Correspondence Analysis in the Social
Sciences. Recent Developments and Applications. London: Academic Press.
Blasius, J. (1994). Correspondence Analysis in Social Science Research. In: M.
Greenacre and J. Blasius (eds), Correspondence Analysis in the Social
Sciences. Recent Developments and Applications. London: Academic Press.
Brown, S. & Melamed, L. (1990). Experimental Design and Analysis. Newbury
Park, CA: Sage.
16
Bryman, A. & Cramer, D. (1999). Quantitative Data Analysis with SPSS Release 8
for Windows: A Guide for Social Scientists. London and New York:
Routledge.
Buhl-Mortensen, L. (1996). Type-II Statistical Errors in Environmental Science
and the Precautionary Principle. Marine Pollution Bulletin, Vol. 32, No. 7,
528-531.
Carver, P. (1978). The case against statistical testing. Harvard Educational
Review, 48, 378-399.
Chapman, D. & Nam, J. (1968). Asymptotic Power of Chi-Square Tests for Linear
Trends in Proportions. Biometrics, Vol. 24, No. 2, 315-327.
Chatfield, C. (1991). Avoiding statistical pitfalls. Statistical Science, 6, 240-268.
Clausen, S.-E. (1998). Applied Correspondence Analysis: An Introduction.
Thousand Oakes, CA: Sage.
Cochran, W. (1952). The chi2 Test of Goodness of Fit. The Annals of Mathematical
Statistics, Vol. 23, No. 3, 315-345.
Cohen, J. & Cohen, P. (1983). Applied Multiple Regression/Correlation Analysis
for the Behavioral Sciences. New Jersey: Lawrence Erlbaum Associates,
Inc.
Cohen, J. (1962). The Statistical Power of Abnormal-Social Psychological
Research: A Review. Journal of Abnormal and Social Psychology, 65, 145153.
Cohen, J. (1965). Some Statistical Issues in Psychological Research. In: B.
Wolman (ed.), Handbook of Clinical Psychology. New York: McGraw-Hill.
Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences. New
Jersey: Lawrence Erlbaum Associates, Inc.
Cox, R. (1977). The role of significance tests. Scandanavian Journal of Statistics,
4, 49-70.
Di Stefano, J. (2001). Power analysis and sustainable forest management. Forest
Ecology and Management, 154, 141-153.
Dometrius, N. (1992). Social Statistics Using SPSS. New York: HarperCollins
Publishers, Inc.
Evans, T. & Viengkham, O. (2001). Inventory time-cost and statistical power: a
case study of a Lao rattan. Forest Ecology and Management, 150, 313-322.
Foster, J. (2001). Statistical power in forest monitoring. Forest Ecology and
Management, 151, 211-222.
Gatti, G. & Harwell, M. (1998). Advantages of Computer Programs Over Power
Charts for the Estimation of Power. Journal of Statistics Education, Vol. 6,
No. 3. Available at: http://www.amstat.org/publications/jse/v6n3/gatti.html
Gifi, A. (1996). NonLinearMultivariate Analysis. Chichester: John Willey & Sons
Ltd.
Girden, E. (1992). ANOVA Repeated Measures. Newbury Park, CA: Sage.
Greenacre, M. (1993). Correspondence Analysis in Practice. London: Academic
Press.
Guenther, W. (1964). Another Derivation of the Non-Central Chi-Square
Distribution. Journal of the American Statistical Association, Vol. 59, No.
307, 957-960.
Guenther, W. (1977). Power and Sample Size for Approximate Chi-Square Tests.
The American Statistician, Vol. 31, No. 2, 83-85.
Guttman, L. (1985). The illogic of statistical inference for cumulative science.
Applied Stochastic Models and Data Analysis, 1, 3-10.
17
Hair, J., Anderson, R., Tatham, R. & Black, W. (1995). Multivariate Data Analysis
With Readings. New Jersey: Prentice-Hall International, Inc.
Hallahan, M. & Rosenthal, R. (1996). Statistical Power: Concepts, Procedures, and
Applications. Behav. Res. Ther., Vol. 34, No. 5/6, 489-499.
Han, C-P. (1975). Some relationships Betwwen Noncentral Chi-Squared and
Normal Distributions. Biometrika, Vol. 62, No. 1, 213-214.
Haynam, G.E., Govindarajulu, Z. & Leone, F.C. (1970). Tables of the Cumulative
Non-central Chi-Square Distribution. In: H. L. Harter and D. B. Owen
(eds), Selected Tables in Mathematical Statistics, Vol. I. Chicago: Markham
Publishing Co.
Heidelbaugh, S. & Nelson, W. (1996). A power analysis of methods for assessment
of change in seagrass cover. Aquatic Botany, 53, 227-233.
Hinkle, D., Wiersma, W. & Jurs, S. (1988). Applied Statistics for the Behavioral
Sciences. Boston: Houghton Mifflin Company.
Hopkins, W. (1997). A New View of Statistics.
Available at: http://www.sportsci.org/resource/stats/index.html.
Hubbard, R. & Armstrong, S. (1992). Are Null Results Becoming an Endangered
Species in Marketing? Marketing Letters (3:2), 127-136.
Huck, S. (2000). Reading Statistics and Research. New York: Addison Wesley
Longman, Inc.
Israëls, A. (1987). Eigenvalue techniques for Qualitative Data. Leiden: DSWO
Press.
Kachigan, S. (1991). Multivariate Statistical Analysis: A Conceptual Introduction.
NY: Radius Press.
Kargopoulos P. & Raftopoulos T. (1998). The Science of Logic & The Art of
Thinking. Thessaloniki: Vanias Publishing House.
Kinnear, P. & Gray, C. (1999). SPSS for Windows Made Simple. East Sussex:
Psychology Press Ltd.
Kirk, R. (1995). Experimental Design: Procedures for the Behavioral Sciences.
Pasific Grove, CA: Brooks/Cole Publishing Company, ITP.
Kish, L. (1959). Some statistical problems in research design. American
Sociological Review, 24, 328-338.
Kramer, S., and Rosental, R., (1999): Effect Sizes and Significance Levels in
Small-Sample Research. In: R. Hoyle (ed.), Statistical Strategies for Small
Sample Research. Thousand Oakes: Sage Publications, Inc.
Kritzer, B. (1996). Surviving Statistical Spitting Matches. A Professional
Development Seminar presentation for Senior Staff of the National
Conference of State Legislatures, Madison, Wisconsin, October 10, 1996.
Available at: http://www.polisci.wisc.edu/~kritzer/misc/legstaff/legstaff.htm
Lachin, J. (1977). Sample Size Determinations for rxc Comparative Trials.
Biometrics, Vol. 33, No. 2, 315-324.
Lancaster, H. O. (1969). The Chi-Squared Distribution. John Willey & Sons, Inc.
Lebart, L., Morineau, A. & Piron, M. (2000). Statistique Exploratoire
Multidimensionnelle. Paris: Dunod.
Lebart, L., Morineau, A. & Tabard, N. (1977). Techniques de la Description
Statistique: méthodes et logiciels pour l’analyse des grands tableaux. Paris:
Dunod.
Loftus, R. (1991). On the tyranny of hypothesis testing in the social sciences.
Contemporary Psychology, 36, 102-105.
18
Lohninger, H. (1999). Teach Me Data Analysis: Single User Edition, [Computer
program manual]. New York: Springer.
Malinvaud, E. (1987). Data Analysis in applied socio-economic statistics with
special consideration of correspondence analysis. Marketing Science
Conference Proceedings. HEC-ISA, Joy en Josas.
Menexes, G. & Oikonomoy, A. (2002). Errors and Misconception in Statistical
Hypothesis Testing. Workbooks of Data Analysis, 2, 52-64. (in Greek)
Menexes, G. & Papadimitriou, I. (2004). Relations of inertia in simple
contingency, generalized contingency (Burt) and indicator matrices for two
or more variables. Workbooks of Data Analysis, 4, 42-69. (in Greek)
Meng, R. & Chapman, D. (1966). The Power of Chi Square Tests for Contingency
Tables. Journal of the American Statistical Association, Vol. 61, No. 316,
965-975.
Meyer, T. & Mark, M. (1996). Statistical Power and Implications of Meta-Analysis
for Clinical Research in Psychosocial Oncology. Journal of Psychosomatic
Research, Vol. 41, No. 5, 409-413.
Michailidis, G. & De Leeuw, J. (1998). The Gifi System of Descriptive
Multivariate Analysis. Statistical Science, Vol. 13, No. 4, 307-336.
Micheloud, F.-X. (1997). Jean Paul Benzécri’s Correspondence Analysis.
Available at:
http://www.micheloud.com/FXM/COR/E/index.htm
Miller, J., Daly, J., Wood, M., Roper, M. & Brooks, A. (1997). Statistical power
and its subcomponents-missing and misunderstood concepts in empirical
software engineering research. Information and Software Technology, 39,
285-295.
Morrison, E. & Henkel, E. (1970). Significance tests in behavioral research:
Skeptical conclusions and beyond. In: D. E. Morrison and R. E. Henkel
(ed.), The Significance Test Controversy---A Reader. Chicago: Aldine.
Muller, K., LaVange, L., Landersman-Ramey, S. & Ramey, C. (1992). Power
Calculations for General Linear Multivariate Models Including Repeated
Measures Applications. Journal of the American Statistical Association,
Vol. 87, No. 420, 1209-1226.
Murphy, K. & Myors, B. (1998). Statistical Power Analysis: A Simple and General
Model for Traditional and Modern Hypothesis Tests. New Jersey: Lawrence
Erlbaum Associates, Inc.
Nathan, G. (1972). On the Asymptotic Power of Tests for Independence in
Contingency Tables from Stratified Samples. Journal of the American
Statistical Association, Vol. 67, No. 340, 917-920.
Nishisato, S. (1980). Analysis of Categorical Data: Dual Scaling and its
Applications. Toronto: University of Toronto Press.
Nutahara, H. et al. (2001). A simple computerized program for the calculation of
the required sample size necessary to ensure statistical accuracy in medical
experiments. Computer Methods and Programs in Biomedicine, 65, 133139.
Oakes, M. (1986). Statistical Inference: A Commentary for the Social and
Behavioral Sciences. Chichester: John Wiley & Sons, Inc.
.
:
Pagano, M. & Gauvreau, K. (2000).
.
Parkhurst, F. (1985). Interpreting failure to reject a null hypothesis. Bulletin of the
Ecological Society of America, 66, 301-302.
19
Patnaik, P. (1949). The Non-Central chi2 and F Distribution and their Applications.
Biometrika, Vol. 36, No. 1/2, 202-232.
Pearson, E.S. & Hartley, H.O. (eds) (1972). Biometrika Tables for Statisticians 2.
London: Cambridge University Press.
Pratt, W. (1976). A discussion of the question: for what use are tests of hypotheses
and tests of significance. Communications in Statistics, Series A5, 779-787.
Rao, R. (1995). The Use of Hellinger Distance in graphical Displays. In: E.-M.
Tiit, T. Kollo and H. Niemi (eds), New Trends in Probability and Statistics
Vol. 3 Multivariate Statistics and Matrices in Statistics, Zeist: VSP BV and
Vilnius: TEV Ltd., (VSP/TEV).
Reynolds, H.T. (1984). Analysis of nominal data. Thousand Oaks, CA: Sage
Publications, Inc.
Rozeboom, W. (1960). The fallacy of the null-hypothesis significance test.
Psychological Bulletin, 57, 416-428.
Sankaran, M. (1963). Approximations to the Non-Central Chi-Square Distribution.
Biometrika, Vol. 50, No. 1/2, 199-204.
Saporta, G. & Tambrea, N. (1993). About the Selection of the Number of
Components in Correspondence Analysis. In: J. Janssen and C. Skiadas
(eds), Applied Stochastic Models and Data Analysis. Singapore: World
Scientific.
Schmidt, L. (1996). Statistical significance testing and cumulative knowledge in
psychology: implications for training of researchers. Psychological
Methods, 1(2), 115-129.
Sheppard, C. (1999). How Large should my Sample be? Some Quick Guides to
Sample Size and the Power of Tests. Marine Pollution Bulletin, Vol. 38,
No. 6, 439-447.
Thomas, L. & Juanes, F. (1996). The importance of statistical power analysis: an
example from Animal Behaviour. Anim. Behav., 52, 856-859.
Thomas, L. & Krebs, C. (1997). A Review of Statistical Power Analysis Software.
Bulletin of the Ecological Society of America, Vol. 78(2). Available at:
http://sustain.forestry.ubc.ca/cacb/power/review/powrev.html
Van de Geer, J. (1993). Multivariate Analysis of Categorical Data: Applications.
Thousand Oaks, CA: Sage Publications, Inc.
Verma, R. & Goodale, J. (1995). Statistical power in operations management
research. Journal of Operations Management, 13, 139-152.
Weller, S. & Romney A.K. (1990). Metric Scaling: Correspondence Analysis.
Newbury Park, CA: Sage.
Yates, F. (1951). The influence of Statistical Methods for Research Workers on the
development of the science of statistics. Journal of the American Statistical
Association, 46, 19-34.
Yoccuz, G. (1991). Use, overuse, and misuse of significance tests in evolutionary
biology and ecology. Bulletin of the Ecological Society of America, 72, 106111.
Zar, J. (1996): Biostatistical Analysis. New jersey: Prentice-Hall International, Inc.
20