Chapter 10 from Everitt

Transcription

Chapter 10 from Everitt
Chapter 20
I Generalized Estimating
I
Equations: Epileptic
Seizures and
Chemotherapy
10.1
Description of data
111 :t clinical trial reportt?rl by Thit11 x l d Vtiil (l990),59 ~ ~ L ~ P T with
I~.s
cpilt?la,v \wrc r;~r~dorrliif~rl
l o goulxs rrer:eiving cit.11crt.11~anti-rpilcptic:
I r i ~ gpvngx1,irle or a glaccl,o ill ~(irlitiur~
t o sr.andnrd clietr~otl~eropy.
Tlie rlatnbcr of seiz111.mIVRS collntcd uwr four t.wo-wck pcriotlg. 111
addii,ion, n h~~cliu~c
seizure r n t ~was rccnnlctl fur each pdicnt-, I)aswl
( ~ I t,hc
I
eight-wcclc prcrar~cln~~~i~~nt.ion
seizure colait. Tllc age of each par i ~ n twas also rccordcd. Tlrr li~nixign~slinnof irllcrrsl is whet:tlsl' t h ~
:rretrt~cntpropibide rcduccs tlic Rtq11enc.v US epileptic sr?~ul.psconlrrareti wit11 piaccl~o. Tlre data are s h n ~ ~inn TnZ>Ic 10.1. (These dnta
.+IKo
appet~ri11 Hand t t 01.. 1994.)
4
Table 10.1 Data in e p i l . dta
subj
1
~rl
1U,I
31
5
L')
3
s:I
vl
tirnl
Imc
np
3
3
0
11
31
Table 10.1 Data in epil.dta (continued)
55221
3
.
i 4
Y
1
18
32
10.2 Generalized estimating equations
In this ckaptcr we mnsider an approach t o the analysis of longitudinal
data that iis very tliff~rentfrom random d w t s modeling d~czibe(1in
the prcvims tllaptcr. Instead of altempting to model the depcndenr~
hetnwn responses on the same kldividuals a arising from 1)etwmnwbjcct heterogcncity rapresented by random intercepts and posihly
sandom slopes, wc will r o n c e ~ l t r t eon estimating the marginal mmn
;Iructure, treating the deperldcnce RS a ~ u i s a n c ~ .
10.2.1
N o m d b y Olistvibuted responses
If we suypose that a nornlzlly distributed resprlnse is observed 011 each
:ildividual at T time points, then the basic regessior~model for l o ~ y i il~dinaldata bemmes (cf. cquation (3.3))
:$,here y: = ( y t l , g =-.
~ ,.,yTT),a: = ( ~ ~ , , ~ i. 2. ,,E. , T ) , X. is a T x ( p + 1)
I icsign matrix, ant1 @' = (A!. . . ,&) is B VCC~OP of regression pararrl-rcla. The residual tmms are assumed to haw a multivariate n o m d
:istril>u~tioriwith a eovariancc matrix ol mme prtrtiwlar form that is a
5nctiorl of (hnp~fillly)n small nurr~bc~.
of parmnetcr.s. Maximurr~likelimod extimation c m be uscd t o evtirnatn both thc parmeters in (10.1)
and t,heparamctcrs st mi bur in^ the covariance matrix (detals are givcn
:a Jennrirh and Schluchter, 1986). The ldler rut! often not of prirnaw
:!trlecst (they arc often refarcd to as nuisance praramctcrs): hilt. using
:romriarlcc rrlatrix that fails to match that of ehc repeated ImmureYents can lead t,o inefficient astirnntcs and invalid stmandad
crmrs for
-ZP pxdncters that am ol conctlrrl. nnmely the 0 in (10.1).
If each non-replicated elcment of thc covariarice matrix ir tretktcd ac
i separate pax-mchcr; giving M unstru~cturcdc m w h c c matrix, a11d if
-7erc arc no mivliirig clnta, then this approach is eswr~tidlyc q u ~ v d e ~ l t
- multivariate arlalysis of variance for longitudind data (see Everitt,
2 ~11).IIoa~evcr.iL is nftm more eficicnt to i m p w somr lneaningful
--.-ucture or1 thc cmarimce matrix. Tllc simplpst ('and nrost nnreali*
tic) sti'ticturc is fndependence with dl off-diagorixl cle~nrnts(~.hecowrianccs) ~ q u d
t o zero, and t,yp~callyall diwouel clements (thc wiancs)
cqnsl to each orher. Anotlrr cornmonly nscd simple structure, known
as mmpcrzad t;yrmmetq (for examnple, see Wirier, Ig'il), rcquirps Llmt all
eovrariances arc equal and all varianres are q a s l . This is just thc torrdation structure of n Iincar rmrlom intercept modcl described i11 the
previous rhapt~scxeept tliat tlie random i n t e r r ~ prnodcl
l
also rcqr~ircs
that the correlation hr positive.
Othcr rorrelrltion structures ir~dudcai~toreh~essive
htructilre where
the ro~relatior~s
dccreasc wit-h the distanr~bctvip~ntime points. Whatwpr the assumed wsrelntion stnlrture, all models may be eslimatcd
by m a x i m ~ ~lmi k ~ lhoorl.
i
IJnrortiinately, it i s generally not straigl~tfurwardto specify a multimrfnte model For non-normal responses. One sol~ition,disaiwcd in the
previous rhaptw: i s t o inducx rmidual d~penclmrfinrnong the respunss
using randoin effccts. An dternativc approach is to give 111) f,hc idea of
a modd altogether by uslnp; gmemZkm! cstimalir~gq ~ u t i o n (GEE)
s
a=
introduced by Liang and Zeger (1986). GencraIixed e*,iinnting q n a tions nw ewntialfy a m ~ ~ l t i m r i ~
extension
te
of the quasi-likdihood
approach cliscussd in Chapter 7 (sec also \l'edrlerhurn, 1974). In GEE
the parametere arc e u t ~ m n t dusirig "c?timat,ing equations" raemlhing
the s m r e equations for ntoximum likclihood estimation of the lir~eeat
model dw,ribecl in tile previous section. 'These ati1rmti11.g c q ~ ~ a t i o
only rcrluirc specification of a lirk and variantx: function md a correlation structure for the o h d resporiM cond~tiorid011 the covariate.
A4 iu tllc qmf-likelihood approach, thc paranlcteM can he atirnated.
erren if thc specification doer not. correspnnd t.o nxiy statistical rriodcl.
The regressior~meflicient,~represent inargina! cffects, i.e., they d e
tamins thc pop~~latiun
averaged relationships. Liang and Zeg~r(1886
show that the estimates of thew coefficients arc valid even when tlw
correlation structnr~is inmrrertly sprcificd. Corm* inferences can te
obtnin~tlusing roh~iste s t i w t ~ of
s the standard errors h n s d on thEsandm-icl~estimator for di~slereddata (c g., Uinder, 19883; Willima
2000). Thc parametem of thc corrchtion m<btrix, referrd tn as the
wo~kzrcgcmelrahon mat% arc twatcd WH n ~ ~ w a n
pammete~s.
~e
HOAever, Lindsey aiid Lmnbert (1998) aud Crotichley and Davics (1999
point sul thnt estimates arc rir, l o r ~ p rc o ~ a i a e n tif c~cndogc~ous"
ccvi~rixtcssi~chw baseline reuponsE5 arc in(t:ludecl in the model. Fortunatcly, inclusion of the bas~lincrespomy: ILS a c o m ~ i i ~ does
t e yie!?
coudstcnt cslimdttm of trraLtm~xlCefferts in c l i n i d Llhlnldaln such zi
rha epilepsy data&cunsidmcd hcrc! (sne Croudllcy mid Davics, 19O'J) as
long as t.he lnwdd does riot contnjn a hmlirlc by lrealinent uiterirctio11.
T h e r ~are sollie inlportar~ldiifereilces hetween GEE and random
rH'ccts mudcli~lg. First, whilc randwn effects modeling is ha%d un a
5~stisticxlmorlcl mrl typically rrlnxirn~~m
likelihood estiara.tion, GEE is
l a statistir:al model. Second!
1 a11 estimation rnctl~odthat is not b s ~ on
~ l w r cis an irnport,ant diFerence in Ll~eintwpi-etntion of the r e p s
,ion coefficients. In rmiclmn effmfa models, thc regression coefficierlls
represent cvnndiilanaalnr srsbject-sp~~ifir:
c f f ~ t for
s given val~icsof the
rat~donieffects. Fur GEE, on t11e other hand, t h ~ regcssion
!
cot%r i ~ n t nrcprcwnt rnurginal or ylopulatiov~ areraged effects. As we saw
ia (he thongllt disorder data in the p r w i o ~ achapter, cunditiotutl and
marginal rslabiorishila can be v e v difbrent. Eithcr may be of intcr-.st: for insbancc palients arc IikFly to utimt. to h o w the subject-specific
4Xwt of treatments, whercas I ~ a d t ~ccnrwtrlisk~
h
may be interested in
:,opulatio~~
averltgwl cFeuts. Whr~casrandom efecls rr~odclsallow t ha
~rarginulrelntioltship t,o ljr? dcriwd, GEE does not allow dcrivntiori of
-!lo conditiond ~*~laiionsl~ip.
Not.c that conditio~ialand marginal rela-ionships arc the sarm if an idsritit,y link is used a~trrd,in the rn of
:anrlom irilerccpt modcls (no random co&?fticieuis) : il a log link is spwi?.vrl (swUiagle e t id., 2002). Tl~irrl,GEE is oftmenp~cferrcdbecame, in
: I I I ~ L Y ~ to
% the rantloin effects apprued~,
the piualneter cstirostea are
31nsih3cnt cver~i f t h e correlation strnbure i~rnissyecifid (nlthuugh
-:!is is t , y l ~ c01i1.v if (;he mean structure is corr~xi.lyspecifid). Fourtll!
y i l i l e maximurti likc!ihood estimation of a eorrcct.ly specified model is
-311sist~iit
if dittn arc missin: at. random (MAR), his is not the case for
-:;EE rt-hich rcqrrircs t h a t rmponscs are ~niwingco~nplct,elyat rarldon~
SICAH); or tltfit*missingr~cssdepcnds only 0x1 the cnmriatcs irlcludcd
-rthe mudel. See Hxrrlin and Fill>@(2002) for a thorough introduction
- 1, GEE.
10.3
-
Analysis using Stata
gcnernlizcd estimating cquntions a p p r o x l ~a
;s askscribed ill Limig
c.d Zpe;er (I986), is iinplcmcnt-ed ill Stala's xtgea cornrl~muld. The
3 i r i co~nponcnts
1v11ichImve to be spedlied are:
-7.p
I
thc msunicd clistribution ol t.lic rwpoIise vnrixble (pj;irren ttic ar
vnrintsu), qvxified in the family () option - this detcrmin~sthe
varia.nce runctitni,
t,hc link hc~wecntkic rmponse varial>le and its linear prcdi~tor,
specifid in tlrc k i n k 0 nptioii. a ~ i d
w the slruchw of the working corrciation matrix, specified in th-correlation0 option.
lu general, it is not necessary to speciFy the link0 option sin=iw for thc glm corilrnand, the dcfa11It link is the ccannaical link for tb
specifier! family.
Since the xtgee cornmand will oRcn he mod with Ibe f amily (gauss
option, tomther with the idcntitg link function, WY m-ill illustrate tb:
option on thc pwt-na~aldepression data u s 4 in the prpvlous two c l m ~
ters hafore moving 011to ded with the epilepsy data in Tablc 10.1.
10.9.1
Post-natal dep~esszondata
T11e data are ,rehtaincd using
i n f i l e subj group depO depl dep2 dep3 dep4 dep5 dep6 / J /
using deprsss.dat, clear
reshape long dep, i(subj1 j ( v i s i t )
mvdecode -all, mv(-9)
To-hegin, wc fit a model that regmscs dep on group. visit, the?
intcrac.tion and v i s i t quared as in t,he previous eI1apt.m bul u n d r
the unrealistic mumption of indepcnd~nrs.The necessary rommmc
written out in i t s Tull~st,form is
generate gr-vis = group*visit
generate vis2 = v i ~ i t - 2
xtgee bep group v l s l t gr-vis visa, i(subj) t(visit1
corr (indep) link(iden) fmily{gauss)
///
(see Display 10.1). Here, the fitted model is simply a ~ilult~iple
r e p
sin11rnodcl for 365 o k m t i o n s which are a s ~ u r n dttu bc i r ~ d ~ p e n d e r
oforic a~~ot~her;
the estimiltcd s r d e parameter is jmt tlic residual me=
sguarc, and thc davimcc is equal lo the residrld slun of quww. P~stirnntedregraion coefficients ant1 their msodated standard ermindicate that the group by vissit irltcrartiorl is rignificant at thc 5 "
Icvel. Hmc~ver,treating the observations m b~dcpendentis unrealisr and will almost wrtainly lead Lo poor =timat& or the standard error:.
Stmdard errors for hetween-s~thjectfactors (hcrs group) W P likel~i he u n d c ~ ~ e s t m h chda c a ~ wrw are treating observations From t I c s a r
snhjecl rls independent, t l ~ ~increasing
?.
the apparent sample size; s t a d a d m r s for wiChin-~~~bject
f&)n (here vlsit, p - v i s , and via:
arc likely to be mresli~naterlsince n~ me not colltrolli~igfor residuz
betwccn-suhjoct variability.
Uk therefore auw abandon the ~ssurnptinrluf indaper~dcncem:
=timate tl corrrlatio~)mntrix having compoilnd symmetry (i.c., mrstraining thc correlations h e t m ~ r nt,he observations at any pair of tip-
:iE population-averaged model
3rmp variable:
:mk:
subj
~dentity
Family :
:orrelation:
h b e r of obs
M b e r of groups
Obs par group: min
=
BY& =
Gausalan
iadepsndent
Icale parameter:
26 89995
)tar80~ch12(356):
~ s p s r s i o n(Pearaon).
9578.17
26.89995
.390a383
,579783
4.89
--
max
Wald chi2(4)
Prob > cbI2
Deviance
Dispersion
Q.000
.2340665
-
=
=
n
366
61
2
5.8
7
269.51
0 . 0 0 ~
9576.17
26.89935
.5468102
?oints tu be cqual). Such a correlalion strulctume is spccificd using
:orr(exchangeable), ol' the rrhhreviatcd form corr(exc). Thc model
-an he fitted as Ibllows:
group v i s i t vsis vis2, i(subj) t ( v l s i t ) / / /
corr(exc) linkciden) famcgauss)
Inbtead of speci%ing the sub.joct and time identifiers wing the op-:ms i 0 mid t 0 , we can also dcclnre the data as bcing of the f m n
r t (for cross-ser:tior~al
timc scries) ~q follows:
xtgee dep
i i s subj
ti$ v i s l t
-.:id ornil the i 0 and t 0 options from riow on. Since both tlw link
=.:id thc fnmily corre~pordl o the ddauIt options. the saInc analysis
y a y be carricd out, using the shorter corrrnl~id
xtgee dep group v i s i t gr-vis v1s2, corr(exc)
4 c Display 10.2). Aftcr ~Lirnation,estat wcerrelation rcports the
~ t i i i ~ nuq~rkirig
t~d
"nil,lliuiS conelation matrix
e s t a t wcorrelat ioa, format lZ6.4g)
rhich is shown in Ilisplay 1U.3. Bcre t h e format0 option was wed to
:--<luccthe niirnba or clecimrtl p h s and therefore avoid vm7sof the
.cllrix wrapping over two lines.
Sote t,ha+,the standn.d error fur group has jncrewcd wtlcreas t
h
',r v i s i t , g i v i a , ald v i s 2 have dccremd as expcctcd. TIE est,i- a t 4 wit.hi11-subjcctcorrelatiur~rnalrix is corripound symrnet.rit:. This
s t r u c t u ~ eis fi.eq11ently not acceptable ~ i n r:nrrel..littions
c ~
b ~ t w e mp i i m
of obmrmtior~swiddv separated in linw will oftcn be lo\vcr than ror
obscrvatioris cloec~together. TIlis pallcrrl was nppnrcnt from Lhc s c a b
tcrplot rnat.rix gix>c.cnin Chaptcr 8.
To allorv Tor sue11 n pxilwn of corrctations among the repeat4 o h
se~vatiorls.we cau ~noveto
raatmgmaswe ~Lmck~m.
FOPexnmglc, in
a first-order autoregrewive sp~cifirationt11.e (:orreIatiorl betwerr~t h e
points r m d s is assulncd lo bc p l ' - y l . The necessary ~nstructionfor
fitting the prcviouslv considc~cdrnodcl bill with this first-ardor autoro
:rcssir.e structurc for tt~ccorrelations is
xtgee dep group v i ~ i tgr-vis vis2, corrlarl)
S2E population-averaged model
;roup and time vars:
Lank:
?m11y:
:3rrelarion:
Gaussian
AR(1)
-
I
Cosf.
S t d . Err.
a
-
356
61
m u =
=
213.85
W b e r of obs
=
Mumbsr of grwpa
Obr per group. =in =
wg '
Wald cbl2C41
Prob > chi2
27.10X8
? t a l e pramster:
dep
subj v i ~ i t
~dentity
Pzltl
-
I9SX Conf.
2
5.8
7
0.0000
Ineemall
-
Display 10.4
The estirnatm of the r e g w i o r ~cocfficiclii,~and tI~cirstandard ccrrors
Ir 13isplay 10.4 have changd hilt riot substar~tially. The rslirrlntcd
i t l ~ i n - s u b j e c tcorrelation matrix may again he obttained using
estat ucorrelation, format(X6.4g)
w Disl~lriy10.5) w l ~ i dhas
~ the expected pattern in which currela- ,ns dccrcasc suhtanlidly ns thc separation between the oh-tionti
--r,.rekw*.
Other wrrelat~onst.rrudm~sare available for xtgee, including the
-?:ion correlation(unstructured~in which rio coristrair~tsare placed
t11e mrrehtions. (This is esseutiallv qulvaler~tto ~uultivariateanal-:< of variance fur longitudinal data, cxccpt that Ihc wrin~lccis =--adt.0 he constant o r w time.) It might appear t h t ~ ~ s i this
n g option
210 r A Handbook of Stalkvrad
A n d kc 5 Usi~ay.Stah
-
--
P a t m a t e d within-subj correlat~oamatrix R:
Display 10.5
would he t l ~ cx
m
t sensible one to chouss for all data set.8. This is not.
however, the rax since i t necessitatw the esti~ntimntionof mauy nuisance
paramet,rrs. This can, in some cireun~stmccs,rmsc prnblcms in the
estimation of thwe pitramcks nf most inter&, particularly when the
sanlplc size is small and the number of timP points in Tnrge.
We now analyze the epilepsy dhta using a similar model as for tk
depression rhhn, brlt using thc Poisson distribution and lug link. 7 3
data arc amiiabb in a Stattta Rlc epil .dta and can be rcatI using
1
use e p f l , clear
LVc wiIl treat the hiweline rnaasurc EL? one of the rrspunsm:
1
generate yO = baseline
I
Sonit! ilsoful summy statistics c m b p obtnined using
summarize yO y l y2 y3 y4 ~f treat==l
(see Displays 10.6 and 10.7).
We see that the number ool ohwmtions is co~rstantover time
T
them appears to be no rlrupout. The rncaus nnd standard deviatiof yo arc larger than for the nthcr responsw because seixur~qSF=
counted over m %week period at baseline md orrev 2-week periods c
the neyllhscquent visits. Tho largest ~ I I oC
P y l in thc progabida
seems out ofstep with h e other rriaxirnum v~luesarid mgy indicate z z
011t1ier. Snmc graphics of thc data mav be uscctil for investigating 1%
possibility further: but fist it is rnnvcniant lo reshape the data frcm
its present "wide" form tr) t,llc "long': Form. Wc riuw reshape the
as follows:
Grnemlrs~rlP~stmtnnlmgEpunlaonu F~iiqrltc.Sczm~wund Chprraothmzpp
-------------------
W
211-
Variable
31
31
Y4
8.580845
8.419955
8.129D32
6.709671
102
A5
72
63
18.24067
11.85986
13.89421
11.26408
reahape long y , i (subj) j (visiz)
Sort subj treat visit
list in 1/12, clean
:ubj
1
visit
1
1
1
1
id
104
104
2
104
1
4
104
2
a
iw
2
1
a
2
3
106
106
2
2
3
3
0
3 1 0 4
P
0
i
y
11
6
3
3
3
traat
0
basellnu
age
11
31
1
31
31
91
31
0
0
0
H
il
3
o
s
0
li
11
li
11
o
11
3
3
11
0
108
106
107
6
D
0
0
ior
2
a
2.0
30
11
30
30
30
6
25
a
2s
Display 30.8
Pcrhaps the rnml useful grap11ic.ddisplay for invmtigating the data
is a set of graphs uf indiv~dltdresporlse prufil~s.Since we are pjmnirig
fit a Poisson nlodcl with the log link to the data, we takc thc log
transformation I~eforeplotting the response profilm. (We need to add
a p ~ i t i v elumber, say 1, hccausc some v i m ~ r eI : O U I I ~we
~
zero.)
to
generate ly = log(y+l)
Howcvcr, thc bnsclinr mensure rpprpmnts yeizurc counts over an &week
period, comparcd with 2-week periods for eadl of the olhcr time points.
We tli~xeforad i a h the b l i ~ l count
c
by 4:
replace ly
= log(y/4+1) i f visit-0
md t h plot
~ t h c log-count,~:
twoway connect ly v i s i t i f treat--0, bycsubj, ///
styleIcmpact)
pizleCMLogcount")
twoway connect ly v i s i t i f treat==l, bycsubj,
///
stylslccmpact)) yzitle("Log comt")
Ttie resulting graphs are s h m in Fignres 10.1 a i d 10.2. There is nobviom improvcmcnt in the progtbide muup. Subjccl 49 hrul moepileptic fits overall t.hm any other subject and might perhaps be m:sidcrcd an outlier (.we Exercise 10.2).
vkl
Figure 10.2: Rcsyr>lxseprofiles in ttlc trcatnd group.
-4s discuswl
in
Clinpt~r7. thc most plausibIe distribution for count
?acs ir: oRrn the Porswn rlistribut~nn The Paisbon distribution is spec- ~ in
d xtgee lnoclds tlsir~gthe npt~onfamlly(poisson) Tha log link
- impllcrl (since
it is the cm~lnlrallink). The b d i n r counts were
Ltdincd ovcl rn X-w~ckperiod whereas all suhsquent munts arc over
rn~ck
To model the wiznre uatr in courits per week, R P must
- - e r e k t r ~IISP the log ohs-ervatian period log(p,) as nn u%et (a eovrtri<ttr
:th reg~msion~~eRcicient
set to 1). Tlw n1ude2for thc mnean count 0 , ~
;:# r h a t
the r ~ t is
e modeled as
-.-
.\c cm compute the required offset ming
I
1
generate Lnobs
-
cond(vxsit==O ,kn(8),In(2))
T > I ~ O W I I JD~igglc ef al. (2002). we will allow the log rntc to &mge
-reatinant pm~rpsprcilcmnstrmttafter t l u bas~liiincr~s~ssrncnt.
Phc
--esqarp cm'ariats. ail iriclicator for the post-hascline tisits and M I
interaction betweerr that indicmr and treatment group, are created
using
generate post
=
visit>O
generate tr-post = trsat*post
We wilI also nuntrul for thc agc of the patients. The summary tables for
thc scixure data given w page 210 provide strong smpi~icnlevidence
t h a t there is ovcrdispersiun (the w i a ~ c e are
s greater *.tian the mcans).
and this rAn be incorporntcd using the scale(x2) option to allow the
dispersion parmeter 4 t o bc csli~nated(see also Chapter 7).
iis subj
age treat post tr-post, corr(exc) ZamilyIpoia) ///
off set(lnobs) scale(x2)
xtgse y
GEE ppulatada-eusrqed m d e l
Group variable'
Link.
Famlly:
Corralation:
18.48008
-.0i7737B
.ll07981
-.1036807
,0148614
-201945
.ISOW35
,213317
2.265255
,4400816
-.0322513
tr-poat
_cons
lnobs
(Standard error.
-max --
I d e r of oba
Number of g r w p s
=
Obs psr @oup: min
aVg
exchangeable
Scale parameter:
treat
mbj
log
Paisaon
w a d caia(r)
&ah > chi2
-.061385
-.a135922
-2.17
-0.09
0.Om
0.930
0.74
0 460
-.I83321
-0.49
5.16
0.627
-.57.17742
0.000
1.402711
296
59
5
5.0
6
5.43
0.2458
-.0031176
,3780176
,4049173
,31441s
3.12TS
Iof fret)
scaled using aquara root of Pearson X2-based B ~ s p e r s ~ o o l
Display 10.9
T h c ou tpnt assllming WI exchangeable correlation slructnre is gi~~"in Display 10.9, and thc cstimatwl rorrelatiu~lrr~alrixis ohtained usitc
xtcorr.
(see Display 110.10).
Tn Display 10.9, the parameter 4 is evtirmtcd a? 18.5,i n d i c a t k
W V P ~ Pov~rdinpersiii~u
i11 Lhex data. We briefly illu~trat~e
how import%i t m y t o dlw for ovcrdispersion by omitting Ihe scale(x2) optior.:
Fstmated w l t h l r s u b j correlation matrix A :
Display 10.10
xtgee y age treat post tr-post, corr(exc) family(pois) / / /
off set(lnobs)
GEE
populmt~oo-averagedmodel
Group variable.
link.
?amlly:
Correlat3on:
Scale parwmster:
Number o f
aubj
log
Poisson
exchangeable
1.
O ~ B
Number of groups
Obs p e r group min
avs
max
Wald df214)
Pxob > cba2
---
295
59
5
'
6.0
6
=
100.38
=
I)
0000
Display 10.11
Thc rcsulcs givc11i n Display 10.11 show that t h e stmdard errws me
ror~murh smaller than before. Ewn if werdisperuion had nut been saw
~er.twI,this error co11Id have h e w detected lr, using the vce(robust1
.~ption(.we Chapter 7):
xtges y age treat post tr-post, corr(exc) iamily(pois) ///
o f f s e t Ilnobs) vce(robust)
Tlic results of thc rohust rcgrrssion in Display 10.12 are r ~ m a r k a b l y
thosc of thc ovrr.dispersed Poimison modpi. st~ggwtingtlxit.
-7.p lntlcr ISa ~caso~iable
.'lnode1" for the data.
- n~ilnrto
flumber o f obs
Nmber of group8
GEE populatlm-amrag& model
subj
log
Group variable.
Link:
P5IUlly:
Co~rePatiw:
PO~S~OE
exchangeable
Scale parameter:
Y
=
=
Obs per group: m i n =
Wg '
max =
Wald c h i 2 ( 4 )
-
295
59
5
5.0
5
%.B5
1
Pro8 > chi2
=
0.1442
(Std. Err. adjusted f o r clwtsring on smbj)
Coef.
Semi-robust
Std. F.rr.
z
P>lal
195X Ccnf
. ~ntervall
w
txaat
post
Cr-DOSt
Display 10.12
The estimated coefficient of rr-post represcnh the estimated differcncc in the c h m g ~in log seivurc rate from bnsclinc to post r a n
domizati~nbctwcen the plareho and progahide groiips. In thc placebr
group thwe itr or1 incrcasc in t,he log seizurc rate of O.ll(lX, nntl in the
prngabide g o u p thcrc is an inmaue of only 0.007 (= 0.11118 - .103il.
Hmwer, t11e di ffcrence is not ~igriificai~t
(p= 0.68). The exponential a!
the interaction coeffiri~ntgiw an estimated incidence rate ratio, hem
the ratio of the rclatiw increase in seizlrrc rate for the treated pahienrs
cornpard with the cor~tmlpatients. The rxl>oncntirttedcorffic:ier~taui_
the corresponding nnlfi(lence i n t c m l can he obtained directly ilsinp
thc ef o m option in xtgee:
treat post tr-post, corr(exc)
family (poi.) off s e t (lnobs) scale(x2) e f o m
xtgea y age
///
The r ~ s t ~ lin
t s DispIny 110.13 indicate that the relative increase k
scizurc rate is 10%lunw irl thc trcatetl guuy compared with the contmr
group, with a 95% cullfidcncc interval frorrl41X lower t.o 37% sate.
Ijowwer, before inLc~.gre~ing
theue est imaws, we shuul~iperforsome d i a p ~ ~ t i c sStandardizarl
.
Pcnrson rmiduals can bc useful fir
ident,ifying potcntiat olitliers ( w e equation (7.9)). Tliesc can be founi
by first llhiug the predict cnmnland to o b t ~ i npredicted munts, su-trncting the o h e n d counts, and dividirg by the estimat~txds t n n d a ~
deviation
where 8 is thc -timated dispcrsioa parameter:
G,
quietly xtgee y treat baseline age viait, corr(exc) / / /
GEE populatrcm-averaged m&l
Group u e r ~ a b l e :
Link.
Family :
Correlation:
subj
log
Polrsnn
sxchmgeable
Scale parameter:
18.43008
.Q6SZ632
age
treat
tr-post
98237
1.117168
.9016131
lnobr
( o f f set>
paat
,0143927
.1983847
3676464
,192308
-2.17
-0.04
0.74
-0.49
--
Ihrmber of oh6
=
Humber of p~oups
G%s pmr p a u p . min =
avg
max =
Vald chla(4)
R o b > chi2
0.030
0.930
0.460
0.627
285
59
5
5.0
6
5.43
0.2458
.9404613
.6612706
.W4&9873
3.4593B9
.8325009
,5934667
1.499179
1 369456
arrors scaled uslng a p r a root of Pearson Ka-based diaperslon)
(Stan-
family(pois) scaleIx2)
predict pred, mu
generate pres = (y-pred)/sq?A (e(chi2_dis)*pred)
Boxp1c)ts of thrse
resid~~als
at endl vlsit are obtaincd rwinl:
s o r t visit
graph box stpres, medtype(line1 ovet(visit,
///
relabel(1 "visit 1" 2 "visit 2" 3 "visit 3" ///
4 " v i s i t 4"))
The resulting graph is sllown in Figure 10.3. Pearson residuals grcnter
illan 4 are c c r t i n l y a camp for concern, so we Can check which &i~bjects
-hey belong to r~sing
list subj id ifstpres>4
I
I
subj
~d
I
iubjcct 49 appears t o he an outlier due to rxtrcrriely large mcnnnts
ar .wwin Fiplrc 10.2. SuRject 25 also has an i~~itl.lmuaually
lwgc count at,
-.+it 3. It woulrl be a good idcn to repeat, lhe analysis without ni~bject
:!I t o sce how much the I T ' S I ~ ~ P are
I
affmtdby this unusud wll>ject,(sce
:sercisc 10.2). This [:an be vjebveved as a semiti~xtgandvbi~.
-
ID
-
P
-
N
-
a
-
m
S
z
rn
i
*
&&&€
I
N.
0
1
2
3
4
Figurc 10.3:St~ildnrdiicdPcmun residuals.
10.4 Exercises
10.1
-
Treatment of post-natal depression
1. For the depression data, rompare the rcsults of GEE wirk
a compouttd spimefric structure with orlfinxy linear IF
gression where standart! crrors ace corrwted for the withi*
subject corrclatior~using:
a. the options, vceIrobust) cluster(aubj), tu obtain tk
sandwich estimator for chrst~reddata (sw hclp for regre- 4
wti
b. bootstrappirg,hy ampling mbjecb~vithrmplnremcnt. T b i
]nay he achlewd 11s1ngG ~ cbootstrap prefix, t o # c t k
with thc option cluster Csubj 1.
10.2 Epibptic sei~uresand chemotherapy
1. Explorc othrr possiblc correlation str~lrtilresfor Ihe s e b
data in t.he eontcxt of a Priswn rnodcl. Exnsnixle the rob-b-tandmd ~ r 1 . oin~ewh
~
c m .
2. Repeat the above a~mlyses,but excluding subject 49 (a%
a p p m t o he a11oullicr). Complirp the rasulls.
10.3 Thought disorder and schizophrenia
1. f i r Ihc Lhought disorder data dismEserl i11 the previous dlag
ter, estimate the effect of early, month
their interartion
on thc logit of though:hl disorder i~singGEE uith an exchmg+
ablc corrclntion structure. USP r o h w t standard errors.
2. Interpret the estimates.
3. Plot tl~eprediciedprobability over time for early o m . t t uvman
(using graph twoway function, see Section R.3.2). and comyare the curve with thc curves in Fignre 9.10.
10.4 Driver education
In a randoinized experiment, to iuvestigate if drivcr education reduces tlic numhcr ol oollisionq and trfiffic. violatiurn of teer~qvrs
(Stork et n l . 1983). aligilde h i ~ h5sch001 students were randomi z d to thrw gmilps: safe perfumsnw curriculum (SPC),p w
drivcr license curriculnm (PDL), ar~dcor~trol.\t%crca Lhc SPC
w~ a 70-hoilr state of-the-art program; the PDL wxs a 30-hoisr
cour>woontaining only the minimum training required to pms
thc driving test. The control g r r ~ i ~receil~ed
p
no training Lhrough
the sclwol system and w ~ taught
5
by t,he parerits mcl/or private
training schools only. During thrcc yeaw of follow-lip, the otr
rurrence of collirionv and moving violat~onswere nht.ainer1 \]sing
r ~ o r d from
s
the stC1tcDepartment, of hlntor Vehicles. (The data
are fro~nDavis, 2002.)
The wia1)Ies in drivers-dta are:
program: group (strlugs w i a b l c with valucs SPC, PDF, and
Conbrol)
m gender: gmder (string rwiahlc with v a l ~ Male
~ ~ s and Fe~nnlt?)
Icolt to co13: i11dica;tor For al lexst one mllihion or moving
vlolatirm dnririg years 1 Lo 3
I num: number of times thc rcsponsccovariate pattern o c r ~ n e r l
I. Prepare the data for mdj~isusing GEE. [Hint: make sure
to expand the data first using expand num, then rmhape t r l
long.)
2. Investigate t h e effect or lime, program, gender, and the progrtvn by gender irltcraclio~ion the odds of at least oue collision
or moving vinlat,ion using gcncralizcd eatilnxting equations
wilh n, logii, link and unst,ructured corrclntio~is. Usc robust
starrdord el rors t.11roughout this exercise.
3. Pcrforrri Q \C7a1d Ccst for the interaction tcrms and rcmove
thcm it Ihc tcst is not significant at ttrc 5% 3cvcT.
4. Ry inspecting the elirnatd wrrelittioii matrix, c h o m tb
correlat~onst,ructmethat mppem to be m w t appropriate
wtimat~t.he model with that correhtion struct,wc.
5. Interpret tlic odds ratio entimatw for thc firial n~oclel.