THE EFFECT OF TRANSMISSION MODE ON GENETIC DIVERSITY
Transcription
THE EFFECT OF TRANSMISSION MODE ON GENETIC DIVERSITY
The Pennsylvania State University The Graduate School Biology Department THE EFFECT OF TRANSMISSION MODE ON GENETIC DIVERSITY IN ZUCCHINI YELLOW MOSAIC VIRUS A Dissertation in Biology by Heather Simmons © 2011 Heather Simmons Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy December 2011 The Dissertation of Heather Simmons was reviewed and approved* by the following: Andrew G. Stephenson Distinguished Professor of Biology and Assistant Department Head for Research Dissertation Co-Advisor Edward C. Holmes Professor of Biology and Eberly College of Science Distinguished Senior Scholar Dissertation Co-Advisor Andrew Read Professor of Biology and Entomology Eberly College of Science Distinguished Senior Scholar Chair of Committee Fred Gildow Professor of Plant Pathology and Head of Plant Pathology Department Michael Axtell Associate Professor of Biology Douglas Cavener Professor and Head of Biology Department *Signatures are on file in the Graduate School iii ABSTRACT This dissertation consists of six chapters: an introduction, four data chapters and a conclusion. In the introduction I provide general background and information on the study system, zucchini yellow mosaic virus (ZYMV), and one of its host species, Cucurbita pepo ssp. texana (a wild gourd). Also included in this section are background on the methods that I have used, which are Bayesian coalescent and tree building methods. The first study (chapter two) was motivated by the fact that plant RNA viruses were considered more genetically stable than animal RNA viruses. Animal RNA viruses are assumed to achieve extremely high levels of genetic diversity as a result of their high mutation rates, rapid replication rates and large population sizes. However, it was believed that the same did not hold true for plant RNA viruses due to a combination of lower mutation rates, weaker immune selection, as well as the result of genetic bottlenecks during systemic movement through the plant and during horizontal transmission by aphids. Therefore, we determined the mean rate of nucleotide substitution for the coat protein (CP) of Pennsylvanian ZYMV samples using a Bayesian coalescent approach to be 5.0 x10-4 subs/site/year (4x10-4 - 8x10-4), which is within the range of those found for animal RNA viruses. As scant data were available on the timescale of the evolution of this virus within the Cucurbitaceae (squash, melon, cucumber), using the same approach we found the time to the most recent common ancestor for the lineages of ZYMV we sampled to be approximately 400 years (HPD: 119-771 years) with a possible origin in Asia. In addition, we found evidence in support of purifying selection (dN/dS = 0.108). We also undertook an analysis of phylogeographical structure and found in situ evolution of ZYMV within individual countries, suggesting intermittent movement of ZYMV across geographic boundaries. iv Since we had established that the substitution rate estimate was in accord with those previously observed in animal RNA viruses, we sought to determine if plant RNA viruses exhibit quantifiable intra-host genetic diversity in the second study (chapter three). Most plant viral genetic diversity studies had focused on genetic diversity at the inter-host level; however, there was no consistency in the results of those studies that had considered intra-host genetic diversity. In addition, it was believed that population bottlenecks associated with both aphid-vectored transmission, as well as with systemic movement through the plant, drastically reduced the effective population size. Although there had been some in vitro work on the effect of population bottlenecks on viral genetic diversity in plant viruses, little work had been conducted in natural systems. Therefore, to assess intra-host genetic diversity, as well as the effect of the aphid induced population bottleneck on viral genetic diversity, we generated intra-host sequence data for the CP gene of ZYMV from two horizontally transmitted populations: one aphid-vectored and the other mechanically inoculated (to avoid aphid-related bottlenecks). We also sampled multiple time points from individual plants to assess intra-host viral genetic diversity. We determined that despite the relatively frequent generation of mutations, most of these occurred only transiently, as they were deleterious and tended to be purged rapidly from the population. There appeared to be more population structure in the aphid vectored clones as indicated by multiple clones bearing the same mutations, the presence of a distinct sub-lineage, as well as several clones being more than one mutational step away from the consensus sequence. We also observed possible evidence of complementation occurring in trans. Unlike most comparable studies, we quantified the error rate associated with the RT-PCR procedure used in this study. In doing so, we determined it was high enough to cause a portion of the mutations detected, indicating future intra-host studies of this nature should quantify the extent to which detected mutations are artificially induced. v Although the CP is the most frequently studied protein of ZYMV, it is a multifunctional protein that is not the sole protein involved in aphid transmission. Therefore, we decided to undertake full genome sequencing of these samples in the third study (chapter four). We had sequenced a limited number of clones with conventional cloning and Sanger sequencing (we averaged 35 clones per sample), but it was extremely difficult to detect minor variants with this method. Thus, we sought to increase coverage by uncovering mutations present in the population at low frequencies using deep sequencing. We used the same aphid vectored and mechanically inoculated samples from the previous study with a few modifications: we increased the number of time points in the field samples, and increased the number of serial passages in a mechanically transmitted experiment. We found that mutations persist during inter-host transmission events in both the aphid vectored and mechanically inoculated populations, suggesting that the vectorimposed bottleneck is not as extreme as previously supposed. Likewise, we found that mutations persist intra-host over time, indicating that systemic bottlenecks may not constrain viral genetic variation as severely as previously suggested. In addition, differential selective pressures as a result of transmission mode was suggested by the presence of minor alleles that move to fixation in the aphid vectored plants, but remain as low frequency alleles in the mechanically inoculated plants. We determined that the high level of coverage obtained during deep sequencing makes it the preferred method for detecting low frequency variants in the population. The fourth study (chapter five) was prompted by the results I obtained while procuring vertically transmitted samples of ZYMV for sequencing, which showed that the seed transmission rate of ZYMV was three orders of magnitude greater than the most commonly cited rate (0.047%). Whether or not seed transmission occurred in ZYMV was a controversial issue as the rates in the literature ranged from 0-18.9%. Therefore, to definitively determine what the seed transmission rate of ZYMV was in C. pepo, we measured the seed transmission rate of this virus vi by visual inspection, RT-PCR, and antibody tests. We found a seed transmission rate of 1.6% using RT-PCR, and showed that vertically infected C. pepo plants are capable of initiating horizontal ZYMV infections, both mechanically and via an aphid vector (Myzus persicae). Thus, it appears that ZYMV infected seeds may act as viral reservoirs, thereby accounting for the current geographic distribution of ZYMV. We also found that vertical ZYMV infection in C. Pepo results in virtually symptomless infection and that antibody tests failed to detect vertical ZYMV infection, suggesting that current methods used to detect seed-borne variants of this viral pathogen need to be modified. This dissertation explores the nucleotide substitution rate of ZYMV, the patterns and extent of viral genetic diversity within individual hosts, the effect of transmission mode on this diversity, as well as the vertical transmission rate of this virus. As a group, these studies reveal the underlying mechanisms of an emerging RNA virus that will serve to aid in managing this devastating crop pathogen. In addition, these studies highlight the need to consider how methodological choices may impact viral population genetic results and, by extension, data interpretation. vii TABLE OF CONTENTS LIST OF FIGURES.................................................................................................................. viii LIST OF TABLES ................................................................................................................... ix ACKNOWLEDGMENTS........................................................................................................ x Chapter 1 Introduction ............................................................................................................. 1 The study systems ............................................................................................................ 3 Methods ............................................................................................................................ 11 Chapter 2 Rapid evolutionary dynamics of Zucchini yellow mosaic virus .............................. 15 Abstract ............................................................................................................................ 15 Introduction ...................................................................................................................... 15 Methods ............................................................................................................................ 18 Results and Discussion ..................................................................................................... 20 Chapter 3 Rapid turnover of intra-host genetic diversity in Zucchini yellow mosaic virus ..... 24 Abstract ............................................................................................................................ 24 Introduction ...................................................................................................................... 24 Methods ............................................................................................................................ 28 Results .............................................................................................................................. 32 Discussion ........................................................................................................................ 38 Chapter 4 Deep sequencing reveals persistence of intra- and inter-host genetic diversity in natural and greenhouse populations of Zucchini yellow mosaic virus ............................. 41 Abstract ............................................................................................................................ 41 Introduction ...................................................................................................................... 41 Methods ............................................................................................................................ 45 Results .............................................................................................................................. 50 Discussion ........................................................................................................................ 59 Chapter 5 Experimental verification of seed transmission of Zucchini yellow mosaic virus .. 64 Abstract ............................................................................................................................ 64 Introduction ...................................................................................................................... 64 Methods ............................................................................................................................ 67 Results .............................................................................................................................. 70 Discussion ........................................................................................................................ 72 Chapter 6 Discussion............................................................................................................... 76 References ................................................................................................................................ 83 viii LIST OF FIGURES Figure 1-1: Diagram depicting different cell types that can be infected by the virus from Principles of Plant Virology ............................................................................................. 10 Figure 2-1: Maximum likelihood tree of 55 ZYMV CP sequences. ........................................ 21 Figure 3-1: Experimental design of study ................................................................................ 30 Figure 3-2: Minimum spanning tree of the sequences ............................................................ 36 Figure 3-3: Spatial distribution of mutations in the CP gene from both the field and greenhouse experiments ................................................................................................... 37 Figure 4-1: Schematic representation of the field experimental design showing the spatial relationship between individual plants ............................................................................. 46 Figure 4-2 Representative simulation of the resampling of illumina reads to estimate the effect of coverage on the detection threshold of minor alleles......................................... 52 Figure 4-3: Effect of coverage in the probability of detecting the ZYMV coat protein alleles ................................................................................................................................ 53 Figure 4-4: Variation in allele frequency over time and space of ZYMV variants ................. 57 Figure 4-5: Distribution of mutations across the ZYMV genome under field and greenhouse conditions. ..................................................................................................... 59 Figure 5-1: Minimum-spanning tree of the seed clones........................................................... 72 ix LIST OF TABLES Table 2-1: Bayesian estimates of population dynamic and evolutionary parameters of the CP gene of ZYMV ........................................................................................................... 22 Table 3-1: Summary of the ZYMV CP sequences from each infected plant under aphidvectored (field) and mechanically-inoculated (greenhouse) transmission ....................... 33 Table 4-1: Summary of genome coverage statistics of Illumina sequence data ...................... 51 Table 4-2: Summary of the 27 variants found in more than one sample ................................. 55 x ACKNOWLEDGMENTS First and foremost, I would like to thank my advisor Dr. Andrew Stephenson for his support over the past five years. In particular, I am extremely grateful that he had enough faith in me to allow me to pursue my ideas, no matter how ridiculous, and without which this thesis would not have been possible. I feel honored and privileged to have had the opportunity to work with Dr. Edward Holmes from whom I have learned more than I could possibly begin to list in the allotted space. It has been a real pleasure to have the opportunity to work with Dr. Fred Gildow, to whom I am indebted both personally and professionally for his advice, as well as the use of his lab, greenhouse and resources. I would also like to thank Drs Andrew Read and Michael Axtell for their invaluable insights, comments and contributions to this thesis. In addition to my committee I have been extremely fortunate to be surrounded by a tremendous network of collaborators, colleagues and friends. I would like to thank Dr. Stephen Schaeffer, who has always been happy to provide advice, equipment and freezer space. I am extremely grateful for the expertise and help that I obtained from Tony Omesis and William Sackett —Tony for maintaining my endless experiments in the greenhouse, and William for his advice, maintenance of my plants and aphids, as well as for teaching me how to perform transmission tests. I am deeply appreciative of Kari Peter for taking me under her wing and I am indebted to Siobain Duffy and Ben Dickins for their advice and help throughout my PhD. I am extremely grateful to my undergrads, Melinda Bothe and Sarah Scanlon, who slaved away performing thousands of mini preps and RNA extractions. I would also like to thank the 314office crew (Miruna Sasu, Andre Wallace, Lindsey Swierk, Renee Rosier and honorary office crew member Dominique Cowart) for their support and kindness. To my friend and colleague, Joseph Dunham, I would not have made it this far without you. xi Last but not least I would like to thank my husband, Aaron Parker, for his incredible support and patience throughout this process, and most of all for not divorcing me, and to my son Bradley, who has had to sacrifice so much while I have been in school: thank you for not disowning me. Chapter 1 Introduction As a result of rapid replication rates, large population sizes, and high mutation rates, populations of RNA viruses are thought to exhibit extremely high levels of genetic diversity. Understanding the patterns of intra-host viral diversity is key to understanding the underlying evolutionary mechanisms in RNA viruses, as high levels of genetic diversity have been linked to the capacity of these viruses to evade host resistance mechanisms (Feuer et al, 1999; Lech et al, 1996), switch hosts (Jerzak et al, 2008), and alter virulence (Acosta-Leal et al 2011). Estimates of the rates of molecular evolution in RNA viruses range between 10–2 to 10–5 nucleotide substitutions per site, per year (subs/site/ year) (Duffy et al, 2008). When I began my dissertation project, it was believed that plant RNA viruses evolved more slowly than their animal counterparts (Blok et al., 1987; Fraile et al., 1997; Kim et al., 2005; Marco & Aranda, 2005; Rodriguez-Cerezo et al., 1991). This was thought to be due to weaker immune mediated selection, lower mutation rates and the effects of population bottlenecks on the viral population (Garcia-Arenal et al., 2001). Hence, the first goal of this dissertation is to examine this assumption that plant viruses evolve more slowly than their animal counterparts by computing the mean substitution rate for the coat protein (CP) of Zucchini yellow mosaic virus (ZYMV). Although consensus sequences are valuable for inferring phylogenetic relationships between populations, they are less informative of intra-host genetic diversity because the consensus sequence represents average viral diversity within a population, typically the most prevalent viral strains, masking the diversity of individual virions. In addition, most plant RNA 2 viral genetic diversity studies had been conducted at the inter-host level, and those that had examined intra-host viral genetic diversity reported conflicting rates. For instance, limited (<0.1%) intra-host genetic diversity was observed by Turturo et al. (2005) in Grapevine leafrollassociated virus 3, while higher levels of intra-host diversity were observed by Teycheney (2005) using Banana mild mosaic virus, who observed divergence levels of more than 15% in a third of the sequences obtained. Intermediate levels of nucleotide diversity (ranging from 0 to 2.4%) were found by Jridi et al. (2006) using Plum pox virus measured over 13 years in a prunus tree. Although the population sizes achieved by plant RNA viruses are expected to be extremely high (e.g. 1011 – 1012 virions per infected leaf in Tobacco mosaic virus) (Garcia-Arenal et al, 2003), it is believed that that the effective population sizes (Ne) are significantly lower (García-Arenal et al. 2001), mostly as the result of population bottlenecks. Population bottlenecks are thought to occur during several stages in the viral lifecycle: during vector transmission, during systemic movement through the plant (that occur as the virus moves from cell-to-cell and tissueto-tissue), and as the virus enters the germ line. In fact, several studies report extremely low numbers of virions being transmitted per transmission event. Moury et al (2007), using an in vitro system, reported on average, only 0.5-3.2 Potato virus Y virions are transmitted per aphid; Ali et al (2006) determined the number of virions transmitted from mechanically infected squash plants to healthy plants via aphids (Aphis gossypii and Myzus persicae) was three virions on average for both aphid species. Betancourt et al (2008) using Cucumber mosaic virus (CMV) estimated that only one or two complete genomes of this multipartite virus are transmitted by aphids. Similar drastic population bottlenecks have been reported during systemic movement. For instance, Sacristan et al (2003), using Tobacco mosaic virus (TMV), estimated that the founding population in a new leaf after systemic movement within tobacco to be between two and 20 virions, and French & Stenger (2003) determined that approximately four virions of Wheat streak mosaic virus appeared to be involved in the invasion of new tillers of wheat. Likewise, Li & 3 Roossinck (2004) reported similar results from examining the movement of 12 experimental mutants of CMV in tobacco, in which they found that an average of seven mutants were found in the eighth leaf and an average of five in the 15th leaf (distance from inoculated leaf). Genetic bottlenecks have also been observed in cell-to-cell movement of Soil-borne wheat mosaic virus, where Miyashita & Kishino (2010) determined the cell-to-cell bottleneck to be ~6 virions for the initial movement from the infected cell and ~5 virions in subsequent movements. Therefore, severe bottlenecks appear to be common modifier of plant viral populations and are likely to have a large impact on virus evolution. Although several studies have explored the effect of artificially induced population bottlenecks, very little work had been done to asses the effects of population bottlenecks as they occur in nature (Li & Roossinck, 2004). In addition, a comparative study undertaken by Schneider and Roossinck (2001) showed that mutation frequencies tended to be higher in plant protoplasts than in intact plants, indicating that in vitro studies are not necessarily representative of in planta conditions. Thus, the second goal of this dissertation is to assess the impact of population bottlenecks on intra- and inter-host genetic diversity in plants growing under greenhouse and field conditions. The Study Systems Zucchini yellow mosaic virus Zucchini yellow mosaic virus (ZYMV), a member of the family Potyviridae, is a singlestranded, positive-sense RNA virus. Although ZYMV was initially discovered in Italy in 1973, it was not formally described until 1981 (Lisa et al., 1981). Remarkably, within the next two decades, this virus achieved a worldwide distribution and is thus considered to be an emerging virus (Desbiez & Lecoq, 1997). Although the virus is present in temperate, subtropical and tropical regions, few potential reservoirs have been identified. Natural infection appears to be limited to members of the Cucurbitaceae, and the virus has been reported in wild cucurbits in the 4 United States, Jordan and Sudan; however, no natural reservoirs have been reported in temperate regions (Debiez & Lecoq, 1997). Symptom severity is dependent upon the time of infection — the younger the plant is when infection occurs, the more severe the resulting symptoms. In addition, the strain of ZYMV and the environmental conditions, particularly temperature, appear to affect symptom severity (Desbiez and Lecoq, 1997). ZYMV infection often results in severe stunting of the entire plant, as well as a distinctive yellow mottling of the leaves, and infected leaves often exhibit blistering and lacination (Desbiez & Lecoq, 1997). The fruits of ZYMV infected plants are often mottled and distorted and although they are edible, they tend to be unmarketable. Cucurbit (squash, melon and cucumber) production in the United States alone is estimated to be worth 1.5 billion per annum and cucurbits rank among the 15 most important agricultural crops in the United States (Cantliffe et al., 2007). Given that ZYMV has the capacity to reduce agricultural yields up to 94%, it is an extremely significant crop pathogen (Blua & Perring, 1989). Virus transmission ZYMV is transmitted by aphids in a non-persistent manner. Also known as noncirculative or stylet-borne transmission (Nault, 1997), the virions remain on the stylet of the aphid, where the aphid is believed to act as a “flying syringe”. Acquisition and inoculation occur during a brief (< 1min) epidermal puncture that is part of a gustatory based food selection process (Nault & Styer, 1972, Powell & Hardie, 2000). The intracellular portion of the aphid probe has been divided into three sub phases (II-1, II-2 and II-3) (Powell et. al., 1995). Aphids are thought to acquire virions during a brief (5-10 seconds) intracellular probe of either epidermal or mesophyll cells (Lopez-Abella & Bradley, 1969; Powell, 1991) in II-3 (Martin et al., 1997). Viral inoculation is thought to occur while the aphids are ejecting watery salvia during the first intracellular puncture (II-1) (Powell, 2005). Ejection of watery salvia continues until a mesophyll or epidermal cell is punctured, at which point it is believed that the watery salvia may switch to 5 gelling salvia (Martin et al., 1997). The virions are thought to associate with the distal third of the maxillary stylet (Wang et al., 1996). The virus is transmitted in what is termed the helper strategy, which differs from the capsid strategy, in that the coat protein (CP) does not interact directly with the aphid stylet but rather the CP interacts with the aphid mouthpart through an intermediary called the Helper Component protein (HC-Pro). Therefore, transmission occurs when the DAG motif on the CP interacts with the PTK region of the HC-Pro and a secondary motif on the HC-Pro (KLSC) interacts with the stylet. The key difference between the helper and capsid strategies is that in the helper strategy, the HC-Pro and virion can be picked up separately with the effect that a given HC-Pro can transmit a virion from a completely different plant or even from a different leaf of the same host (Pirone & Blanc, 1996). This may have significant implications for the maintenance of genetic diversity in the viruses that use this strategy. To date, 26 aphid species have been shown to be capable of transmitting ZYMV (Katis et al, 2006), although with differing efficiencies. The two most efficient transmitters of ZYMV in laboratory and field tests have been shown to be Myzus persicae and Aphis gossypii, with 41% and 35% efficiencies, respectively (Castle et al., 1992). The aphid vector remains viruliferous for a very limited time period (~five hours at 21°C) after acquisition of the virus (Fereres et al., 1992), which suggests that aphids may not be directly involved in the long distance dissemination of ZYMV. This, in combination with the current worldwide distribution of ZYMV, and that fact that there are no known reservoirs of ZYMV in temperate regions, raises the possibility that vertical transmission of ZYMV may be instrumental in the dissemination of this virus. Thus, the third goal of this dissertation is to assess the rate of seed transmission in ZYMV and to determine if vertically infected plants are capable of initiating horizontal infections. 6 Seed transmission within the Potyviridae is not uncommon, but how the virus enters the germ line is currently unknown. However, there is some evidence in pea seed-borne mosaic virus (PSbMV) that the virus uses the suspensor as a mode of entry into the embryonic tissues. Once fertilization has occurred the zygote will undergo an asymmetrical cell division, resulting in a small apical cell, which will become the embryo and a larger basal cell (commonly called the suspensor) (Wang & Maule 1994). The function of the suspensor is to provide nutrients for the growing embryo from the endosperm. The suspensor in pea during the early stages of seed development appears to be anchored close to the micropyle (a tiny opening in the ovule through which the pollen tube enters), as well as maintaining close contact with the endosperm wall (Wang & Maule 1994) (Fig. 1-1). It is believed that the virus moves from the maternal cells in the micropyle to the endospermic cytoplasm and embryonic suspensor from which it invades the embryo (Roberts et al., 2003). Genomic organization and protein function The ZYMV genome is ~9,600 nt long with a viral encoded protein (VPg) covalently linked to the 5′ end and a polyadenylated 3’end. The spatial arrangement is typical of the Potyviridae, and protein functions are listed in genomic order. P1 encodes a proteinase and, along with the third protein (P3), is the least conserved region in the viral genome. In addition, P1 has been shown to enhance amplification and movement of the virus (Urcuqui-Inchima et al., 2001). Due to the low conservation of sequence identity between potyviruses, it is believed P1 may be involved in host-virus interactions (Shukla et al., 1991). The HC-Pro is required for aphid transmission, and has proteinase activity that is responsible for cleaving the HC-P3 junction (Shukla et al., 1991, Urcuqui-Inchima et al., 2001). The HC-Pro is believed to be involved in viral amplification, synergism, symptom development, and is a suppressor of post-transcriptional gene silencing (PTGS), or RNA interference (RNAi) (Gal-on, 2007). It has been proposed that 7 the HC-Pro may also function to aid the entry and exit of the virus into and out of the host vascular system (Urcuqui-Inchima et al., 2001). The P3 protein, as a result of the lack of sequence homology, is not well characterized (Shukla et al., 1991, Urcuqui-Inchima et al., 2001), which may suggest a virus specific function (Shukla et al., 1991). However, it has been suggested that this protein may play a role in both virus amplification, as well as plant pathogenicity (Urcuqui-Inchima et al., 2001). It is believed that the P3-6K1 complex may encode a pathogenicity determinant (Urcuqui-Inchima et al., 2001). The Cylindrical inclusion protein (CI) protein acts as an RNA helicase as it unwinds the RNA duplex, and may also be involved in cell-to-cell movement of the virus (Shukla et al., 1991, Urcuqui-Inchima et al., 2001). The function of the 6K2 protein has not yet been established; however, mutated 6K2 genes have been introduced into another potyvirus genome, tobacco etch virus (TEV), and have shown to be either detrimental or lethal to the virus. It has also been proposed that the 6K2 protein anchors the replication apparatus to ER-like membranes (UrcuquiInchima et al., 2001). The small nuclear inclusion protein (Nla) protein acts as a proteinase (Shukla et al., 1991, Urcuqui-Inchima et al., 2001), and it has been suggested that it may also posses a nuclear localization function (Urcuqui-Inchima et al., 2001). The VPg is believed to act as a primer for viral synthesis, as well as protecting the mRNA from attack by exonucleases (Shukla et al., 1991). The large nuclear inclusion protein (Nlb) is the RNA-dependent polymerase for the virus. The coat (or capsid) protein (CP) is involved in encapsidation of the viral RNA, vector transmission (Shukla et al., 1991, Urcuqui-Inchima et al., 2001), regulation of viral RNA amplification, as well as cell-to cell and systemic movement (Urcuqui-Inchima et al., 2001). It is believed that the CP may function in host specificity (Shukla et al., 1991). Entry into the cell, translation and replication In order for the virus to gain entry into the host cell, the cell wall needs to be physically penetrated and for ZYMV this occurs via the aphid stylet. Once the virus has gained entry into the 8 cell, uncoating is bidirectional, occurring first and more rapidly from the 5’ end (with 70% of the viral RNA being uncovered within 3 minutes) and more slowly from the 3’ end (Wu & Shaw, 1996). As ZYMV is a positive sense RNA virus, it is infectious once uncoated and can be directly translated. Although the ZYMV RNA lacks a cap structure at the 5’end, this region is believed to contain two regulatory regions, which are thought to direct cap-independent translation (Niepel & Gallie, 1999) through interactions with the poly-A tail (Gallie, 2001). The VPg functions to repress translation of capped messengers by proteolysis of eIF4G (a factor necessary for translation of capped mRNAs) (Sachs et al., 1997). The eukaryotic translation machinery is heavily biased to express only the 5' open reading frame. For the entire genome to be expressed, the genome encodes a single open reading frame that codes for a large polyprotein precursor that is processed into 10 putative proteins by three viral encoded proteases: the first protein (P1), the helper component protein (HC-Pro) and the small nuclear inclusion protein (Nla) (Gal-on, 2007). The proteases allow for two levels of regulation first through the rate of proteolysis and second through regulating the efficiency of cleavage site recognition (Merits et al., 2002). The genome is expressed as a single ORF, which results in equimolar amounts of each protein, but this is not always desirable, especially in the case of the polymerase. Thus potyviruses are thought to transport their excess replication proteins (Nla and Nlb) to the nucleus where they are subsequently sequestered (Restrepo et al., 1990). There are two stages of replication. First, the positive strand is copied into a negative strand and, second, the negative strand is copied multiple times into positive strands. Once the parental viral RNA is translated, the replicase proteins are available. At this point the parental strand forms a replication complex with the newly synthesized viral proteins and replication begins at the 3’ end of the parental virion. Replication is believed to be primed by the VPg in both the negative and positive directions. Once formed, the negative strand serves as a template for positive strand formation. The association of the negative strand with several growing positive 9 strands is called the intermediary complex, and free negative strands are not typically found in the cell (Astier et. al., 2007). The 6K2 has been proposed to anchor the replication apparatus (Urcuqui-Inchima et al., 2001) to the replication site, which for the genus potyvirus is believed to be associated with the endoplasmic reticulum (ER) (Martin et al., 1995). The ER is thought to form vesicles that protect the replication complex from host defense responses (Ahlquist et al., 2003). Cell-to-cell and systemic movement For infection to occur, the virus must be capable of moving both cell-to-cell as well as from organ-to-organ. Any infection that is halted in the first infected cells, termed subliminal infection, will not result in systemic infection (Furusawa & Okuno, 1978). In potyviruses at least four proteins are involved in virus movement: the CP, the HC-Pro, the CI and the VPg. It is believed that the CP binds to the viral RNA and is involved in altering the exclusion size limit of the plasmodesmata (which is a thin stream of cytoplasm that flows through the cell walls of adjacent plant cells and allows communication between them), thus facilitating cell-to-cell movement of the virus. This phenomenon is believed to be transient and follows the infection front (Heinlein et al., 1995; Oparka et al., 1997). The HC-Pro is also thought to increase plasmodesmal permeability (Rojas et al., 1997), and the CI is believed to guide the CP-RNA complex to the plasmodesmata (Rodríguez-Cerezo et.al., 1997). It is currently unknown how the VPg is involved in viral movement, but mutated VPgs in turnip Mosaic virus have been shown to reduce both cell-to-cell and systemic movement (Dunoyer et al., 2004). Although long distance movement of plant viruses has not been studied as extensively as cell-to-cell movement, it is clear that for the Potyviridae, the CP is necessary for long distance spread within a plant. However, it has proved to be extremely difficult to tease apart the independent roles that this protein plays in long distance vs. localized spread of the virus. In order for systemic infection to occur, the virus must enter the vascular tissue. The virus moves from the 10 mesophyll cells and through a series of cells, which are the perivascular parenchyma, the phloem parenchyma, the companion cells, and finally into the sieve tube elements, which are series of cells that are joined end-to-end and form a continuous tube through which carbon metabolites are transported from the “source” leaves to the “sink” immature leaves (Fig. 1-1). Figure 1-1: Diagram from Principles of Plant Virology - Genome, Pathogenicity, Virus Ecology. © 2007, Science Publishers (English version) Phylogeny At least 25 strains of ZYMV have been identified (Desbiez and Lecoq, 1997). Phylogenies of ZYMV (based on the Coat Protein) indicate three clusters of isolates exclusive of 11 the more distant Singapore and Reunion Island isolates (Zhao et al., 2003). The first cluster, group I, includes the majority of the European isolates, as well as some Japanese and Chinese isolates, and a Californian strain. Group II are all from Asia (South Korea, Taiwan, Hangzhou and Japan), while Group III includes several Chinese isolates. Of particular interest is that the members of Group III differ from the other two clusters in terms of the symptoms that they cause. The group III viruses cause severe mosaic symptoms on the leaves, but not the fruits, whereas groups I and II induce severe symptoms on both the leaves and fruits (Zhao et al., 2003). Cucurbita pepo ssp. texana Cucurbita pepo ssp. texana (the Texas gourd, or free-living squash) is a monoecious, annual vine with indeterminate growth and reproduction. It is native to Northern Mexico, Texas, and the states along the Mississippi River from Southern Illinois southward. It is thought that this particular subspecies resulted either as an early escape from cultivation, or that it is the wild progenitor of cultivated squashes, (Decker & Wilson, 1987; Decker-Walters, 1990; DeckerWalters et al., 2002; Lira et al. 1995,). It is cross compatible with all cultivated squash and pumpkins, as well as annual Cucurbita taxa from Mexico (Arriaga et al., 2006). C. pepo is considered to be the optimal host for the maintenance of ZYMV (Gal-on, 2007). Methods Maximum likelihood tree building In Chapter two, we use 55 consensus sequences of the CP, six of which we generated from samples obtained from our experimental fields in Pennsylvania, and the remaining 49 were sequences from around the world that were deposited in GenBank. To determine the evolutionary relationships amongst these samples, we generated a Maximum likelihood tree (ML) using the PAUP package (Swofford, 2003). ML is a method in which a hypothesis about evolutionary history is evaluated in terms of the probability that the proposed model of evolution and the hypothesized tree would give rise to the observed set of sequences (Page & Holmes, 2007). ML 12 methods are thought to surpass other tree building methods since they are thought to be more accurate. However, there are disadvantages to this method. It is very computationally intensive and is highly dependent on the model of evolution (Huelsenbeck, 1995). Therefore, to determine which model of evolution best fit our data to infer the tree, we used the program MODELTEST, which is a program that selects from 56 models of nucleotide substitution and determines the best model based on the data (Posada et al. 1998). Minimum spanning tree building Most traditional tree building methods require a fair amount of variance between the sequences in order to accurately reconstruct relationships (Huelsenbeck & Hillis 1993), but the clonal data generated in chapter three displayed very little variance. In fact, ~90% of the sequences obtained were identical to the consensus and three of the twenty samples had no mutations whatsoever. Therefore, I opted to use a minimum spanning tree approach to determine the population structure of these sequences. The program I used, TCS, is based on a method developed by Templeton et al. (1992) that uses statistical parsimony to infer population level genealogies on samples with very low variance (Clement et al., 2000). After collapsing the haplotypes, the program calculates their frequency. These frequencies are then used to estimate haplotype outgroup probabilities. An absolute distance matrix is calculated for all pairwise comparisons, and the probability of parsimony is calculated for these pairwise differences with a 95% probability cut-off. The number of mutations between pairs of sequences is the number of mutational connections between pairs of sequences. These connections are then used to output the resulting minimum spanning tree or network (Clement et al., 2000). BEAST We used the BEAST package (Drummond & Rambaut, 2007) to ascertain the rate of nucleotide substitution per site, as well as the time to the most recent common ancestor (TMRCA) of the ZYMV CP sequences in chapter two. Time structure is a requirement for this 13 analysis, and as I was only able to acquire a year of collection for a subset of the CP sequences from GenBank. As a result, only 35 of the 55 sequences were used in this analysis. The BEAST program models the rate of molecular evolution for each branch of the phylogenetic tree using the Bayesian Markov chain Monte Carlo (MCMC) approach. This approach uses the MetropolisHastings algorithm to approximate the posterior distribution. It searches along a chain of hypothetical trees and provides an estimate of the probability that a given tree is correct (Lakner et al., 2008). Sanger sequencing I used Sanger sequencing to generate the clonal data in chapter three and five, as well as the consensus sequences in chapter two. After PCR amplification and purification of the sample(s) of interest, sequencing occurs when reverse strand synthesis is performed on these copies starting from a known primer sequence located upstream of the desired sequence in a mixture of deoxynucleotides (dNTP’s) and dideoxynucleotides (ddNTP’s). The dNTPs are the standard A, C, G and T building blocks of DNA and the ddNTPs are modified nucleotides that lack a hydroxyl group at the third carbon of the molecule, preventing ester bonds from forming with the phosphate group of another dNTP or ddNTP. The polymerization reaction is terminated when a ddNTP is incorporated instead of a dNTP; therefore, the mixture of both types of bases randomly causes the extension to be terminated in a non-reversible fashion resulting in molecules of different lengths. After denaturing and clean up, the molecules are sorted by molecular weight using capillary electrophoresis, and the fluorescent label attached to the ddNTP is read out sequentially in the order created by the sorting step (Kircher & Kelso, 2010). Illumina Sequencing As cloning free DNA amplification is possible through high throughput sequencing technologies such as Illumina/Solexa, in chapter four I decided to undertake a deep sequencing approach on the samples generated in chapter two for two reasons. The first reason is aphid 14 transmission of ZYMV involves more than one protein, the HC-Pro and the CP. The second reason is the level of coverage obtained with cloning and Sanger sequencing was fairly low (we averaged 35X per sample in chapter three). Illumina sequencing parallelizes the sequencing process with the result that millions of reads can be produced at one time (Morozova & Marra, 2008). During library preparation, two different adaptors are added to the 3’ and 5’ end of each molecule. On the surface of the flow cell (which is the solid surface of the sequencer), there are two populations of immobilized oligonucleotides that are complementary to the two different single-stranded adapter ends of the sequencing library. These hybridize to the single-stranded DNA fragments, thus attaching them to the flow cell. The molecule is then bent over and hybridized to a complementary adapter thus creating a “bridge” that serves as the template for complementary strands. Bridge amplification is the process of bending and reverse synthesis, whereby reverse strand synthesis starts from the hybridized portion, such that the new strand is covalently bound to the flow cell. When the new strand bends over and attaches to another short nucleic acid sequence complementary to the second adapter sequence attached to the free end of the strand, it is then used to synthesize a second covalently bound reverse strand, and so on and so forth. Once the amplification step is completed, the flow cell will contain ~ 40 million clusters, each of which contains ~ 1000 clonal copies of a single template molecule. The process uses a sequencing by synthesis concept that is similar to the Sanger sequencing process: the incorporation reaction is halted after each base, then the label of the incorporated base is read, and then the sequencing reaction continues with the incorporation of the next base. Illumina uses reversible terminators with removable fluorescent molecules with DNA polymerases that incorporate terminators into the chain. The terminators are labeled with fluorescence with a different color for each base, so that the sequence is inferred as the color is read at each nucleotide step (Kircher & Kelso, 2010). Chapter 2 Rapid evolutionary dynamics of Zucchini yellow mosaic virus Abstract Zucchini yellow mosaic virus (ZYMV) is an economically important virus of cucurbit crops. However, little is known about the rate at which this virus has evolved within members of the family Cucurbitaceae, or the timescale of its epidemiological history. Herein, we present the first analysis of the evolutionary dynamics of ZYMV. Using a Bayesian coalescent approach we show that the coat protein of ZYMV has evolved at a mean rate of 5.0 x 10-4 nucleotide substitutions per site, per year. Notably, this rate is equivalent to those observed in animal RNA viruses. Using the same approach we show that the lineages of ZYMV sampled here have an ancestry that dates back no more than 800 years, suggesting that human activities have played a central role in the dispersal of ZYMV. Finally, an analysis of phylogeographical structure provides strong evidence for the in situ evolution of ZYMV within individual countries. Introduction Zucchini yellow mosaic virus (ZYMV), first isolated in 1973 and described in 1981 (Lisa et al., 1981), is the cause of one of the most economically important diseases of the family Cucurbitaceae, naturally infecting plants in more than 50 countries (Desbiez & Lecoq, 1997). Symptoms include yellowing, stunting, leaf deformations, and misshaped and discoloured fruits, which often renders the fruits unmarketable, drastically reducing agricultural yields (Blua & Perring, 1989; Desbiez & Lecoq, 1997; Gal-On, 2007). Although ZYMV is widespread, few viral reservoirs have been identified, particularly in temperate regions (Desbiez & Lecoq, 1997). ZYMV is a single-stranded, positive-sense RNA virus of the family Potyviridae. The primary mode of transmission is via aphids in a non-persistent manner. Although 10 aphid 16 species have been reported as vectors (Katis et al., 2006; Lisa et al., 1981), a wider range of potential aphid vectors has been identified under experimental conditions (Blackman & Eastop, 2000; Katis et al., 2006). While aphid transmission is undoubtedly the main route of spread for ZYMV, infrequent seed transmission has also been proposed (Robinson et al., 1993; Schrijnwerkers et al., 1991), the epidemiological importance of which is uncertain (Johansen et al., 1994). ZYMV has a genome of 9593 nt arranged as a single open reading frame encoding a polyprotein precursor that is processed into 10 putative proteins (Gal-On, 2007). Of these, the coat protein (CP) is involved in the encapsidation of viral RNA, vector transmission (Shukla et al., 1991; Urcuqui-Inchima et al., 2001), the regulation of viral RNA amplification and cell-tocell and systemic movement (Urcuqui-Inchima et al., 2001). Transmission occurs as a result of the interaction between the aphid stylet, CP and the HC-Pro protein (Pirone & Blanc, 1996), such that some mutations in CP and HC-Pro disrupt viral transmission (Gal-On, 2007; Pirone & Blanc, 1996; Shukla et al., 1991; Urcuqui-Inchima et al., 2001). The CP is also extensively used as a tool to infer the phylogenetic relationships among viral isolates (Rybicki & Shukla, 1992; Shukla et al., 1991). A variety of studies have explored the extent and structure of genetic diversity in ZYMV, particularly within a biogeographical context. Analysis of a 250 nt fragment of 160 viral isolates sampled from 23 geographical areas revealed two major groups of ZYMV, denoted A and B, with the former divided into three clusters (Desbiez et al., 2002). A subsequent analysis of the CP revealed three main groups of isolates with differing geographical distributions (Zhao et al., 2003). Group I included the majority of European isolates, as well as some from China and Japan, and a single Californian isolate. Group II was exclusively composed of viruses from Asia, while group III included several Chinese isolates. Notably, while group I and II isolates resulted in mosaic symptoms on leaves and fruit distortion, group III viruses did not cause symptoms on the 17 fruit, but induced severe mosaic symptoms on the leaves (Zhao et al., 2003). More phylogenetically distant ZYMV isolates were observed in Singapore and Réunion (and other islands in the Indian Ocean representing group B of Desbiez et al., 2002), which likely reflects their biographical separation (Gal-On, 2007; Zhao et al., 2003). More localized phylogeographical studies have revealed that viruses can diffuse within specific localities, such as Central Europe (Glasa & Pittnerova, 2006; Glasa et al., 2007; Tobias & Palkovics, 2003), perhaps mediated by the local spread of aphids. However, isolates sampled from adjoining locations are not always related (Pfosser & Baumann, 2002), suggesting that biogeographical structure may, to some extent, be determined by the international trading of infected seeds (Desbiez et al., 2002; Tobias & Palkovics, 2003). There has also been considerable interest in using sequence data from plant RNA viruses to infer evolutionary dynamics. Although a combination of intrinsically high rates of mutation, rapid replication and large population sizes are thought to provide RNA viruses with abundant genetic variation, some plant RNA viruses appear more genetically stable than their animal counterparts (Garcia-Arenal et al., 2001, 2003). This could be due to a combination of intrinsically lower rates of mutation (Malpica et al., 2002) and a reduced fixation rate of advantageous non-synonymous mutations because of weaker immune selection (Garcia-Arenal et al., 2001). Similarly, genetic bottlenecks play a major role in structuring genetic diversity during both systemic infection (French & Stenger, 2003; Li & Roossinck, 2004; Sacristan et al., 2003) and horizontal transmission by aphids (Ali et al., 2006). Despite the agricultural importance of ZYMV, there has been little work documenting either the rate of molecular evolution of this virus or the age of the sampled genetic diversity, reflected in the time to the most recent common ancestor (TMRCA). However, this information is central to understanding the evolutionary dynamics of plant RNA viruses in general, and particularly whether they exhibit reduced rates of evolutionary change, which in turn may have 18 major implications on their ability to emerge in new host species. Cucurbita pepo ssp. texana is an annual monoecious vine that is native to northern Mexico, Texas, and the lower Mississippi River drainage area. It is thought to be either the wild progenitor of the cultivated squashes (C. pepo ssp. pepo) or an early escape from cultivation (Decker & Wilson, 1987; Decker-Walters, 1990; Decker-Walters et al., 2002; Lira et al., 1995). Methods ZYMV infection of plants collected during the 2006 growing season was determined immunologically (DAS-ELISA test kit; Agdia). Leaf tissue from infected plants was then homogenized in liquid nitrogen and RNA extracted using a Qiagen RNeasy Plant Mini kit. Firststrand cDNA was synthesized from the extracted RNA using Superscript III First-Strand kit (Invitrogen). The target cDNA was then amplified directly via PCR and sequenced. The CPspecific primers used for the cDNA, PCR and sequencing steps were: forward, 5’-AAGATTGGCACGCTA-3’; reverse, 5’-CGGTAAATATTAGAATTAGCTCG-3’. All sequences generated here have been submitted to GenBank and assigned accession numbers EU371645–EU371650. A total of six ZYMV CP, newly acquired here, were combined with 49 collected from GenBank (accession numbers available from the authors on request), producing a total dataset of 55 CP sequences, 815 nt in length. To determine the evolutionary relationships among all 55 sequences we employed the maximum-likelihood (ML) method available within the PAUP* package (Swofford, 2003). The best-fit model of nucleotide substitution was determined by MODELTEST (Posada & Crandall, 1998) as TIM+I+I-4 and this was used as the basis for tree bisectionreconnection branch-swapping (parameter values available from the authors on request). A bootstrap resampling approach (1000 replications), employing the ML substitution model, was used to assess the support for individual nodes. To determine the strength of phylogenetic clustering by country of virus isolation we employed a parsimony character mapping approach (Carrington et al., 2005). Each ZYMV sequence was therefore assigned a character state 19 reflecting its country (or continent) of origin. Given the ML phylogeny for these sequences, the minimum number of state changes needed to produce the observed distribution of country character states was estimated using parsimony (excluding ambiguous changes). To determine the expected number of changes under the null hypothesis of complete mixing among countries, the states of all isolates were randomized 1000 times. The difference between the mean number of observed and expected state changes indicates the level of geographical isolation, with statistical significance assessed by comparing the total number of observed state changes to the number expected under random mixing. All analyses were performed using PAUP* (Swofford, 2003). The rate of nucleotide substitution per site, as well as the TMRCA of the ZYMV CP sequences were estimated using the Bayesian Markov chain Monte Carlo approach implemented in the BEAST package (Drummond & Rambaut, 2007). This approach analyses the distribution of tip times on millions of plausible sampled phylogenies, so that estimates are set within a rigorous statistical framework. As this analysis requires time-structured data, where the date of sampling of each isolate is known, it was restricted to a subset of 35 CP sequences for which the year of sampling was available, representing a 22 year period from 1984 to 2006. In the case of eight Chinese viruses, sampling dates were only known to the nearest two possible years. To account for this uncertainty, analyses were repeated using the different sampling times available. We also compared the demographical models of a constant population size and exponential population growth, employing both strict and relaxed (uncorrelated lognormal) molecular clocks. Bayes factors were used to determine the best supported model. Because the TIM+I+I-4 substitution model is unavailable in the BEAST package, the closely related GTR+I+ I-4 model was used in its place. The extent of statistical uncertainty in parameter estimates is reflected in the 95% highest probability density values. Finally, site-specific selection pressures in the 55 CP dataset were estimated as the ratio of non-synonymous (dN) to synonymous substitutions (dS) per site (ratio dN/dS) using both the single likelihood ancestor counting (SLAC) and fixed effects 20 likelihood (FEL) methods, available at the Datamonkey facility (Kosakovsky Pond & Frost, 2005). Results and Discussion In accord with other studies of the phylogeography of ZYMV, distinct clusters of viral isolates are apparent in the ML tree of 55 CP sequences (Fig. 2-1). These clusters represent: (i) a large group of isolates sampled from a variety of locations in Asia (China, Japan, Korea and Taiwan), Europe and the Middle-East (Austria, Germany, Israel, Italy, Hungary and Slovenia), and USA, and previously denoted as groups I and II; (ii) China (previously denoted group III); and (iii) Singapore and the Réunion Island (previously unclassified). We found no compelling evidence for the existence of group II isolates (from Asia), as these fell within the phylogenetic diversity of group I viruses, and suggest that those isolates from Singapore and the Réunion Island are so phylogenetically distinct that they be assigned to their own group. A number of inferences can be made from this spatial pattern. First, the greatest level of genetic diversity, including the deepest phylogenetic split, is seen in Asia (particularly China), including the presence of one clade of viruses that has only been observed (to date) in China. Although this is compatible with the lineages of ZYMV sampled here having an origin in Asia, this will need to be confirmed with a larger sample of isolates. Second, other than a virus sampled in Florida in 1984, all other USA isolates, sampled between 1992 and 2006 and including those newly obtained from Pennsylvania, have a single common ancestor (Fig. 1). Although the sample size is small, this suggests that there has been some in situ evolution of ZYMV in the USA since this time, without the importation of new viral material. Our parsimony analysis of geographical structure also revealed a strongly significant clustering by country of origin compared with that expected by chance alone (P<0.001). A similarly strong clustering was observed by continent (Americas, Asia, Europe and the Middle-East, Indian Ocean; P<0.001). Hence, although ZYMV is able to cross geographical boundaries as indicated by the many countries represented within 21 groups I/II, such gene flow is not sufficiently frequent to eradicate geographical structure. More generally, this strong spatial clustering suggests that there is little vertical transmission of ZYMV through cultivated cucurbits, because commercial seeds of cultivated species are likely to be frequently transported across national borders. Figure 2-1: ML tree of 55 ZYMV CP sequences For viruses where the year of sampling is available, these dates are given in parentheses. Those viruses samples as part of this study are shaded grey. The group nomenclature depicted represents that previously proposed for ZYMV (Zhao et al., 2003). The tree is drawn to scale of 0.05 nt substitutions per site and bootstrap values (.90%) are shown next to the relevant nodes. The tree is mid-point rooted for clarity only. The best supported evolutionary model for the CP of ZYMV under our Bayesian coalescent analysis was that of exponential population growth under a relaxed molecular clock 22 (Table 2-1). Under this model the mean rate of evolutionary change for ZYMV was 5.0 x 10-4 nucleotide substitutions per site, per year. Similar rates were obtained under different demographical and molecular clock models, incorporating the different possible sampling times for those viruses where the exact year of sampling was unknown, and using a range of prior values for the substitution rate, indicating that they are robust (results available from the authors on request). This high evolutionary rate falls within the normal range observed in RNA viruses, most of which represent animal RNA viruses (Jenkins et al., 2002; Hanada et al., 2004). As such, we find no evidence that ZYMV evolves any slower than animal RNA viruses that are subject to the same, error-prone replication. Table 2-1: Bayesian estimates of population dynamic and evolutionary parameters of the CP gene of ZYMV. HPD, Highest probability density (95 %). Although repeated population bottlenecks undoubtedly influence the genetic structure of viral populations in the short-term (Li & Roossinck, 2004), they will have no affect on long-term evolutionary rates if most substitutions are selectively neutral. Similarly, although a weaker immune response against plant RNA viruses will reduce the rate at which some non-synonymous mutations accumulate (Garcı́a-Arenal et al., 2001), the fact that these normally constitute a minor fraction of the total number of nucleotide substitutions means that they are unlikely to have a major impact on long-term evolutionary rates. In support of this we found no evidence for 23 positive selection acting on the CP of ZYMV using either the SLAC or FEL methods; the predominant evolutionary pressure was that of negative (purifying) selection, with a mean dN/dS of 0.108 and 106 of 271 codons negatively selected under the SLAC method. This agrees with previous studies of the CPs of plant RNA viruses, which indicate that they are subject to relatively strong purifying selection (Chare & Holmes, 2004). Further, the lack of positive selection suggests that experimental passage has not had a major impact on our analyses. Although the rapid evolutionary rates observed here for ZYMV will need to be verified for a wider range of plant RNA viruses, the implication from this work is that mutational and replicatory dynamics are similar across a broad range of RNA viruses. Such high rates of evolutionary change also lead to a recent TMRCA for the isolates of ZYMV analysed here (Table 2-1). Although there is a relatively large date range because of the inherent sampling error on this analysis (119–771 years), these dates clearly indicate that the spread of this virus has been recent. Indeed, these dates broadly coincide with important ecological changes that may have assisted the spread of ZYMV, including (i) an increase in the number of hectares of worldwide cucurbit cultivation; (ii) the cultivation of cucurbits in novel areas with few wild Cucurbitaceae, facilitating viral transfer from a non-cucurbitaceous plant to the cultivated cucurbits (as observed in a contemporary setting; Perring et al., 1992), and (iii) the cultivation, in close proximity, of cucurbit crops with diverse origins, which allowed the virus to jump to new genera of the family Cucurbitaceae. Overall, our study highlights the utility of gene sequence data to reveal key aspects of the epidemiological history of plant RNA viruses. Chapter 3 Rapid turnover of intra-host genetic diversity in Zucchini yellow mosaic virus Abstract Genetic diversity in RNA viruses is shaped by a variety of evolutionary processes, including the bottlenecks that may occur at inter-host transmission. However, how these processes structure genetic variation at the scale of individual hosts is only partly understood. We obtained intra-host sequence data for the coat protein (CP) gene of Zucchini yellow mosaic virus (ZYMV) from two horizontally transmitted populations – one via aphid, the other without – and with multiple samples from individual plants. We show that although mutations are generated relatively frequently within infected plants, attaining similar levels of genetic diversity to that seen in some animal RNA viruses (mean intra-sample diversity of 0.02%), most mutations are likely to be transient, deleterious, and purged rapidly. We also observed more population structure in the aphid transmitted viral population, including the same mutations in multiple clones, the presence of a sub-lineage, and evidence for the short-term complementation of defective genomes. Introduction Determining the extent and structure of genetic variation in RNA viruses is central to understanding the mechanisms that shape their evolution. The high levels of genetic diversity that characterize many RNA viruses have been linked to their ability to adapt rapidly to changing environments including new host species (Holmes, 2009; Jerzak et al., 2008; Woolhouse et al., 2001), and to evade mechanisms of host resistance (Feuer et al., 1999; Lech et al., 1996). Most estimates of the rate of molecular evolution in animal RNA viruses fall within approximately one order of magnitude of a mean rate of 1 × 10−3 nucleotide substitutions per site, 25 per year (subs/site/year; Duffy et al., 2008). In contrast, it has previously been suggested that plant RNA viruses are characterized by lower rates of evolutionary change, in some cases by several orders of magnitude (Blok et al., 1987; Fraile et al., 1997; Kim et al., 2005; Marco and Aranda, 2005; Rodríguez Cerezo et al., 1991). This major difference in evolutionary dynamics has been attributed to intrinsically lower mutation rates, weaker immune-mediated positive selection, and the frequent occurrence of population bottlenecks (García-Arenal et al., 2001, 2003). However, more recent analyses using longitudinally sampled gene sequence data have resulted in substitution rate estimates in accord with those previously observed in animal RNA viruses, at least in the short term (Fargette et al., 2008; Gibbs et al., 2008, 2010; Pagán and Holmes, 2010). As a case in point, we previously reported a mean evolutionary rate of 5 × 10−4 subs/site/year for the coat protein (CP) of Zucchini yellow mosaic virus (ZYMV) (Simmons et al., 2008). Most studies of genetic diversity in plant viruses have been conducted at the inter-host level. However, if plant RNA viruses do evolve as rapidly as suggested by the analysis of epidemiological scale sequence data then we would also expect them to exhibit measurable genetic diversity at the intra-host scale. Those studies undertaken to date have found varying levels of intra-host variation. Turturo et al. (2005) observed limited (<0.1%) intra-host genetic diversity in Grapevine leafroll-associated virus, while Jridi et al. (2006) noted that the nucleotide diversity of Plum pox virus measured over 13 years in a prunus tree ranged from 0 to 2.4%. Rather higher levels of intra-host diversity were observed in Banana mild mosaic virus, with divergence levels of more than 15% in a third of the sequences obtained (Teycheney et al., 2005). Determining the extent and patterns of intra-host genetic diversity in plant RNA viruses is central to revealing the fundamental processes of viral evolution. Large-scale population bottlenecks are thought to result in effective population sizes for RNA viruses that are several orders of magnitude lower than consensus population numbers (García-Arenal et al., 2001). 26 Indeed, population bottlenecks have been documented during aphid transmission in Cucumber mosaic virus (Ali et al., 2006; Betancourt et al., 2008) and Potato virus Y (Moury et al., 2007). Systemic bottlenecks (that occur as the virus moves from cell-to-cell and tissue-to-tissue) may reduce effective population sizes even further (French and Stenger, 2003; Sacristán et al., 2003; Li and Roossinck, 2004; Miyashita and Kishino, 2010). In these circumstances genetic drift is predicted to play a major role in the substitution dynamics of mutant alleles. However, little is known about the frequency and impact of population bottlenecks in natural virus populations (Li and Roossinck, 2004). As an exception, the extent of genetic diversity in Citrus tristeza virus transmitted via aphids was reduced by an order of magnitude compared to that found in the sweet orange (Citrus sinensis) host (Nolasco et al., 2008). ZYMV was first isolated in 1973 in Italy, and since this time the virus has been found in more than 50 countries as a naturally occurring infection of the Cucurbitaceae (Debiez and Lecoq, 1997; Desbiez et al., 2002). Viral symptoms include a distinctive yellow mottling in the leaves, stunting of the plant, and severe deformities in the fruits and leaves (Debiez and Lecoq, 1997; Gal-On, 2007). Production of cucurbits in the United States is valued at approximately $1.5 billion per annum (Cantliffe et al., 2007), and as ZYMV infection can reduce agricultural yields by up to 94% (Blua and Perring, 1989), it is one of the most economically significant agricultural pathogens in cultivated cucurbits (squash, melon and cucumber). ZYMV is a member of the Potyviridae family of positive-sense, single-stranded encapsidated RNA viruses. The ∼9.5 kb viral genome encodes a single polyprotein precursor that is cleaved into ten putative proteins (Gal-On, 2007). Transmission occurs primarily via aphids in a non- persistent manner (Lisa et al., 1981) and, to date, 26 aphid species have been shown to transmit ZYMV (Katis et al., 2006). The viral coat protein (CP) is multifunctional and involved in cell-to-cell and systemic movement, the regulation of viral RNA amplification (Urcuqui-Inchima et al., 2001), encapsidation of the RNA, vector transmission (Urcuqui-Inchima et al., 2001; Shukla et al., 1991), and perhaps host 27 specificity (Shukla et al., 1991). ZYMV transmission is the result of an interaction between the stylet of the aphid, the helper component protein (HC-Pro), and the conserved DAG (Asp-AlaGly) region of the CP (Pirone and Blanc, 1996). The highly variable N-terminus region of the CP is exposed on the surface of the coat protein and is thought to contain virus-specific epitopes. The core region and C-terminus are more conserved, although the last ten amino acids of the Cterminus may be exposed on the viral surface (Gal-On, 2007). To obtain a better understanding of the patterns and processes of plant virus evolution at the scale of individual hosts, we analyzed the intra-host genetic diversity of ZYMV in Cucurbita pepo ssp. texana (a wild gourd) under two distinct modes of transmission: aphid-vectored and mechanically-inoculated (i.e. without aphids). The aphid-vectored experiment was conducted in an experimental field and resulted in two types of data; a time series as the virus evolves within the host over the course of the infection, and epidemiological-scale data following the spread of the virus as it was transmitted by aphids between hosts during the growing season. Because the number of transmission events is not controlled, these data recapitulate the natural spread of the virus. Using data of the first type the extent of the bottleneck imposed by the aphid during transmission can be estimated. The second type of data allowed us to determine if mutations are transmitted between individuals or are generated anew within each individual. In the mechanical inoculation experiment, carried out in a greenhouse, ZYMV was serially passaged across four generations by mechanical inoculation. By comparing these data to those from the field study we were able to compare viral genetic diversity with and without the aphid-imposed bottleneck. To assess the effect of intra- host systemic bottlenecks, half of the fifth and eighth leaves from each mechanically-inoculated individual were used separately to inoculate another individual. This follows the design of two earlier studies which showed that the number of mutant clones present in a leaf decreased as a function of distance from the original inoculum source, presumably as a result of systemic bottlenecks (Li and Roossinck, 2004; Ali and 28 Roossnick, 2010). Methods Field experiment The field experiment was conducted at The Pennsylvania State University Agriculture Experiment Station at Rock Springs, Pennsylvania, USA, using Cucurbita pepo ssp. texana (a wild gourd). One 0.4-hectare field was laid out as a grid labeled A-L and 1–15, with approximately six meters between plants and 180 plants per field (Fig. 3-1a). In 2007 individual F-8 (located in the middle of one of the fields) was mechanically inoculated with ZYMV, the consensus sequence of which has been deposited in GenBank (accession number EU371649). When the inoculated plant, CF8, exhibited viral symptoms a leaf was collected. Plant labels are as follows: The first digit C designates that the sample was collected from the field, the next digit and number in this case F8, designate the plant coordinates within the field grid, and the number in parenthesis denotes the order in which samples where collected from an individual plant. As neighboring plants became infected, leaf samples were collected so that a leaf sample was gathered every two weeks from each individual that displayed disease symptoms from the onset of visible symptoms until the host plant died (approximately 9 weeks in total). Presence of ZYMV was detected immunologically using DAS-ELISA (Agdia, IN) and confirmed by polymerase chain reaction (PCR) and sequencing of the viral CP. The DAS-ELISA results not only confirmed the presence of ZYMV in the field plants but also revealed that only one of the plants (CE7) was co-infected with another potyvirus. Leaf samples from confirmed ZYMVinfected plants were stored at −80 ◦ C. Although samples were collected from all of the infected plants in the field, eleven of these, which represents six individual plants, were selected for sequencing. One plant (CF7) was sampled at three time points (August 4th, August 28th, September 13th); three plants were sampled at two time points (CE7 on September 13th and September 20th, CE8 on August 8th and September 13th, and CG7 on August 30th and 29 September 20th); and clonal sequences were sampled only once from two plants (CF8 and CG6; Table 1). Greenhouse experiment Two individual plants were mechanically inoculated in January of 2008 with a ZYMV sample taken from the first diseased individual from the 2007 season (CF8). The mechanical inoculations performed in the greenhouse using carborundum powder (500 gm). The infectious tissue was prepared from infected plant tissue diluted in a phosphate buffer (0.1 M Na2 H/KH2 PO4 buffer) in a 1:3 ratio. The carborundum powder was dusted on the surface of the leaf, and the inoculum was then applied with a pestle to the leaf surface. When the plants displayed disease symptoms and exhibited at least an additional eight leaves of growth from the inoculation site (typically 4–5 weeks), half of the fifth and eighth leaves (distance from the first inoculated leaf) each were each used separately to inoculate another individual and so on through four generations (Fig. 3-1b). The infection rate of the mechanical inoculations was 100%. The other half of each leaf was stored at −80 ◦ C. We generated clonal sequence data from nine samples representing one transmission chain. In summary, clones were generated from the fifth and eighth leaves of individual A, the fifth and eighth leaves of individual C (which was infected from the fifth leaf of A), the fifth leaf of individual G (which was infected from the fifth leaf of C), the fifth and eighth leaves of individual H (which was infected from the eighth leaf of C), and the fifth leaf of individual O (which was infected from the fifth leaf of G). In addition, we sequenced one sample (fifth leaf of K) from the third generation from the eighth leaf of A. 30 31 Figure 3-1: Experimental design of the current study. (a) Field experiments. The schematic shows the position of the field plants relative to each other. Plant labels are as follows: The first digit C designates that the sample was collected from the field, the next two digits designate the plant coordinates within the field grid, and the number in parenthesis denotes the number of samples collected from an individual plant. The boxed images that occur between the sampled field plants are of Aphis gossypii (cotton aphid), which serves to indicate that the spread of infection in the field occurred naturally (i.e. was aphid vectored). (b) Greenhouse experiments. The first field infected plant was used to infect plant A, the fifth leaf of which was used to infect C. The fifth leaf of C was used to infect G and the eighth leaf to infect H. The fifth leaf of G was used to infect O, and K was infected from the third generation from the eighth leaf of A. RNA isolation, PCR analysis, cloning and sequencing RNA was isolated from frozen leaf samples using the RNeasy® Plant Mini Kit (Qiagen, CA). First-strand cDNA was synthesized from the extracted RNA following the protocol provided by the supplier using the SuperscriptTM III First-Strand kit (Invitrogen, CA). The target cDNA was then amplified directly via PCR using Phusion® High-Fidelity PCR Master Mix (Finnzyme, MA). Although we used a high fidelity Taq polymerase to reduce the number of ‘mutations’ introduced during the experimental procedure, it is impossible to fully eliminate RTintroduced errors from occurring (see Results Section). Prior to cloning with the TOPO® TA Cloning® Kit (Invitrogen, CA), each sample was purified using the QIAquick PCR Purification Kit (Qiagen, CA) and an A overhang was added to each sample. Before submitting samples for sequencing at The Pennsylvania State University Nucleic Acid Facility, each sample was purified with the QIAprep Spin Miniprep Kit (Qiagen, CA). The CP-specific primers used for the cDNA, PCR and steps were: forward: AAGTGAATTGGCACGCTA; reverse: CGGTAAATATTAGAATTACGTCG. To ensure that mutations were valid each clone was 32 sequenced in forward and reverse and manually aligned. Any mutations occurring in one direction only were discarded. T7 forward and M13 reverse primers were used for clone sequencing. All sequences generated here have been submitted to GenBank and assigned accession numbers HM768168–HM768204. Sequence analysis All ZYMV sequences were manually aligned using Se-Al (2.0a11; kindly provided by Andrew Rambaut, University of Edinburgh) and trimmed to cover the coat protein region: from the CP start codon until the stop codon, for a total of 849 nucleotides (nt). Counts of the number of mutations in each sample were undertaken manually, while pairwise genetic distances were estimated using MEGA (version 3) (Kumar et al., 2004). Because of the very small number of mutations observed we utilized uncorrected genetic (p) distances. As the number of cloned sequences varies across individual plants or time points we performed a chi-squared goodness of fit test (Using R 2.10.1; 2008) to correct for the number of mutations compared to the number of sequences. To estimate the number of nonsynonymous (dN) and synonymous substitutions (dS) per site (ratio dN/dS), itself a measure of selection pressure, we used the Single Likelihood Ancestor Counting (SLAC) algorithm employing the MG94 × HY85 3 × 4 substitution model in HyPhy (Kosakovsky Pond et al., 2005). Finally, minimum spanning trees for the field and greenhouse populations were estimated separately using the statistical parsimony approach available in the TCS 1.21 program (Clement et al., 2000). Results To determine the extent and structure of intra-host viral genetic diversity in ZYMV we sequenced clones from 20 viral samples representing both the greenhouse and field populations. In total, we obtained 706 clonal sequences, with an average of 35 sequences per leaf sample. Approximately 90% of the clones sequenced were identical to the consensus sequence. Pairwise genetic distances ranged from 0 to 0.11%, with an overall mean of 0.02% for the field and 33 greenhouse populations combined (Table 3-1). Table 3-1: Summary of the ZYMV CP sequences from each infected plant under aphid-vectored (field) and mechanically-inoculated (greenhouse) transmission. Mutational spectrum in the field plants We generated a total of 378 clones from 11 field samples. Of these, 329 had no mutations and therefore matched the consensus sequence generated from the first-infected field plant. Clones from two of the individual plants, including the first inoculated plant – CF8 and CE7(2) – exhibited no mutations. Overall, there were total of 47 mutated sequences and 23 different mutations, 18 of which were singletons (occurred in one sequence only). This represents a mutational frequency of 1.47 × 10−4 mutations per nucleotide site. Ten of the mutations were synonymous; two sequences exhibited the same silent mutation, and 13 sequences from individual CG7 at time point 1 showed a change from a TAG stop codon to a TAA stop codon. There were 13 nonsynonymous mutations, three of which were found in multiple clones. Notably, one of these non- synonymous mutations (TTG to TAG) resulted in a premature stop codon and was found in seven (19.4%) of the clones from plant CG7(1). A minimum spanning tree showing 34 the structure of this genetic diversity is shown in Fig.3-2a. Although most mutations are only one step away from the consensus, clear population structure was present in the form of three clones being two mutational steps away from the consensus, a number of mutations present in multiple clones, and in one case a mutant clone (at position 849) itself possessing a descendent mutation (at position 786). The latter is indicative of a distinct sub-lineage, although one that is only found at a single time-point in a single plant. Although we cannot exclude the possibility of a ZYMV infection other than our primary inoculant, given the low level of genetic diversity and the fact that ∼90% of the sequenced clones match the consensus this seems extremely unlikely. DASELISA tests undertaken by Agdia revealed that only one of the samples, CE7, was co-infected with another virus, in this case Watermelon mosaic virus-2 (WMV-2). There appears to be no significant difference in mean pairwise genetic divergence, or mean dN/dS, between this sample and the other field samples (Table 3-1). Previous work has suggested 36 of the 42 amino acids of the N-terminus of the CP can be altered with no apparent effect on the viral life-cycle and hence are highly variable (Gal-On, 2007). In our study, only five of the total of 23 mutations occurred in this region, three of which were nonsynonymous. However, when correcting for sequence length we observed no significant difference in the number of mutations between the N-terminus and the rest of the CP (p = 0.2618). We also observed no mutations in the conserved DAG region known to be involved in aphid transmission. Finally, the number of unique mutations did not differ significantly over time within individuals (CF7: p = 0.944; CE7: p = 0.0578; CG7: p = 0.345; CE8: p = 0.418). However, the total number of mutated sequences within an individual over time was significantly different for two individuals (CE7: p = 0.0339 and CG7: p = 0.0077 applying the same correction). Mutational spectrum in the greenhouse plants A total of 328 clones were generated from the nine greenhouse plants, 301 of which had 35 no mutations and so matched the consensus sequence of the first-infected field plant. Only one individual plant, from the third generation, exhibited no mutations. There were a total of 24 mutated sequences and 18 different mutations, 17 of which were singletons, representing an error frequency of 8.7 × 10−5 mutations/site. Seven of the mutations were synonymous, and 11 were nonsynonymous, one of the latter being found in seven clones. One stop codon mutation was found in one sequence. Notably, none of the mutations were the same between transmission events. Three of the 18 mutations were found in the highly variable N-terminus region of the CP, although we again observed no mutations in the conserved DAG region. Finally, comparing the fifth and eighth leaves within a plant, we found that the number of mutations was the same between them in plant A, increased from one to five in plant C, and increased from two to four in plant H. Crucially, however, we identified no shared mutations between sequenced clones from the fifth and the eighth leaves, indicative of a rapid population turnover. Indeed, the minimum spanning tree of these data is striking in its marked lack of population structure, such that all the mutations are only one step away from the consensus (although one is present in seven clones; Fig. 3-2b). 36 Fig 3-2: Minimum spanning tree of the sequences collected here. (a) Field experiments. (b) Greenhouse experiments. The numbers along the branches represent the nucleotide position at which each mutation occurred. The number of clones with a particular mutation is one unless otherwise noted within the oval. Plants labeled as in Fig 3-1. 37 As the aphid vector was removed in the greenhouse experiment we might expect the extent of purifying selection to be stronger in the field than the greenhouse. However, we observed no marked difference in mean dN/dS ratios among these populations; a value of 0.54 (CI 95%: 0.23–0.84) was observed in the field compared to 0.66 (CI 95%: 0.34–1.13) in the greenhouse. The high dN/dS values (>1) observed in some individual samples likely reflect a large sampling error on the small number of mutations observed. Finally, it is notable that we observed no clear difference in the spatial distribution of mutations along the CP between the two experimental conditions (Fig. 3-3). Figure 3-3: Spatial distribution of mutations in the CP gene from both the field and greenhouse experiments. The numbers below the horizontal line represents nucleotide positions. Mutations introduced during the experimental procedure The error rate for the reverse transcriptase (RT) enzyme used here is reported as 2.9 × 10−5 mutations/site/replication (personal communication, Invitrogen). Given our sequenced target region of 849 nt, the expected number of mutations per cDNA copy of the CP gene is therefore 0.0246 (2.9 × 10−5 mutations/site/replication × 849 sites × single round of replication). We cloned 706 of these cDNA copies, leading to an overall expectation of 17.37 mutations among our 706 clones. The overall error rate including both the Phusion taq error rate and RT error rate is 0.0377 (calculated using the Phusion Taq error rate provided by Finnzymes and the RT error rate given above). Accounting for both the RT enzyme and Taq polymerase error rate we would expect the 38 total number of artefactual mutations to be ∼27. The actual number of mutations observed in our data was 71. Although is it clear that our data contains a number of artefactual mutations, as is likely to be true of any study of intra-host genetic variation in RNA viruses, many of the mutations observed here will be bona fide, especially as the reported error rate for RNAdependent RNA polymerase is greater than of RT (Drake et al., 1998). In addition, we used great caution when calling mutations and only counted those that were present in both the forward and reverse alignments, and in some cases sequenced both directions twice. Hence, our reported introduced RT error rate is likely to be conservative. As such, it is highly unlikely that mutations at a frequency >1 are artefactual, including the stop codon mutation in plant CG7(1). Discussion Although the level of intra-host diversity we report for ZYMV (mean=0.02%) is on average less than that recently observed in intra-host studies of animal influenza viruses using similar methodologies, there was considerable overlap among estimates and fewer clones were analyzed in this case (Hoelzer et al., 2010; Iqbal et al., 2009; Murcia et al., 2010). For example, a study of 2366 sequences of equine influenza virus resulted in a mean intra-host diversity of 0.04% (range 0.01–0.12% among samples) (Murcia et al., 2010). Hence, ZYMV appears to exhibit mutational dynamics broadly similar to those observed in some rapidly evolving animal RNA viruses, and as expected given the intrinsically error-prone nature of replication with RNAdependent RNA polymerase. The possibility of artificially induced mutations should therefore be explored for those plant RNA viruses in which far higher levels of intra-host genetic diversity are observed. It is also striking that most mutations in ZYMV are transient in nature, only being observed at a single sampling point. Indeed, we observed no mutations that were shared between time points from individual plants. Although a certain proportion of the mutations observed are clearly artefactual and an inherent outcome of the experimental procedures employed, particularly 39 singleton mutations which should be treated with caution, our results are compatible with the notion that the majority of intra-host mutations in ZYMV are deleterious and removed by purifying selection between sampling times. The relatively high number of stop codon mutations observed supports this hypothesis, as does the marked difference in mean dN/dS values within (∼0.6; herein) and between (0.108; Simmons et al., 2008) hosts. A similar turnover of apparently transient deleterious mutations has been observed in a number of animal RNA viruses (Holmes, 2003, 2009; Hoelzer et al., 2010; Murcia et al., 2010), is supported by experimental studies of fitness distributions in RNA viruses (Sanjuán et al., 2004), and may therefore be a common component of intra-host viral genetic diversity. Despite this, it is notable that some short-lived population structure was present in the field samples – manifest as clones that differed in multiple mutations from the consensus, the same mutations present in multiple clones, and at least one distinct viral sub-lineage – yet not so in the greenhouse experiment. It is therefore possible that transmission mode impacts the structure of viral genetic diversity, even at the scale of individual plants, although this is evidently an issue that needs to be reassessed with a far larger number of clones than generated here. Importantly, the discontinuity of mutations within individuals over time extends to transmission: no lineages were shared between individuals during aphid transmission. This suggests that the bottleneck imposed by the aphid is substantial, although it is also possible that our sample size is insufficient to sample minor lineages. As the aphid-imposed bottleneck is absent from the greenhouse experiment we might have expected to see more lineages transferred between hosts in this case. That this does not appear to the case from the data generated here suggests that the intra- and inter-plant population bottlenecks are generally severe enough to remove most genetic variation. In addition, that the number of unique mutations did not increase during serial passaging in the greenhouse indicates that the aphid-imposed bottleneck is not the only factor restricting genetic diversity, although this will clearly need to be explored further 40 using a larger number of serial passages. Irrespective of sample size, the existence of strong population bottlenecks means that genetic drift will play a major role in substitution dynamics. One of the most striking observations of our study was that seven clones sampled from one leaf at one time point from one field plant contained the same stop codon mutation. Such a high frequency of what is likely to be a deleterious mutation is suggestive of the action of transient complementation, although this will require future experimental verification. Indeed, that the stop codon mutation was not found at later time-points in this individual argues against both recurrent mutation and polymerase read-through as both would be expected to have longerterm effects. Complementation has previously been reported in experimental infections of plant viruses (Fraile et al., 2008; Osbourn et al., 1990). For example, a mutant Tobacco mosaic virus with a frameshift and premature stop codon mutation in the CP was fully complemented in transgenic plants that expressed the wild-type CP gene (Holt and Beachy, 1991). Complementation has also been documented during viral co-infections, including truncated CP mutants of Pepper huasteco virus that were complemented by coinfection with Taino tomato mottle virus (Guevara-González et al., 1999). Not only is viral co-infection a frequent occurrence in nature, but the use of transgenic squash is now commonplace in agricultural settings. Complementation in these circumstances could theoretically lead to the inhibition of gene silencing (Qu et al., 2003; Thomas et al., 2003), the correction of defects in movement (Callaway et al., 2004), and perhaps even the expansion of host range (Latham and Wilson, 2008; Spitsin et al., 1999). Given the threat that RNA viruses such as ZYMV pose to staple crop production worldwide, the frequency and consequences of complementation in natural populations of plant viruses clearly needs to be investigated in greater detail. Chapter 4 Deep sequencing reveals persistence of intra- and inter-host genetic diversity in natural and greenhouse populations of Zucchini yellow mosaic virus Abstract The genetic diversity in populations of RNA viruses is likely to be strongly modulated by their life-histories, including mode of transmission. However, how transmission mode shapes patterns of intra- and inter-host genetic diversity, particularly when acting in combination with de novo mutation, population bottlenecks, and the selection of advantageous mutations is still poorly understood. To address these issues, we performed in-depth next generation sequencing of Zucchini yellow mosaic virus (ZYMV) in a wild gourd, Cucurbita pepo ssp texana, under two conditions: aphid-vectored and mechanically inoculated, achieving an average coverage of ~9000X. We show that mutations persist during inter-host transmission events in both the aphid vectored and mechanically inoculated populations, suggesting that the vector-imposed transmission bottleneck is not as extreme as previously supposed. Similarly, mutations were found to persist within individual hosts, arguing against strong systemic bottlenecks. Strikingly, mutations were seen to go to fixation in the aphid vectored plants, suggestive of a major fitness advantage, but remained at low frequency in the mechanically inoculated plants. Overall, this study highlights the utility of next generation sequencing in providing high resolution data capable of revealing the nature of viral evolution, particularly as the full spectrum of genetic diversity within a population may not be uncovered without sequence coverage of at least 2,500X. Introduction Understanding the factors that generate and maintain genetic diversity is the central goal 42 of evolutionary genetics. Plant pathogenic RNA viruses are ideally suited for the study of the determinants of genetic variation because of their extremely high mutation rates, itself due to the lack of error-correction associated with replication by an RNA-dependent RNA polymerase, and their rapid replication (Duffy et al., 2008). This capacity to generate genetic diversity is central to the capacity of RNA viruses to breakdown host resistance mechanisms (Acosta-Leal et al., 2010; Feuer et al., 1999; Lech et al., 1996), to adapt to new niches (Roossinck, 1997), including new hosts (Jerzak et al, 2008), and for changes in virulence (Acosta-Leal et al., 2011). For any RNA virus, the extent and structure of the genetic variation that occurs within individual hosts is due to a combination of de novo mutation, genetic diversity generated through mixed infection, natural selection, and stochastic processes such as genetic drift and the population bottlenecks that occur both within and among hosts. However, the roles played by these differing processes in shaping intra-host genetic variation are uncertain. For example, given the extremely large census population sizes that plant RNA viruses can achieve (e.g., in Tobacco mosaic virus, TMV, this has been documented to reach 1011—1012 virions per infected leaf; Garcia-Arenal et al., 2003), it might be expected that selection would act efficiently within hosts. However, several studies indicate that the effective population size (Ne) of RNA viruses in nature is several orders of magnitude lower than the census population number (García-Arenal et al., 2001; Hughes 2009), and the duration of infection in a single host may be of insufficient length to enable natural selection to fix beneficial mutations. As such, stochastic processes may be more important determinants of genetic diversity at the intra-host level. Population bottlenecks may be particularly important in plant RNA viruses. Such bottlenecks are thought to occur during two processes: between-host vector transmission and systemic movement within the plant. For example, the number of virions transmitted from mechanically infected squash plants to healthy plants via aphids (Aphis gossypii and Myzus persicae) has been estimated to be on average three virions for both aphid species (Ali et al., 43 2006), and even lower numbers have been observed in Cucumber mosaic virus (CMV) (Bentacourt et al., 208). Similar drastic population bottlenecks have been reported during systemic movement. For instance, estimates of the founding population in a new leaf after systemic movement during TMV infection ranged between two and 20 virions (Sacristan et al., 2003), and only four virions of Wheat streak mosaic virus appear to be involved in the invasion of new tillers of wheat (French & Stenger 2003). Population bottlenecks have also been observed on a cellular level. For example, using Soil-borne wheat mosaic virus, Miyashita & Kishino (2010) determined the cell-to-cell bottleneck to be ~6 virions for the initial movement from the infected cell and ~5 virions in subsequent movements. Although these studies suggest that population bottlenecks are likely to have major effects on plant virus evolution, to date there has been no analysis of the impact of population bottlenecks using extremely high coverage data of viral genomes, particularly as produced through next generation sequence data. Due to its very high levels of coverage, next generation sequencing represents an excellent tool for detecting allele frequencies present at low frequencies. Therefore, to gain a deeper understanding of the extent of intra-host genetic diversity in plant RNA viruses and the processes that have generated this variation, we used deep sequencing techniques to analyze the extent of genetic variation, and particularly the effect of population bottlenecks, in Zucchini yellow mosaic virus (ZYMV) infecting its natural host Cucurbita pepo ssp texana (a wild gourd). ZYMV is one of the most studied viruses of the family Potyviridae. The virus infects wild and agronomically important members of the plant family Cucurbitaceae (squash, melon and cucumber), causing symptoms that include yellowing and stunting of the plant, as well as severe leaf and fruit deformities (Desbiez & Lecoq, 1997). This emerging RNA virus attained worldwide distribution within two decades of its description (Lisa et al., 1981), and the importance of ZYMV as a crop pathogen is underscored by the fact that it has been shown to reduce agricultural yields up to 94% (Blua & Perring, 1989). ZYMV has a single-stranded positive-sense RNA 44 genome of approximately 9,600 nt, with a polyadenylated 3’end and a viral encoded protein (VPg) covalently linked to the 5’end. A single open reading frame codes for a large polyprotein precursor that is processed into 10 putative proteins by three virally encoded proteases (P1, HCPro and Nla) (Gal-on, 2007). As is common given the compact genomes typical of RNA viruses, these proteins are multi-functional and as such are expected to be under fairly strong selective constraints (Holmes, 2003). Transmission of ZYMV primarily occurs via aphids in a non-persistent manner (Pfosser & Baumann, 2002; Urcuqui-Inchima et al., 2001), with 26 aphid species shown to be capable of transmitting the virus (Katis et al., 2006). An interaction between two conserved regions of the HC-Pro the KITC/KLSC (which interacts with the aphid stylet), and the PTK (which interacts with the conserved DAG region in the CP) results in viral transmission (Urcuqui-Inchima et al., 2001). This has been termed the ‘helper strategy’ as the HC-Pro acts as a bridge between the CP and the aphid stylet, which differs from the ‘capsid strategy’ whereby the capsid protein interacts directly with the aphid mouthparts (Pirone and Blanc, 1996). In addition, vertical transmission via seed has been shown to occur in Cucurbita pepo at low rates (1.6%; Simmons et al, 2011). To determine the extent and structure of genetic diversity in intra-host populations of ZYMV, and particularly how this diversity is likely to be shaped by population bottlenecks, we undertook deep sequencing of ZYMV populations infecting C. pepo ssp texana under two modes of horizontal transmission: aphid-vectored and mechanically inoculated (i.e. without aphids). From the aphid-vectored experiment, we produced both epidemiological-scale data from which we can determine the extent of the bottleneck imposed by the aphid during inter-host transmission, as well as intra-host genetic variation over the course of infection. As a new leaf sample was collected at each time point we were not only able to determine the mutational spectrum maintained within individual plants over time, but also how intra-host viral genetic diversity is affected by bottlenecks during systemic movement. ZYMV was also mechanically 45 inoculated across eight generations in a serial passaging experiment carried out in a greenhouse. Comparison of these data with those from the field study allowed us to analyze, uniquely, the evolution of viral genetic diversity with and without the aphid-imposed bottleneck. Methods Field experiment The field experiment was conducted using C. pepo ssp. texana at The Pennsylvania State University Agriculture Experiment Station at Rock Springs, Pennsylvania, USA. One 0.4-hectare field with 180 plants was laid out as a grid labeled A-L and 1-15, with approximately six meters between plants. In 2007, the plant situated in the middle of the field, F-8, was mechanically inoculated with ZYMV that was isolated by us during a previous field season (the consensus sequence of the CP has been deposited in GenBank accession number EU371649) (Simmons et al., 2008). Plants are labeled are as follows: The first letter and number, for example F8, designates the plant coordinates within the field grid, and the number in parenthesis denotes the order in which samples where collected from an individual plant. When the initially inoculated plant, F8, exhibited viral symptoms a leaf was collected. As neighboring plants became infected, a leaf sample was collected on a weekly basis from each plant from the onset of visible symptoms until host death (~9 weeks). Presence of ZYMV in the leaf samples was detected immunologically using DAS-ELISA (Agdia, IN) and confirmed by RT-PCR, and were subsequently stored at -80oC. Although samples were collected from all of the infected plants in the field, a subset of samples that were spatially related to F8 were selected so that a total of sixteen samples representing six individual plants were used for next generation sequencing. This subset included one plant that was sampled at four time points: F8 (July 24th, August 8th, August 13th and August 28th); two plants sampled at three time points: F7 (August 30th, September 13th and September 20th) and G7 (August 30th, September 6th and September 20th); and three plants 46 sampled at two time points: E7 (September 13th and September 20th), E8 (September 13th and September 20th), and G6 (September 20th and September 30th) (Fig 4-1). Figure 4-1: Schematic representation of the field experimental design showing the spatial relationship between individual plants. The first two digits designate the plant coordinates within the field grid, and the number in parenthesis denotes the number of samples collected from an individual plant. F8 (4) in the bottom right hand corner is the original inoculant. The arrows represent transmission events by aphids. Greenhouse experiment 47 Two individual plants were mechanically inoculated in a greenhouse at The Pennsylvania State University in January of 2008 with a ZYMV sample taken from the first diseased individual from the 2007 season (F8). Inoculum was prepared from infected plant tissue diluted in a phosphate buffer (0.1 M Na2H/KH2PO4 buffer) in a 1:3 v/v ratio. Carborundum powder (500gm) was then rubbed on the surface of the leaf, and the inoculum subsequently applied to the leaf surface with a pestle. When the plants displayed disease symptoms and exhibited at least an additional eight leaves of growth from the inoculation site (typically 4 to 5 weeks), half of the fifth leaf each was used to inoculate another individual. This process was repeated up to the eighth generation. The other half of each leaf was stored at -80oC, and subsequently used for sequencing. RNA isolation and RT-PCR RNA was isolated from frozen leaf samples using the RNeasy® Plant Mini Kit (Qiagen, CA). First-strand cDNA was synthesized from the extracted RNA using five genome-specific primers, which were designed based on the reference strain, following the protocol provided by the supplier using the SuperscriptTM III First-Strand Synthesis kit (Invitrogen, CA). The target cDNA was then amplified directly via PCR using Phusion® High-Fidelity PCR Master Mix (Finnzyme, MA). PCR was conducted following manufacturers protocols with HF PCR buffer and 5 µl of first-strand product in a 50 µl total reaction volume. The following PCR conditions were used: 98°C for 1 min, 98°C for 10 s, 58°C for 20 s, 72°C for 1 min 20 s, for a total of 20 cycles with a final 5 min 72°C extension and held at 4°C. The 5 primers were designed with 560, 19, 141, and 151 bp overlap between amplicons across the genome. Primers: ZYMC_F1: (nt 2750 of the reference strain NC_003224.1) AGAAATCAACGAACAAGCAGACGA, ZYMC_R1: (nt 2199-2219) GCAACATCCATCAACGAAGGC, ZYMC_F2: (nt 1689-1708) GGGGG AAAGAGGGTATCATT, ZYMC_R2: (nt 3956-3973) CCAAGGGGCGTGTAGGTT, ZYMC_F3: (nt 3956-3974) TGAACCTACACGCCCCTTG, ZYMC_R3: (nt 6070-6088) 48 TGCCCTTGCCCATAAAATA, ZYMC_F4: (nt 5947-5970) GACGAAAGCACCC ATACAGACATA, ZYMC_R4: (nt 7808-7826) TGACCGACCCACCAATCCT, ZYMV_F5-2: (nt 5947-5970) GGTGGTTGGGATAGATTGATGAG, ZYMV_R5-2: (nt 9515-9534) TCCGACAGGACTACGGCATT. These primers allowed for coverage of 99% of the viral genome. Amplicon lengths were 2192, 2314, 2134, 1879, and 1859 bp in length. The five PCR products per viral sample were pooled and gel extracted using Zymoclean Gel Recovery kit (Zymo Research, CA) to remove background amplification product. After which the purified samples were quantified using a Qubit fluorometer (Invitrogen, CA). Illumina Library Construction Once quantified, samples were sheared using NEB Next dsDNA Fragmentase (New England Biolab, MA) following manufacturer’s recommendations. Approximately 300ng of pooled product were used for shearing to a desired size range of 100-300 bp. The reaction was terminated by adding 5 µl cold 0.5 M EDTA and cleaned with DNA Clean & Concentrator-5 kit (Zymo Research, CA). The fragmented samples were used for library construction following Mortazavi et al. protocol starting at blunt-end repair (2008). The following exceptions were made: each cleaning step was conducted using DNA Clean & Concentrator kit and blunt-end repair and ligation reactions were conducted using reagents from NEB. Samples were amplified and indexes were incorporated following standard indexing protocols with a total of 18 PCR cycles: 98°C for 1 min, 98°C for 10 s, 65°C for 30 s, 72°C for 30 s, and a final 5 min 72°C extension and held at 4°C. Samples were then PCR purified, quantified, and diluted to 10 nM concentration for Illumina sequencing. DNA sequencing was performed at the University of Southern California on an Illumina GAIIx with multiplexing (12 samples per lane for the first two lanes and eight on the last lane) for a total of three lanes on the same flow cell. Read accuracy and the identification of variant sites 49 We used a standard workflow for identification of variant sites on Galaxy (Goecks et al., 2010; Blankenberg et al., 2010) that can be accessed at http://usegalaxy.org/heteroplasmy. We altered the workflow by increasing the maximum edit distance to seven, and the minimum allowable coverage to a highly conservative value of 500X. The reads were mapped to the ZYMV reference genome (NC_003224.1) using a burrows wheeler alignment mapper (Li & Durbin, 2009), and subsequently transformed and filtered using Galaxy tools. Strand bias was accounted for such that any variance found at a site was validated in both strands in order to be considered a true variant. To control for mapping quality we excluded any sites that had a quality score less than 30 as compared with the illumina supplied control (PhiX 174). According to Illumina, with this quality score the inferred base call accuracy is 99.9%. To control for methodological errors introduced as a result of the experimental procedures we took an extremely conservative approach, excluding (i) any mutations that were present at a frequency of less than 1% and (ii) any sites where the coverage was less than 500X. All nucleotide sequences generated here have been submitted to GenBank and assigned accession numbers JN192405 to JN192428 Mutation analysis The consensus ZYMV sequence for each sample was manually aligned to the ZYMV reference strain using Se-Al (2.0a11; kindly provided by Andrew Rambaut, University of Edinburgh). Counts of the number of mutations in each sample were undertaken manually. To determine if there was an association between fluctuation in mutation frequency and time point, we performed a chi-square test of independence using the statistical package SPSS 13.0 (SPSS Inc., Chicago, USA). To test if the number of mutations per individual sample was significantly different between samples, we used both a two-sample t-test and a Mann-Whitney Test in the R software package (R 2.12.1; 2011), with which we also computed the spatial distribution of mutations using a Mann-Whitney U test. 50 Given that the frequency of ‘minor’ alleles (i.e. those < 50% in the population) was known, we used a binomial distribution (in the software package R) to determine the probability of uncovering that minor allele at increasing levels of coverage (number of reads). In addition, we resampled our Illumina data at progressively lower levels of coverage in order to determine how lower coverage levels can bias the discovery of true minor alleles. We ran a simulation (in R) in which we re-sampled our Illumina data at each base position in the genome. As we had excluded any variants that occurred at less than 1% frequency, we calculated the minimum threshold as the 99th percentile of a binomial distribution. Not only did this analysis indicate the coverage level at which all variants would be uncovered, but it also revealed how at low levels of coverage the discovery of true minor alleles tends to be biased. Results Genome Coverage 24 samples were successfully sequenced; 16 aphid vectored and eight mechanically inoculated. The proportion of the genome that was sequenced ranged from 76.5% to 95.4% with an average of 83.7% (Table 4-1). After filtering, coverage ranged from 2,243 to 12,507 reads per individual sample with the average coverage being 9,236. Given the high levels of coverage attained for a relatively large number of samples, we used these data as a baseline to run simulations in which we re-sampled the illumina reads using a 1% cutoff to determine the coverage level at which all variants in the population would be revealed. This analysis suggested that at very low levels of coverage (10X or less) variants tend to be oversampled leading to an overestimate of the number of mutations. In contrast coverage levels from 25X to 1000X lead to an underestimation of the mutational spectrum. For all 24 samples saturation, defined as the ability to sample all variants in that population, was reached at ~2,500X coverage (Fig 4-2). Since we averaged 9,236X coverage, we are confident that we have successfully uncovered the majority of the variants in our populations. 51 Table 4-1: Summary of genome coverage statistics of Illumina sequence data. For the field samples, the first two digits designate the plant coordinates within the field grid, and the number in parenthesis denotes the number of samples collected from an individual plant. 1 Total number of reads obtained for each sample 2 Total number of reads that mapped to the ZYMV reference strain allowing for a mismatch of 7 3 Level of coverage obtained before filtering 4 Level of coverage obtained after filtering 5 Proportion of the genome that we obtained coverage of after filtering 52 Figure 4-2: Representative simulation of the resampling of illumina reads to estimate the effect of coverage on the detection threshold of minor alleles. All samples used in the simulations produced comparable results, and all variants were uncovered by ~2,500X coverage. The dashed red line indicates the number of variants within the sample, so that points above the line indicate oversampling and those below undersampling. To further determine the power of our illumina coverage to detect low frequency alleles, we performed a bootstrap resampling analysis using the minor alleles found in the coat protein gene (CP). This region was chosen as we had previously cloned and Sanger sequenced the CP of these samples (Simmons et al., 2011). Six CP mutations were uncovered in the current study. None of which were detected in the previous study, and four of which were sampled only once, ranging from 1.7-4.6% in allele frequency (nucleotide positions 8547(1.7%), 8631 (4.6%), 9009 53 (3.4%) and 9358 (4.3%)). The other two were found in more than one sample with allele frequencies averaging 9.7% (8715) and 4.3% (9355). Accordingly, we found the level of coverage needed to detect a least one read for each allele frequency to be: 1.7% ~250X; 2.1% ~200X; 3.4% ~150X; 4.3% and 4.6% ~100X and 9.7% ~50X (Fig 4-3). Hence, attaining sufficient coverage is extremely important for detecting low frequency variants in a population, and for obtaining an accurate characterization of genetic diversity in viral populations. Figure 4-3: Effect of coverage in the probability of detecting the ZYMV coat protein alleles uncovered in this study. Probabilities were estimated assuming a binomial distribution. Each color represents a different mutation, labeled with their position in the genome and allele frequency in parenthesis. 54 Frequency and pattern of nucleotide variants A total of 93 variants (i.e. polymorphic mutations at a frequency >1%) were found across the data set as a whole: 66 were found in a single sample, and 27 were found in at least two samples. Two of the 27 were found only within the same individual, and 24 were found in more than one individual, suggesting that these mutations were spread between hosts. Among the full set of variants, 31/66 and 3/27 were nonsynonomous mutations (Table 4-2). In addition, 48/66 and 8/27 were unique to the field samples; 18/66 and 1/26 were unique to the greenhouse samples; and 18/27 were shared between both experimental conditions. A chi-squared test in which all 93 variants were considered indicated that the overall number of mutations generated in the field was significantly higher than in the greenhouse (χ2=29.17; P<1x10-4). However, the number of mutations per individual in the greenhouse and field was contrasted and no significant difference was detected (p=0.494 by two-sample t test; p=0.346 by Mann-Whitney). 55 Table 4-2: Summary of the 27 variants found in more than one sample. The numbers at each nucleotide position indicate how many samples within each group have a given mutation. Strikingly, among the 93 variants detected, 11 were present in every time point in an individual, or in all eight of the greenhouse samples. This indicates that these mutations are maintained during the course of infection and hence through any intra-host bottlenecks that have occurred. These comprised; two mutations in F8 (2205 and 7688); five mutations in F7 (1704, 7317, 7821, 7824 and 9463); six mutations in E7 (2205, 7317, 7688, 7821, 7824 and 9533); four mutations in G7 (6294, 7688, 8508 and 8517); two mutations in E8 (2205 and 7688), and one mutation in G6 (7688). For these conserved variants, we used a chi-square test of independence to determine whether they experienced changes in allele frequency over time. Interestingly, we observed an association between time point and allele frequency in all cases (p<1x10-4), such that 56 allele frequencies have increased rapidly through time as expected if they are selectively advantageous. In addition, seven mutations were present in at least one time point in every single field plant (nt positions: 1254, 2205, 4626, 7317, 7688, 7821 and 9463) indicating that these variants are maintained during inter-host transmission and hence through any population bottlenecks that have occurred at these times. All but one of these mutations (1254) were also found in at least one greenhouse sample. In the greenhouse samples, three mutations were shared across serial passages (1701, 1704 and 7688). The average number of mutations between our samples and the reference strain NC_003224.1 (a Taiwanese isolate) is 464 (5.78%), which is compatible with previous studies using consensus sequences (Simmons et al., 2008). We also compared the variants found in this study to the other 24 full-length ZYMV genomes published on GenBank. Of the 66 mutations observed in a single sample found in this study, 25 were present in the GenBank sequences, as were 16 of the 27 polymorphic variants, including all seven mutations that were found to be present at least once in every individual, suggesting that these variants may exists as polymorphic sites in natural populations. Variation in Allele Frequency Of the 27 variants present in more than one sample, we found two cases (positions 2205 and 7317) in which the originally ‘minor’ allele (defined as initially less than 50% frequency; in these cases 35.2 % and 1.6%, respectively) approached fixation in later samples (both allele frequencies reached 98%; Fig 4-4, a and b). In addition, these fixation events occurred rapidly, taking only in 59 days in both cases. Interestingly, these same two nucleotide positions are present as polymorphic sites in the 25 ZYMV full genome sequences on GenBank (2205 in 11/25 and 7317 in 6/25), suggesting that these sites may be polymorphic in nature and may confer a selective advantage in some host genotypes (or host species) or under some environmental conditions. The latter idea is supported by the fact that allele frequency changes appear to be 57 affected by environmental conditions. For instance, the minor allele at nucleotide position 7317 increased to fixation in both time and space in the field. However after an initial decrease, the frequency remained constant through transmission events in the greenhouse, where environmental conditions are relatively constant; the first greenhouse sample the allele frequency was 19%, dropped to 2% in the subsequent host, and averaged 2.5% in the remaining hosts. Although not as striking, a similar trend was observed at nucleotide position 7688 (data not shown). An additional two cases where the minor variant increased as the virus spread in the field from the original inoculant, although did not approach fixation, were also observed (nt 1254 and 9533) (Fig 4-4, c & d). At position 1254 the frequency in the original inoculant is 2.6% and subsequently increases to 28.6%, while at position 9533 the frequency increases from 1.2% to 15%. Figure 4-4: Variation in allele frequency over time and space of ZYMV variants. The 3D graphics show changes in allele frequency (y-axis) during within-host infection. The x-axis shows 58 variation over time, or intra-host variation. The z-axis shows variation over space, or between- at nucloetide positions 2205 (A), 7317 (B), 1254 (C) and 9533 (D). The data corresponds to the field experiment. Spatial Distribution of Mutations We used a bootstrap method to infer whether mutations were spatially clustered across the genome compared to a null model, which assumed random mutation placement. Bootstrap distributions and null distributions were calculated for the index of dispersion statistic, and then compared using the Mann-Whitney U test (using R 2.12.1; 2011). Interestingly, field mutations showed evidence of significant spatial clustering (p<1x10-4). In contrast, there was no significant spatial clustering of mutations in the greenhouse samples (p-value~1) (Fig 4-5). We looked at the number of mutations per gene region, and using a chi-square fitness of fit test (in R) determined that that the number of mutations per gene region was greater than would be expected by chance in only two regions: Nlb in the field samples and HC-Pro in the greenhouse samples. We also found one region in the greenhouse samples (CI) in which the number of mutations was less than would be expected by chance alone, although these results are strongly dependent on the level of coverage attained. Despite the relatively high number of mutations observed, those genomic regions previously suggested to constitute conserved domains in ZYMV were also conserved in our analysis, indicating that mutations in these regions are strongly deleterious and removed rapidly within hosts. For instance, all of the regions known to be necessary for aphid transmission – the PTK and KLSC regions in the HC-Pro, and the DAG region in the CP – were conserved in our samples. 59 Figure 4-5: Distribution of mutations across the ZYMV genome under field and greenhouse conditions. The length of the ticks indicates the relative number of samples with that mutation. Discussion Although population bottlenecks are expected to be strong both within and between hosts, nearly 30% of the variants we detected within our viral populations were found in more than one sample, either within the same or a different plant. As such, the population bottlenecks that shape the evolution of plant RNA viruses may not be as large as previously suggested, although this will clearly vary in a virus-specific manner. Of equal importance was the observation that some of the initially ‘minor’ alleles rapidly went to fixation in the aphid vectored plants, but remained at low frequency in the mechanically inoculated plants, suggesting that they are strongly selectively advantageous in the former environment. The dramatic increase in allele frequency for some of these alleles in the aphid-vectored plants (e.g. 1.6% to 98% at nucleotide position 7317), was observed in more than one plant. This result argues strongly for natural selection and against genetic drift as the main mechanism generating the differences between the 60 allele frequencies in the greenhouse and field populations, as the latter process is expected to result in fixation events over much longer time-scales. The average time for fixation of a neutral mutation in a haploid population is Ne x generation time, which will generally equate to timescales measured in years, whereas the change in allele frequency recorded here has occurred over a time period of only two months. Also of interest in this context was the observation that regions known to be involved in aphid transmission were conserved in all of the samples analyzed in our study. Hence, the natural selection we observed is unlikely to be directly linked to transmission events, although it may be indirectly linked through host-virus, or host-vector rather than to vector-virus interactions. Specifically, it is believed that compositional differences in salvia among aphid species may result in differential viral transmission (Pirone & Perry, 2002). There is also evidence that the virus may manipulate host factors to increase the plant’s attractiveness to potential vectors by modulating color changes associated with infection (Ajayi & Dewar, 1983), olfactory cues in the form of volatile compounds (Ngumbi et al., 2007; Medina-Ortega et al., 2009, Mauck et al., 2010), as well as altering the mechanisms involved in virus acquisition. In addition, host factors may be involved in optimizing vector transmission. For example, in Cauliflower mosaic virus (CaMV) virus inclusion bodies have been shown to control aphid-mediated transmission (Espinoza et al., 1991; Khelifa et al., 2007). Although little is known about the specific mechanisms underlying these processes, it is possible that the differences in selection pressures found in the present study may be due to the absence of the aphid vector in the greenhouse experiment. This possibility notwithstanding, the effect of other environmental differences between the field and the greenhouse experiments on allele frequencies should also be investigated. For example, the greenhouse environment is relatively stress free, as the plants are watered regularly, maintained within a narrow range of temperatures, have ample room and light, and are sprayed regularly with insecticide to prevent herbivory. This is in direct contrast to our 61 field plants that are subjected to the vagaries of nature and experience a variety of biotic and abiotic stresses, such as drought, herbivory and competition. As the transmission events undertaken in the greenhouse represent a release from the aphid vector, and hence a release from the large population bottleneck imposed by aphid transmission, and the innoculum dose was large (half a leaf, which ensures inoculation at saturation), we might expect that the amount of genetic diversity being transmitted between greenhouse plants to be significantly greater than in the field. It is therefore surprising that our results indicated that greater genetic diversity is transmitted in the field experiment. Indeed, an average of only 0.5-3.2 Potato virus Y virions are transmitted per aphid in in vitro experimental systems (Moury et al. 2007), with similar numbers reported in vivo (Betancourt et al., 2008). However, these estimations do not consider the huge number of aphids that may be involved in transmitting the virus, and which could potentially overwhelm the population bottlenecks induced by single transmission events. In support of this, experiments using suction traps found that although aphid population size tends to fluctuate both in terms of year and location, very high population numbers can be achieved (Katis et al. 2006), with up to 40,000 aphids being counted in one location in one year (range 2,179 - 41,851). Similarly, studies have revealed up to four alatae and 400 apterous aphids (non winged) per leaf per time-point on C. pepo (zucchini) (Hooks et al. 1998). As the incidence of ZYMV has been shown to be correlated with total aphid numbers (Basky et al., 2001), the effect of aphid population size on the effective population size of viral populations in individual plants clearly needs to be examined in more detail. It is also possible that the lack of severe bottlenecks in this study may be due in part to the fact that helper-dependent transmission, such as occurs with ZYMV, may be less prone to severe bottlenecks than transmission where the virions interact directly with the aphid stylet. Specifically, the HC-Pro and virion do not have to be acquired simultaneously. As long as the helper protein is capable of interacting with the aphid stylet it can assist in the transmission of 62 virions acquired from other parts of the host or even from different hosts, thus ameliorating the effect of the population bottleneck. This is in direct contrast with viruses that interact directly with the vector (Pirone & Blanc 1996). Thus, it is possible that multiple aphids transmitting the virus between hosts, as well as the fact that that ZYMV is vector transmitted via the HC-Pro, maintained levels of genetic diversity in our study. The genetic resolution we have achieved in this study is clearly a reflection of the deepamplicon sequencing used here. A previous study using some of the same samples, for which cloning and Sanger sequencing of the CP region was undertaken, revealed that no mutations were transmitted between individuals or within plants (Simmons et al., 2011), in marked contrast to the results obtained here. Our simulations revealed that to reach saturation and detect all variants in the population (assuming a 1% cutoff), a coverage level of ~2,500X is needed in order to sample all of the variants present in our populations. We also determined that the probability of detecting an allele that comprises ~10% of the population at least once requires approximately 50X coverage, and to detect an allele present at 1.7% frequency at least once requires a minimum coverage of 250X. Given that in our previous study we averaged 35 clones per sample, it is not surprising that we were unable to uncover these mutations. More than two thirds of the mutations observed in this study were observed in a single sample only (66 out of 93). Thus, although there is some transmission of variants both inter- and intra-host, the majority of the mutations generated were not transmitted either inter- or intra-host. Whether this is the result of population bottlenecks restricting viral genetic diversity, purifying selection acting on the viral population, or some combination of both still needs to be determined. However, the majority of single nucleotide substitutions in RNA viruses are likely to be deleterious (Sanjuan et al., 2004). Hence, given that approximately half of these mutations (31/66) are nonsynomomous (compared to 3/27 mutations found in more than one sample), and that we previously detected the mean dN/dS ratios among these populations to be ~0.6 (the coat 63 protein region only) (Simmons et al., 2011), it is probable that many of the mutations that occurred in only one sample are also deleterious and will subsequently be purged from the population. Overall, our study reveals that, although the majority of the mutations generated within viral populations may be deleterious, some mutations are clearly transmitted both within and among hosts and despite the presence of population bottlenecks. Hence, although stochastic processes must clearly play a role in structuring viral populations, these may be insufficient to negate the action of natural selection. This latter point is dramatically highlighted by the fact that we uncovered minor allele variants that approached fixation in time and space, strongly suggesting that they are selectively advantageous. These findings therefore attest to a complex pattern of changing genetic diversity in an emerging RNA virus, and will contribute to a more complete understanding of the dynamics of evolutionary change with implications for the management of emerging viral diseases. Chapter 5 Experimental verification of seed transmission of Zucchini yellow mosaic virus Abstract Within two decades of its discovery, Zucchini yellow mosaic virus (ZYMV) achieved a global distribution. However, whether or not seed transmission occurs in this economically significant crop pathogen is controversial, and the relative impact of seed transmission on the epidemiology of ZYMV remains unclear. Using reverse transcription polymerase chain reaction, we observed a seed transmission rate of 1.6% in Cucurbita pepo subsp. texana and show that seed-infected C. pepo plants are capable of initiating horizontal ZYMV infections, both mechanically and via an aphid vector (Myzus persicae). We also provide evidence that ZYMV infected seeds may act as effective viral reservoirs, partially accounting for the current geographic distribution of ZYMV. Finally, the observation that ZYMV infection of C. pepo seeds results in virtually symptomless infection, coupled with our finding that an antibody test failed to detect vertically transmitted ZYMV in infected seed, highlights the urgent need to standardize current detection methods for seed infection. Introduction Since the discovery of Zucchini yellow mosaic virus (ZYMV) in Italy in 1973, and its subsequent description in 1981 (Lisa et al., 1981), this emerging RNA virus has spread rapidly and achieved an effectively global distribution (Debiez & Lecoq 1997). Although a number of explanations have been put forward to account for the widespread geographic distribution and persistence of this virus, including the international trading of infected fruit, plants, or seeds, as well as overwintering in alternative hosts and noncolonizer aphids, the mechanisms underlying the rapid dissemination and persistence of ZYMV remain unclear (Lecoq et al., 2003). ZYMV is 65 a single-stranded positive-sense RNA virus of the family Potyviridae that can result in yellowing and stunting of the plant, as well as severe leaf and fruit deformities that can reduce yields up to 94% (Blua & Perring, 1989). Given that cucurbit (squash, melon, and cucumber) production in the United States alone is estimated to be worth approximately $1.5 billion per year (Cantliffe et al., 2007), the economic significance of this crop pathogen is enormous. Understanding the epidemiology and evolution of ZYMV is therefore central to controlling this devastating crop disease. Viral transmission generally occurs in one of two ways: horizontally, which is the transmission of the virus between unrelated hosts, or vertically, which is the transmission of the virus from parent to offspring. ZYMV is horizontally transmitted in a nonpersistent manner by at least 26 aphid species (Katis et al., 2006). Transmission occurs as a result of an interaction between the stylet of the aphid, the helper component protein (HC-Pro), and the conserved DAG (Asp- Ala-Gly) region of the coat protein (CP) (Pirone & Blanc, 1996). However, the current worldwide distribution of ZYMV is unlikely to have resulted from aphid transmission alone, particularly as the aphid vector remains viruliferous for a very limited time period (~5 h at 21°C) after acquisition of the virus (Fereres et al., 1992). Hence, it has been suggested that the longdistance spread of ZYMV may be the result of vertical transmission via infected seeds rather than horizontal transmission by aphids (Davies & Mizuki, 1986; Debiez & Lecoq, 1997; Fletcher et al., 2000; Lecoq et al., 2003; Schrijnwerkers et al., 1991; Tobias & Palkovics, 2003). Whether or not seed transmission of ZYMV occurs remains controversial. This controversy is due in part to the fact that the reported rates of seed transmission in cucurbits range from 0 to 18.9% (Davies & Mizuki, 1986; Debiez & Lecoq, 1997; Fletcher et al., 2000; Gleason, 1990; Lecoq et al., 2003; Muller et al., 2006; Riedle-Bauer et al., 2002; Robinson et al., 1993; Schrijnwerkers et al., 1991; Tobias & Palkovics, 2003). Accurately determining the rate of seed transmission of ZYMV is of fundamental importance for understanding the epidemiology of this major plant-pathogenic virus 66 and for developing and implementing strategies to control it. Some of the reported variation in the estimates of seed transmission rates in ZYMV undoubtedly results from differences in detection methods. For instance, using an enzyme-linked immunosorbent assay (ELISA)-based method, Davis and Mizuki (1986) found 18% (246 of 1,299) of Cucurbita pepo (Black Beauty zucchini) seedlings to be infected with ZYMV. Similarly, Fletcher et al. (2000), using DAS-ELISA, observed seed transmission rates of 3.5% for ZYMV in C. maxima Duchesne (buttercup squash). However, their results should be interpreted with caution because they also observed a 2% transmission rate of ZYMV in their controls (possibly as a result of virus particles remaining on the seed coat). Muller et al. (2006) using DAS-ELISA detected ZYMV in two of 1,000 asymptomatic Cucumis sativus L. (cucumber), C. pepo L., and C. maxima Duchesne (pumpkin) that grew from seeds from infected plants, while ZYMV was detected in 1.4% (15 of 1,031) seedlings of C. pepo var. styriaca (naked seed pumpkin mutant) using a combination of both DAS-ELISA and reverse transcription–polymerase chain reaction (RT-PCR) (Pirone & Blanc, 1996). More recently, Lecoq et al. (2003) mention unpublished data in which no seed transmission of ZYMV was observed in 70,000 seedlings from various Cucurbitaceae. Other studies suggest that there is only minimal, if any, seed transmission of ZYMV in Cucumis melo L. (melon) (Gleason, 1990), and that ZYMV transmission through seeds is probably of no epidemiological importance (Muller et al., 2006). Finally, interpretations on rates of seed transmission for ZYMV also differ. For instance, Robinson et al. (1993) found seed transmission rates of only 0.07% in various cucurbits and concluded that seed transmission does not occur in ZYMV, while Schrijnwerkers et al. (1991) found a seed transmission rate of 0.05% in C. pepo (zucchini) and concluded that seed transmission does occur in ZYMV. Similarly, Tobias and Palkovics (2003) reported symptomatic infections of <0.5% seeds of ZYMV-infected plants from C. pepo var. styriaca (hull-less seeded oil pumpkin seeds) and concluded that seed transmission does occur at very low rates. 67 To determine what contribution seed transmission has on the epidemiology of ZYMV, we used C. pepo subsp. texana (wild gourd) as a model system and measured the seed transmission rate of ZYMV by visual inspection, RT-PCR, and antibody tests (ImmunoStrips; Agdia, Elkhart, IN). Seed transmission of ZYMV is only epidemiologically significant if vertically infected plants are capable of initiating additional infections via horizontal transmission. To test for horizontal transmission, we assayed the ability of the vertically infected plants to initiate infection via mechanical inoculation and tested for the ability of an aphid vector (Myzus persicae (Sulzer)) to nonpersistently transmit ZYMV from vertically infected plants to healthy seedlings. Methods Field experiment We harvested approximately 6,000 seeds (count estimated by weight) at the end of the 2008 growing season from ZYMV-infected C. pepo subsp. texana plants growing in four experimental fields at The Pennsylvania State University Agricultural Research Farm at Rock Springs, PA. The 0.4-ha experimental fields were laid out with 180 plants per field with approximately 6 m between plants. A healthy texana plant that was mechanically inoculated with ZYMV was placed in the middle of each field to serve as a virus source, and the virus was subsequently spread to neighboring plants via aphids. The seeds were extracted in 4% hydrochloric acid and washed in a 10% bleach solution to ensure that any viral infection that occurred was not simply the result of virus on the seed coat, but rather the result of embryonic infection. The seeds were then germinated in flats in a greenhouse. At the third true leaf stage, ZYMV infection was determined visually, and if no symptoms were present the seedling was discarded. Based on visual symptoms showing only slight leaf deformations, two out of 3,195 plants had ZYMV, which was verified by RNA extraction, RT-PCR, sequencing, and cloning. In fact, the symptoms were so mild that they could have been easily overlooked or considered to be 68 normal in appearance. We subsequently pooled an additional 281 symptomless seedlings in groups of 10 for a total of 29 groups, 28 with 10 seedlings apiece, and as there was a final single seedling, this was treated as an individual group. These 29 groups were tested for ZYMV via RTPCR. When a group consisting of 10 seedlings tested positive for infection, this result was taken to mean that one of 10 plants was infected. As this interpretation could have underestimated the number of infected seedlings, our estimate of the seed transmission rate is conservative. Because we knew the proportion of samples that tested negative, we used both the binomial distribution and the Poisson distribution to estimate the probability that more than one seedling would test positive in the same sample. At the end of the 2009 season, we again collected fruits from field plants that had been naturally infected with ZYMV via aphid transmission. Although all of the plants displayed classic visible signs of ZYMV foliar infection such as deformed, stunted leaves with yellow mottling, the majority of the fruits showed no symptoms and appeared healthy. The seeds were extracted and cleaned as described above, and seeds from individual plants were pooled. The seeds were planted in flats in the greenhouse. At the third true leaf stage, a leaf tissue sample was collected and frozen at –80°C for analysis from each of 2,336 seedlings. Samples were pooled into batches of 10 for extraction, cDNA synthesis, and PCR. Two plants that tested positive by RT-PCR were also tested for ZYMV using ImmunoStrips as per the manufacturer’s protocol. The ZYMV ImmunoStrip is polyclonal and able to detect a number of isolates, including the CT, USDA, SJBCA, CA, IT, NY, FL, and Z18 strains. RNA isolation, PCR analysis, cloning and sequencing RNA was isolated from frozen leaf samples using the RNeasy Plant Mini Kit (Qiagen, Valencia, CA). First-strand cDNA was synthesized from the extracted RNA using the Superscript III First-Strand kit (Invitrogen, Carlsbad, CA) as per the manufacturer’s protocol, and the target cDNA was then amplified directly via PCR using Phusion High-Fidelity PCR Master Mix (Finnzymes, Espoo, Finland; distributed by New England Biolabs, Ipswich, MA). PCR 69 amplification was performed for 35 cycles (Step 1: 98°C for 1 min, Step 2: 98°C for 10 s, Step 3: 64°C for 20 s (minus 1°C every cycle), Step 4: 72°C for 40 s, Step 5: cycle to step 2 for 2 cycles, Step 6: 98°C for 10 s, Step 7: 62°C for 20 s, Step 8: 72°C for 40 s, Step 9: cycle to step 6 for 31 cycles) followed by a final extension for 5 min at 72°C. The coat protein (CP) specific primers used for the cDNA synthesis and PCR were: forward: AAGTGAATTGGCACGCTA; reverse: CGGTAAATATTAGAATTACGTCG. To verify that the PCR product was indeed ZYMV, four samples were submitted for sequencing at the Penn State Genomics Core Facility (The Pennsylvania State University, University Park, PA). Each sample was purified with the QIAprep Spin Miniprep Kit (Qiagen). Two samples were cloned using the TOPO TA Cloning Kit (Invitrogen) prior to which each sample was purified using QIAquick PCR Purification Kit (Qiagen) and an A overhang was added to each sample. Approximately 40 clones were submitted from each sample for sequencing. To ensure that mutations were valid, each clone was sequenced in forward and reverse, and manually aligned in the Se-Al (2.0a11) package kindly provided by Andrew Rambaut (University of Edinburgh, UK). Any mutations occurring in only one direction were discarded. This resulted in 71 reliably sequenced clones. The sequences were then trimmed to cover the majority of the CP region: from the CP start codon to nucleotide (nt) 773. T7 forward and M13 reverse primers were used for clone sequencing. All sequences generated have been submitted to GenBank and assigned accession numbers (HQ543133 to HQ543139). Mechanical inoculation To determine if mechanical transmission can occur from seed-infected plants, we grew six healthy seedlings (noninfection determined via RT-PCR), which we mechanically inoculated with ZYMV-infected tissue from three vertically infected plants. Each infected plant was used to inoculate two healthy seedlings apiece. A ~3 cm2 piece of infected leaf tissue was ground in liquid nitrogen prior to being diluted in a phosphate buffer (0.1 M Na2H/KH2PO4 buffer) in a 1:3 70 ratio. Carborundum powder was dusted on the surface of the leaf, and the inoculum was then applied with a pestle to the leaf surface. Horizontal transmission from vertically infected plants We used Myzus persicae to determine if an aphid vector could transmit the virus from a vertically infected plant to healthy seedlings. As a positive control, we assayed a mechanically inoculated infected plant that displayed severe ZYMV symptoms. A leaf was cut into seven portions, and ~25 aphids were allowed to feed on six of these in the dark for 30 min, the seventh served as a negative control. The leaf portions were then placed on noninfected seedlings (noninfection was checked by RT-PCR) at the first true leaf stage, and these plants were left overnight. The following day, the plants were sprayed with Endeavor (pymetrozine) (Syngenta, Guelph, ON) diluted per the manufacturer’s protocol (0.34 g/liter) and applied at a rate of 1 liter per 46.5 m2 to kill the aphid populations, and the seedlings were left in the spray chamber overnight before being returned to an aphid-free greenhouse. After approximately 3 weeks, a leaf sample was collected from each seedling, and infection was determined by RT-PCR. The same procedure as described above was then used to test if horizontal transmission could occur from seven seed-infected plants using 10 to 60 aphids per leaf portion. Each plant was used to infect six healthy seedlings (noninfection was checked by RT-PCR), with an additional healthy plant serving as a control for a total of 42 seedlings. Results Immunostrips testing The two seed-infected plants that tested positive for ZYMV via RT-PCR tested negative using an antibody test (ImmunoStrips from Agdia). In contrast, the mechanically inoculated positive control plant tested positive using the same test. Seed transmission rate In 2008, two individual seedlings and four of 281 (1.42%) samples were infected with 71 ZYMV (verified by RT-PCR). In 2009, 36 of 2,336 (1.54%) were infected. Hence, a total of 42 of 2,619 samples, or 1.6%, were infected by seed transmission. Using a binomial distribution, we estimated the probability that an individual seedling would test positive to be 1.66%, while under a Poisson distribution we estimated the same probability to be 1.67%. Thus, we believe that our estimate of a seed transmission rate of 1.6% accurately reflects the data. Horizontal transmission from vertically infected plants We used three seed-infected plants to mechanically inoculate a total of six healthy seedlings (two apiece). From these, we found four (66.67%) ZYMV-infected seedlings using RTPCR. When seven ZYMV-infected plants derived from infected seed were used as source plants for the aphid transmission tests, the number of seedlings subsequently virus inoculated by the aphids was three out of 42 (7.14%). This seedling infection rate of 7.14% was verified by RTRCR, and none of seven control plants fed on by nonviruliferous aphids became infected. In contrast, when one mechanically infected ZYMV was used as a source plant for an aphid transmission test, four of the eight (50%) became infected. Genetic diversity of ZYMV We generated a total of 71 coat protein (CP) clones from two vertically transmitted plants. Within this sample we found a total of seven mutations, three from one plant (designated seed-2 (S2)) and four from the other (S1). Five of the mutations were singletons (i.e., only observed once in the alignment), and the other was observed in two clones from the same plant (S1). Four mutations were synonymous and three were non-synonymous. A minimum spanning tree, displaying the mutations observed, how they differed from the consensus, as well as a marked absence of phylogenetic structure (i.e., all mutations are one step away from the consensus), was estimated using the statistical parsimony approach available in the TCS 1.21 program (Clement et al., 2000) (Fig 5-1). 72 Figure 5-1: Minimum-spanning tree of the seed clones. Numbers along branches represent the nucleotide position at which each mutation occurred. Number of clones with a particular mutation is one unless otherwise noted within the oval. S1 and S2 designate seed samples one and two, and the next two digits the clone number. It is theoretically possible that the sequences we obtained could be the result of escaped CP transgenes from deregulated transgenic squash rather than due to seed infections (H. Lecoq, personal communication). However, after cloning and sequencing two of the ZYMV samples, we found that all 71 clone sequences contained 22 amino acids from the protein immediately preceding the CP (the nuclear inclusion b). As the transgene consists of the CP alone, sequencing a portion of the nuclear inclusion b precludes the possibility that the obtained sequences were derived from an escaped transgene. Discussion We observed a seed transmission rate of 1.6% for ZYMV in C. pepo subsp. texana, as 73 well as evidence that vertically infected plants can act as reservoirs for horizontal transmission. This rate is theoretically high enough for infected seeds to constitute a viable route by which ZYMV epidemics are initiated and hence may be partially responsible for the current geographic distribution of this devastating crop pathogen. Indeed, trace seed infection (0.001) in lettuce mosaic potyvirus has been shown to be sufficient to affect lettuce production due to the subsequent spread of the virus by aphids (Johansen et al., 1994). We therefore believe it is plausible that a seed transmission rate of 1.6% may be able to initiate yearly epidemics. Also of note is that the DAG motif, which is known to be involved in aphid transmission (Gal-On, 2007), was not mutated in any of the cloned sequences. Notably, the infected plants were essentially symptomless. Other than the occasional leaf curling on the first true leaves, which could also occur as the result of mechanical damage when emerging from the seed coat, the plants looked healthy and displayed no mottling, yellowing, or stunting as normally seen with ZYMV infection. This finding may account for the low transmission rates reported by authors who used visual inspection as their primary ZYMV detection method, thereby leading to an underestimation of the true transmission rate. For example, Gleason (1990) determined infection based on visual symptoms and reported that only three of 6,800 C. melo (melon) seedlings displayed the typical ZYMV symptoms of foliar distortion, mosaic and stunting. The lack of obvious viral symptoms also implies that it may be difficult to identify the source of a ZYMV epidemic. In addition, it is possible that healthyappearing, seed-infected seedlings might be involved in the global spread of the virus. Lecoq et al. (2003) demonstrated that melon fruits displaying light disease symptoms of ZYMV infection were capable of transmitting virus via aphids at a 5% rate. It is possible that apparently healthy looking fruits may also be instrumental in disseminating this virus. However, we previously determined a very strong spatial clustering of ZYMV by country of origin (Simmons et al., 2008). This suggests that although there is international gene flow of ZYMV, it does not completely 74 disrupt biogeographic structure, which is itself more suggestive of intermittent gene flow via the international seed trade than seed transmission via cultivated cucurbits. Infected seed as a reservoir of ZYMV is further supported by the observations that overwintering sources of ZYMV are scarce, especially in temperate regions (Lecoq et al., 2003), and there are few if any alternative hosts of ZYMV (Pirone & Blanc, 1996). In addition, Schrijnwerkers et al. (1991) found that seed transmission rates tend to vary depending upon the age at which the plant becomes infected with ZYMV, with plants infected at an earlier growth stage producing more infected seed. Thus, reservoirs of ZYMV may not remain constant over time, which would explain the observation that ZYMV epidemics often skip years (GraftonCardwell et al., 1996; Lecoq et al., 2003; Luis-Atreaga et al., 1998; Rubies-Antonell et al., 1996). That the ImmunoStrips tested negative while we were able to detect ZYMV via RT-PCR may help to explain the conflicting vertical transmission rates found in the literature. Given that the immunostrip is polyclonal, it is unlikely that the negative result is due to strain differences. As we only detected a small number of mutations in the clones, it is possible that the inability of the ImmunoStrips to detect ZYMV may be the result of lower virus titers in the seed-infected samples. However, as we only sequenced 773 nucleotides (out of 849 from the CP start codon to the CP stop codon) of the CP, it is possible that CP gene may have accumulated a sufficient number of mutational differences that antibodies are no longer able to react with it. The failure to detect vertically transmitted ZYMV using non-PCR techniques coupled with our findings that vertically infected ZYMV can be horizontally transmitted has implications for the international seed trade. Currently, three major organizations publish standardized testing methods for seed health: International Seed Testing association (ISTA), International Seed Health Initiative (ISHI), and in the U.S. the National Seed Health System (NSHS). Of the 14 approved methods for virus detection in seeds, three use indicator plants while the remainder use ELISA testing (Munkvold, 2009). We therefore suggest that one of the primary objectives for control 75 strategies for ZYMV should be the establishment of standardized testing protocol for the detection of vertical infection in seeds. 76 Chapter 6 Discussion This thesis explores the effect of transmission mode on the genetic diversity and epidemiology of an emerging RNA virus, Zucchini yellow mosaic virus (ZYMV). Collectively these studies provide information on the evolutionary rate of ZYMV, the manner in which viral lineages are transmitted among hosts, the magnitude of transmission and systemic bottlenecks, the amount and patterns of genetic diversity that are generated during infection, as well as the rate of vertical transmission and its effect on the epidemiology of this virus. Chapters two, three and four examine the genetic variation and underlying mechanisms of evolution in populations of ZYMV at different scales ranging from the between population level to the within individual level. The first study examined ZYMV at the between population level and revealed that the nucleotide substitution rate of this plant RNA virus fell within the range of those that have been observed for animal RNA viruses, which was contrary to the prevailing thought. The scope and depth of chapters three and four was qualitatively different from all previous studies on evolving populations, as the then current literature lacked information on intra-host plant RNA viral diversity, the effects of population bottlenecks on viral genetic diversity in vivo, and on how these two aspects of the viral lifecycle interact with one another to shape evolution. These studies describe the patterns and amount of intra-host genetic diversity in ZYMV and elucidate how this diversity is affected by the population bottleneck imposed by the host plant during systemic movement. In addition, these studies examine how this intra-host genetic variation is affected by the genetic bottleneck imposed by the aphid during inter-host transmission. These studies reveal that most intra-host mutations are deleterious and thus tend to be removed rapidly from the population. This is further supported by a comparison of the dn/ds ratios calculated at the between population level (chapter two) with that calculated at 77 the within individual level (chapter three). That there appeared to be more purifying selection acting at the between population level (~0.1) than within individual hosts (~0.6) strongly suggests that a large proportion of the mutations that are generated within hosts are not maintained at the population level. The third study revealed that both inter- and intra-host population bottlenecks are not as extreme as had been previously hypothesized. The fourth and final study revealed that the vertical transmission rate of ZYMV is 1.6% and that seed transmission of ZYMV may be instrumental in the worldwide dissemination of this virus. These studies not only consider genetic diversity at increasingly finer scales in terms of moving from the between population level to the within individual level, but also at increasingly deeper levels of coverage. The first study examined the evolutionary dynamics of ZYMV using consensus sequences derived from populations from widely divergent geographic regions. However as consensus sequence data masks the sequence variation among individual genomes in order to gain a deeper understanding of inter- and intra-host genetic diversity in chapter three I generated clones from ZYMV infected plants from our experimental fields as well as from serial passaged greenhouse samples. From these samples I averaged ~35 clones per sample for 20 samples. Since this level of coverage would not allow for the detection of low frequency alleles within these viral populations I undertook next generation sequencing of these same samples (with some modifications) in chapter four and achieved an average coverage level of ~9,000X. In chapter three I found that no mutations were transmitted between or within hosts, however in chapter four with the deeper level of coverage obtained I found that mutations do in fact persist both inter- and intra-hosts. A comparison of the methods used in chapters three and four, as well as the analyses in chapter four, suggests that in order to uncover the full extent of genetic variation within a population the level of coverage obtained is an extremely important parameter. Conventional sequencing is limited by practical constraints such as time and finances, with the 78 effect that achieving this level of coverage is highly unlikely and only a relatively small number of individuals from any one population are typically sampled at any one time. Furthermore I show that the error rate inherent in the RT-PCR procedure may skew the results obtained from mutation analyses of RNA viruses. In the second study, I estimated that approximately 40% of the mutations uncovered could be due to procedural error associated with the reverse transcriptase and the PCR step. There is considerable variation in the reported rates of intra-host genetic variation in plant RNA viruses and it is possible that these discrepancies could be the result of artefactual mutations. It would be extremely difficult, if not impossible, to determine which mutations are real and which are the result of procedural error particularly in the case of singletons; however, this problem can be compensated for by increasing the depth of sequencing and obtaining higher levels of coverage. Thus, it appears that as a result of the high levels of coverage achieved through deep sequencing approaches that they may be a superior choice for elucidating genetic variation within viral populations. These analyses reveal that results, and by extension the conclusions derived from these results, can be strongly biased by the choice of methods used to generate the data. As a further case in point, in chapter five we revealed that the vertical transmission rate of ZYMV, and its impact on the epidemiology of this virus was a controversial issue mostly as a result of variation in detection methods. A survey of the literature indicated that detection methods included visual inspection, antibody testing as well as RT-PCR testing resulting in estimates of vertical transmission that ranged from 0-18.9%. I determined that vertical infections were often symptomless thus visual detection would naturally lead to many false negatives. Likewise I discovered that antibody testing failed to detect vertically acquired viral infections that were detectable via RT-PCR. This possibly accounts for the range in vertical transmission rates recorded in the literature, and highlights the need to standardize detection methods for detecting viral infections in crop seeds. 79 In chapter four I demonstrated that the population bottlenecks imposed as the viral population moves through the plant both cell-to-cell as well as organ-to-organ were not as severe as had been previously suggested. This was evidenced by the persistence of mutations within individual plants over the course of infection. However, this phenomenon needs to be investigated in greater detail, particularly as our results appear to be contrary to those published to date. Thus, an assessment of the genetic diversity of the viral populations via Illumina sequencing as the virus moves from leaf to leaf within a plant, would be instrumental in teasing apart the population bottleneck imposed on the virus by the host plant during systemic movement. Given the extremely high levels that aphid populations can achieve in agricultural fields it is highly probable that population bottlenecks imposed by both the aphid vector during transmission and by the host plant as the virus moves systemically may be overwhelmed by the sheer number of transmission events both between individual plants as well as within the same plant. This is hinted at by the fact that there was more population structure in the clones derived from the aphid vectored samples compared to the mechanically inoculated samples in chapter three, and underscores the need to study these systems in nature. These findings also highlight the problems inherent in applying in vitro results to in vivo systems. The phylogeographic analysis undertaken in the first study hinted that the manner in which ZYMV was globally distributed required a mechanism that could both efficiently transport the virus across geographic boundaries as well as effectively initiate horizontal infections. I found significant clustering by country of origin, as well as by continent, which suggests that although movement of ZYMV can and does occur it is not frequent enough to disrupt this geographical structure. In chapter two, I assumed that vertical transmission was probably not the cause of this movement, particularly as most of the literature at the time suggested that ZYMV infected seeds were of little to no epidemiological significance. However, in chapter five, I demonstrated that infected seeds not only result in infected plants but that this infection could be subsequently 80 transmitted by aphids, suggesting that infected seeds may acts as reservoirs for this viral infection. Consequently, the sale and transport of seeds could be responsible for the current geographic distribution of this crop pathogen. It had been previously suggested that the international movement of infected fruits and or seedlings may also contribute to the global dissemination of ZYMV (Lecoq et al., 2003). Given that ZYMV is such an economically devastating crop pathogen, it is important for the global management of ZYMV to determine the relative contribution of the transportation of infected fruits and seedlings versus infected seed to the epidemiology of ZYMV. In chapter four I determined that population bottlenecks as a result of vector transmission and systemic movement through the plant were not as severe as previously hypothesized. However, we did not address the population bottleneck imposed on the viral population as the virus enters the germ line during vertical transmission. Given that infected seeds may act as reservoirs of ZYMV and may contribute to the worldwide dissemination of this virus, an assessment of how vertical transmission affects viral genetic diversity, as well as the effect of the genetic bottleneck on this diversity while entering the germ line, would be an informative next step. Illumina sequencing of the vertically transmitted samples could potentially reveal if there are significant differences in genetic variation between the vertically and horizontally transmitted populations. I determined in chapter five that vertically infected C. pepo plants are virtually symptomless which could either be due to lower viral titers or genetic changes related to vertical transmission. However, it is currently not clear if this is simply due to viral titer levels or if there is an underlying genetic mechanism, or some combination of both mechanisms. Therefore, it is important to determine if vertical transmission rates increase over time and if Illumina sequencing of vertically transmitted samples reveal an underlying genetic cause of this decrease in symptoms. 81 As vertically transmitted pathogens are dependent on their host successfully producing infected offspring, host fecundity is considered to be more important in vertical transmission than horizontal transmission (Froissart, 2010). Thus, one might assume that the fecundity of vertically infected C.pepo plants might be significantly higher than horizontally infected plants especially over several generations. To date I have observed that the germination rate of seeds harvested from horizontally infected fruits appears to be significantly lower than those from healthy fruits. However, how this compares to vertically transmitted seeds is currently unknown. In addition, as we know the transmission rate, an estimation of the germination rate of infected seeds in comparison to healthy seeds would aid in managing the spread of this viral pathogen. There is some evidence to suggest that viruses may manipulate host factors to increase the plant’s attractiveness to potential vectors by modulating olfactory cues in the form of volatile compounds (Ngumbi et al., 2007; Medina-Ortega et al., 2009, Mauck et al., 2010). Thus, given that the virus in being transmitted vertically it may not influence volatiles in the same manner as its horizontally transmitted counterparts. Therefore, an assessment of the volatiles emitted by vertically infected plants and how they compare to those emitted by horizontally infected plants, as well as healthy plants, may yield a deeper understanding of not only how the virus manipulates host behavior but how this behavior influences the aphid vector. This dissertation provides insight into the genetic variation of an RNA virus as it is transmitted between and within host plants. As deep sequencing technologies become increasingly more affordable, it will become possible to expand the number of viral populations sequenced via both vertical and horizontal modes of transmission, thereby increasing our understanding of how genetic diversity is modulated by population bottlenecks. This is of paramount importance in terms of our capacity to predict and, perhaps limit, the spread of plant RNA viruses. Moreover, knowledge gained by studying plant RNA viruses, which are amenable 82 to experimental manipulation at the field scale, may yield key insights into the tempo of evolution and the evolution of virulence in emerging RNA viruses. 83 REFERENCES Acosta-Leal, R., Bryan, B. K. and Rush, C. M. 2010. Host effect on the genetic diversification of Beet necrotic yellow vein virus single-plant populations. Phytopathology 100:1204-1212. Acosta-Leal, R., Duffy, S., Xiong, Z., Hammond, R. and Elena, S. 2011. Advances in Plant Virus Evolution: Translating Evolutionary Insights into Better Disease Management. Phytopathology. In Press. DOI: 10.1094/PHYTO-01-11-0017 Ahlquist, P., Noueiry, A., Lee, W., Kushner, D. and Dye, B. 2003. Host factors in positive-strand RNA virus genome replication. J. Virol. 77: 8181-8186 Ajayi, O. and Dewar, A.M. 1983. The effect of barley yellow dwarf virus on field populations of the cereal aphids, Sitobion avenae and Metopolophium dirhodum. Ann. Appl. Biol.103:1-11. Ali, A., Li, H., Schneider, M. L., Sherman, D. J., Grey, S., Smith, D. and Roossinck, M. J. 2006. Analysis of genetic bottlenecks during horizontal transmission of Cucumber Mosaic Virus. J. Virol. 80:8345-8350. Ali, A. and Roossinck, M.J. 2010. Genetic bottlenecks during systemic movement of Cucumber Mosaic virus may vary in different host plants. Virology. 404:279-283. Arriaga, L., Huerta, E., Lira-Saade, R., Moreno, E. and Alarcón, J. 2006. Assessing the risk of releasing transgenic Cucurbita spp. in Mexico. Agric Ecosyst & Environ 112:291-299. Astier, S., Albouy, J., Maury, Y., Robaglia, C. and Lecoq, H. 2007. Principles of Plant Virology – Genome, Pathogenicity, Virus Ecology. Science Publishers Ateya, C. D., Raccah, B. and Pirone, T. P. 1990. A point mutation in the coat protein abolishes aphid transmissibility of a potyvirus. Virology. 178:161-165. Basky, Z., Perring, T. and Tobias, I. 2001. Spread of zucchini yellow mosaic potyvirus in squash in Hungary Journal of Applied Entomology. 125:1439-0418. 84 Betancourt, M., Fereres, A., Fraile, A., and Garcia-Arenal, F. 2008. Estimation of the effective number of founders that initiate an infection after aphid transmission of a multipartite plant virus. J. Virol. 82:12416-12421 Blackman, R. L. and Eastop, V. F. 2000. Aphids of the World’s Crops: an Identification and Information Guide, 2nd edn. London, UK: John Wiley & Sons. Blankenberg, D., Von Kuster, G., Coraor, N., Ananda, G., Lazarus, R., Mangan, M., Nekrutenko, A. and Taylor, J. 2010. "Galaxy: a web-based genome analysis tool for experimentalists". Current Protocols in Molecular Biology. 2010 Jan; Chapter 19:Unit 19.10.1-21. Blok, J., Mackenzie, A., Guy, P. and Gibbs, A.J. 1987. Nucleotide sequence comparisons of turnip yellow mosaic virus isolates from Australia and Europe. Arch. Virol. 97:283-295. Blua, M. and Perring, T. 1989. Effect of Zucchini Yellow Mosaic Virus on development and yield of cantaloupe cucumis melo. Plant. Dis. 73:317-320 Callaway, A.S., George, C.G. and Lommel, S.A. 2004. A Sobemovirus coat protein gene complements long-distance movement of a coat protein-null Dianthovirus. Virology 330:186195. Cantliffe, D.J., Shaw, N.L. and Stoffella, P.J. 2007. Current trends in cucurbit production in the U.S. Acta. Hort. 731:473-478. Carrington, C. V. F., Foster, J. E., Pybus, O. G., Bennett, S. N. and Holmes, E. C. 2005. Invasion and maintenance of dengue virus type 2 and type 4 in the Americas. J Virol 79:14680–14687. Castle, S.J., Perring, T.M. Farrar, C.A. and Kishaba, A.N. 1992. Field and laboratory transmission of watermelon mosaic virus 2 and zucchini yellow mosaic virus by various aphid species. Phytopathology 8: 235-240. Chare, E. R. and Holmes, E. C. 2004. Selection pressures in the capsid genes of plant RNA viruses reflect mode of transmission. J Gen Virol 85:3149–3157. 85 Clement, M., Posada, D. and Crandall, K.A. 2000. TCS: a computer program to estimate gene genealogies. Mol. Ecol. 9, 1657-1660. Davies, R.F. and Mizuki, M.K. 1986. Seed transmission of zucchini yellow mosaic virus Phytopathology. 76:1073. Decker, D. S. and Wilson, H. D. 1987. Allozyme variation in Cucurbita pepo complex: C. pep ovar. overifera vs. C. texana. Syst Bot 12:263–273. Decker-Walters, D. S. 1990. Evidence for multiple domestication of Cucurbita pepo. In Biology and Utilization of the Cucurbitaceae, pp. 96–101. Edited by D. M. Bates, R. W. Robinson and C. Jeffrey. Ithaca, NY: Cornell University Press. Decker-Walters, D. S., Straub, J. E., Chung, S. M., Nakata, E. and Quemada, H. D. 2002. Diversity in free-living populations of Cucurbita pepo Cucurbitaceae as assessed by random amplified polymorphic DNA. Syst Bot 27, 19–28. Debiez, C. and Lecoq, H. 1997. Zucchini yellow mosaic virus. Plant Path. 46, 809-829. Desbiez, C., Wipf-Scheibel, C. and Lecoq, H. 2002. Biological and serological variability, evolution and molecular epidemiology of Zucchini yellow mosaic virus Island of Martinique. Plant Dis. 80:203-207. Drake, J.W. and Holland, J.J. 1999. Mutation rates among RNA viruses. Proc. Natl. Acad. Sci. USA 96:13910-13913. Drummond, A. J. and Rambaut, A. 2007. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol 7:214. Duffy, S., Shackelton, L.A. and Holmes, E.C. 2008. Rates of evolutionary change in viruses: patterns and determinants. Nat. Rev. Genet. 9:267-276. Dunoyer, P., Thomas, C., Harrison, S., Revers, F. and Maule, A. 2004. A cysteine- rich plant protein potentiates Potyvirus movement through an interaction with the virus genome-linked protein VPg. J Virol 78:2301–2309 86 Espinoza, A. M., Medina, V., Hull, R. and Markham, P. G.1991.Cauliflower mosaic virus gene II product forms distinct inclusion bodies in infected plant cells. Virology 185:337–344. Fargette, D., Pinel, A., Rakotomalala, M., Sangu, E., Traoré, O., Sérémé, D., Sorho, F., Issaka, S., Hébrard, E., Séré, Y., Kanyeka, Z. and Konaté, G. 2008. Rice Yellow Mottle Virus, an RNA plant virus, evolves as rapidly as most RNA animal viruses. J. Virol. 82, 3584–3589. Fereres, A., Blua, M. J. and Perring, T. M. 1992. Retention and Transmission Characteristics of Zucchini Yellow Mosaic Virus by Aphis gossypii and Myzus persicae Homoptera: Aphididae. J. Econ. Entomol. 85:759-765. Feuer, R., Boone, J.D. Netski, D., Morzunov, S.P. and St. Jeor, S.C. 1999. Temporal and spatial analysis of Sin Nombre virus quasispecies in naturally infected rodents. J. Virol. 73:95449554. Fraile, A., Sacristán, S. and García-Arenal, F. 2008. A quantitative analysis of complementation of deleterious mutants in plant virus populations. Spanish J. Ag. Res. 6:195-200 Fraile, A., Escriu, F., Aranda, M.A., Malpica, J.M, Gibbs, A.J. and García-Arenal, F. 1997. A century of tobamovirus evolution in an Australian population of Nicotiana glauca. J. Virol. 71:8316-8320 French, R. and Stenger, D. C. 2003. Evolution of Wheat streak mosaic virus: dynamics of population growth within plants may explain limited variation. Annu. Rev. Phytopathol. 41, 199–214. Froissart, R., Doumayrou, J., Vuillaume, F., Alizon, S. and Michalakis, Y. 2010. The virulence– transmission trade-off in vector-borne plant viruses: a review of non-existing studies. Proc. R. Soc. B. 365,1907-1918 Furusawa, I. and Okuno, T. 1978. Infection with BMV of mesophyll protoplasts isolated from five plant species. J. Gen. Virol. 40:489-491. Gaille, D. 2001. Translational control of cellular and viral mRNA. Plant Mol. Bio. 32:145-148 87 Gal-On, A. 2007. Zucchini yellow mosaic virus: insect transmission and pathogenicity – the tails of two proteins. Mol. Plant Pathol. 8:139–150. García-Arenal, F., Fraile, A. and Malpica, J.M. 2001. Variability and genetic structure of plant virus populations. Annu. Rev. Phytopathol. 39:157-186. García-Arenal, F., Fraile, A. and Malpica, J.M. 2003. Variation and evolution of plant virus populations. Int. Microbiol. 6:225-232. Gibbs, A.J., Fargette, D., García-Arenal, F. and Gibbs, M.J. 2010. Time – the emerging dimension of plant virus studies. J. Gen. Virol. 91:13-22. Gibbs, A.J., Ohshima, K., Phillips, M.J. and Gibbs, M.J. 2008. The prehistory of potyviruses: their initial radiation was during the dawn of agriculture. PLoS ONE 3, e2523. Glasa, M. and Pittnerova, S. 2006. Complete genome sequence of a Slovak isolate of Zucchini yellow mosaic virus ZYMV provides further evidence of a close molecular relationship among Central European ZYMV isolates. J Phytopathol 154:436–440. Glasa, M., Svoboda, J. and Novakova ,́ S. 2007. Analysis of the molecular and biological variability of Zucchini yellow mosaic virus isolates from Slovakia and Czech Republic. Virus Genes 35:415–421. Gleason, L. 1990. Absence of Transmission of Zucchini Yellow Mosaic Virus from Seeds of Pumpkin. Plant Dis. 74:828. Goecks, J, Nekrutenko, A, Taylor, J and The Galaxy Team. 2010. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11:R86. Grafton-Cardwell, E.E., Perring, T.M., Smith, R.F., Valencia, J. and Ferrar, C. A. 1996. Occurrence of mosaic virus in melons in the Central Valley of California. Plant Dis. 80:10921097. 88 Guevara-González, R. G., Ramos, P. L. and Rivera-Bustamante, R. F. 1999. Complementation of coat protein mutants of pepper huasteco geminivirus in transgenic tobacco plants. Phytopathology 89:540-545. Hall, J.S., French, R., Morris, T.J. and Stenger, D.C. 2001. Structure and temporal dynamics of populations within wheat streak mosaic virus isolates. J. Virol. 75:10231-10243 Hanada, K., Suzuki, Y. and Gojobori, T. 2004. A large variation in the rates of synonymous substitution for RNA viruses and its relationship to a diversity of viral infection and transmission modes. Mol Biol Evol 21:1074–1080. Heinlein, M., Epel, B. L., Padgett, H. S. and Beachy, R. N. 1995. Interaction of tobamovirus movement proteins with the plant cytoskeleton. Science 270:1983–1985. Hoelzer, K., Murcia, P., Baillie, G.J., Wood, J.L.N., Metzger, S., Osterrieder, K., Dubovi, E.J., Holmes, E.C. and Parrish, C.R. 2010. Intra-host evolutionary dynamics of canine influenza virus in naïve and partially immune dogs. J. Virol. 84:5329-5335. Holmes, E.C. 2003. Patterns of intra- and inter-host nonsynonymous variation reveal strong purifying selection in dengue virus. J. Virol. 77:11296-11298. Holmes, E.C. 2009. The Evolution and Emergence of RNA Viruses. Oxford Series in Ecology and Evolution. Series edited by PH Harvey & RM May. Oxford University Press, Oxford. Holt, C. A. and Beachy, R. N. 1991. In vivo complementation of infectious transcripts from mutant tobacco mosaic virus cDNAs in transgenic plants. Virology 181:109–117 Hooks, C.R.R., Valenzuela, H.R. and Defrank, J. 1998. Incidence of pests and arthropod natural enemies in zucchini grown with living mulches. Agriculture, Ecosystems and Environment 69:217-231. Huelsenbeck, J.P. 1995. Performance of phylogenetic methods in simulation. Syst Biol 44:17-48. Huelsenbeck, J. I, and Hillis, D.W. 1993. Success of phylogenetic methods in the four-taxon case. Syst. Biol. 42:247-264. 89 Huet, H., Gal-On, A., Meir, E., Lecoq, H. and Raccah, B. 1994. Mutations in the helper component HC gene of zucchini yellow mosaic virus ZYMV affect aphid transmissibility. J. Gen. Virol. 75:1407–1414. Hughes, A. 2009. Small Effective Population Sizes and Rare Nonsynonymous Variants in Potyviruses. Virology 10:127-134. Iqbal, M., Xiao, H., Baillie, G., Warry, A., Essen, S.C., Londt, B., Brookes, S. M., Brown, I. H. and McCauley, J. W. 2009. Within-host variation of avian influenza viruses. Phil. Trans. R. Soc. Lond. B. 364:2739-2747. Jridi, C, Martin, J-F., Marie-Jeanne, V., Labonne, G. and Blanc, S. 2006. Distinct viral populations differentiate and evolve independently in a single perennial host plant. J. Virol. 80:2349-2357. Jenkins, G. M., Rambaut, A., Pybus, O. G. and Holmes, E. C. 2002. Rates of molecular evolution in RNA viruses: a quantitative phylogenetic analysis. J Mol Evol 54:156–165. Jerzak, G.V.S., Brown, I., Shi, P., Kramer, L.D. and Ebel, G.D. 2008. Genetic diversity and purifying selection in West Nile virus populations are maintained during host switching. Virology 374:256-260. Johansen, E., Edwards, M.C., and Hampton, R.O. 1994. Seed Transmission of Viruses: Current Perspectives. Annu. Rev. Phytopathol. 32:363-86. Katis, N.I., Tsitsipsi, J.A., Lykouressis, D.P., Papapanayotou, A., Kokinis, G.M.,Perdikis, D.C. and Manoussopoulos, I.N. 2006. Transmission of Zucchini yellow mosaic virus by colonizing and non-colonizing aphids in Greece and new aphid vectors of the virus. J. Phytopathol. 154:293-302. Khelifa, M., Journou, S., Krishnan, K., Gargani, D., Espérandieu, P., Blanc, B. and Drucker, M. 2007. Electron-lucent inclusion bodies are structures specialized for aphid transmission of cauliflower mosaic virus. J. Gen. Virol. 88:2872-2880. 90 Kim, T., Youn, M.Y., Min, B.E., Choi, S.H., Kim, M. and Ryu, K.H. 2005. Molecular analysis of quasispecies of Kyuri green mottle mosaic virus. Virus Res. 110:161–167 Kircher, M. and Kelso, J. 2010. High-throughput DNA sequencing – concepts and limitations. BioEssays. 32:524–536 Kosakovsky Pond, S.L., Frost, S.D.W. and Muse, S.V. 2005. HyPhy: hypothesis testing using phylogenies. Bioinformatics 21:676-679. Kumar, S., Koichiro, T. And Nei, M. 2004. MEGA3: Integrated software for molecular evolutionary genetics analysis and sequence alignment. Brief Bioinform. 5:150-163. Lakner, C., van der Mark, P., Huelsenbeck, J., Larget, B. and Ronquist, F. 2008. Efficiency of Markov chain Monte Carlo tree proposals in Bayesian phylogenetics. Syst. Biol. 57:86-103 Latham, J.R. and Wilson, A.K. 2008. Transcomplementation and synergism in plants: implications for viral transgenes? Mol. Plant. Path. 9:85 -103 Lech, W. J., Wang, G., Yang, Y. L., Chee, Y., Dorman, K., McCrae, D., Lazzeroni, L. C., J., Erickson, W., Sinsheimer, J. S. and Kaplan, A. H. 1996. In vivo sequence diversity of the protease of human immunodeficiency virus type 1: presence of protease inhibitor-resistant variants in untreated subjects. J. Virol. 70:2038-2043. Lecoq, H., Desbiez, C., Wipf-Scheibel, C. and Girard, M. 2003. Potential Involvement of Melon Fruit in the Long Distance Dissemination of Cucurbit Potyviruses. Plant Dis. 87:955-959. Li, H. and Durbin, R. 2009. Fast and Accurate Short Read Alignment with Burrows-Wheeler Transform. Bioinformatics. 25:1754-1760. Li, H. and Roossinck, M.J. 2004. Genetic bottlenecks reduce population variation in an experimental RNA virus population. J. Virol. 78, 10582-10587 Lira, R., Andres, T. C. and Nee, M. 1995. Cucurbita. Pages 1-115 in R. Lira, (ed). Systematic and ecogeographic studies on crop genepools. Volume 9. International Plant Genetic Resources Institute. Mexico City and Rome. 91 Lisa, V., Boccardo, G., D’Agostino, G., Dellavalle, G. and D’Aquilio, M. 1981. Characterization of a potyvirus that causes Zucchini yellow mosaic. Phytopathology. 71:667–672. Lopez-Abella, D., Bradley, R. H. E. and Harris, K. F. 1988. Correlation between stylet paths made during superficial probing and the ability of aphids to transmit nonpersistent viruses. Adv Dis Vector Res 5:251–285. Luis-Atreaga, M., Alvarez, J.M., Alonso-Prados, J.L., Bernal, J.J., Garcia-Arenal, F., Lavina, A., Batlle, A. and Moriones, E. 1998. Occurrence, distribution and relative incidence of mosaic viruses infecting field-grown melon in Spain. Plant Dis. 82:979-982. Malpica, J. M., Fraile, A., Moreno, I., Obies, C. I., Drake, J. W. and Garcia-Arenal, F. 2002. The rate and character of spontaneous mutations in an RNA virus. Genetics 162:1505–1511. Marco, C. F. and Aranda, M. A. 2005. Genetic diversity of a natural population of Cucurbit yellow stunting disorder virus. J. Gen. Virol. 86:815–822. Martin, B., Collar, J. L., Tjallingii, W. F. and Fereres, A. 1997. Intracellular ingestion and salivation by aphids may cause the acquisition and inoculation of non-persistently transmitted plant viruses. J Gen Virol 78:2701–2705. Mauck, K.E., De Moraes, C.M. and Mescher, M.C. 2010. Deceptive chemical signals induced by a plant virus attract insect vectors to inferior hosts. Proc. Natl. Acad. Sci. USA 107:3600– 3605. Medina-Ortega, K. J., Bosque-Perez, N. A., Ngumbi, E., Jimenez-Martinez, E. S. and Eigenbrode, S. D. 2009. Rho- palosiphum padi Hemiptera: Aphididae responses to volatile cues from Barley yellow dwarf virus-infected wheat. Environ. Entomol. 38:836 – 845. Merits, A., Rajamaki, M., Lindholm, P., Runeberg-Roos, P., Kekarainen, T.M., Puustinen, P., Makelainen, K., Valkonen, J. and Saarma, M. 2002. Proteolytic processing of potyviral proteins and polyprotein processing intermediate in insects and plant cells. J. Gen. Virol. 83:1211-1221 92 Miyashita, S. and Kishino, H. 2010. Estimation of the size of genetic bottlenecks in cell-to-cell movement of Soil-borne wheat mosaic virus and the possible role of the bottlencks in speeding up selection of variations in trans acting genes or elements. J Virol. 84:1828-1837. Morozova, O. and Marra, M.A. 2008. Applications of next-generation sequencing technologies in functional genomics. Genomics 92:255–64 Mortazavi, A., Williams, B.A., McCue, K. and Schaeffer, L., 2008. Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 5:621-628 Moury, B., Fabre, F. and Senoussi, R. 2007. Estimation of the number of virus particles transmitted by an insect vector. Proc. Natl. Acad. Sci. USA 104:17891-17896. Muller, C., Brother, H., Von Bargen, S. and Buttner, C. 2006. Zucchini yellow mosaic virus – incidence and sources of virus infection in field-grown cucumbers and pumpkins in the Spreewald, Germany. J. Plant Dis and Prot. 113:252-258. Munkvold, G.P. 2009. Seed pathology progress in academia and industry. Annu. Rev. Phytopathol. 47:285-311. Murcia, P., Baillie, G.J., Daley, J., Elton, D., Jervis, C., Mumford, J.A., Newton, R., Parrish, C.R., Hoelzer, K., Dougan, G., Parkhill, J., Lennard, N., Ormond, D., Moule, S., Whitwham, A., McKinley, T.J., McCauley, J.W., Holmes, E.C., Grenfell, B.T. and Wood, J.L.N. 2010. The intra- and inter-host evolutionary dynamics of equine influenza virus. J. Virol. 84:69436954. Nault, L. R. 1997. Arthropod transmission of plant viruses: a new synthesis. Ann Entomol Soc Am 90:521–541. Nault, L. R. and Styer, W. E. 1972. Effects of sinigrin on host selection by aphids. Entomol Exp Appl 15:423–437. 93 Niepel, M. and Gallie, D.R. 1999. Identification and characterization of the functional elements within the tobacco etch virus 5’ leader required for cap-independent translation. J. Virol. 73: 9080-9088 Nolasco, G. Fonseca, F. and Silva, G. 2008. Occurrence of genetic bottlenecks during citrus tristeza virus acquistion by Toxoptera citricida under field conditions. Arch. Virol. 153:259271. Ngumbi, E., Eigenbrode, S. D., Bosque-Perez, N. A., Ding, H. and Rodriguez, A. 2007. Myzus persicae is arrested more by blends than by individual compounds elevated in headspace of PLRV-infected potato. J. Chem. Ecol. 33:1733–1747. Oparka, K.J., Prior, D.A.M., Santa Cruz, S., Padgett, H.S. and Beachy, R.N. 1997. Gating of epidermal plasmodesmata is restricted to the leading edge of expanding infection sites of tobacco mosaic virus. Plant J. 12:781–789 Osbourn, J.K, Sarkar, S. and Wilson, M.A. 1990. Complementation of coat protein-defective TMV mutants in transgenic tobacco plants expressing TMV coat protein. Virology 179:921925. Pagán, I. and Holmes, E.C. 2010. Long-term evolution of the Luteoviridae: time-scale and mode of virus speciation. J. Virol. 84:6177-6187. Page, R.D.M. and Holmes, E.C. 2007. Molecular Evolution. A phylogenetic Approach. Blackwell Publishing. Perring, T. M., Farrar, C. A., Mayberry, K. and Blua, M. J. 1992. Research reveals pattern of cucurbit virus spread. Calif. Agric. 46:35–39. Pfosser, M. F. and Baumann, H. 2002. Phylogeny and geographical differentiation of Zucchini yellow mosaic virus isolates Potyviridae based on molecular analysis of the coat protein and part of the cytoplasmic inclusion protein genes. Arch. Virol. 147:1599–1609. 94 Pirone, T. P. and Blanc, S. 1996. Helper-dependent vector transmission of plant viruses. Annu. Rev. Phytopathol. 34:227–247. Posada, D. and Crandall, K. A. 1998. Modeltest: testing the model of DNA substitution. Bioinformatics 14:817–818. Powell, G. 1991. Cell membrane punctures during epidermal penetration by aphids: consequences for the transmission of two potyviruses. Ann. Appl. Biol. 119:13–321. Powell, G. 2005. Intracellular salivation is the aphid activity associated with inoculation of nonpersistently transmitted viruses. J. Gen. Virol. 86:469–472 Powell, G. and Hardie, J. 2000. Host-selection behavior by genetically identical aphids with different plant preferences. Physiol. Entomol. 25:54–62. Powell, G., Pirone, T. and Hardie, J. 1995. Aphid stylet activities during potyvirus acquisition from plants and an in vitro system that correlate with subsequent transmission. Eur. .J Plant. Pathol. 101:411–420. Qu, F., Ren, T. and Morris, T.J. 2003. The coat protein of turnip crinkle virus suppresses posttranscriptional gene silencing at an early initiation step. J. Virol. 77:511–522. R Development Core Team 2008. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0 http://www.Rproject.org. R Development Core Team. 2011. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0 http://www.Rproject.org. Riedle-Bauer, M., Suarez, B. and Reinprecht, H.J. 2002. Seed transmission and natural reservoirs of Zucchini yellow mosaic virus in Cucurbita pepo var. styriaca. J. Plant Dis and Prot. 1092:200-206. 95 Restrepo, M.A., Freed, D.D. and Carrington J.C. 1990. Nuclear Transport of Plant Potyviral Proteins. The Plant Cell. 2:987-998 Roberts, I.M., Wang, D., Thomas, C.L. and Maule, A.J. 2003. Seed transmission of Pea seed borne mosaic virus in pea exploits novel symplastic pathways and is, in part, dependent upon chance. Protoplasma 222:31-43. Robinson, R.W., Provvidenti, R. and Shail, J.W. 1993. Tests for Seedborne Transmission of Zucchini Yellow Mosaic Virus. Hortscience. 287:694-696. Rodríguez-Cerezo, E., Findlay, K., Shaw, J. G., Lomonossoff, G.P., Qiu, S.G., Linstead, P., Shanks, M. and Risco, C. 1997. The Coat and Cylindrical Inclusion Proteins of a Potyvirus Are Associated with Connections between Plant Cells. Virology. 236:296-306. Rodríguez-‐Cerezo, E., Elena, S. F., Moya, A. and García-‐Arenal, F. 1991. High genetic stability in natural populations of the plant RNA virus tobacco mild green mosaic virus. J. Mol. Evol. 32:328–332. Rojas, M. R., Zerbini, F. M., Allison, R. F., Gilbertson, R. L. and Lucas, W. J. 1997. Capsid protein and helper component-proteinase function as potyvirus cell-to-cell movement proteins. Virology 237:283–295. Roossinck, M.J. 2007. Mechanisms of plant virus evolution. Annu. Rev. Phytopathol. 35:191-209 Rubies-Antonell, C. Ballante, M. and Turina, M. 1996. Virus infection in melon crops in CentralNorthern Italy. Inform. Fitopathol. 7-8:6-10. Ruiz-Jarabo, C. M., Arias, A., Baranowski, E., Escarmís, E. and Domingo, E. 2000. Memory in viral quasispecies. J. Virol. 74:3543-3547. Rybicki, E. P. and Shukla, D. D. 1992. Coat protein phylogeny and systematics of potyviruses. Arch. Virol. Suppl 5:139–170. Sachs, A.B., Sarnow, P. and Hentze, M.X. 1997. Starting at the beginning, middle, and end translation in Eucaryotes. Cell. 89:831-838 96 Sacristán, S., Malpica, J. M., Fraile, A. and García-Arenal, F. 2003. Estimation of population bottlenecks during systemic movement of Tobacco mosaic virus in tobacco plants. J. Virol. 77:9906–9911. Sanjuán, R., Moya, A. and Elena, S.F. 2004. The distribution of fitness effects caused by singlenucleotide substitutions in an RNA virus. Proc. Natl. Acad. Sci. USA 101:8396-8401. Schnieder, B.S. and Higgs, S. 2008. The enhancement of arbovirus transmission and disease by mosquito saliva is associated with modulation of the host immune response. Trans. Roy Soc. Trop. Med. H. 102:400-408. Schneider, W. L. and Roossinck, M. J. 2001. Genetic diversity in RNA virus quasispecies is controlled by host-virus interactions. J. Virol. 75: 6566-6571 Schrijnwerkers, C. C. F. M. Huijberts, N. and Bos, L. 1991. Zucchini Yellow Mosaic virus; two outbreaks in the Netherlands and seed transmissibility. Neth J Plant Path. 97:187-91. Shukla, D.D., Frenkel, M.J. and Ward, C.W. 1991. Structure and function of the potyvirus genome with special reference to the coat protein coding region. Canadian J. Plant Path. 13:178-191 Simmons, H.E., Holmes, E.C. and Stephenson, A.G. 2008. Rapid evolutionary dynamics of zucchini yellow mosaic virus. J. Gen. Virol. 89:1081-1085. Simmons H.E., Holmes E.C., Gildow, F.E., Bothe-Goralczyk, M.A. and Stephenson, A.G. 2011. Experimental verification of seed transmission in Zucchini yellow mosaic virus. Plant Dis. 95:751-4. Simmons H.E., Holmes E.C. and Stephenson, A.G. 2011. Rapid Turnover of Intra-Host Genetic Diversity in Zucchini yellow mosaic virus. Virus Res. 155:389-96. Spitsin, S., Steplewski, K., Fleysh, N., Belanger, H., Mikheeva, T., Shivprasad, S., Dawson, W., Koprowski, H. and Yusibov, V. 1999. Expression of alfalfa mosaic virus coat protein in 97 tobacco mosaic virus TMV deficient in the production of its native coat protein supports long-distance movement of a chimeric TMV. Proc. Natl. Acad. Sci. USA 96:2549–2553. Swofford, D. L. 2003. PAUP*. Phylogenetic Analysis Using Parsimony *and other methods, version 4. Sunderland, MA: Sinauer Associates. Teycheney, P-Y., Laboureau, N., Iskra-Caruana, M-L. and Candresse, T. 2005. High genetic variability and evidence for plant-to-plant transfer of Banana mild mosaic virus. J. Gen. Virol. 86:3179-3187. Tobias, I. and Palkovics, L. 2003. Characterization of Hungarian isolates of zucchini yellow mosaic virus ZYMV, potyvirus transmitted by seeds of Curcubita pepo var Styriaca. Pest Manag Sci 59:493–497. Thomas, C.L., Leh, V., Lederer, C. and Maule, A.J. 2003. Turnip crinkle virus coat protein mediates suppression of RNA silencing in Nicotiana benthamiana. Virology 306:33–41. Turturo, C., Saldarelli, P., Yafeng, D., Digiaro, M., Minafra, A., Savino, V. and Martelli, G.P. 2005. Genetic variability and population structure of Grapevine leafroll-associated virus 3 isolates. J. Gen. Virol. 86:217-224. Urcuqui-Inchima, S., Haenni, A. and Bernardi, F. 2001. Potyvirus proteins: a wealth of functions. Virus Res. 74:157-175 Wang, D. and Maule, A.J. 1994. A model for seed transmission of a plant virus: genetic and structural analyses of pea embryo invasion by pea seed-borne mosaic virus. The Plant Cell 6:777-787 Wang, R. Y., Ammar, E. D., Thornbury, D. W., Lopez-Moya, J. J. and Pirone, T. P. 1996. Loss of potyvirus transmissibility and helper- component activity correlate with non-retention of virions in aphid stylets. J Gen Virol 77:861–867. Woolhouse, M. E. J., Taylor, L. H. and Haydon, D. T. 2001. Population biology of multihost pathogens. Science 292:1109–1112. 98 Wu, X. and Shaw, J. 1996. Bidirectional uncoating of the genomic RNA of a helical virus. Proc. Natl. Acad. Sci. USA. 93: 2981-2984 Zhao, M. F., Chen, J., Zheng, H.-Y., Adams, M. J. and Chen, J.-P. 2003. Molecular analysis of Zucchini yellow mosaic virus isolates from Hangzhou, China. J. Phytopathol. 151:307–311. 99 VITA: Heather Simmons EDUCATION Ph.D. 12/11 B.S. 2006 Department of Biology, The Pennsylvania State University Advisor: Andrew Stephenson Department of Biology, University of Oregon Graduated cum laude with a 3.92 GPA TEACHING EXPERIENCE • Teaching Assistant, Biology 322 (Genetics) The Pennsylvania State University. Spring 2010 • Teaching Assistant, Bio220W (Populations and communities), The Pennsylvania State University, Spring 2007 and 2008 • Teaching Assistant, Animal Behavior, University Of Oregon, Spring 2004 • Teaching Assistant, Freshman Biology and Anatomy and Physiology, New Mexico Junior College, 05/1998 – 05/1999 SELECTED SCHOLARSHIPS AND AWARDS • Jeanette Ritter Mohnkern Graduate Student Scholarship for Outstanding Achievement in Doctoral Research (2011), Dept. of Biology, PSU • Doctorial Dissertation Improvement Grant, NSF (2010 – 2012) • Henry W. Popp Fellowship for Outstanding Graduate Student in Plant Sciences (2010), Dept. of Biology, PSU • PSU Biology Department Travel Grant (to attend 6th Annual Virus Evolution Workshop), Biology Department, PSU (2010) • Braddock Research Award, Eberly College of Science, PSU (2010) • J. Ben and Helen D. Hill memorial Fund Award (2007, 2008, 2009, 2010) • NSF Travel Grant to attend EEID (Ecology and Evolution of Infectious Diseases) workshop and conference (2009) • Braddock Graduate Recognition Fellowship for Outstanding New Graduate Students, Eberly College of Science, PSU (2006) • Para Talus Presidential Scholarship, University of Oregon (2006) PEER-REVIEWED SCIENTIFIC PUBLICATIONS • Simmons H.E., Holmes E.C., Gildow, F.E., Bothe-Goralczyk, M.A., & Stephenson, A.G. (2011). Experimental verification of seed transmission in Zucchini yellow mosaic virus. Plant Disease 95:751-4 • Simmons H.E., Holmes E.C., & Stephenson, A.G. (2011). Rapid Turnover of Intra-Host Genetic Diversity in Zucchini yellow mosaic virus. Virus Research. 155:389-96 • Simmons H.E., Holmes E.C., & Stephenson, A.G. (2008). Rapid evolutionary dynamics of zucchini yellow mosaic virus. J Gen Virol. 89:1081-5. SCIENTIFIC MANUSCRIPTS IN PREPARATION • Simmons H.E., Dunham, J.P., Stack, J.C., Dickins, B.J.A., Pagan, I.P., Holmes E.C., & Stephenson, A.G. Deep sequencing reveals persistence of intra- and inter- host genetic diversity in natural and greenhouse populations of Zucchini yellow mosaic virus. (To be submitted to Journal of General Virology