Mitochondrial DNA Analysis of Four Ethnic Groups of Afghanistan
Transcription
Mitochondrial DNA Analysis of Four Ethnic Groups of Afghanistan
Mitochondrial DNA Analysis of Four Ethnic Groups of Afghanistan John William Whale The thesis is submitted in partial fulfilment of the requirements for the award of Master of Philosophy of the University of Portsmouth January 2012 1 It is nothing for one to know something unless another knows you know it. Persian proverb 2 Abstract Mitochondrial DNA is a small genome, 16569 base pairs in length, which is found in high quantities within mitochondria inside a typical somatic cell. Mitochondrial DNA is also unilaterally inherited via the maternal line. As such, mitochondrial DNA is inherited relatively unmolested from mother to offspring, with exception to mutational episodes, enabling the historical analysis of a population, group of populations or a species. Mitochondrial DNA analysis examines both the coding and non-coding regions for the presence or absence of single nucleotide polymorphisms. When a particular collection of polymorphisms are present, the mitochondrial DNA can be assigned to a genetic group known as a haplogroup. The identification of polymorphisms within the non-coding region (D-loop) illustrates the mitochondrial DNA haplotype. Many haplogroups are region-specific, in that they often present among populations of a geographical region, while presence of haplogroups from adjacent regions can infer adjustments to population structure via gene flow from migratory events. Afghanistan is a landlocked, Central Asian country which has held a significant strategic position throughout history as a thoroughfare for ancient trade routes and human migrations. As a consequence, Afghanistan has a vast diversity of ethnic groups. This study aimed to analyse the mitochondrial DNA genome to identify the haplogroup composition and distribution among four ethnic groups of Afghanistan; the Baluch, Hazaras, Pashtuns and Tajiks which together account for ~80% of the total Afghani population. Afghanistan is a previously unstudied population, and this study aims to determine whether the haplogroup composition has been influenced by numerous demographic events. The Baluch, Pashtun and Tajik ethnic groups believe they have ancestry from west Eurasia and the Middle East, while the Hazaras believe they are of Mongol descent and this study also aimed to identify whether the mtDNA haplogroups observed supported the belief systems of the Afghani ethnic groups and provide indications of their ancestry. The mitochondrial DNA analysis illustrates that the Hazara possess a large East Asian haplogroup contribution of 37.5%, while the Baluch, Pashtuns, and Tajiks possess a much lesser contribution; less than 14.3%. Meanwhile, the Baluch, Pashtuns and Tajiks each enjoy a large west Eurasian haplogroup contribution of at least 64.3% while the Hazaras exhibit a west Eurasian haplogroup frequency of 40%. The Hazaras have the most diverse collection of haplogroups, with only two haplogroups out of the seventeen observed overall absent from this ethnic group. The Pashtuns have the greatest HVS-I sequence diversity as no haplogroup is shared within the ethnic group. As a whole, the 3 Afghani populations exhibit a high gene diversity (>0.98). The Hazara, Pashtun and Tajik populations are considered to be expanding populations, or have recently experienced an expansion process based upon mismatch distributions. This is supported by a star-like phylogeny in a Median-Joining network. Genetic barriers were observed when analysing Afghani HVS-I with an additional 3923 mitochondrial DNA HVS-I sequences from 62 populations; separating the Iranian Baluch population from the Afghani Baluch, also the Afghani populations from the Pakistani, Indian Bhargava, Chinese and Mongol populations. The same analysis has inferred the Afghani ethnic groups observed are share a greater affinity with west Eurasian and Central Asian populations rather than to populations of South Asia or East Asia. The haplogroup analysis indicates the Baluch, Pashtuns and Tajiks share some sort of ancestral heritage, while the Hazaras, due to their greater East Asian lineage contribution, may be descendants of a major East Asian expansion, possibly from the Genghis Khan line, and have experienced a more recent maternal gene flow. These results illustrate the impact of the historical expansions and migrations have had upon the Afghani population. 4 CONTENTS 1 2 Chapter One - Introduction; Anthropology and DNA 21 1.1 Origins of Modern Humans 22 1.1.1 Early Hominids 22 1.1.2 Modern Humans 25 1.2 Mitochondrial DNA (mtDNA) 29 1.3 Mitochondrial and Y-Chromosome Haplogroup Distribution 33 1.3.1 Mitochondrial DNA Variation 33 1.3.2 Y-Chromosome Variation 39 1.4 Aims 44 Chapter Two - Afghanistan 45 2.1 Geography 46 2.2 Climate 49 2.3 Population 50 2.4 Ethnicity and Language 51 2.5 Migrations 56 2.6 Refugees 57 2.7 Afghan Sub-Populations 57 2.7.1 Pashtuns 57 2.7.2 Tajiks 59 2.7.3 Hazara 59 2.7.4 Uzbeks 61 2.7.5 Other Ethnic Groups 62 2.7.6 Aimaqs 62 2.7.7 Baluch 62 2.7.8 Turkmens 62 2.7.9 Nuristanis 63 2.8 Historical Influence of Afghnistan’s Population 63 2.8.1 Prehistory 63 2.8.2 Aryan Migration 64 2.8.3 Persian Empire 64 2.8.4 Greek Rule 65 2.8.5 Yuezhi and the Kushan Empire 66 2.8.6 Arabs and Islam 66 5 3 4 5 2.8.7 Mongol Dynasty 66 2.8.8 Modern Era 66 Chapter Three - Materials and Methods 68 3.1 Materials 69 3.2 Precautionary Measures 69 3.3 Sample Collection 69 3.4 DNA Isolation 69 3.5 PCR 70 3.6 Agarose Gel Electrophoresis 74 3.7 Glycogen Precipitation of DNA (PCR Products) 75 3.8 Purification of PCR Products 75 3.9 DNA Extraction of PCR Products from Agarose Gels 76 3.10 RFLP Analysis 76 3.11 DNA Sequencing 77 3.11.1 Haplogroup Identification 77 3.11.2 Hypervariable Region I 77 Chapter Four - Results 79 4.1 PCR Amplifications 80 4.2 Haplogroup Characterisations using RFLP Analyses 83 4.3 Haplogroup Characterisations using DNA Sequencing 107 Chapter Five - Phylogeographic Analysis of Afghani mtDNAs 121 5.1 Phylogeography of Individual Haplogroups 128 5.1.1 African Haplogroup L3 128 5.1.2 The Early non-African Lineages 130 5.1.2.1 Haplogroup M* 130 5.1.2.2 Haplogroup N* 130 5.1.2.3 Haplogroup R* 132 5.1.3 The East Asian Lineages 135 5.1.3.1 Haplogroup C 135 5.1.3.2 Haplogroup D 135 5.1.3.3 Haplogroup G 137 5.1.3.4 Haplogroup Z 140 5.1.3.5 Haplogroup A 140 5.1.3.6 Haplogroup B 143 5.1.3.7 Haplogroup F 146 6 5.1.4 The West Eurasian Lineages 5.1.4.1 Haplogroup X 148 5.1.4.2 Haplogroup HV* 150 5.1.4.3 Haplogroup H 150 5.1.4.4 Haplogroup JT 153 5.1.4.5 Haplogroup J 153 5.1.4.6 Haplogroup T 156 5.2 Discussion 6 148 158 Chapter Six - MtDNA Diversity and Polymorphism in Afghani Populations 167 6.1 Previous mtDNA Studies on Afghani Populations 168 6.2 mtDNA HVS-I Region Sequencing 170 6.2.1 Variable Sites 170 6.2.2 Haplotype Distribution 173 6.2.3 Genetic Diversity 174 6.2.3.1 Gene Diversity 174 6.2.3.2 Nucleotide Diversity 175 6.2.3.3 Theta Estimators 177 6.2.3.4 Mismatch Distribution 178 6.3 Phylogenetic Network of the Afghani Population 181 6.4 Mitochondrial DNA Genetic Barriers between Afghans and Other Populations 7 181 Chapter Seven - Y-Chromosome Analysis of Afghani Ethnic groups Article: Haber et al., 2012 8 188 Conclusion 197 References 199 Appendix I 222 Appendix II 229 Appendix III 236 7 Declaration Whilst registered as a candidate for the above degree, I have not been registered for any other research award. The results and conclusions embodied in this thesis are the work of the named candidate and have not been submitted for any other academic award. 8 List of Tables Chapter One - Introduction; Anthropology and DNA Table 1.1: Haplogroup frequencies of west Asian haplogroups in India, Central Asia and the Caucasus (Kivisild et al., 1999). 37 Chapter Two - Afghanistan Table 2.1: Afghanistan population estimates every five years since 1950 (UNPD, 2009). 50 Table 2.2: Population growth rate every 5 Years since 1950 based on the Estimated Population of Afghanistan (UNPD, 2009). 51 Table 2.3: UNHCR statistics of displaced Afghanis as of January 2010 (UNHCR, 2011). 57 Chapter Three - Materials and Methods Table 3.1: Volumes and final concentrations of mastermix reagents for polymerase chain reaction amplification of mtDNAs. 71 Table 3.2: Oligonucleotide sequences, their co-ordinates and the fragment sizes produced following PCR (Torroni et al., 1997). 72 Table 3.3: The primers used in this study and their co-ordinates and fragment sizes produced following PCR (Palanichamy et al., 2004). 73 Table 3.4: Thermocycler conditions for the primer pairs as described by Torroni et al. (1997). 74 Table 3.5: Thermocycler conditions for the Palanichamy et al. (2004) primer pairs. 74 Table 3.6: Reaction mixes for the restriction digests with and without BSA. 77 Table 3.7: Co-ordinates and sequences of the forward and reverse oligonucleotides and the fragment size generated for HVS-I analysis. 77 Chapter Four - Results Table 4.1: Size of DNA fragments for each haplogroup characterisation from RFLP analysis; a denotes primer pairs from Torroni et al., (1997), b denotes primer pairs from Palanichamy et al., (2004). Table 4.2: Recognition sequences and cut sites of the enzymes used for the haplogroup assignment of samples. N = any base (A, C, G or 9 82 T), R = either A or G, W = either A or T, Y = either C or T. Table 4.3:SNP sites of haplogroups characterised via DNA sequencing. 83 107 Table 4.4: Sequencing results for the samples analysed for Haplogroups Q & Z. 118 Table 4.5: Sequence results for samples examined for the characteristic SNPs of Haplogroups S, W & X. 118 Table 4.6: Sequencing results of samples analysed for Haplogroups B & F. 119 Table 4.7: All samples and the Haplogroups to which they belong. 120 Chapter Five - Phylogeographic Analysis of Afghani mtDNAs Table 5.1: Frequencies of the regional haplogroup lineages in the Afghanistan populations (%). 122 Table 5.2: Haplogroup frequencies within the Afghanistan populations (bold). For comparison, data from other publications has been included. 125 Table 5.3: East Asian haplogroup frequencies among the Hazara, Mongolians and Koreans. 164 Table 5.4: Frequency of haplogroups HV & H among Afghan, Iranian, Iraqi, Turkmen, Uzbek and Pakistani populations. 165 Chapter Six - MtDNA Diversity and Polymorphism in Afghani Populations Table 6.1: Frequency and nucleotide positions of transitions, transversions and indels within the HVS-I sequences of the four Afghani ethnic groups. 172 Table 6.2: General data of the HVS-I polymorphisms among the four Afghani ethnic groups. 172 Table 6.3: Afghani ethnic group mtDNA HVS-I sequence data. 173 Table 6.4: Number of haplotypes (h), haplotype diversity (Hd) and nucleotide diversity (π) of the 4 Afghani populations using DnaSP ver. 5.10. 174 Table 6.5: The number of shared haplotypes between the Afghani ethnic groups in this study. 174 Table 6.6: Gene diversity of the Afghani populations in this study. 175 Table 6.7: Mean number of pairwise differences between the Afghani populations. 176 10 Table 6.8: AMOVA results of variance within and among the Afghani populations and additional populations. 176 Table 6.9: Pairwise differences between pairs of populations. 177 Table 6.10: FST p-values between pairs of populations. 177 Table 6.11: Estimators of female effective population size based upon the number of pairwise differences (θπ), the number of segregating sites (θS) and the number of observed haplotypes (θk). 178 Table 6.12: Tajima’s D statistic values for the Afghani populations using the total number of mutations and the total number of segregating sites and their statistical significance. 181 Table 6.13: Co-ordinate values for the Afghani populations and the additional 62 populations. 183 11 List of Figures Chapter One - Introduction; Anthropology and DNA Figure 1.1: Map of Kenya; Lake Turkana in the north-northwest region of Kenya. 23 Figure 1.2: Illustration of the Multi-Regional hypothesis of modern human evolution 26 Figure 1.3: Illustration of the Out-of-Africa theory of modern human evolution. 27 Figure 1.4: Illustration of the Assimilation hypothesis for modern human evolution. 29 Figure 1.5: Diagrammatic view of mtDNA. 30 Figure 1.6: Simplified Y-Chromosome Consortium (YCC) haplogroup tree. 40 Chapter Two - Afghanistan Figure 2.1: Political map of Afghanistan. 47 Figure 2.2: 34 provinces of Afghanistan and also its location in Central Asia. 48 Figure 2.3: The Population density of Afghanistan. 51 Figure 2.4: Distribution of Afghan ethnic groups. 52 Figure 2.5: Distribution of language groups spoken in Afghanistan. 53 Figure 2.6: Indo-European Language Tree illustrating the Centum and Satem branches. 54 Figure 2.7: (a) The Iranian languages spoken in the Dnieper-Ural region; (b) The Ural-Yenisei region and the eastern Iranian languages spoken and the location of the Central Asian BMAC culture (shaded) (c) The locations of the Afanasevo (shaded) and Andronovo (outlined) cultures of Central Asia and the Iran-India zone in the south. 55 Figure 2.8: Pashtun people from Afghanistan. 58 Figure 2.9: Tajik people from Afghanistan. 59 Figure 2.10: Hazara people from Afghanistan. 60 Figure 2.11: Uzbek people from Afghanistan. 61 Figure 2.12: Left: an Aimaq man from Afghanistan, Middle: a Baluch man from Afghanistan, Right: a Turkmen man from Afghanistan. 62 12 Chapter Three - Materials and Methods Figure 3.1: Map of Iran and inset; Khorasan province and the three refugee camps near the cities of Mashad, Bojnurd and Birjand. 70 Chapter Four - Results Figure 4.1: Amplification of nine overlapping primer pairs and three internal primer pairs as described in Torroni et al. (1997). 80 Figure 4.2: Amplification of the fifteen overlapping primer pairs as described in Palanichamy et al. (2004). 81 Figure 4.3: 2% agarose gel of HpaI restriction digests for Haplogroup L3 characterisation. 83 Figure 4.4: 2% agarose gel of HpaI restriction digests on Afghan DNAs. 84 Figure 4.5: 2% agarose gel of AluI digests for Haplogroup M characterisation. 85 Figure 4.6: 2% agarose gel of primer pair 9 (Palanichamy et al., 2004) PCR products and AluI digests for Haplogroup M assignment. 86 Figure 4.7: 2% agarose gel of PCR products and restriction digests for analysis of the Haplogroup M characteristic. 86 Figure 4.8: 2% agarose gel of PCR products (P) and digests (D) with MnlI for Haplogroup N characterisation. Figure 4.9: 2% agarose gel of Haplogroup N characterisation. 87 88 Figure 4.10: 2% agarose gel of PCR products (P) and digest products (D) following incubation with HincII for Haplogroup C characterisation. 89 Figure 4.11: 2% agarose gel of PCR products and digest products following incubation with HincII for Haplogroup C characterisation. 90 Figure 4.12: 2% agarose gel of PCR product (P) and cleaved DNA products (D) following incubation with the endonuclease AluI for Haplogroup D characterisation. 91 Figure 4.13: 2% agarose gel of PCR amplifications (P) and endonuclease digestions of these amplifications (D) with HphI for Haplogroup E classification. 92 Figure 4.14: 2% agarose gel of PCR products (P) and digested amplifications (D) with the endonuclease HhaI for Haplogroup G assignment. 93 13 Figure 4.15: 2.5% agarose gel of DNAs digested with the endonuclease MboII for haplogroup R characterisation. 94 Figure 4.16: 2.5% agarose gel of amplified DNAs digested with the endonuclease MboII for the assignment of haplogroup R. 95 Figure 4.17: 2.5% agarose gel of amplified DNAs digested with MboII for characterisation of haplogroup R. 96 Figure 4.18: 2% agarose gel of amplified DNAs processed by the endonuclease HaeIII for haplogroup A classification. 97 Figure 4.19: 2% agarose gel of RFLP analysis using HaeII on amplified DNAs for haplogroup I classification. 98 Figure 4.20: 2% agarose gel of DNAs digested with the endonuclease HaeIII for haplogroup Y assignment. 99 Figure 4.21: 2.5% agarose gel of DNAs following incubation with the endonuclease MseI for haplogroup HV characterisation. 100 Figure 4.22: 2.5% agarose gel of DNAs digested with AluI for haplogroup H classification. 101 Figure 4.23: 2% agarose gel of digested DNAs following incubation with NlaIII for Haplogroup V assignment. 102 Figure 4.24: 2% agarose gel of digested DNAs following incubation with the endonuclease NlaIII for Haplogroup TJ classification. 103 Figure 4.25: 2% agarose gel of DNAs digested by the endonuclease BfaI for the characterisation of Haplogroup T. 104 Figure 4.26: 2% agarose gel of PCR products digested with the endonuclease BstNI for Haplogroup J classification. 105 Figure 4.27: 2% agarose of gel DNAs following incubation with the endonuclease HinfI for the assignment of Haplogroup Uk-group. 106 Figure 4.28: Sequence alignment of DNA samples assessed for the SNP at np. 5843 for the classification of Haplogroup Q. 108 Figure 4.29: Chromatogram of sample 7; arrow indicates the SNP site for Haplogroup Q, which in this case the nucleotide is an adenine. 108 Figure 4.30: Alignment of DNA sequences assessed for the polymorphism at np. 9090 which characterises for the Haplogroup Z. 109 Figure 4.31: Chromatogram of sample 41Z; arrow indicates the polymorphic nucleotide which for this sample is a thymine. Figure 4.32: Chromatogram of sample 113Z; arrow denotes the SNP site 14 109 which is a cytosine. 110 Figure 4.33: Alignment of DNA sequences analysed for the Haplogroup S polymorphism at np. 8404. 110 Figure 4.34: Chromatogram of sample 8S; arrow indicates the SNP site for haplogroup S for which the nucleotide is a thymine. 111 Figure 4.35: DNA sequence alignment of samples examined for the polymorphism at np. 11947 which is characteristic to Haplogroup W. 111 Figure 4.36: Chromatogram of sample 110W; arrow signifies the polymorphic nucleotide that defines Haplogroup W, which here is 112 an adenine. Figure 4.37: Alignment of DNA sequences which were analysed for a SNP at np. 6371 for Haplogroup X. 112 Figure 4.38: Chromatogram of sample 15X; arrow identifies the polymorphic nucleotide, which is a thymine. 113 Figure 4.39: Chromatogram of sample 110X; arrow denotes the polymorphic nucleotide, which in this sequence is a cytosine. 113 Figure 4.40: DNA sequence alignments of samples assessed for the presence or absence of a 9bp deletion from np. 8281-8289 which, if present, is characteristic of Haplogroup B. 114 Figure 4.41: Chromatogram of sample 21B; arrow & bracket identifies the 9bp sequence whose absence defines Haplogroup B. 115 Figure 4.42: Chromatogram of sample 35B; arrow denotes the position where the 9bp sequence would be, between the adenine & guanine bases. 115 Figure 4.43: Sequence alignment of DNA samples assessed for the haplogroup F SNP at np. 6392. 116 Figure 4.44: Chromatogram of sample 13F; arrow denotes the polymorphic nucleotide at the SNP site. 116 Figure 4.45: Chromatogram of sample 27F; arrow identifies the SNP site where, in this case the nucleotide is a thymine. Chapter Five - Phylogeographic Analysis of Afghani mtDNAs Figure 5.1: An illustration of the mtDNA haplogroup tree; rooted by mtEve, derived haplogroups are connected to their parental haplogroups. Branches flanked by coding-region polymorphisms 15 117 for subsequent haplogroup determination. Haplogroups colour coded as to their geographical location. 124 Figure 5.2: Haplogroup L3 frequency map of Hazara (red) and Pashtun (blue) populations from Afghanistan and of neighbouring populations (grey). 129 Figure 5.3: Frequencies of macrohaplogroup M* in the Hazara (red), Baloch (green) and Pashtun (blue) populations and from neighbouring groups (grey). 131 Figure 5.4: Haplogroup N* frequencies in the Hazara (red) and Tajiks (yellow) and in surrounding west Eurasian populations. 133 Figure 5.5: Frequency of haplogroup R* in the Hazara (red), Tajiks (yellow) and Pashtun (blue) and elsewhere in western Eurasia (grey). 134 Figure 5.6: Frequencies of haplogroup C in Hazara (red) population and neighbouring populations. 136 Figure 5.7: Haplogroup D frequencies among the Hazara (red), Baloch (green) and Tajik (yellow) populations from Afghanistan and from neighbouring populations (grey). 138 Figure 5.8: Comparison of haplogroup G frequencies found in the Pashtuns (blue) of Afghanistan and from populations in Central Asia and the Iranian Plateau and the Caucasus region. 139 Figure 5.9: Haplogroup Z frequency within the Hazara (red) and among other populations in Central & East Asia. 141 Figure 5.10: Frequency map of haplogroup A within the Hazara (red) and Baloch (green) and from Central Asian and East Asian populations (grey). 142 Figure 5.11: Frequency of Haplogroup B among the Hazara (red) and Pashtuns (blue) in Afghanistan compared to populations in west Eurasia and East Asia (grey). 144 Figure 5.12: DNA alignment of DNA stretch of non-coding DNA between COII and tRNAlys genes. Samples 25, 35 and 133 have the 9bp deletion when compared to rCRS and a known haplogroup B sequence. 145 Figure 5.13: Haplogroup F frequency among the Hazara (red) and Central and East Asia populations such as Turkmens, Kyrgyz, Mongols, 16 Koreans and Chinese (grey). 147 Figure 5.14: Distribution of haplogroup X throughout western Eurasia and in the Hazara (red) and Baloch (green). 149 Figure 5.15: Haplogroup HV frequencies among the Hazara (red), Tajiks (yellow), Baloch (green) and Pashtuns (blue) and other west Eurasian populations (grey). 151 Figure 5.16: Haplogroup H frequency among the Hazaras (red), Tajiks (yellow), Baloch (green) and Pashtuns (blue) and from other regional populations (grey). 152 Figure 5.17: Haplogroup JT frequencies of the Hazara (red), Tajiks (yellow) and Baloch (green) populations from Afghanistan. 154 Figure 5.18: Haplogroup J frequency among the Baloch (green) and Pashtuns (blue) and the populations from west Eurasia and Central Asia (grey). 155 Figure 5.19: Haplogroup T frequency among the Hazara (red) and Baloch (green) populations of Afghanistan compared to other west Eurasian populations (grey) from Iran, Iraq, Turkmenistan, Uzbekistan and Tajikistan. 157 Figure 5.20: A skeleton version of the maximum parsimony tree of the mtDNA haplogroups present among the Afghani population with relevant colours for different continental and regional lineages. Circle size is proportionate to frequency observed within the population. 159 Figure 5.21: Skeleton version of the maximum parsimony tree of the fifteen mtDNA haplogroups observed among the Hazara ethnic group. 160 Figure 5.22: Skeleton version of the maximum parsimony tree of the six mtDNA lineages present among the Tajik population of Afghanistan with colours representative of the different regional lineages. 161 Figure 5.23: Skeleton version of the maximum parsimony tree of the mtDNA haplogroups present among the Baloch population, circle sizes are proportionate to haplogroup frequency. Figure 5.24: Skeleton version of the maximum parsimony tree of the eight mtDNA haplogroups observed among the Pashtun ethnic 17 162 group with different colours representative for the regional lineages. Frequency of haplogroups is represented by circle size. Figure 5.25: Location of the ethnic groups of Afghanistan. 163 165 Chapter Six - MtDNA Diversity and Polymorphism in Afghani Populations Figure 6.1: Locations of six sub-populations of Uzbekistan with foreign ancestry (Irwin et al., 2009b). 169 Figure 6.2: Aligned mtDNA HVS-I Sequences compared to the rCRS (top) using DNA Alignment, highlighted cells indicate polymorphic nucleotides. 171 Figure 6.3: Baluch population mismatch distribution (y axis = frequency); (Raggedness index (r): 0.0324) 179 Figure 6.4: Hazara population mismatch distribution (y axis = frequency); (Raggedness index (r): 0.0224) 179 Figure 6.5: Pashtun population mismatch distribution (y axis = frequency); (Raggedness index (r): 0.0121) 180 Figure 6.6: Tajik population mismatch distribution (y axis = frequency); (Raggedness index (r): 0.0270) 180 Figure 6.7: Median Joining network calculated from the HVS-I sequences of the Afghani population. 182 Figure 6.8: The first five genetic barriers of the HVS-I sequence date of 66 populations, including the four Afghani groups, compiled using the FST matrix as an input file in Barrier ver. 2.2. 185 Figure 6.9: The top ten genetic barriers using the FST matrix of the four Afghani ethnic groups and the 62 additional populations in Barrier v.2.2. 186 18 Abbreviations AMH Anatomically Modern Humans BMAC Bactria-Margiana Archaeological Complex bp Base Pairs COI Cytochrome Oxidase Subunit I COII Cytochrome Oxidase Subunit II COIII Cytochrome Oxidase Subunit III CRS Cambridge Reference Sequence HVS Hypervariable Segment IE Indo-European Ky Thousand Years Kya Thousand Years Ago LGM Last Glacial Maximum Mb Mega bases (1,000,000 bases) mtDNA Mitochondrial DNA np Nucleotide Position NRPY or NPY Non-Recombining Portion of the Y-Chromosome PAR Pseudoautosomal Region PCR Polymerase Chain Reaction rCRS Revised Cambridge Reference Sequence RFLP Restriction Fragment Length Polymorphism SNP Single Nucleotide Polymorphism Y-STR Y-Chromosome STR YBP Years Before Present YCC Y-Chromosome Consortium 19 Acknowledgements Firstly, I would like to thank my supervisor Dr. Maziar Ashrafian Bonab for having the patience to tolerate me for the duration of this study. I would also like to thank him for all the support and encouragement he has given me especially during the low periods and for also sharing his knowledge with me. I would also like to thank the other members of my supervisory team, Dr. Julian Mitchell and Dr. Frank Schubert, for the support and advice they have given and also for the many spontaneous ‘corridor meetings’, your help has also been invaluable. This project would not have been possible without the financial and organisational support of IBBS and The Genographic Project, my thanks also go to you. I would also like express my gratitude to the technical staff of King Henry Building, especially Dr. George Zouganelis, Ms Christine Hughes and Mr Christopher Baker. Your constant availability to listen to my grievances has ensured my sanity throughout the difficult periods of this study. Additionally, I would like to thank the many other academic and research staff at the University of Portsmouth for their help and support. Finally, but certainly not least, I would like to thank my friends and family, especially my parents Christine and Tony, whose patience in me has been tested fully, but despite this have always supported me throughout. They have given me the strength and determination during the tougher days to pick myself up again and continue. I would like to thank Jonathan Parrott for his support but also for proof-reading some of my thesis chapters and never ceasing to find a mistake. Thanks! Thank you for all your help. 20 Chapter One Anthropology & DNA 21 1. Anthropology and DNA 1.1 Origins of Modern Humans 1.1.1 Early Hominids For many years, scientists, especially biologists, and ourselves as inquisitive humans have wondered and asked the question who walked the Earth before us? Where do we come from? The process of evolution as recognised by the early evolutionary biologists Charles Darwin, Alfred Russell Wallace and Thomas Huxley have aided us in the explanation of this question. We know that the early hominids diverged away from our nearest relatives, the African apes, some 5-7 million years ago (MYA) based upon fossil identification and protein comparisons between Asian and African apes and humans (Stoneking, 2008). Most of the hominid fossils that are dated after 4.2 MYA, but do not comply with the characteristics of the genus Homo, are grouped into the genus that preceded it, Australopithecus (Jobling, Hurles & Tyler-Smith, 2004). Fossils belonging to this genus have only been found in Africa (Jobling, Hurles & Tyler-Smith, 2004). The oldest of these species is Australopithecus anamensis, found at Lake Turkana (Figure 1.1) and the nearby sites of Kanapoi and Allia Bay in northern Kenya. The fossil evidence collected at these sites were a mandible, molars and premolars, skull fragments, a tibia and a humerus (Cartmill & Smith, 2009) and have been dated to the lower Pliocene era 3.9-4.2 MYA (Jobling, Hurles & Tyler-Smith, 2004; Cartmill & Smith, 2009). Another early species was Australopithecus afarensis, which differs from the earlier A. anamensis, in that the males are slightly smaller (Jobling, Hurles & Tyler-Smith, 2004). Remains of A. afarensis have been found at multiple sites along the Eastern Rift Valley, from Hadar, Ethiopia in the north to Laetoli, Tanzania, in the south dating to 3.0-3.9 MYA (Jobling, Hurles & Tyler-Smith, 2004; Cartmill & Smith, 2009). A contemporaneous species to A. afarensis was Australopithecus bahrelghazali whose remains have been found near Koro Toro, central Chad, some 2,500 Km west of the Eastern Rift Valley where the remains of A. afarensis were found (Jobling, Hurles & Tyler-Smith, 2004). These are dated from 3.0 MYA (Jobling, Hurles & Tyler-Smith, 2004) to 3.5 MYA (Jobling, Hurles, Tyler-Smith, 2004; Cartmill & Smith, 2009). Australopithecus bahrelghazali are morphologically similar to the eastern relatives, and may even be a variant of A. afarensis (Jobling, Hurles & Tyler-Smith, 2004). The main features which allows for the differentiation between these two species is that the mandibular symphysis, the fusion ridge joining the left and right halves of the mandible, 22 is more vertical in A. bahrelghazali (Cartmill & Smith, 2009), also the lower premolars have three roots instead of the conventional two, however this feature is not necessarily something that enables for the identification between them. Figure 1.1: Map of Kenya; Lake Turkana in the north-northwest region of Kenya. During the time soon after A. afarensis and A. bahrelghazali, there seems to be a transition of hominids between Australopithecus and the future genus Homo. The species in question is Australopithecus habilis or as it is also known Homo habilis of which some specimens have been dated to ~2.5 MYA (Jobling, Hurles & Tyler-Smith, 2004). It has been described as a primitive Homo and an advanced Australopithecus (Cartmill & Smith, 2009). This hominid is regarded to belong to the genus Homo largely upon the interpretation of partial skull and mandible fossils (Jobling, Hurles & Tyler-Smith, 2004). Homo habilis had a larger brain and had smaller ‘cheek-teeth’ (Cartmill & Smith, 2009). 23 Homo habilis is regarded as having an Australopithecus-like brain and Homo-like face while other fossils indicate the opposite leading on to claim that there’s a sister species named Homo rudolfensis (Cartmill & Smith, 2009). A Homo species where there is little confusion is Homo erectus. The only issue is that H. erectus is sometimes used in reference to non-Africans while Homo ergaster is consigned to African individuals (Jobling, Hurles & Tyler-Smith, 2004) but there is no significant difference between the two. Here, to avoid confusion, Homo erectus will be used in reference to both. The oldest H. erectus fossils were found in eastern Africa from Koobi Fora, near Lake Turkana, and dated approximately 1.8-1.9 MYA (Jobling, Hurles & Tyler-Smith, 2004). Some other fossils from this site have been dated to 1.7-1.8 MYA (Cartmill & Smith, 2009). As well as being initially found in Africa, Homo erectus was also the earliest hominid found outside Africa, and based on fossil evidence obtained from Indonesia and China, H. erectus may have been in East Asia as early as 1.8 MYA (Jobling, Hurles & Tyler-Smith, 2004). These Asian H. erectus had a larger body than their African counterparts which may have provided greater tolerance to heat stress which supported them in their migration, and there may also have been small populations of Homo erectus still around as little as 27 thousand years ago (KYA) (Jobling, Hurles & Tyler-Smith, 2004). The limb proportions and tooth size of H. erectus are similar to anatomically modern humans (AMH), Homo sapiens sapiens, however, their brain size is smaller (Jobling, Hurles & Tyler-Smith, 2004). Later Homo species following the emergence of H. erectus but before AMHs include Homo heidelbergensis and Homo neanderthalensis. The former are less robust than Homo erectus but did have larger brains. Fossil evidence, particularly a mandible, of Homo heidelbergensis has been found across Europe (in Germany, Greece, Italy, Spain and the United Kingdom) and in Ethiopia, eastern Africa and to have been dated to ~1 MYA (Jobling, Hurles & Tyler-Smith, 2004). Due to their distribution, H. heidelbergensis must have been a variable species, not too dissimilar to AMHs, as they were able to inhabit several different regions under different environmental conditions. However, Homo neanderthalensis, or the Neanderthal as it is more commonly known, has been identified to have emerged ~250 KYA and become extinct ~27 KYA, just before the Last Glacial Maximum (LGM). The Neanderthal inhabited regions of Europe and western Asia with fossils identified in France, Germany, Israel and Iraq. Based upon skeletal and skull evidence, which show H. neanderthalensis to have large brains and 24 well-defined brow ridges and also appear robust, Neanderthals may have in fact derived from H. heidelbergensis (Jobling, Hurles & Tyler-Smith, 2004). 1.1.2 Modern Humans The evolution from previous hominid species into AMHs is contested, not which archaic human species from which we derive, but where geographically, in three hypotheses; (i) the multi-regional hypothesis, (ii) the replacement hypothesis and (iii) the assimilation hypothesis. Initially, the conclusions made regarding the evolutionary history of AMHs focussed of fossil evidence (Schick & Toth, 1993), but now concentrates on both archaeological, anthropological and genetic data from modern human populations (Stringer, 2002; Mellars, 2006). During the early period of AMH history, modern humans were at least one third of a group of evolving human species; Homo erectus in Asia, Homo neanderthalensis in Europe/Eurasia and Homo sapiens in Africa (Klein, 2008). The multi-regional hypothesis (Figure 1.2) or Regional Continuity model (Wolpoff et al., 1984) suggests that Homo erectus populations, migrated out of Africa to the various regions of the world more than 1 MYA (Nei, 1995), and gradually evolving in AMHs, providing our current worldwide distribution. For example, Asian H. erectus evolved into Asian modern humans, African H. erectus evolved into African modern humans etc. Wolpoff et al. (2000) states that this model does not suggest parallel evolution, independent multiple origins or the simultaneous appearance of characteristics within different regions. This hypothesis does propose that the regional characteristics of modern humans have remained unchanged since the time of their ancestors more than 1 MYA (Nei, 1995) which would seem unlikely. Genetically, there is no evidence of Homo neanderthalensis mitochondrial DNA (mtDNA) contribution to AMHs (Hodgson & Disotell, 2008). This is due to the large quantity of polymorphisms seen between Neanderthal and modern human mtDNA in comparison to any two modern human mtDNAs. However, Homo erectus X-Chromosome sequence data can be found in modern humans (Cox et al., 2008), thus providing some genetic support to the multiregional hypothesis. This model doesn’t rule out the possibility of different H. erectus populations breeding with one another; however it does suggest that the main form of breeding occurred within isolated H. erectus groups. This hypothesis argued that “each inhabited region showed a continuous anatomic sequence leading to modern humans, and those non-African populations exhibited no special African influence” (Stringer, 2002). 25 Figure 1.2: Illustration of the Multi-Regional hypothesis of modern human evolution (Stoneking, 2008) The Replacement hypothesis or Out-of-Africa theory (Figure 1.3) is the main alternative to the multi-regional hypothesis. This theory also identifies an African origin (while in the multi-regional hypothesis, this African origin is associated with Homo erectus and not all AMHs), that proposes modern humans originated from an African H. erectus population ~100,000-200,000 years ago (Nei, 1995), ~150Kya (Forster & Matsumura, 2005) and not from the expansive H. erectus populations outside Africa. Modern humans would first expand and colonise Africa before migration into the Middle East and subsequently onwards throughout the world. Support for this hypothesis intensified with the introduction of genetics, in particular the use of mtDNA and Y-Chromosome, as these are inherited unilaterally. Cann et al. (1987) collected 147 mtDNA samples from five population groups around the world (African, Asian, Australia, New Guinea and European). Restriction Fragment Length Polymorphism (RFLP) analysis was performed on these samples in an attempt to identify the level of genetic variation. Upon the construction of a genealogical tree illustrating the evolutionary relationships between the populations, Cann et al. (1987) concluded that the most ancestral sequence split the tree into two groups; the first consisting of only African mtDNAs, and the second containing the mtDNAs of the rest of the world. They also calculated, using the molecular clock, that the common ancestor for modern humans lived in eastern Africa 140-280 KYA (Cann et al. 1987). 26 The hypervariable regions of mtDNA (HVS-I and HVS-II) from 189 individuals, of which 121 were of African descent, were sequenced (Vigilant et al., 1991). Chimpanzee and human mtDNA sequences were used to calibrate the rate of mtDNA evolution resulting in the dating of the common human ancestor sometime between 166-249 KYA (Vigilant et al., 1991). Other studies which also appear to support the Out of Africa hypothesis, also calculate the ancestor of modern humans to be near this time; 230-298 KYA (Hasegawa & Horai, 1991; Ruvolo et al., 1993). The subsequent expansion(s) from Africa are thought to have taken a northern (via the Levant) or a southern route across the Horn of Africa. Recently, evidence has amassed supporting the latter as an initial single successful migration (Kivisild et al., 1999; Forster & Matsumura, 2005; Macaulay et al., 2005; Mellars, 2006; Hudjashov et al., 2007; Chandrasekar et al., 2009; Kumar et al., 2009). A Levant migration has been recognised, but has either been identified as having lesser impact (Forster & Matsumura, 2005) or occurring more recently (20-10Kya) (Winters, 2011). The Strait of Gibraltar has been identified as a third migration point from northern Africa, ~40-35Kya while Eurasia was still inhabited by Neanderthals (Winters, 2011). To date, the current genetic and archaeological data available is generally interpreted to substantiate the single AMH origin in east Africa (Liu et al., 2006). Figure 1.3: Illustration of the Out-of-Africa theory of modern human evolution (Stoneking, 2008) 27 The Out-of-Africa hypothesis has been supported further; Hudjashov et al. (2007) identified, by analysis of both Y-Chromosome and mtDNA, that Australian Aboriginals and Melanesians (from the region of Oceania that incorporates the islands around Australia, including Papua New Guinea) belong to the founder groups (mtDNA lineages M and N, and Y-Chromosome lineages C and F) that are associated with the initial exit from Africa that occurred 50-70 KYA. It was also found that Australian Aboriginals were closely related to the indigenous populations of Papua New Guinea and the rest of Melanesia, and during one period or another were part of the same settlement group which has been later separated due to the oceans (Hudjashov et al., 2007). The Assimilation model (Figure 1.4) is an amalgamation of the two previous theories in that AMHs “arose through the integration of an important African role with multiregional views” (Stringer, 2002). The Assimilation model accepts the African origin for modern humans, however suggests that the role of population migrations and the replacement of the more archaic species has been over-proposed and the evolution of various H. erectus populations into AMHs. For instance, it has recently been observed that the Neanderthal genome shares a greater affinity with AMHs of Eurasia than those of Africa; between 14% of Eurasian genomes derive from the Neanderthal genome (Green et al., 2010). It was also observed that the Neanderthal genome is as similar to a French individual as it is to an East Asian (Han Chinese) and a Papuan genome indicating an occurrence of admixture between AMHs and Neanderthals shortly after the modern human migration from Africa but prior to the divergence of Europeans, East Asians and Papuans (Green et al., 2010). 28 Figure 1.4: Illustration of the Assimilation hypothesis for modern human evolution (Stoneking, 2008) 1.2 Mitochondrial DNA (mtDNA) Since Watson & Crick (1953) described the structure of DNA as two right-handed helical chains coiled round the same axis held together by purine and pyrimidine bases, scientists and in particular geneticists have been obsessed with its function, and once identified in most cells, was quickly identified as an essential molecule. Within a typical somatic cell, there are many complex processes occurring which are specific for that cell, for example, the synthesis of a specific protein required for a particular job. The cell contains many organelles which are required for the essential cellular procedures. The main centre of the cell is the nucleus, where DNA is stored in the form of chromosomes and will only leave in the form of mRNA. Nuclear DNA is heavily involved in protein synthesis and as a consequence, providing the cell with its identity by the regulation and expression of genes. For example, pancreatic cells are instructed to synthesise insulin, a hormone used to regulate blood-sugar levels, while other cells have this gene switched off. The organelles inside a cell are often involved in protein synthesis, the packaging or transportation of proteins. The mitochondrion (~dria pl.) is a double-membrane bound organelle, whose inner membrane (cristae) is extensively folded upon itself to maximise surface area, and is like no other organelle inside the cell; not only does it receive 29 synthesised proteins, but also synthesises its own proteins from its own genome, mitochondrial DNA, which is separate from nuclear DNA (Borst, 1977). Mitochondrial DNA (mtDNA) is located inside the mitochondrion (Jobling, Hurles & Tyler-Smith, 2004; Butler, 2005) within the mitochondrial matrix, and unlike nuclear DNA, is not involved in the majority if not all cellular processes, but only those which occur inside the mitochondrion such as oxidative phosphorylation and ATP synthesis. The origins of the mitochondria are widely accepted to have derived from a mutual symbiosis between the cells and a bacterium (Anderson et al., 1981; Jobling, Hurles & Tyler-Smith, 2004). Human mitochondrial DNA (Figure 1.5) is a double-stranded circular molecule and is 16,569 base pairs (bp) in length (Jobling, Hurles & Tyler-Smith, 2004; Butler, 2005; Ebner et al., 2011). Figure 1.5: Diagrammatic view of mtDNA. Mitochondrial DNA is inherited unilaterally via the maternal line (Lightowlers et al., 1997; Jobling, Hurles & Tyler-Smith, 2004; Butler, 2005) and is present in nearly all cells. Biparental (additional paternal contribution) inheritance has been observed among insects such as honeybees and drosophila, mussels, yeast and mice (Meusel & Moritz, 1993; Kvist et al., 2003; Kraytsberg et al., 2004). Among honeybees, as much as 27% male mtDNA contribution has been observed during the egg stage 12 hours after 30 oviposition, while the contribution becomes negligible by larval emergence (Meusel & Moritz, 1993). Paternal inheritance of mtDNA in humans has been observed among the blastocyst stage of some abnormal embryos (St. John et al., 2000) but overall contribution has been negligible (~0.7%) (Kraytsberg et al., 2004). The unilateral inheritance may be attributed to a typical oocyte containing ~100,000 mtDNA genomes, while spermatocytes only contain ~100 genomes (Chen, X et al., 1995; Jobling, Hurles & Tyler-Smith, 2004) and also the selective destruction or inactivation of spermatozoon mitochondria during early embryogenesis (Schwartz & Vissing, 2002). Within cells, mtDNA is present in high copy numbers; 103-104 (Lightowlers et al., 1997; Butler, 2005), and most of these copies are identical to one another (Lightowlers et al., 1997). Cells which require a greater ATP yield, such as muscle or nerve cells, will contain a greater number of mitochondria and therefore more mtDNAs than those which have a much lower demand. The mitochondrial genome contains 37 genes; coding for 13 proteins, 22 tRNAs and two rRNAs, which are contiguous and have little non-coding bases between them (Anderson, et al., 1981). The major non-coding region within mtDNA is the D-loop; a 1122bp region which houses the hypervariable (HVS) regions; HVS-I (np. 16024-16383; classically 16024-16365), HVS-II (np. 57-372; classically 73-340) and HVS-III (np. 438-574) (Butler, 2005). Since the late 1980s, mtDNA, in conjunction with Y-Chromosome and autosomal DNA, have been utilised for population genetics as it is possible to trace the evolutionary and historical lineage of a species. Mitochondrial DNA has been used extensively due to its maternal inheritance via the ova (Giles et al., 1980), lack of recombination, high abundance per cell and high mutation rate (Olivo et al., 1983; Merriwether et al., 1991; Elson et al., 2001; Piganeau & Eyre-Walker, 2004; Torroni et al., 2006; Asari et al., 2007; Behar et al., 2007; Maji et al., 2008). The mitochondrial genome acquires mutations approximately ten times faster than nuclear DNA (Brown et al., 1979; Ingman & Gyllensten, 2001; Ebner et al., 2011). This high rate of mutation is accredited to the absence of protective proteins, such as histones, around the DNA, exposure to oxidative damage, and a lack of repair mechanisms (Bogenhagen, 1999). The original sequenced sample of mtDNA, the Cambridge Reference Sequence (CRS) (Anderson et al., 1981), was obtained from the placenta of an individual of European descent and exhibits the typical characteristics of European mtDNA, belonging to Haplogroup H, Sub-Haplogroup H2 (Achilli et al., 2004). Haplogroups are a set of slowly mutating markers (Jobling, Hurles & Tyler-Smith, 2004) that tend to be shared by peoples of the same geographic region. Wallace et al. (1999) found that many individuals 31 from the same or similar populations or cultural backgrounds shared similar mtDNA sequences and could be clustered together to form haplogroups. This can be seen particularly amongst European populations (Hedman et al., 2007; Richard et al., 2007; Tetzlaff et al., 2007; Zimmerman et al., 2007) which share many of the same haplogroups. The original sequence was reanalysed by Andrews et al. (1999) as other investigators (Brown et al., 1992; Howell et al., 1992) had identified differences in the genomic sequence. In total, the re-analysis identified eighteen errors or rare polymorphisms within the mtDNA sequence, thus updating the sequence to become the revised Cambridge Reference Sequence (rCRS). Since the acknowledgement of mtDNA as an essential tool in population genetics, the genome has been analysed extensively. Earlier analyses of mtDNA utilised a number of restriction enzymes; AluI, AvaII, BamHI, DdeI, HaeII, HaeIII, HhaI, HincII, HinfI, HpaI, HpaII/MspI, MboI, RsaI and TaqI. The earlier studies using this method, known as the 14-restriciton enzyme method, used the endonuclease HpaII (Torroni et al., 1992; 1993; 1994a) while those undertaken later switched to MspI (Torroni et al., 1994b; 1996; 1997; 1998; 1999; Brown et al., 1998; Kivisild et al., 1999; Macaulay et al., 1999; Kivisild et al., 2003; Quintana-Murci et al., 2004; Alzualde et al., 2005), however both target the sequence C:CGG. In many cases when this method was used, additional polymorphic SNPs were also observed; most commonly AccI at 14465 and 15254, HinfI at 12308, NlaIII at 4216 and 4577 and MseI at 14766 (Torroni et al., 1996; 1998; 1999; Brown et al., 1998; Kivisild et al., 1999; Macaulay et al., 1999; Quintana-Murci et al., 2004; Alzualde et al., 2005). This method has been modified more recently as restriction enzymes have been used at the sites of diagnostic polymorphisms for haplogroup identification (Torroni et al., 2001; Al-Zahery et al., 2003; Quintana-Murci et al., 2004; Tambets et al., 2004; Alzualde et al., 2005; Nasidze et al., 2006; Jin et al., 2009). Quite often, mtDNA genomes were also partially sequenced, mostly focussing on the hypervariable regions (HVS-I and II) (Kivisild et al., 1999; 2003; Nasidze & Stoneking et al., 2001; Torroni et al., 2001; Al-Zahery et al., 2003; Nasidze et al., 2004a; 2004b; 2005a; 2005b; 2006; 2007; Powell et al., 2007; Alshamali et al., 2008; Irwin et al., 2009a; 2009b; Jin et al., 2009) and now DNA sequencing is a little more economical, whole genome sequencing (Achilli et al., 2004; 2005; Fagundes et al., 2008; Chandrasekar et al., 2009; Kumar et al., 2009) is a common mtDNA analysis method. 32 1.3 Mitochondrial and Y-Chromosome Haplogroup Distribution 1.3.1 Mitochondrial DNA Variation Human mtDNA genomes differ broadly across the world, with populations of similar descent or geographical region sharing many of the same characteristics. In some cases, these characteristics can indicate some historical events of the population including admixture with other populations or migrations. The Americas are dominated by five haplogroups (Schurr et al., 1990; Achilli et al., 2008; Fagundes et al., 2008); A, B and X among native North Americans, and haplogroups B, C and D among South Americans. Haplogroup A is also abundant among Central American populations. Haplogroups A-D are also present in Asia, while X is found in low frequencies outside the Americas; In Europe, it accounts for <5% of the mtDNA diversity (Fagundes et al., 2008). Haplogroup X arrived in the Americas as part of a single founding population, which refutes multiple migration theories such as the Solutrean hypothesis. This is based upon the five founder haplogroups possessing a coalescence age of ~20KYA (Fagundes et al., 2008), an incidence that wouldn’t be observed had a later peopling event have occurred. Additional contesting evidence is that Amerindian mtDNAs contain rare mutations that can only be found in Asia, thus indicating the peopling of the Americas occurred via a migration through Asia (Schurr et al., 1990). The initial separation of American populations from Asian groups ended with a population bottleneck in Beringia during the LGM ~23-19KYA reducing the female contribution to ~1,000 individuals. Toward the end of the LGM, the population experienced an expansion from ~18-15KYA implementing the migration by a southern route, likely along the western coast of North America, as the ‘opening of the ice-free corridor is dated no earlier than ~14KYA’ (Fagundes et al., 2008). There are a number of minor populations among the larger populations within East Asia. Haplogroups B, F, M7 and R9 tend to be found in abundance within Chinese populations, and in Hong Kong, these four lineages make up over 50% of the mtDNA diversity (Irwin et al., 2009a). The Korean population and the subpopulations that neighbour them (Manchurians, Korean-Chinese and Han (Beijing), each have high frequencies of haplogroups D (≥25.0%), M (≥15.0%), F (≥10.1%) and B (≥10.0%) (Jin et al., 2009). The haplogroup D frequencies in the other populations studied by Jin et al. (2009) were Vietnamese (18.8%), Mongolian (12.8%) and Thai (7.5%). The Korean population also presents moderate frequencies of the haplogroups A (8.4%), G (7.3%) which are common in northeast Asia and southeast Siberia. Other common lineages from this region (C, Y & Z) only make up <4% of the Korean mtDNA gene pool. Based upon 33 mtDNA analysis, Koreans are most genetically similar to populations within their own geographical region of northeast Asia. The Xibe, another Chinese ethnic group, originating from north-eastern China but now inhabit a region of north-western China. They genetically resemble populations from their indigenous north-eastern region of China and are most similar to the Manchurian population of the same region (Powell et al., 2007). In Europe, there is one haplogroup in particular which dominates the population landscape; haplogroup H (Mikkelsen et al., 2008). The frequency of this haplogroup increases the further west into Europe, in addition, there are also regional ‘hotspots’ where the frequency is greater than the surrounding populations – such hotspots include the Spanish Basques, northern Germany, Denmark, northern France and Great Britain (Achilli et al., 2004). The average frequency within Europe is 40.5%, often found between 40-50% (Grignani et al., 2009), 24.6% in the Caucasus region, 18.4% among Middle Eastern populations and 10.6% within Asian groups (Achilli et al., 2004). There are at least fifteen subgroups of haplogroup H, two common subgroups are H1 and H3 which exhibit high frequencies in the Iberian peninsula and the surrounding populations such as the Berbers of Morocco, and both have a coalescent age of ~11KY (Achilli et al., 2004). Another common European haplogroup is haplogroup U (coalescent age of ~60KY – giving it a date of origin soon after the AMH exit from Africa (Kivisild et al., 1999; Achilli et al., 2005)), and in particular subgroup U5 (Kivisild et al., 1999). A U5 subclade (U5b1b) was found to be present among a single Yakut individual and one Fulbe individual, both individuals differed by two coding region nucleotides and three control region nucleotides (Achilli et al., 2005), an unusual occurrence since the Yakut are from Siberia and the Fulbe are from Senegal. Sub-haplogroup U5 (coalescent age of 41.4 ±9.2KY) is often found at low frequencies within Europe and the Berber and African populations, however U5 is found at a high frequency (~48%) among the Saami of northern Scandinavia (Achilli et al., 2005), of which some can also be assigned to the subclade U5b1b. The lineage U5b1b shares similar patterns with the other major haplogroup found among the Saami; haplogroup V. Together, these two lineages account for nearly 90% of the Saami mtDNAs (Achilli et al., 2005). The coalescent age for U5b1b is 8.6 ±2.4KY, not dissimilar to the coalescent age of the popular H subhaplogroups H1 and H3 and also haplogroup V itself (Achilli et al., 2005). The identification of the lineage U5b1b links the Berbers (and the African tribes, such as the Fulbe who are known to have mixed with the Berbers) and Europeans who have 34 contributed their H1, H3, U5b1b and V lineages with the populations of northern Africa during the LGM (Achilli et al., 2005). The populations within the Caucasus region have recently been studied extensively as this region accommodates a number of populations which speak a variety of languages from different languages families. Nasidze & Stoneking (2001) identified that Caucasian populations were more similar to their geographical neighbours despite the language differences than to populations who share a language family but are not geographically local. A Neighbor-Joining tree also illustrated a close relationship between the Azerbaijanis, Armenians and Chechenians who are all south Caucasian populations but who speak an Altaic, Indo-European and North Caucasian language respectively. The Caucasian populations have also been identified as an intermediate between West Asian and European populations (Nasidze et al., 2004a) while it was found that two Iranian groups (Tehran and Isfahan) are close the Caucasian groups of the Avarians and Rutulians. Elsewhere, Iranians have been observed to lie within an intermediate position between Caucasian and East Asian populations (Shepard & Herrera, 2006). The Ossetian groups are also fairly similar to one another despite being found on both the northern and southern slopes of the Caucasian mountains, possibly indicating a common origin (Nasidze et al., 2004b). They speak a language that belongs on the Iranian branch of languages but are surrounded by Caucasian-speaking groups. The Kurdish groups from the region; Kurmanji speakers from Georgia and Turkey, Zazaki speakers from Turkey, and Kurds from eastern Turkey, Iran and Turkmenistan, are genetically similar to one another despite the linguistic and geographical differences (Nasidze et al., 2005a). The Kurdish groups are also more similar to West Asian and European populations than to Caucasian and Central Asian groups (Nasidze et al., 2005a). The Kalmyks are an ethnic group that reside along the lower Volga River, Russia, that are believed to have Mongolian ancestry. They have a frequency of the COII-tRNAlys 9bp deletion of ~7% - similar to the frequencies exhibited in the Korean, Mongolian and Buryat populations (Nasidze et al., 2005b). Since this deletion is at low frequencies among eastern Europeans, a Mongolian ancestry is supported. The Kalmyks also share some similarity with local Russian populations indicating a more recent maternal admixture (Nasidze et al., 2005b). The Gagauz are a linguistic enclave; a Turkic-speaking group that originated in Turkey before migrating to their current location in Moldova, surrounded by Indo-Europeanspeaking populations. The Moldovan population are similar to Europeans while the 35 Gagauz are an intermediate between Europeans and Caucasians; they are more similar to Moldovans than to their Turkish ancestry (Nasidze et al., 2007). Recently, there has been greater focus on Indian and Asian populations due to their rich anthropological history. It is thought that the AMH expansion from Africa ~85Kya consisted of a small group of 500-2,000 women, which would justify why only two sublineages (super-haplogroups M and N) have emerged from Africa (Forster & Matsumura, 2005). In India, the most common lineage is haplogroup M, it is ubiquitous and contributes to >70% of Indian mtDNAs (Chandrasekar et al., 2009). The lineage is also common among south Indian tribes and Caste populations and accounted for all but three lineages among the Chenchus (Kivisild et al., 2003). Haplogroup M is also found at high frequencies among the populations inhabiting the region along the southern coast of Pakistan and northwest India; 30-55% (Quintana-Murci et al., 2004). Meanwhile, the frequency of haplogroup M is low or absent west of the Indus Valley and low among Central Asian populations (<12%) (Quintana-Murci et al., 2004). Haplogroup U was identified as the second largest contributor (Kivisild et al., 1999) and is also the second most frequent haplogroup in Europe, however the subhaplogroups differ between the two regions; subgroup U5 in Europe and U2 in India. The distribution of haplogroup U is similar to ‘M’ in Asia, focussing more within the Indo-Pakistan region (Quintana-Murci et al., 2004). The Indian U2 genotype differs from the west Asian U2 in that the latter also contains a transversion at np 16129 that is absent within the Indian U2 lineage (Kivisild et al., 1999). The age of the split of the two genotypes is 53 ±4 KY while the European U5 lineage also has a similar age (Kivisild et al., 1999). Frequencies of west Asian haplogroups are lower in the Indian population than in Central Asia and the Caucasus with the exception of haplogroup W where the frequency is greater in India than both Central Asia and the Caucasus and haplogroup Uk-group that is greater than Central Asia but not in the Caucasus (Table 1.1) (Kivisild et al., 1999). 36 Table 1.1: Haplogroup frequencies of west Asian haplogroups in India, Central Asia and the Caucasus (Kivisild et al., 1999). Haplogroup India Caucasus (Armenia & Georgia) Central Asia H 1.8% 24.8% 14% I 0.7% 1.8% 1% J 0.5% 6.7% 2.5% K 0.2% 8.2% 0.5% T 1.8% 11.8% 3.5% Uk-group 13.1% 21.2% 8% W 2.2% 0.9% 1% India has been identified as a major region in the peopling of southeast Asia and Australia as part of the ‘Southern Route’ migration from Africa (Kivisild et al., 1999; Chandrasekar et al., 2009; Kumar et al., 2009). The southern route migration from Africa was an essential component of the rapid peopling and settlements of southern Asia and Australia certainly by ~46Kya (Forster & Matsumura, 2005; Macaulay et al., 2005; Hudjashov et al., 2007). Chandrasekar et al. (2009) branded India a site of initial settlement of AMHs following the exodus from Africa, and perhaps it was during this period that the divergence of haplogroup U occurred. It was also found that populations from the Andaman Islands and Australians have ancestral maternal roots in India (Chandrasekar et al., 2009). Some individuals from Central Dravidian and Austro-Asiatic tribes share two basal synonymous mtDNA polymorphisms within the M42 haplogroup (G8251A & A9156T) which are specific to Australian Aborigines (Kumar et al., 2009). The shared mtDNA lineage provides direct genetic evidence that Australia was populated by AMHs through South Asia via the ‘Southern Route’ (Kumar et al., 2009). Kumar et al. (2009) suggested an early colonisation of Australia dating to ~60 KYA, which appears to be synonymous with archaeological data. A study on the Iraqi population (Al-Zahery et al., 2003) identified a similarity with Iranian and other Middle Eastern populations, but dissimilarity to Arabians. The west Asian haplogroups (HV, H, V, J, T, K, Uk-group, I, X and W) total 77.9% of the Iraqi mtDNAs, similar to the frequencies seen in Iran (80.4%) and Syria (75.1%). These populations have frequencies that resemble European populations (>90%) more than they do Arabian populations (60.4%) (Al-Zahery et al., 2003). Iraqis can also be grouped with other west Asians populations (Lebanese, Turkish and Syrian) based upon YChromosome data. These four populations have a high frequency of haplogroup J 37 (41.9%-58.3%); other abundant haplogroups include R and E. In total, these three haplogroups amass a total frequency of 76.4% (Turks), 83.8% (Lebanese), 87.8% (Iraqis) and 90% (Syrians) (Al-Zahery et al., 2003). Within Eurasian populations, the sub-Saharan African lineages L1, L2 and L3, are absent aside from the Makrani of southern Pakistan where they are present in high frequencies – 39.4% (Quintana-Murci et al., 2004). Eastern Eurasian lineages are represented by the haplogroups A, B, F and N9a (from the macrohaplogroup N) and C, D, G and Z (from macrohaplogroup M) (Quintana-Murci et al., 2004). The latter are widespread among northern and eastern Asians and to a lesser extent among Central Asians. The highest frequencies of these lineages were found among Central Asian populations – Turkmens (37%) and Uzbeks (31%), however Turkmen Kurds only exhibited a frequency of 9% (Quintana-Murci et al., 2004), which supports the findings by Nasidze et al. (2005a) that Kurdish groups are more similar to one another (and west Asians) than to their geographical neighbours. These lineages are absent or at low frequencies among populations in the Anatolia and Caucasus region, Iranian plateau and the Indus Valley, again with the exception of a population from Pakistan: the Hazaras with a frequency of 35% (Quintana-Murci et al., 2004). The western Eurasian lineages (haplogroups HV, T, J, Uk-group, I, W & X) exhibit a pattern contrary to eastern Eurasian groups; the populations with greater frequencies are found within Anatolia and the Caucasus and the Iranian plateau (Quintana-Murci et al., 2004). In Uzbekistan, populations can be split into two distinct groups; those with Uzbek ancestry and those that have ancestry from a neighbouring country. Western and eastern Eurasian haplogroups dominate the populations with Uzbek ancestry (Karakalpakstan, Khorezm, Qashkadayra, Tashkent and Fergana), with a minor South Asian contribution (Irwin et al., 2009b). There is also an even smaller African lineage contribution found within the western-most populations (Karakalpakstan and Khorezm), which are absent among the other Uzbek groups (Irwin et al., 2009b). The populations with ‘foreign’ ancestry include those of Russian, Kazakhstani, Tajiki, Turkmen and Afghani heritage. The mtDNA composition for the group with Russian ancestry is completely dominated by western Eurasian lineages (>90%) while also consisting of a minor South Asian contribution and an even smaller input of eastern Eurasian lineages. The populations with Kazakhstani, Tajik and Turkmen heritage all have the majority of their mtDNA composition made up of eastern and western Eurasian haplogroups with a minor South Asian contribution, which was very minor within the Turkmen group. Finally, the group with Afghan ancestry have the mtDNA genomes dominated by western Eurasian 38 haplogroups (~75%) followed by moderate eastern Eurasian lineages and a very small South Asian contribution (Irwin et al., 2009b). 1.3.2 Y-Chromosome Variation The Y-Chromosome is ~60Mb in length, is inherited unilaterally via the paternal line (Jobling, Hurles & Tyler-Smith, 2004) and is a valuable genetic tool by providing the male-driven demographic history. The chromosome consists of a short arm and a long arm, and does undergo recombination with the X chromosome. More than 90% of the YChromosome does not participate in recombination with conserved regions of the X, this region is known as the non-recombining portion of the Y-Chromosome (NRPY or NPY). The regions of the Y-Chromosome which do recombine are known as the Pseudoautosomal regions (PARs) which are located at the tip of both arms. PAR1 is located at the tip of the short arm and is 2.6Mb in lenth, while PAR2 is found at the tip of the longer arm but is much smaller at 0.32Mb (Jobling, Hurles & Tyler-Smith, 2004). When analysing the variation of the Y-Chromosome, there are two main methods; the biallelic method and the multi-allelic method. The bi-allelic method identifies SNPs along the chromosome, and is used for the assignment of haplogroups. The mutation rate of the bi-allelic markers is ~10-8 per generation (Butler, 2005). The haplogroup tree (Figure 1.6) was developed by the Y-Chromosome Consortium (YCC) (Jobling & Tyler-Smith, 2003). The multi-allelic method enables a greater resolution of the Y-Chromosome, and generates the haplotypic profile. There are >200 Y-STR markers (Butler, 2005). Unlike mitochondrial DNA, the nomenclature system for the most ancestral haplogroup is identified by letters at the beginning of the alphabet; A-M91 and B-M60 for African haplogroups (instead of L for mtDNA). The most common haplogroup found among Caucasian Europeans is R1b-P25 (Butler, 2005). Haplogroups L-M20, H-M60 and R2aM124 are typical found among South Asian populations, particularly among Indians, while haplogroups R1a1a-M17 and J2-M172 are both west Eurasian lineages (Haber et al., 2012, also Chapter 7). A common East Asian lineage is haplogroup C3, which is thought to represent the lineage of Genghis Khan (Zerjal et al., 2003; McElreavey & Quintana-Murci, 2005). 39 Figure 1.6: Simplified Y-Chromosome Consortium (YCC) haplogroup tree. The Y-Chromosome is a valuable genetic tool as it is able to provide the male-driven demographic history such as migrations and invasions. Y-Chromosome data may provide some slight variation to data observed from mtDNA, particularly with the latter where women were not at the forefront of the invasions and expansions of the various Empires throughout history. The Y-Chromosome analysis can also provide data of a population that mitochondrial DNA cannot; in that it can infer population structures using patrilineal surnames (Sykes & Irven, 2000; Jobling, 2001) and also language (Forster & Renfrew, 2011). The language spoken by a population can be driven by as little as 10% of the YChromosomes; some tribes of the Indian subcontinent, such as the Munda, speak Austroasiatic languages typical of among East Asian populations, while mtDNA analysis predominantly present South Asian haplogroups, immigrant East Asian haplogroups of the Y-Chromosome are observed thus the Y-Chromosome has established a shift in language spoken (Forster & Renfrew, 2011). There is an increasing amount of Y-Chromosome data available from populations and sub-populations around the world. The Korean population, based upon Y-Chromosome data, appear to share close relationships with both northeast and southeast Asian populations, while the mtDNA evidence, despite the 30% of the genome that can be accredited to a south Asian origin, share a greater similarity to northeast Asian groups (Jin et al., 2009). The Xibe, the population now residing in north-western China, lie in an intermediary position between the main cluster of north-western and north-eastern populations, but are closely related to (in a minor cluster) the Manchurian and Hezhe groups which are both located in northeast Asia (Powell et al., 2007). This indicates the 40 Xibe have not lost their north-eastern heritage but have perhaps begun to integrate more with their local populations. Caucasian populations are similar to west Asian groups; in particular, the Lebanese population can be placed among the Caucasian populations based upon Y-Chromosome pairwise FST values, in addition, the Abkhazian group from west Georgia lie between Iranian groups (Nasidze et al., 2004a). Meanwhile, religious groups of Lebanon present Y-haplogroup frequencies that may be attributed to the religious origins; haplogroup JM172 was most frequent among the Maronites (division of the Roman Catholic Church), J-M267 among Muslims and E-M35 among Greek Orthodox (Haber et al., 2010). The differentiation between the different groups however, has been observed to have been established before the adoption of the major religions within the region, but the subsequent religious adoption has reinforced isolation of these groups (Haber et al., 2010). The Ossetian groups of the Caucasus illustrate a north/south divide; the northern groups are more similar to one another, while the southern groups, which live along the southern slopes of the Caucasus Mountains, are more similar to other south Ossetian groups than they are to each other (Nasidze et al., 2004b). The most frequent haplogroup among the north Ossetians was haplogroup G, while haplogroup F was common within the south Ossetians. In addition, haplogroup E was exclusively found within the south Ossetian Y-Chromosomes (Nasidze et al., 2004b). The mtDNA analysis indicated a common origin, supported by the groups speaking Iranian-related languages, however the Y-Chromosome data suggests that any common paternal origin may have been lost based upon the haplogroup differences exhibited. Both north and south Ossetians are closely related to the neighbouring Caucasian groups but not so much to one another (Nasidze et al., 2004b), indicating greater paternal admixture from these Caucasian groups into the Ossetian population. The Kurdish populations from the Caucasus region are more closely related to west Asian and Caucasian groups, unlike the mtDNA data which revealed a closer affinity with Europeans (Nasidze et al., 2005a). The Kurmanji and Zazaki speakers from Turkey are very close to west Asian and Iranian groups, while the Kurmanji speakers from Georgia lie in an intermediate position between two north Caucasian groups; the north Ossetians from Ardon and Darginians (Nasidze et al., 2005a). The most frequent haplogroups displayed among the Kalmyks from south-western Russia are C, C3c, K and P. Haplogroup C can be commonly found within both Central Asian and Mongolian populations, but is absent among eastern Europeans. It has been touted as 41 the possible genetic lineage associated with Genghis Khan (McElreavey & QuintanaMurci, 2005). The C3c lineage is also found among the Mongolian population but also the Kazakhs (Nasidze et al., 2005b). Haplogroup K can often be found within the populations of this region (eastern Europeans, Central and East Asians) while haplogroup P is absent or present at low frequencies among these groups (Nasidze et al., 2005b). The common eastern European lineage N3 was found to be present in just one Kalmyk sample, indicating that despite their geographical position, the Kalmyks have not intermingled with their eastern European or Caucasian neighbours, thus substantiates the Mongolian ancestral claim (Nasidze et al., 2005b). Within western Russian groups, the haplogroup with the greatest frequency is R1a1. Other major haplogroups are I and N3; together, these three lineages contribute 73.8-93.9% of the Y-Chromosomes among the sub-populations (Fechner et al., 2008). Haplogroup R1a1 is commonly found in Eastern Europe and the Volga-Ural region; the frequencies are greater toward south-western Russia and the Caucasus. The R1a1 lineage is found at low frequencies within Western Europe (Fechner et al., 2008). Haplogroup N3 if often found in Eurasia, Northern and Eastern Europe, and the Volga-Ural region. Frequencies are greater in the Volga-Ural region than they are in Eastern Europe (Fechner et al., 2008). Generally, the European Russian Y-Chromosome composition is most similar to Eastern European and VolgaUral groups (Fechner et al., 2008). The Gagauz of Moldova can be grouped with west Asian populations such as the Lebanese, Syrians, Kurds and Iranians from Isfahan and then the south Caucasian groups, the Armenians and Azerbaijanis, rather than with Moldovans and Europeans, which are fairly close to one another, or with Turks and eastern Europeans (Nasidze et al., 2007). However, they do share a greater genetic relationship with their geographical neighbours than they do with the populations which share their linguistic heritage (Nasidze et al., 2007). The Gilaki and Mazandarani populations from the South Caspian region of Iran are, according to Y-chromosome analysis, similar to the Caucasian groups (Azerbaijan and Armenian populations) followed by west Asians, the region they now inhabit. Haplogroup J2 and R1 were both found in high frequencies among both populations, and both account for >50% of Gilaki and Mazandarani Y-Chromosomes (Nasidze et al., 2006). Both populations indicate a potential paternal origin in the south Caucasus region before migration and integration into the South Caspian region. In contrast, the mtDNA 42 data suggests a greater similarity with west Asians than with Caucasian and European groups, and therefore resemble the geographical and linguistic neighbours (Nasidze et al., 2006). The Afghan population represents great ethnic, linguistic and cultural diversity (Lacau et al., 2011). A recent study found a Greek contribution to the Pashtun ethnic group which neighbour the Pakistani border. The Most Recent Common Ancestor (MRCA) between a Pashtun and three Greek males ‘coincides with the time period’ in which Alexander the Great invaded and occupied Persia. This genetic link was identified via the E-M78 lineage (Lacau et al., 2011). Twenty-two haplotypes were ascertained among the Afghani population; eight were found in both northern and southern Afghanistan, while the remaining fourteen were exclusively found among southern Afghans (Lacau et al., 2011). The two regions are genetically distinct from one another; a possibility for this could be due to the Hindu Kush Mountain range serving as a natural barrier between the populations in northern and southern Afghanistan preventing any admixture between them (Lacau et al., 2011). Previous studies of the human Y-Chromosome have included the Pakistani population, which also included the Afghani Baluch, Hazara and Pashtun populations (Qamar et al., 2002) and also populations of Central Asia (Heyer et al., 2009). Both studies indicate that the populations share a common ancestry despite differences in ethnicity. The study of Pakistani populations (Qamar et al., 2002) identified that all populations exhibited a similar Y-haplogroup diversity which clustered with South Asian groups, while the study of Turkmenistan, Uzbekistan, Kazakhstan, Kyrgyzstan, and Tajikistan populations (Heyer et al., 2009) identified greater variation and diversity within the populations rather than among them. (Haber et al., 2012). Another recent study on the Y-STR data of four Afghani populations (Hazara, Pashtuns, Tajiks and Uzbeks) identified the presence of 32 Y-Chromosome haplogroups among these four ethnic groups (Haber et al., 2012). The west Eurasian lineage, haplogroup R1a1a-M17, was identified at greater frequencies among the Pashtuns (51.02%) and the Tajiks (30.36%) than among the Uzbeks (17.65%) and Hazaras (6.67%). Meanwhile, haplogroup C3-M217 exhibits an inverse pattern in that it is most abundant among the Hazaras (33.33%) and Uzbeks (41.18%) than the Tajiks (3.57%) and Pashtuns (2.04%) (Haber et al., 2012). Y-haplogroup C3c has been found in 8% of males across sixteen populations from northeast China to Uzbekistan has an MRCA dated to ~1000 years ago (95% CI ~700-1,300 years) in Mongolia (Zerjal et al., 2003). This date corresponds with the expansion of the Mongol dynasty of Genghis Khan. 43 1.4 Aims Afghanistan lies in a region of Central Asia that was once a cross-road for the major trade routes and migrations, and now currently exhibits a diversity of ethnic groups. The main aim of this study is to identify the composition and distribution of maternally inherited haplogroups via mtDNA of four Afghani ethnic groups. We also look to identify whether the beliefs of each ethnic group’s own origin is supported from the mtDNA analysis. Additionally, to determine whether the Afghani populations share similar characteristics with adjacent populations through the sequencing the HVS-I region and if the demographic processes have led to the emergence of any of these ethnic groups. 44 Chapter Two Afghanistan 45 2. Afghanistan The name Afghanistan is of Indo-Iranian origin, meaning ‘Land of Afghans’; ~istan originating from Persian meaning ‘country’, which itself derives from the Indo-Iranian ‘stanam’ to mean place or where one stands, and this word derives from Proto-IndoEuropean sta-no- meaning “to stand” (Harper, 2010). The name Afghan was initially used by the Pashtun ethnic group as a name for themselves, and was first noted in 1030 AD (Harper, 2010). The flag of Afghanistan consists of three vertical stripes (left-right) of black, red and green. In the centre of the flag, lies the national emblem of Afghanistan which features a mosque, surrounded by sheaves of wheat, and a scroll scribed with the word ‘Afghanistan’. Below the mosque are numerals for the solar year 1298 (year 1919 on the Gregorian calendar) to highlight Afghanistan’s independence from UK influence (CIA, 2010); while above is an Arabic inscription of the Shahada (Muslim creed), rays of the rising sun, and the Takbir, an Arabic expression meaning ‘God is great’ (CIA, 2010). 2.1 Geography The Islamic Republic of Afghanistan is the 41st largest country in the world, with an area of 652,230 km2 (CIA, 2010) and by comparison, is slightly larger than France. Afghanistan is a land-locked country situated in Central Asia, largely known for its constant involvement of both international and civil conflict. Afghanistan shares land borders with Iran to the west (~950 km), Turkmenistan (~750 km) Uzbekistan (<150 km) and Tajikistan (~1,200 km) to the north, Pakistan (~2,500 km) to the south and east, and a very small border with China (<80 km) in the far north-east (CIA, 2010). Its location allows the links to three major cultural and geographical regions; the Indian subcontinent to its southeast, Central Asia to its north and Iran to its west (Barfield, 2010). Afghanistan has thirty-four states or provinces (Figure 2.2), these can be found in Table 2.3 with their associated population estimates (Islamic Republic of Afghanistan Central Statistics Organization (CSO), 2010). The capital and largest city is Kabul, located in eastern Afghanistan, while other large cities include Kandahar (south), Herat (west), Mazar-e Sharif (north) and Jalalabad (east of Kabul). Afghanistan is generally split into three regions; the Central Highlands, the northern plains and the south-western plateau. The region of the Central Highlands incorporates the Hindu Kush Mountains and its sub-ranges. The landscape here is rugged with deep valleys between the high peaks. The northern plains contain the most fertile land in Afghanistan and as a result are the most agricultural region. Extending from the Iranian border in the west to the Pamir Mountains, this region covers approximately 46 100,000Km2. The northern plains are a densely populated region (“Afghanistan”, 2011) approximately 600 metres above sea level. The south-western plateau is “a region of high plateaus, sandy deserts and semi-deserts” (“Afghanistan”, 2011) covering some 130,000 Km2, ~900 metres above sea level. A semiarid region that includes the Registan and Margo deserts as well as the Helmand River. Figure 2.1: Political map of Afghanistan. Afghanistan has a varied landscape; dry plains in the north and also the south and southwest, while the Hindu Kush mountain range stretch across the land from the northeast toward the southwest, covering most of the country. The highest peak in Afghanistan lies within the Hindu Kush mountains, Noshaq, also Nowshak, rising to an impressive 7,485 metres (~24,500 feet), and is located near the north-eastern border with Pakistan. Noshaq, is the second largest peak within the Hindu Kush mountain range behind Tirich Mir (7,708 metres/~25,300 feet) which lies within the Chitral region of north-western Pakistan. The Hindu Kush are a sub-range mountain system belonging to the Himalayas (Lacau et al., 2011). The Hindu Kush, which stretches for approximately 600 miles, forms the western tip of the Pamir Mountains, Karakorum Mountains, and the 47 Himalayan mountain range. The height of the Hindu Kush Mountains decrease as they stretch westward across Afghanistan. In truth, the westernmost mountainous region of the Hindu Kush mountain system are not the Hindu Kush Mountains, but are a number of smaller ranges which extends out toward Herat (Barfield, 2010). These include the Koh-i Baba (west of Kabul), Koh-i Hisar (west of the Koh-I Baba), Safed Koh (Paropamisus), Siah Koh (both near Herat) and Chalap Dalan (southeast of Herat) are all sub-ranges of the Hindu Kush. The Torkestan Mountain sub-range extends northwest, while the Siah Koh extends northward and the Malmand and Khakbad southwest. Figure 2.2: 34 provinces of Afghanistan and also its location in Central Asia (ISAF; http://www.isaf.nato.int/map-usfora/index.php) and inset; the Afghanistan’s capital, Kabul, and the surrounding provinces. Afghanistan is home to several river systems, all of which spring in the mountains, but only one can ever reach the ocean; the Kabul River, a tributary of the Indus River. The Murghab River is not confined within Afghanistan’s borders, flowing from the Koh-i Hisar into south-eastern Turkmenistan. The Helmand River springs in the Koh-i Baba and flows north of Registan and through the Dasht-i Margo, and pooling into the Hamun Lakes. The Khash, Harut, and Farah rivers also join the Helmand River in pooling into 48 the Hamun Lakes, a group of three lakes (Hamun-e Helmand, Hamun-e Puzak and Hamun-e Sabari) in eastern Iran and south-western Afghanistan. These lakes are present seasonally and are salt-rich. Another significant lake is the Ab-i Istada, south of Ghazni, which the River Tarnak flows into. Some rivers in Afghanistan may only be present seasonally and do often dry out before reaching the basin of another river (Barfield, 2010). Afghanistan’s northern border can be identified by the flow of the Amu Darya River (formally the Oxus River) as it runs along the territory line for ~1,000 Km, separating Afghanistan from Tajikistan and Uzbekistan. The south of Afghanistan is largely desert land; Dasht-i Margo and the Registan desert. Dasht-i is the Persian/Pashto word for ‘plain’ or ‘desert’, while Margo translates as ‘death’ or ‘dead’, therefore the Dasht-i Margo is known as the desert of death. The land in this desert is primarily rocky-clay and sand mounds with salt marshes. The Dasht-i Margo lies approximately 900 metres above sea level within the Nimruz and Helmand provinces and spans an area ~150,000 Km2. The Registan desert lies within the Helmand and Kandahar provinces and is primarily a sand-base desert. 2.2 Climate Afghanistan has a varied climate and is unsurprisingly different from one region to another. Generally, Afghanistan has cold winters and very hot summers (CIA, 2010; “Weather & Climate in Afghanistan”, 2011; Petrov & Weinbaum, 2011). The mountains of the northeast have dry, cold winters, while the mountainous region near the Pakistan border receives some of the wetter weather systems contributed by monsoons on the Indian subcontinent. It is in this region in eastern Afghanistan where the most rainfall occurs, largely due to the positions of the mountains; and it is here that Afghanistan’s only natural forests can be found (Barfield, 2010). In the southwest, daytime temperatures can reach as high as 35°C, while in Jalalabad temperatures of 49°C have been recorded (Petrov & Weinbaum, 2011). In the mountains, January temperatures can be -15°C or below, while -24°C has been recorded in Kabul (Petrov & Weinbaum, 2011). During the period of June to October, Afghanistan receives very little rainfall, while most of the country’s precipitation occurs between December and April. Snow falls in the highlands from December to March. Afghanistan receives an average of 316mm of rainfall per annum; on average, September is the driest month of the year while March is the wettest (“Afghanistan Climate”, n.d.). Kabul has an average relative humidity of 56.4%; February is the most humid month (77%) while August is the least (33%) (“Afghanistan Climate”, n.d.) 49 2.3 Population Afghanistan has never had a completed census of the population with the first attempt in 1979 interrupted due to Soviet invasion and conflict, and as a consequence, obtaining accurate population data for the country proves to be arduous. A further census was scheduled in 2008; however, this was postponed for a further two years (Reuters, 2008; UN, 2008) and is now scheduled from 2011 through to 2013 with data to be collected and supplied one province at a time (UN Statistics Division, 2010). Recent estimates suggest the Afghan population be approximately 25 million (CSO, 2010), 28.1 million (BBC, 2010), 29.1 million (CIA, 2010; United Nations Population Fund (UNFPA), 2010) to a forecasted 37 million by 2015 (UN, 2002) and a life expectancy of 44 years for both men and women (BBC, 2010; CIA, 2010; UNFPA, 2010). Afghanistan is widely known to be an Islamic nation, with population estimates of Sunni Muslims to be ~80%, Shi’a Muslims ~19% and all other religions ~1% (CIA, 2010) including Buddhists, Hindus and Sikhs (Nielson, 2010). Table 2.1: Afghanistan population estimates every five years since 1950 (UNPD, 2009) Of the estimated population values, 51.79% are male (15,079,000) and 48.21% are female (14,038,000) (United Nations Population Division (UNPD), 2009) while the average population density of Afghanistan is 45 people per Km2 (UNPD, 2009). Based on the estimated population figures as seen in Table 2.1, the Afghan population has seen an overall population growth of 27.99% since 1950 (UNPD, 2009) and 2010 will see an estimated population growth of 3.45% (UNPD, 2009) or 2.47% (CIA, 2010). The population density of Afghanistan is shown in Figure 2.3. 50 Table 2.2: Population growth rate every 5 Years since 1950 based on the Estimated Population of Afghanistan (UNPD, 2009) Figure 2.3: The Population density of Afghanistan. 2.4 Ethnicity & Language Afghanistan is an ethnically diverse country with several different peoples inhabiting the varied landscape (Figure 2.4). Pashtuns inhabit large areas of Afghanistan, mostly in the 51 south, while Tajiks are common in the north and the Hazara in a more central region. Estimates of the population show a variety of ethnic groups: Pashtun (42%), Tajik (27%), Hazara (9%), Uzbek (9%), Aimaq (4%), Turkmen (3%), Baloch (2%) and the remaining 4% representing all other ethnicities (CIA, 2010). Alternatively, Pashtuns make up approximately 40% of the population; Tajiks ~25%; Hazara, ~20%; Uzbeks, ~5%; Aimaqs, ~5% and Turkmen <5% (Weinbaum, 2011). Despite the slight difference in estimates, both sources identify Pashtuns as the major ethnic group in Afghanistan, followed by the Tajiks, Hazara and Uzbeks. This difference can also be put down to the lack of official population statistical data which a national census would provide. As a consequence of ethnic diversity, it is not unexpected to find that there are also a diverse range of languages spoken in Afghanistan (Figure 2.5). Figure 2.4: Distribution of Afghan ethnic groups. Although the Pashtuns are the dominant ethnic group in Afghanistan, only ~35% of the population speak their language (CIA, 2010) while approximately 50% speak Dari, the Afghan dialect of Persian (CIA, 2010; Weinbaum, 2011). These Indo-European languages are both official languages of Afghanistan. Other languages spoken include Turkic languages - 11% (mostly Uzbek and Turkmen), Arabic, Indo-European languages and other variations of Persian - 4% (CIA, 2010; Lewis, 2009). Ethnic groups in Afghanistan are not generally defined by the language they speak, bilingualism is 52 widespread as non-Pashtuns will also speak Pashto, and Pashtuns may speak Dari (or a variant thereof) (CIA, 2010; Weinbaum, 2011). Most of the languages spoken in Afghanistan are Proto-Indo-European in origin (Lewis, 2009). Many of the languages spoken around the world are descendants of this language group such as the Germanic languages (German and English), the Romance languages (French, Italian, Latin and Spanish), Baltic and Slavic languages and the Indo-Aryan and Indo-Iranian languages, including Hindi and Farsi (Figure 2.6). Figure 2.5: Distribution of language groups spoken in Afghanistan (Retrieved: Nielson (2010). According to Lewis (2009) there are over thirty Indo-Iranian (sub-branch of IndoEuropean) languages (not including their dialects) spoken in Afghanistan. Most falling on the Indo-Aryan branch while the others on the Iranian branch. Nuristani and Pashayi are two languages on the Indo-Aryan branch that stand out, while on the Iranian branch Kurdish, Balochi, Pashto, Munji, Dari, Aimaq and Hazaragi are notable examples. There is only one Dravidian language spoken in Afghanistan; Brahui, in Kandahar province of south-eastern Afghanistan which borders Pakistan (Lewis, 2009). The Dravidian languages are typically found deep in India and regions east of India. Equally, there is one Afro-Asiatic language, Arabic, from the Semitic sub-branch spoken in Afghanistan and is spoken by small communities in northern Afghanistan (Lewis, 2009). The remaining languages belong to the Altaic group, including Uzbek, Turkmen and Kyrgyz from the Turkic branch; and Mogholi from the Mongolic branch. Mogholi is spoken by a small community near Herat (Lewis, 2009). 53 Figure 2.6: Indo-European Language Tree illustrating the Centum and Satem branches (Short, 2007) Indo-Iranian languages are heterogeneous (Fortson, 2009) due to their many dialects, but they stretch across a wide geographical area, often compartmentalised into four regions (Figure 2.7). These four regions are: the Dnieper-Ural region, the Ural-Yenisei region, the Central Asian zone and the Greater Iran-India zone (Mallory, 2003). 54 The Dnieper-Ural region saw the emergence of agricultural communities in the 5th millennium BC and the later introduction of wheeled vehicles (Mallory, 2003). (a) (b) (c) Figure 2.7: (a) The Iranian languages spoken in the Dnieper-Ural region; (b) The Ural-Yenisei region and the eastern Iranian languages spoken and the location of the Central Asian BMAC culture (shaded) (c) The locations of the Afanasevo (shaded) and Andronovo (outlined) cultures of Central Asia and the Iran-India zone in the south (Mallory, 2003). Some eastern Iranian languages can be traced back to this region, such as Ossetic, Scythian and Sarmation. The Ural-Yenisei region shared similarities with the Dnieper55 Ural region, both archaeological and cultural. The main culture here was the Andronovo complex (Mallory, 2003). The Central Asian zone was formed resulting from the emergence of communities during the Neolithic period. Also eastern Iranian language speakers, the presence of other Indo-Iranian traits can be found in Bactria-Margiana Archaeological Complex (BMAC) sites, such as the apparatus required pressing haoma, a leafless vine that produces a milky juice (Mallory, 2003). The Greater Iran-India zone refers to the territories that were occupied by the Indo-Iranian languages. Civilisations can be traced to indigenous or near-indigenous origins from the 7th millennium BC, despite the cultural diversity and the large area they inhabited (Mallory, 2003). The Indo-Iranian language branch has two main sub-branches; Indic (Indo-Aryan) and Iranian. An example of an early Iranian branch language is Avestan, while the Indic counterpart is Sanskrit. The Rig Veda, an ancient set of sacred Hindu scripts and hymns were written in Sanskrit (Fortson, 2009; Mallory, 2003). Approximately 8,000 years before present (YBP), the Elamite civilisation from the Fertile Crescent are thought to have spoken a Dravidian-family language which spread eastwards to the Indus Valley and Indian subcontinent concurrently with the agricultural movement (Quintana-Murci, et al., 2004). This language movement provides a rational justification for the presence of the Brahui language in south-eastern Afghanistan. Later, Andronovo or Srubnaya cultural nomads migrated into Iran and Afghanistan (~5,000 YBP), and probably brought the Indo-Iranian language branch which would subsequently displace the use of the Dravidian languages in Iran and the surrounding region (Quintana-Murci, et al., 2004). 2.5 Migrations Modern Afghanistan is full of tribal and sub-tribal communities; Afghan towns are centres of trade with pastoral and agricultural products from the more rural zones exchanged for manufactured goods that are more widely available in the urban towns and cities (Barfield, 2010). These towns and cities are often inhabited by multiple ethnicities, providing diverse local communities. Afghans are quite nomadic, particularly those living in the more remote regions; migrating, often seasonally, in search of work when opportunities are poor within their own regions. For example, it is not uncommon for a tribe member(s) to migrate in the winter season from the agricultural plains into towns before moving back prior to the new season. Even the most remote regions have links to their regional urbanised zones (Barfield, 2010). 56 2.6 Refugees Due to the ongoing conflict in Afghanistan, many Afghans do not feel safe residing in the country and many become refugees or seek asylum in the neighbouring countries. According to the UNHCR, the UN Refugee Agency, as of January 2010 (Figure 2.7), there were just under 2.9 million Afghan refugees, with 1.7 million in Pakistan and another 933,500 in Iran. In addition, there are also nearly 300,000 internally displaced Afghans and more than 30,000 asylum seekers. These figures are the official statistics, while the actual numbers of displaced Afghans is likely to be much higher as not all Afghans will go through the appropriate channels. Since 2002, the UNHCR have helped 4.5 million Afghan refugees reintegrate into Afghanistan via the UNHCR Shelter programme (UNHCR, 2011). Table 2.3: UNHCR statistics of displaced Afghanis as of January 2010 (UNHCR, 2011). Type of Displacement Number of Displaced Afghanis Refugees 2,887,123 Asylum Seekers 30,412 Returned Refugees 57,582 Internally Displaced Persons (IDPs) 297,129 Returned Internally Displaced Persons (IDPs) 7,225 Various 0 Total Population of Concern 3,279,471 2.7 Afghan Sub-Populations 2.7.1 Pashtuns The Pashtun peoples are considered to be ethnically Caucasian (“Afghans: Their History & Culture”, 2002) and are Sunni Muslims. They are located in south-eastern Afghanistan, but can also be found in north-western Pakistan and north-eastern Iran, and are an eastern Iranian ethno-linguistic group, and as such, speak Pashto (“Afghans: Their History & Culture”, 2002), an Indo-European language found on the Iranian sub-branch (Short, 2007). Their traditional homeland lies in an area east, south and southwest of Kabul (Weinbaum, 2011). They are not contained to one region of Afghanistan as they also inhabit northern and western (around and near Herat) regions (Weinbaum, 2011). 57 Figure 2.8: Pashtun people from Afghanistan Pashtuns practice a set of traditional cultural values, known as Pashtunwali, ethics which include “badal; the right to seek revenge, nunawati; the right to seek refuge and live in peace, melmastya; hospitality and protection to guests, tureh; bravery, sabats; steadfastness, isteqamat; persistence, imamdari; righteousness, ghayrat; the right to defend one’s property and honour, and mamus; the right to defend the female family members”. Some of these traits can probably be identified in the current ongoing conflict in Afghanistan, particularly as the Taliban are made up of Pashtuns (BBC, 2010). There are more Pashtuns in Afghanistan than any other ethnic group, approximately 38% of the total population (“Afghans: Their History & Culture”, 2002), and have been the dominant group since the 18th century (Barfield, 2010), perhaps represented by the fact that the President, Hamid Karzai, is also a Pashtun. The Pashtun origins are unknown, however, their existence is probably a consequence of intermingling of ancient and the subsequent invaders (“Afghans: Their History & Culture”, 2002) that have inhabited the lands the Pashtuns now live. However, the Pashtuns themselves trace their lineage to Qais (Barfield, 2010). Within the Pashtun ethnic group, there are four main Pashtundescendant groups (Barfield, 2010); i) the Durrani, who are descendants of Qais’s first son, found in the south and southwest, ii) the Ghilzais (the largest Pashtun group), descendants of Qais’s second son, but via his daughter, found in the east, iii) the Gurghusht, descendants of Qais’s third son and iv) the Karlanri, who are claimed to be the descendants of an adopted child of unknown/uncertain origin, these Pashtuns live along the Afghanistan-Pakistan border with the majority of the population falling on the Pakistani side (Barfield, 2010). While there are sub-divisions of the Pashtun ethnicity, there are also divisions of tribes within these of which families and communities belong. Pashtuns themselves do not only define themselves by their ethnicity, but also by speaking Pashto and practicing Pashtunwali (Barfield, 2010). 58 2.7.2 Tajiks The Tajik ethnic group are the largest of the Dari speaking peoples, inhabiting northern Afghanistan, across the border from Tajikistan, into regions of the Hindu Kush Mountains. They mostly inhabit Badakhshan province of north-eastern Afghanistan, although there are pockets of Tajik populations elsewhere (Weinbaum, 2011), within the Kabul and Herat regions for instance. Figure 2.9: Tajik people from Afghanistan Dari is a form of the Persian/Iranian language, and are generally defined as non-tribal Persian-speakers (Barfield, 2010; Weinbaum, 2011). They are Caucasian, and are morphologically similar to Iranians (“Afghans: Their History & Culture”, 2002). The Tajik population makes up for approximately 25% (“Afghans: Their History & Culture”, 2002) ~30% (Barfield, 2010) of the overall Afghan population and are mostly Sunni Muslims, while there are some Shi’a Muslims distributed within the remote mountain populations. The Tajik population mostly resides within the mountain ranges of the northeast; while there have also been significant populations within Kabul, Herat and Mazar-e Sharif. 2.7.3 Hazaras The Hazaras are also a Dari speaking groups (“Afghans: Their History & Culture”, 2002) and speak a dialect of Persian called Hazaragi (Farr, 2009; Barfield, 2010). Based on their language and religion (Shi’a Islam) the Hazaras were likely to have been contained by Persian/Iranian influence or rule. The name Hazara is believed to derive from the Persian hezar meaning thousand; perhaps a reference to a Mongol army unit (Farr, 2009). The Hazara are of Mongol descent, believed to have arrived in Afghanistan in the 13th 59 and 14th centuries (“Afghans: Their History & Culture”, 2002), sometime between 1229 and 1447 (Farr, 2009) and still sharing Mongol words in their modern-day vocabulary (Farr, 2009). Figure 2.10: Hazara people from Afghanistan They “represent the last remnants of the Mongol dynasties that came through Afghanistan in the early part of the 13th century” (Farr, 2009). Unlike other ethnic groups in Afghanistan, the Hazaras are all contained within the Afghan borders. A traditionally nomadic group and can be found within the mountains of Central Afghanistan, their home extends south to Ghazni and west towards Herat, a region known as Hazarajat (Farr, 2009; Barfield, 2010). Hazarajat, although well-positioned geographically in Afghanistan, is probably the most remote region due to a combination of poor communication links and networks and also its position within the high mountains of the Hindu Kush (Weinbaum, 2011). Despite this region is where most Hazara can be found, many have migrated elsewhere due to a lack of land (Weinbaum, 2011). The Hazaras constitute an approximate 19% of the Afghan population (“Afghans: Their History & Culture”, 2002) 15% (Barfield, 2010) being an estimated 2-3 million (Farr, 2009) or 5 million (“Afghans: Their History & Culture”, 2002) strong, while many ethnic leaders suggest the number is closer to 8 million (Farr, 2009). They are believed to be the descendants of Mongol armies that conquered Iran (Barfield, 2010), perhaps the descendants of Chagatai (a son of Genghis Khan and leader of the region in the early 13 th century (Farr, 2009)) soldiers who failed in their attempt to conquer the Indian subcontinent. In trying to do so, they migrated into the Hindu Kush, but never advanced from this position. The Hazaras themselves claim to be descendants of Genghis Khan or a close male relative (McElreavey & Quintana-Murci, 2005). The “presence of the YChromosomal Haplogroup C within the Hazaran population, and its absence from neighbouring populations is inferred as the genetic legacy of Genghis Khan” 60 (McElreavey & Quintana-Murci, 2005). The recent history of the Hazaras saw them placed at the bottom of the Afghan ethnic hierarchy, targeted for persecution by the Taliban (Barfield, 2010) and sold as slaves in the cities, however the Hazara slave trade saw their population proliferate. 2.7.4 Uzbeks The Uzbeks, unlike the other mentioned ethnic groups, do not speak an Indo-European language. They speak the Uzbek language, an Altaic language, a group of Turkic languages, that is similar to Turkish and completely different to the Iranian languages (“Afghans: Their History & Culture”, 2002). They are mostly Sunni Muslims (UNHCR, 2003; Barfield, 2010) and are ethnically Turkic (“Afghans: Their History & Culture”, 2002; UNHRC, 2003) that descend from nomadic tribes that arrived from Central Asia in waves (Barfield, 2010). Figure 2.11: Uzbek people from Afghanistan They arrived in Afghanistan in the 16th Century, settling in the irrigated valleys or Loess steppes and became farmers (Barfield, 2010). The Uzbeks are the largest of the Altaic groups (Weinbaum, 2011), with an estimated population of 1 million, approximately 6% of the total population (“Afghans: Their History & Culture”, 2002). They inhabit the area of northern Afghanistan, across the border from Uzbekistan, south of the Amu Darya (formally known as the Oxus) river; when the northern border of Afghanistan was altered; the Uzbek populations (as well as other Altaic groups) became, by definition, Afghans (“Afghans: Their History & Culture”, 2002; Barfield, 2010). The mostly inhabit the Balkh province and are generally farmers (Weinbaum, 2011). 61 2.7.5 Other Ethnic Groups As well as the mentioned ethnic groups, there are also several others inhabiting the Afghan lands; the Aimaqs (also Aimak), Baluch (also Beluch), Turkmens, and Nuristanis, to name but a few, to which when combined constitute approximately 12% of the total population (“Afghans: Their History & Culture”, 2002). Figure 2.12: Left: an Aimaq man from Afghanistan, Middle: a Baluch man from Afghanistan, Right: a Turkmen man from Afghanistan 2.7.6 Aimaqs The Aimaqs are tribal Central Asian peoples (“Afghans: Their History & Culture”, 2002) of Persian speakers (Dari). They are Sunni Muslims, believed to be of Turkish descent (Barfield, 2010; Weinbaum, 2011). There are approximately 500,000 Aimaqs (Barfield, 2010) who have historically inhabited the mountainous region east of Herat (Weinbaum, 2011) and west of Hazarajat (home of the Hazaras), but have also occupied some of the steppes and desert-lands north and east of Herat (Barfield, 2010). 2.7.7 Baluch The Baluch are often described as extensions of the Iranian and Pakistani populations (Barfield, 2010). There inhabit south-western Afghanistan (“Afghans: Their History & Culture”, 2002) in and around the sparsely populated Kandahar region (Weinbaum, 2011), and speak their own language; Baluchi, related to Persian (Barfield, 2010). Many Baluch also speak Pashto (the Pashtun language) as they live closely with the Pashtuns; often the distinguishing feature between the Baluch and the Pashtuns is not the spoken language or descent, but the political allegiance to the Baluch Khans (Barfield, 2010). They are pastoral nomads, known as smugglers linking Iran and India (Barfield, 2010). 2.7.8 Turkmens The Turkmens are an Altaic group (like the Uzbeks) and constitute for approximately 10% of population of Afghanistan (Uzbeks & Turkmen combined) (Barfield, 2010). 62 They are Sunni Muslims and are an extended population from Turkmenistan, where the majority reside (“Afghans: Their History & Culture”, 2002; Barfield, 2010). They inhabit the north-western region of Afghanistan, close to the Turkmenistan border, mostly in Balkh province with the Uzbek populations (Weinbaum, 2011). They speak a Turkish language; Turkmen (“Afghans: Their History & Culture”, 2002; Barfield, 2010) and are semi-nomadic, however, more nomadic than Uzbeks (Barfield, 2010; Weinbaum, 2011). 2.7.9 Nuristanis The Nuristanis live in the mountains northeast of Kabul (Barfield, 2010, Weinbaum, 2011), inhabiting isolated valleys within Nuristan province. They are more culturally and linguistic distinct than other ethnic groups in Afghanistan, only converting to Islam (Sunni Muslims) recently post-conquest (1895) in the 20th century (“Afghans: Their History & Culture”, 2002; Barfield, 2010). Their languages are unrelated to any others in Afghanistan, and are even different between individual tribes from separate valleys (Barfield, 2010). 2.8 Historical Influence on Afghanistan’s Population Afghanistan has been ruled by various empires throughout history, often controlled by foreign invaders (Barfield, 2010) and the combination of these empires have left a unique but complex ethnic, linguistic, tribal and cultural structure (Rasanayagam, 2003). Afghanistan has been invaded and conquered many times, in all likelihood for its location and accessibility to other, more prosperous regions such as India, Central Asia and other trade routes i.e. the Silk Route (Barfield, 2010). 2.8.1 Prehistory Following the emergence of modern humans from Africa approximately 60,000 YBP, the first region they would have encountered and settled in was the south-western region of Eurasia (Quintana-Murci, et al., 2004). The earliest evidence found of human settlement has been dated to ~30,000 BC (“Afghanistan Online”, 2008; Colorado State University/Department of Defense (CSU), 2010; Dupree & Dupree, 2011). A sculpted head was found at Aq Kupruk, this was dated to ~20,000 BC (CSU, 2010). The site of Aq Kupruk also uncovered evidence of Stone Age technology and culture dated at ~10,000 years old (~8,000 BC) (Jacobson, 1979; Bednarik, 2010; Dupree & Dupree, 2011). By the time of the latter, the domestication of plants and animals had commenced in the foothills of the Hindu Kush (CSU, 2010) making this region of northern 63 Afghanistan one of the earliest places where this occurred (CSU, 2010). The emergence of Neolithic settlements between 9,000-6,000 BC indicates the progressive expansion of the knowledge required to cultivate and rear domesticated plants and animals. By the late 4th millennium, plants were regularly used for cereals (Jacobson, 1979). There are two main theories proposed for the spread of agriculture; (i) the immigration of farmers with the required knowledge and technologies, as proposed by Gordon Childe (1925), and (ii) the acquisition of cultural traits by communities from passing non-indigenous migrators (Davison, 2006). Both forms permit the gradual spread of agricultural knowledge. 2.8.2 Aryan Migration As a consequence of the domestication of animals, in particular the horse ~4,500-4,000 BC in Iberia and the Eurasian Steppe (Jansen, 2002; Anthony, 2007) the ability to travel and migrate from one region to the next was revolutionised, enabling peoples to travel more quickly (Mallory, 2003) and expand geographically in different direction (Zvelebil, 1980). One of these migrating peoples were the Aryans, believed to be one of the early Proto-Indo-European speaking groups (Fortson, 2009). They arrived and settled in northern Afghanistan sometime around 2,000-1,500 BC (CSU, 2010), while some continued migrating and headed west settling in Iran and others south into India and modern-day Pakistan (CSU, 2010). Their arrival and the timely demise of contemporaneous established civilisations have ignited debates as to whether their movement was a migration or more of an invasion. It is quite possible that as a result of their expansive distribution, the Aryans were the main advocates of the Indo-European languages and promoting their proliferation via demic or cultural diffusion (Kumar, V, 2008), and therefore displacing the indigenous languages. 2.8.3 Persian Empire Afghanistan has been ruled/governed by the Persians on several occasions, and has been part of Persia for most of its history. The first were the Medes (7th century-550 BC), a nomadic tribe from Iran that became independent and due to their greater distribution, ruled from Afghanistan in the east to Iraq in the west (CSU, 2010). During their reign, the religion of Zoroastrianism was founded in Balkh. The Medes preceded the Achaemenid Empire, who united Iranians together. Ruling from ~550 BC - ~330 BC, their vast empire stretched from Libya, Egypt and Saudi Arabia in the south, Turkey (west), the Balkans and the Black Sea (north) to Afghanistan and Pakistan in the east. Afghanistan was split into satrapies; Arachosia (south; Kandahar), Aryana (west; Herat), 64 Bactria (northern Afghanistan and southern Uzbekistan, Tajikistan and Turkmenistan), Drangiana (south and southwest; Sistan) and Gandhara (northeast and northern Pakistan) (CSU, 2010; Dupree & Dupree, 2011). Darius the Great, the Achaemenid ruler, spread the religion of Zoroastrianism throughout Afghanistan and the Achaemenid Empire (“Afghans: Their History & Culture”, 2002; CSU, 2010), a religion that is still practiced today, albeit by far fewer individuals. 2.8.4 Greek Rule The Achaemenid Empire was ended abruptly by the formidable warrior, Alexander the Great. By 332 BC, he had conquered much of the Persian Empire, forcing the then Achaemenid ruler Darius III to flee from Persia as Alexander edged east. Darius would escape to Afghanistan and join with his ally Bessus, the satrapy ruler of Bactria. However Darius was murdered by Bessus, who would then proclaim to be King of the empire. After crushing Persia, Alexander invaded Afghanistan in 330 BC and quickly took Herat while chasing Bessus, who had secured himself within the mountains. He would eventually get his man, and would gain control of most of the Afghan satrapies (Dupree & Dupree, 2011). Alexander later attempted an invasion of India, but failed to conquer it, and would return to Persia. Alexander died a few years later in 323 BC in Babylon, leaving behind the Greek armies in Afghanistan (CSU, 2010; Dupree & Dupree, 2011). Alexander’s successor was Seleucus, one of his officers (CSU, 2010) who took control of Bactria, but controlled from Babylon (Dupree & Dupree, 2011). The Seleucid Empire developed a Hellenistic culture in Afghanistan. The Greeks would rule at least some part of Afghanistan for a few hundred years to come. Like Alexander before him, Seleucus invaded India and also failed, thwarted by the Mauryan Emperor Chandragupta. The Seleucids would offer southern Afghanistan (Achaemenid satrapies of Arachosia and Gandhara), in return to maintain control north of the Hindu Kush (CSU, 2010; Dupree & Dupree, 2011). The Mauryans introduced Buddhism to Afghanistan and would rule here from 304 to 180 BC (CSU, 2010). The next form of Greek rule would be the Graeco-Bactrian Empire from 250-125 BC, arising from the Seleucid dynasty (CSU, 2010). They established rule in Kabul, in the meantime forcing the Mauryans into Pakistan. An Iranian dynasty also arose from the Seleucids; the Parthians, who became independent from the Seleucids taking control of the Sistan and Kandahar regions (CSU, 2010). Their reign would stretch west to Syria (CSU, 2010). Later, the Parthians would join with the Scythians and become IndoScythians. 65 2.8.5 Yuezhi & the Kushan Empire A Central Asian group of nomads, called the Yuezhi, migrated into northern Afghanistan from western China. The Yuezhi united with other nomadic peoples from Central Asia (“Afghans: Their History & Culture”, 2002; Dupree & Dupree, 2011) i.e. Scythians, and forced the Greeks south into the Kabul valley. The Yuezhi would occupy Bactria for approximately 100 years before establishing the Kushan Empire in northern India. The Kushans expanded trade from China to Europe, using the Silk Route extensively; in addition, this route also initiated the spread of Buddhism into China (“Afghans: Their History & Culture”, 2002; Dupree & Dupree, 2011). 2.8.6 Arabs & Islam During the seventh century, the Arab movement swept through Iran, defeating the Sassanids at Nehavand in 642, and began entering Afghanistan (“Afghans: Their History & Culture”, 2002; Dupree & Dupree, 2011) but faced difficulties in their attempts; Herat was conquered in 652 AD while Kabul was finally taken in 664 (CSU, 2010). The slower advancement into Afghanistan and as a consequence the slower conversion to Islam, were likely due to a combination of the varied and often harsh terrains and the constant revolt by Afghan tribes (CSU, 2010). At the end of the 8th century, Arabs were governing the states of Herat, Samarkand, Kashgar and Sistan (CSU, 2010). 2.8.7 Mongol Dynasty Genghis Khan began the Mongol invasion of Afghanistan in 1219 from the east (CSU, 2010). His invasion only transpiring as a result of the Khwarezmian Empire’s extremely violent refusal to Khan’s proposal of alliance (CSU, 2010). As Genghis Khan swept through Afghanistan, as a signal of his intent and displeasure, many Afghan cities were not only demolished but completely depopulated and destroyed (CSU, 2010; Ali, Dupree & Dupree, 2011). Khan died in 1227 at the age of 65, but the Mongols occupied Afghanistan for a further 100 years, their kingdom divided into four Khanates; northern and eastern Afghanistan became part of the Chagadai Khanate, while southern and western Afghanistan became part of the Ilkhanate (CSU, 2010). 2.8.8 Modern Era Afghanistan finally gained its independence in the 18th century, led by Mirwais Khan (CSU, 2010; Ali, Dupree & Dupree, 2011); however, this was not to be the end of the invasions into Afghanistan. The newly formed Afghanistan would even invade Persia 66 and control the region for a short period from 1722-1725 (CSU, 2010). The Persians would come back and invade Afghanistan once more, and again would face revolt from the Afghan tribes (CSU, 2010), eventually Afghans claimed their land back in 1747. In 1805, the Persians attacked Herat, but this time could not find victory (CSU, 2010). Later during the 19th century, the British would attempt to gain control over Afghanistan. The first of the British-Afghan wars occurred in 1839 (“Afghans: Their History & Culture”, 2002; CSU, 2010). The Afghans, on this occasion, defeated Britain in 1842 maintaining control of their lands (“Afghans: Their History & Culture”, 2002; CSU, 2010). In 1859, Afghanistan would lose land to the British as they gained Balochistan, consequently, this made Afghanistan landlocked (CSU, 2010). The northern border was also a point of interest, this time with the Russians; coveting a border that was moved southwards. The second Anglo-Afghan war began in 1878, this time Britain would be more successful, gaining some of Afghanistan’s eastern states including Kurram, Khyber and Pishin (CSU, 2010). Britain withdrew from Afghanistan in 1880, but keeping the entitlement to control Afghanistan’s foreign affairs (CSU, 2010). Just 5 years later, Russia would move their border south taking Afghan lands north of the Oxus River (CSU, 2010). The third Anglo-Afghan war occurred in 1919; the British were defeated, relinquishing control over Afghan foreign affairs and Afghanistan would be independent again (CSU, 2010; Ali. Dupree & Dupree, 2011). In 1979, Afghanistan would be invaded again, this time by the Soviets, who would eventually be defeated and leave in 1989 (“Afghans: Their History & Culture”, 2002; CSU, 2010; Dupree, Dupree & Weinbaum, 2011). In the time that followed, Afghanistan fell into severe civil war; culminating in the emergence of the Taliban, who enforced their extreme views upon the Afghan population (CSU, 2010). Afghanistan would soon be invaded again, this time by the USA and its allies in response to the 2001 attacks in New York, the Pentagon building in Virginia and a fourth hijacked plane crashing in rural Pennsylvania, USA. The Taliban, were targeted for their failure to ‘give up’ the whereabouts of Osama Bin Laden, were soon removed from government and an interim government established until some sort of stability could be sustained and democratic elections to choose a President could take place. The current President of the Islamic Republic of Afghanistan is Hamid Karzai, who was officially elected into office on October 9th 2004 following a brief spell as the Chairman of the Interim Administration of Afghanistan (Office of the President, 2009). The US and its allies still have a strong military presence in Afghanistan today, attempting to quash the remaining and any resurgent Taliban enforcers before they leave entirely. 67 Chapter Three Materials and Methods 68 3. Materials & Methods 3.1 Materials See Appendix 1 for the list of materials used in this project. 3.2 Precautionary Measures When operating within the laboratory, protective clothing (laboratory coats and disposable gloves) were worn at all times. Additional measures were also taken when performing sensitive tasks or handling dangerous/harmful chemicals and equipment, such as the UV irradiation and cleaning with absolute ethanol of workspaces before and after procedures, the most sensitive techniques undertaken within a clean cabinet, and the use of a fume hood for extremely dangerous chemicals i.e. powdered ethidium bromide. All procedures were COSHH assessed and performed appropriately. 3.3 Sample Collection The DNA samples were collected from three refugee camps all located in Khorasan province (Figure 3.1) in north-eastern Iran, near the cities of Mashhad, Bojnurd and Birjand. The samples were collected by researchers from Mashhad University of Medical Sciences; Mashhad, Iran. Ethical consent was provided by all participants and the research was approved following a full ethic review at the Mashhad University of Medical Sciences, examples of the forms used can be found in Appendix 2. All participants had at least three generations of ancestry in their country of birth and had provided details of their geographical origin. Samples in the form of blood (8.5ml) were collected using PAXgene Blood DNA Tubes (Qiagen). These tubes contain a reagent mix that stabilises and preserves the blood (and cells), preventing coagulation. 3.4 DNA Isolation This procedure was also performed by the researchers from Mashhad University of Medical Sciences; Mashhad, Iran. DNAs were extracted and isolated from the blood using PAXgene Blood DNA Kits (Qiagen). The blood samples are transferred into processing tubes already containing lysis buffer, mixed and inverted to lyse the erythrocytes and white blood cells. Following centrifugation, nuclei and mitochondria are pelleted, and these washed and resuspended in a digestion buffer. The protein contaminants are removed by incubation with a protease. The DNA is precipitated in isopropanol, washed with 70% ethanol and dried before resuspension in resuspension 69 buffer (Qiagen, 2009 “PAXgene Blood DNA Kit”). See Appendix 1 for the protocol for this procedure. Figure 3.1: Map of Iran (www.geology.com/world/iran-satellite-image.shtml) and inset; Khorasan province and the the refugee camps (circled) near the cities of Mashad, Bojnurd and Birjand. 3.5 PCR Polymerase Chain Reaction (PCR) is a process, developed by Kary Mullis in 1984, which enables the amplification of small quantities of DNA (Bartlett & Stirling, 2003). Each reaction requires a specific set of oligonucleotides (primers), Taq polymerase, dNTPs, a magnesium cofactor (MgCl2) and a stabilising buffer. During PCR, DNAs undergo a series of cycles which (i) denatures the DNA; separating one strand from another and therefore exposing the nucleotides, (ii) allow the annealing of 70 oligonucleotides to specific sequence of exposed nucleotides which flank the DNA segment of interest, and (iii) the extension of the fragment of DNA using free nucleotides (dNTPs). In order to prevent contamination, each PCR reaction was prepared in an area separated from all other areas where any other procedure would take place. PCRs were set up in a Captair Bio PCR UV Cabinet. Before any work was carried out, pipettes and relevant consumables (pipettor tips, microcentrifuge tubes and PCR tubes) were autoclaved and then exposed to constant UV irradiation for 30 minutes. Once the PCR set-up was completed and the samples transferred to the thermocycler, the cabinet was emptied and UV irradiated for 30 minutes again. Samples were amplified using a twelve primer pair set (Table 3.2) consisting of nine overlapping primer pairs and three internal primer pairs (Torroni et al., 1997) and a fifteen (Table 3.3) primer pair set (Kong et al., 2003; Palanichamy et al., 2004) specific to mtDNA. The contents of each reaction tube consist of PCR-grade H2O, template mtDNA and a mastermix. The volumes used and final concentrations within each reaction tube for the amplification of all primer pairs are found below (Table 3.1). Table 3.1: Volumes and final concentrations of mastermix reagents for polymerase chain reaction amplification of mtDNAs. Reagent Volume Added (µl) Final Concentration GeneCraft 10x PCR Buffer 2.5 1x & 1.5mMΔ AB dNTP mix 10mM 1 0.4mM Forward Primer (10µM) 0.5 0.2µM Reverse Primer (10µM) 0.5 0.2µM GeneCraft BioTherm DNA 0.1 0.5 unit 4.6 - BioTherm with 15mM MgCl2 Polymerase (5units/ µl) Total volume (µl) Δ final concentration of 1.0mM MgCl2 required for application of primer pair 4 (Torroni et al., 1997). The Taq polymerase used is BioTherm Taq polymerase supplied by Genecraft and is a concentration of 5 units/µl. The storage buffer for the enzyme contains 10mM KPhosphate buffer pH 7.0, 100mM NaCl, 0.5mM EDTA, 1mM DTT, 0.01% Tween 20 and 50% Glycerol (v/v). The reaction buffer supplied to support the BioTherm activity consists of 160mM (NH4)2SO4, 670mM Tris-HCl pH 8.8 (at 25°C), 15mM MgCl2 and 0.1% Tween 20. 71 72 73 Table 3.4: Thermocycler conditions for the primer pairs as described by Torroni et al. (1997) Table 3.5: Thermocycler conditions for the Palanichamy et al. (2004) primer pairs 3.6 Agarose Gel Electrophoresis This technique enables the separation of DNA molecules based on their size. The gel is of a sieve-like nature enabling the negatively charged DNA fragments to migrate, due to 74 electrical current, toward the positive electrode (Sambrook, 2001). Smaller fragments of DNA and RNA will migrate further/faster through the gel than larger fragments. Prior to its use, the electrophoresis equipment was washed with a detergent, rinsed with dH2O and then absolute ethanol, before being left to dry. The gel casting plate and comb(s) were assembled and water-tight seal made to prevent any leaking of the liquid gel. The gel itself was prepared by heating in a microwave til the agarose-TBE solution became molten. Ethidium bromide (10mg/mL) is added to the molten gel and mixed, avoiding the formation of bubbles and to produce a uniformly stained gel, before being poured into the gel casting plate. Once set, the water-tight seal is removed and 1x TBE buffer is poured into the electrophoresis tank until the gel is submerged. The electrodes are connected to the powerpack and the gel run at 200 volts for ~30 minutes. Following this, the gel is removed from the electrophoresis tank and placed into a UV transilluminator for the visualisation of the DNA bands and photographed. 3.7 Glycogen Precipitation of DNA (PCR Products) One microliter (1µl) of glycogen (Sigma Aldrich) solution (20µg/µl) is added to PCR products followed by 2-3 volumes of absolute ethanol. The mixture is transferred to a 1.5ml centrifuge tube and incubated at -20°C for a minimum of one hour. Following incubation, the samples are centrifuged for 20 minutes holding at a temperature of 4°C. The supernatant is aspirated without disturbing the DNA pellet, and then washed with 200µl 70% ethanol before centrifugation at 4°C for a further 5 minutes. The supernatant is aspirated, again without disturbing the pellet, followed by drying in a vacuum for 6 minutes. The pellet is resuspended in 20µl dH2O. 3.8 Purification of PCR Products Some samples were purified not by precipitation but by spin column, the purification protocol used was obtained from Macherey-Nagel. Two volumes for Buffer NT were added to the PCR product (i.e. 40µl Buffer NT added to 20µl PCR product) and this mixture transferred to a spin column, which sits inside a 2ml collection tube. The sample then centrifuged for 1 minute at 11,000 rpm and the flow-through discarded. Six hundred microliters (600µl) Buffer NT3 added to the spin column in order to wash the silica membrane, and the tube centrifuged at 11,000 rpm again for 1 minute. The flow-through was discarded again. The tubes were centrifuged for a further 2 minutes at 11,000 rpm to remove any remaining Buffer NT3 and dry the silica membrane. The 2ml collection tube itself was discarded this time and the spin column placed into a new 1.5ml centrifuge 75 tube. Twenty microliters (20µl) dH2O added into the spin column and the tube again centrifuged this time for 1 minute at 11,000 rpm and the DNA eluted. The spin column was then discarded and the centrifuge tube lid shut. 3.9 DNA Extraction of PCR Products from Agarose Gels On occasions, some samples required to be extracted from an agarose gel following electrophoresis. The Macherey-Nagel DNA extraction protocol was followed to undertake this task. The required DNA band(s) were cut, using a clean scalpel, and removed from the agarose gel and each individual samples placed into a separate 1.5ml centrifuge tube. The bands were removed from the gel with the aid of a UV transilluminator. Two hundred microliters (200µl) Buffer NT was added to the centrifuge tube and each tube then placed into a heatblock set at a constant temperature of 50°C. Every 2½ minutes, the tubes were removed from the heatblock and vortexed briefly before returning to incubate further. The incubation and vortexing of the sample(s) continued until the gel inside the tubes had completely dissolved. A NucleoSpin® Extract II Column was placed into a 2ml collection tube. The sample from the centrifuge tube was transferred into the spin column and centrifuged at 11,000 rpm for 1 minute, this to bind the DNA to the silica membrane. The flow-through inside the collection tube was discarded. Seven hundred microliters (700µl) Buffer NT3 was then added to the spin column, washing the DNA/Silica membrane, for 1 minute at 11,000 rpm. The flow-through in the collection tube was discarded once more. The sample was then centrifuged for 2 minutes at 11,000 rpm to remove any remaining Buffer NT3 and dry the silica membrane. The collection tube this time was discarded and the spin column placed into a new 1.5ml centrifuge tube. Twenty microliters (20µl) Buffer NE added to the spin column, to elute the DNA, and the sample centrifuged again for 1 minute at 11,000 rpm. 3.10 RFLP Analysis Restriction Fragment Length Polymorphism (RFLP) analysis is a technique used to determine the sites at which DNA has been cleaved by a restriction endonuclease into linear form (if plasmid) or into two or more fragments. Here, the hierarchical method has been used; targeting the haplogroup defining SNPs with specific restriction endonucleases (Tambets et al. 2004; Quintana-Murci et al. 2004). 76 Each digest contains reaction buffer, enzyme and dH2O and in some cases BSA (BstNI, HaeII, HhaI, HincII, MnlI, MseI and NlaIII) in a reaction volume of 20µl (Table 3.6). Samples are incubated at 37°C for 2 hours. Table 3.6: Reaction mixes for the restriction digests with and without BSA Reagent Volume Final Added (µl) Concentration 2 1x 10x NE Reaction Reagent Volume Final Added (µl) Concentration 2 1x 10x NE Reaction Buffer (1, 2, 3, 4) Buffer (1, 2, 3, 4) Enzyme 1 5 units Enzyme 1 5 units dH2O 5 - 10x BSA 2 1x Purified PCR 12 - Purified PCR 12 - dH2O 3 - Total Volume 20 - Product Product Total Volume 3.11 20 - DNA Sequencing 3.11.1 Haplogroup Identification Some DNA samples required analysis beyond RFLP investigation; those that had exhausted the RFLP analysis route. These mtDNAs were amplified and sequenced using the oligonucleotides L6337/H7406, L8215/H8345, L8215/H8861, L9794/H10356 and L11718/12361 as described above. Amplified mitochondrial DNA segments were sent for commercial sequencing (GATC Biotech Ltd, London). 3.11.2 Hypervariable Region I The mtDNA samples assigned to a haplogroup underwent amplification and sequencing using oligonucleotides (Table 3.7) of the hypervariable segment I (HVS-I) region in mitochondrial DNA. Table 3.7: Co-ordinates and sequences of the forward and reverse oligonucleotides and the fragment size generated for HVS-I analysis. Forward Sequence 5’-3’ Reverse Sequence 5’-3’ Fragment Size (bp) TCAAAGCTTACACCAGTCTTGTAAACC CCTGAAGTAGGAACCAGATG 590 (15908-15926) (16517-16498) 77 Polymerase chain reaction consists of 35 cycles of denaturation at 95°C for thirty seconds, annealing at 55°C for thirty seconds and elongation at 72°C for one minute, followed by a final elongation stage at 72°C for five minutes. Once amplified and purified, using the glycogen precipitation method (see section 3.7 above), the mtDNA segments were sent for commercial sequencing (GATC Biotech Ltd, London). 78 Chapter Four Results 79 4. Results 4.1 PCR Amplifications All mtDNAs were amplified using a series of primer pairs ranging from ~300bp-~2.5Kb (Figure 4.1) and ~0.92Kb-~1.63Kb (Figure 4.2). The fragment sizes generated following PCR can be found in the previous chapter (Table 3.2 & 3.3). M 2 3 4 5 6 7 8 9 10 11 12 13 M 3 Kb 2 Kb 1.5 Kb 1.2 Kb 1000 bp 900 bp 800 bp 700 bp 600 bp 500 bp 400 bp 300 bp 200 bp 100 bp Figure 4.1: Amplification of nine overlapping primer pairs and three internal primer pairs as described in Torroni et al. (1997). M = DNA Ladder; 2-Log DNA Ladder 0.1-10.0 Kb (New England Biolabs). Lane 2 = primer pair 1, Lane 3 = primer pair 2, Lane 4 = primer pair 3, Lane 5 = primer pair 4, Lane 6 = primer pair 5, Lane 7 = primer pair 6, Lane 8 = primer pair 7, Lane 9 = primer pair 8, Lane 10 = primer pair 9, Lane 11 = primer pair 10, Lane 12 = primer pair 11, Lane 13 = primer pair 12. 80 M 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Figure 4.2: Amplification of the fifteen overlapping primer pairs as described in Palanichamy et al. (2004). M = DNA Ladder, as described above. Lane 2 = primer pair 1, Lane 3 = primer pair 2, Lane 4 = primer pair 3, Lane 5 = primer pair 4, Lane 6 = primer pair 5, Lane 7 = primer pair 6, Lane 8 = primer pair 7, Lane 9 = primer pair 8, Lane 10 = primer pair 9, Lane 11 = primer pair 10, Lane 12 = primer pair 11, Lane 13 = primer pair 12, Lane 14 = primer pair 13, Lane 15 = primer pair 14, Lane 16 = primer pair 15. 81 Table 4.1: Size of DNA fragments for each haplogroup characterisation from RFLP analysis; a denotes primer pairs from Torroni et al., (1997), b denotes primer pairs from Palanichamy et al., (2004). Haplogroup PCR (Figures) Amplification Enzyme a L3 Primer Pair 10 M b Characteristic SNP np. HpaI 3594 DNA Fragments Generated Positive Negative Additional Fragments Sample Sample (approximate size) 610bp 487bp & N/A 123bp Primer Pair 9 AluI 10398 & 10400 201bp & 366bp ~590bp, ~470bp & 80bp 301bp ~230bp, ~110bp, ~100bp, 165bp N Primer Pair 10 b MnlI 10873 243bp & 58bp ~60bp, ~50bp, ~30bp, ~20bp & ~10bp b C Primer Pair 12 D b HincII 13263 1,229bp 853bp & ~100bp & ~50bp 375bp Primer Pair 5 b E Primer Pair 12 G b AluI HphI 5178 13626 594bp 239bp 408bp & ~130bp, ~80bp, ~60bp, ~50bp, 186bp ~40bp & ~30bp 168bp & ~650bp, ~270bp & ~220bp 71bp Primer Pair 4 HhaI 4833 301bp & 585bp ~830bp & ~80bp 420bp ~280bp, ~230bp, ~200bp, 284bp R Primer Pair 12b MboII 12705 401bp & 19bp A Primer Pair 1 b HaeIII 663 800bp & ~160bp & ~80bp 1,090bp ~330bp 718bp & N/A 290bp I Primer Pair 3 a HaeII 4529 1,378bp 660bp Y Primer Pair 8 b Primer Pair 14 b H Primer Pair 17 b V Primer Pair 3 a Primer Pair 3 a Primer Pair 3 a HV HaeIII MseI AluI NlaIII 8392 14766 7028 4580 322bp 228bp 188bp 1,025bp 181bp & ~360bp, ~270bp, ~200bp, 141bp ~160bp & ~30bp 211bp & ~670bp, ~450bp, ~60bp, ~40bp, 17bp ~10bp & ~5bp 158bp & ~420bp, ~370bp, ~170bp & 30bp ~20bp 738bp & ~320bp, ~30bp & ~10bp 287bp TJ NlaIII 4216 377bp & 738bp 361bp T BfaI 4917 279bp & ~320bp, ~290bp, ~30bp & ~10bp 318bp 39bp ~240bp, ~200bp, ~170bp, ~130bp, ~110bp, ~90bp, ~60bp, ~50bp & ~15bp J Primer Pair 13 b Primer Pair 11 b BstNI 13078 1,018bp 905bp & N/A 113bp Uk-group HinfI 12308 316bp & 137bp 82 453bp ~550bp, ~220bp, ~80bp & ~10bp Table 4.2: Recognition sequences and cut sites of the enzymes used for the haplogroup assignment of samples. N = any base (A, C, G or T), R = either A or G, W = either A or T, Y = either C or T. Enzyme AluI BfaI BstNI HaeII HaeIII HhaI HincII HinfI HpaI HphI MboII MnlI MseI NlaIII Recognition Sequence & Cut Site (arrow) AGCT CTAG CCWGG RGCGCY GGCC GCGC GTYRAC GANTC GTTAAC GGTGA(N)8 GAAGA(N)8 CCTC(N)7 TTAA CATG For Haplogroup Assay M, D, H T J I A, Y G C Uk-group L3 E R N HV V, TJ 4.2 Haplogroup Characterisations using RFLP Analyses The initial analysis required DNAs to be amplified using primer pair 10 (Torroni et al., 1997) and incubated with the restriction endonuclease HpaI for the assignment of Haplogroup L3 (Figures 4.3 & 4.4). Mitochondrial DNAs which are positive for this haplogroup (and all other downstream haplogroups) will retain the 610bp fragment from amplification while those which are negative will produce two fragments; one 487bp and another 123bp. No other fragments are generated from this assay. Figure 4.3: 2% agarose gel of HpaI restriction digests for Haplogroup L3 characterisation. All samples retain the 610bp amplified fragment. M = DNA Ladder as previously mentioned. 83 Figure 4.4: 2% agarose gel of HpaI restriction digests on Afghan DNAs. All samples have the 610bp fragment, sample ‘PC’ is a positive control and has two fragments, 487bp & 123bp in size; M = DNA Ladder as previously described. 84 Following the analysis for Haplogroup L3, samples will be assessed for Haplogroup M (Figures 4.5-7) using the enzyme AluI. DNAs exhibiting the polymorphisms at nps 10398 and 10400, are positive for Haplogroup M, and will generate fragments of 201bp and 165bp while those that do not will engender a fragment of 366bp. In addition to these diagnostic fragments, other fragments of ~590bp, ~470bp and ~80bp will also be generated. Figure 4.5: 2% agarose gel of AluI digests for Haplogroup M characterisation. ‘P’ denotes the PCR product while ‘D’ denotes the restriction digest, M = DNA Ladder as previously described. Samples D1, D3, D4 & D6 each have a 366bp fragment, samples D2 & D5 have both a 201bp & 165bp fragment. Additional fragments of ~590bp, ~470bp and 80bp are also present. 85 Figure 4.6: 2% agarose gel of primer pair 9 (Palanichamy et al., 2004) PCR products and AluI digests for Haplogroup M assignment. Samples D7, D8, D10, D11 & D12 each have the 366bp fragment while sample D9 has the 201bp & 165bp fragments. All possess the additional fragments; ~590bp, ~470bp & 80bp in P13 D13 P14 D14 M size.P16 D16 P15 D15 M P17 D17 P18 D18 Figure 4.7: 2% agarose gel of PCR products and restriction digests for analysis of the Haplogroup M characteristic. M = DNA Ladder. Samples D13, D14 & D15 have the 366bp fragment; samples D16, D17 & D18 have the 201bp & 165bp fragments. 86 The samples which have been identified as negative for Haplogroup M, by possessing a 366bp fragment, will be examined for the polymorphism at np 10873 which is characteristic of haplogroup N (Figures 4.8 & 4.9). Samples belonging to Haplogroup N, or its downstream haplogroups, will produce a fragment of 243bp following digestion with MnlI, while samples which do not bear the polymorphism will have a 301bp fragment instead and will be classified as Haplogroup L3. Additional fragments generated, which do not determine haplogroup designation, from this assay include ~230bp, ~110bp, ~100bp and <100bp fragments of ~60bp, ~50bp, ~30bp, ~20bp and ~10bp. Figure 4.8: 2% agarose gel of PCR products (P) and digests (D) with MnlI for Haplogroup N characterisation. PCR products are ~1.2Kb in size, sample D2 consists of a band at 301bp, samples D1, D3 & D4 have the 243bp band. Additional fragments of ~230bp, ~110bp and ~100bp are also visible. 87 Figure 4.9: 2% agarose gel of Haplogroup N characterisation; PCR products amplified with primer pair 10 (Palanichamy et al., 2004) and digests with MnlI. Samples D5, D7 & D8 have a 243bp fragment while sample D6 has a band at 301bp instead. Fragments of ~230bp, ~110bp and ~100bp in size are also observed. 88 The samples which have been acknowledged for being positive for Haplogroup M are then examined for the characteristics which define the Haplogroups C, D, E and G. The assay for Haplogroup C (Figures 4.10 & 11) requires samples to be analysed with the endonuclease HincII which will produce a fragment of ~1.23Kb (1,229bp) (if the sample bears the ‘C’ polymorphism) or two fragments of 853bp and 375bp in size if they do not. Additional fragments include one ~100bp and one ~50bp in size. Figure 4.10: 2% agarose gel of PCR products (P) and digest products (D) following incubation with HincII for Haplogroup C characterisation. Sample D1 has bands at 853bp & 375bp, sample D2 has a single band ~1.23Kb (1,229bp) in size; M = DNA ladder. The additional fragment of ~100bp is observed. 89 Figure 4.11: 2% agarose gel of PCR products and digest products following incubation with HincII for Haplogroup C characterisation; samples D3 & D6 have a band of ~1.23Kb (1,229bp) and samples D4, D5 & D7 have bands of 853bp & 375bp. An additional band of ~100bp is observed in each digest. 90 Those samples which did not generate the ~1.23Kb (1,229bp) fragment distinctive of Haplogroup C, are then investigated for the characteristic representative of Haplogroup D (Figure 4.12) using AluI. Sample will either generate a band of 594bp if positive or bands of 408bp and 186bp if they are not. The additional fragments of ~130bp, ~80bp, ~60bp, ~50bp, ~40bp and ~30bp are also produced. Figure 4.12: 2% agarose gel of PCR product (P) and cleaved DNA products (D) following incubation with the endonuclease AluI for Haplogroup D characterisation. Samples D1 & D7 have a 594bp band; samples D2, D3, D4, D5 & D6 has bands of 408bp & 186bp. The generic ~130bp band from this assay is also present in each digest. 91 The restriction endonuclease HphI is used to classify samples into Haplogroup E (Figure 4.13) based upon the presence or absence of a polymorphism at np 13626. Fragments of 168bp and 71bp in size signify the absence of the characteristic while a 239bp fragment indicates its presence and the sample belonging to the haplogroup. Additional fragments generated include one ~650bp, ~270bp and ~220bp. Figure 4.13: 2% agarose gel of PCR amplifications (P) and endonuclease digestions of these amplifications (D) with HphI for Haplogroup E classification. No digests exhibit the characteristic 239bp band, and all possess bands of 168bp & 71bp. The additional fragments of ~650bp, ~270bp and ~220bp are also observed in each digest. 92 All those samples which have not yet been assigned to haplogroups C, D or E, are then assessed for the feature specific for Haplogroup G (Figure 4.14). The endonuclease HhaI is used to identify the presence or absence of this characteristic at np. 4833. Positive samples will have bands of 301bp and 284bp present while negative samples will have a 585bp band instead. This assay also produces non-characteristic fragments of ~830bp and ~80bp. Figure 4.14: 2% agarose gel of PCR products (P) and digested amplifications (D) with the endonuclease HhaI for Haplogroup G assignment. Only sample D5 has bands at 301bp & 284bp while the other digests have a 585bp band. The additional fragment of ~830bp is present among all digests. 93 The samples which have been recognised as having the characteristic polymorphism of Haplogroup N, are now examined for the distinguishing polymorphism for Haplogroup R (Figures 4.15-17) at np. 12705 using the endonuclease MboII. From the resulting incubation, samples will either possess a 401bp (positive for Haplogroup R), or a 420bp sized band (negative). To determine the difference between the two bands, a 20bp DNA Ladder was used to accurately identify the band sizes. This assay does generate additional fragments of ~280bp, ~230bp, ~200bp, ~160bp and ~80bp in size. Figure 4.15: 2.5% agarose gel of DNAs digested with the endonuclease MboII for Haplogroup R characterisation. ‘L’ denotes the 20bp DNA ladder (20bp-1Kb). Samples 1 & 3 have a 401bp band, sample 2 has a 420bp band. Each digest also possesses the standard ~280bp, ~230bp, ~200bp and ~160bp bands generated from this assay. 94 Figure 4.16: 2.5% agarose gel of amplified DNAs digested with the endonuclease MboII for the assignment of Haplogroup R. L = 20bp DNA ladder. Samples 4, 6, 7 & 8 have a 401bp band, sample 5 has a 420bp band. The additional ~280bp, ~230bp, ~200bp and ~160bp fragments are also present. 95 Figure 4.17: 2.5% agarose gel of amplified DNAs digested with MboII for characterisation of Haplogroup R. Samples 9 & 12 have a 420bp band, while samples 10, 11 & 13 have a band 401bp in size. The additional bands (~280bp, ~230bp, ~200bp and ~160bp) are also present. 96 The samples which possessed the 420bp band and are therefore negative for Haplogroup R, are then investigated for the characteristics for Haplogroups A, I and Y. Figure 4.18 illustrates the outcome from samples that were analysed with the endonuclease AluIII for the assignment of Haplogroup A (Figure 4.18), which generates either 800bp and 290bp bands if the sample is positive, or a ~1.1Kb (1,090bp) band if negative. One extra fragment is generated from the assay with this restriction enzyme, ~330bp in size. Figure 4.18: 2% agarose gel of amplified DNAs processed by the endonuclease HaeIII for Haplogroup A classification. Samples 1, 7 & 8 have bands 800bp & 290bp in size, while samples 2, 3, 4, 5 & 6 have a band sized at ~1.1Kb (1,090bp). All samples also possess the standard ~330bp band from this assay; M = DNA ladder (100bp-10.0Kb). 97 Haplogroup I (Figure 4.19) is characterised by a polymorphism at np. 4529; samples will either generate bands of ~718bp and 660bp in size if they are negative or alternatively will retain the amplified fragment of ~1.4Kb (1,378bp) in size. There are no other fragments generated. Figure 4.19: 2% agarose gel of RFLP analysis using HaeII on amplified DNAs for Haplogroup I classification. All samples have bands of 718bp & 660bp in size; M = DNA ladder. 98 Samples are incubated with HaeIII, this time to characterise for Haplogroup Y (Figure 4.20). Samples which exhibit the Haplogroup Y polymorphism will produce a band of 322bp, while samples which do not will generate bands of 181bp and 141bp. The additional fragments produced from the restriction enzyme activity are sized at ~360bp, ~250bp, ~200bp, ~160bp and ~30bp. A 50bp DNA ladder has been used here to differentiate between the characteristic bands and other bands that will present. Figure 4.20: 2% agarose gel of DNAs digested with the endonuclease HaeIII for Haplogroup Y assignment. All samples have bands of 181bp & 141bp and possess the additional bands of ~360bp, ~250bp, ~200bp and ~160bp in size; DL = 50bp DNA ladder (50bp-650bp). 99 The samples which have been recognised as being positive for Haplogroup R, are then analysed for the polymorphism representative of Haplogroup HV (Figure 4.21) at np. 14766 using MseI. Samples that are positive will not be cleaved at the polymorphic site and will generate a 228bp band, while negative samples will produce a 211bp band instead. Analogously to the assay for Haplogroup R, a 20bp DNA ladder was used to accurately determine the band sizes. In addition to these fragments which determine whether a sample belongs to haplogroup HV or not, bands of ~670bp, ~450bp, ~60bp, ~40bp, ~10bp and 5bp will be generated. Figure 4.21: 2.5% agarose gel of DNAs following incubation with the endonuclease MseI for Haplogroup HV characterisation. Samples 1, 2, 3, 4, 5 & 8 have a 211bp band, samples 6, 7, 9 & 10 have a 228bp band; L = 20bp DNA ladder. The samples which have been identified as being positive for Haplogroup HV are then analysed for Haplogroup H and V. The polymorphism at np. 7028 is representative of Haplogroup H, whereas a polymorphism at np. 4580 is characteristic of Haplogroup V. Samples were incubated with the endonuclease AluI for Haplogroup H (Figure 4.22) classification which will generate either a 188bp band if positive or a 158bp band if negative. In addition to these bands, bands of ~420bp, ~370bp, ~170bp and ~20bp will also be present regardless of the outcome and consequently the 20bp DNA ladder has been used again to differentiate between the bands present. If the 158bp band is present, 100 then samples will be incubated with the NlaIII for Haplogroup V assignment (Figure 4.23). As a result, positive samples will engender a ~1Kb (1,025bp) band and bands of 738bp and 287bp will be present for negative samples. The assay for haplogroup V determination will also generate additional bands of ~320bp, ~30bp and ~10bp in size. Figure 4.22: 2.5% agarose gel of DNAs digested with AluI for Haplogroup H classification. Samples 1, 2, & 3 have the 188bp band, while sample 4 has the 158bp band. Additional bands sized at ~420bp, ~370bp & ~170bp are also present; L = 20bp DNA ladder. 101 Figure 4.23: 2% agarose gel of digested DNAs following incubation with NlaIII for Haplogroup V assignment. All samples have bands 738bp & 287bp in size as well as the standard ~320bp band generated from this assay; M = DNA ladder. 102 The DNAs that were negative for Haplogroup HV are here examined for Haplogroup TJ. Following incubation with NlaIII, samples will either produce bands of 377bp and 361bp if the representative polymorphism is present while the presence of a 738bp band is indicative of an absence of the polymorphism. The assay with this restriction endonuclease will engender additional bands of ~320bp, 287bp, ~30bp and ~10bp in size. Figure 4.24: 2% agarose gel of digested DNAs following incubation with the endonuclease NlaIII for Haplogroup TJ classification. Samples 5 & 8 both have bands 377bp & 361bp in size while samples 1, 2, 3, 4, 6 & 7 have a band at 738bp. The additional bands of ~320bp & 287bp are also observed; M = DNA ladder. 103 The samples identified as being positive for haplogroup TJ are then assessed for haplogroups T and J, while those that have not are analysed for haplogroup Uk-group. Haplogroup T (Figure 4.25) requires samples to be incubated with BfaI which will provide bands of 279bp and 39bp if a sample is positive or alternatively if a sample is negative, a 318bp band will be present. Additional bands sized at ~240bp, ~200bp, ~170bp, ~130bp, ~110bp, ~90bp, ~60bp, ~50bp and 13bp are also generated. Figure 4.25: 2% agarose gel of DNAs digested by the endonuclease BfaI for the characterisation of Haplogroup T. Samples 1 & 2 have a band 318bp in size, while sample 3 has one at 279bp. The additional bands of ~240bp, ~200bp, ~170bp, ~130bp & ~110bp are also visible; DL = 50bp DNA ladder. 104 The assay for Haplogroup J (Figure 4.26) utilises the endonuclease BstNI at np. 13078 and will either cleave the DNA, producing bands of 905bp and 113bp, in event of a sample being negative or will retain its length following amplification; ~1Kb (1,018bp) as no other fragments are generated from this assay. Figure 4.26: 2% agarose gel of PCR products digested with the endonuclease BstNI for Haplogroup J classification. Sample 1 has a band ~1Kb (1,018bp) in size, while sample 2 has bands of 905bp & 113bp in size; M = DNA ladder. 105 The analysis of Haplogroup Uk-group (Figure 4.27) employs HinfI to recognise the presence or absence of a polymorphism at np. 12308. The presence of the polymorphism indicates that the sample can be assigned to Haplogroup Uk-group and will engender bands of 316bp and 137bp, however, an absence of the polymorphism will generate a 453bp sized band and the sample will not be classified as belonging to this haplogroup. The additional fragments engendered from the endonuclease activity will be ~550bp, ~220bp, ~80bp and ~10bp in size. Figure 4.27: 2% agarose of gel DNAs following incubation with the endonuclease HinfI for the assignment of Haplogroup Uk-group. All samples have a band 453bp in size that is indicative of the absence of the defining mutation for haplogroup Uk-group. In addition to this band, the ~550bp & ~220bp bands are also observed; DL = 50bp DNA ladder. 106 4.3 Haplogroup Characterisations using DNA Sequencing Once all RFLP analyses have been performed, samples still unassigned will be assessed for further haplogroup-defining SNPs relevant according to the already completed RFLP results. For example, outstanding samples which are positive for Haplogroup M will be screened for a SNP at np. 5843 (Haplogroup Q) and np. 9090 (Haplogroup Z). Samples positive for Haplogroup N, but negative for Haplogroup R will be assessed for SNPs at np. 8404 (Haplogroup S), np. 11947 (Haplogroup W) and np. 6371 (Haplogroup X). Finally, those which are positive for both Haplogroups N and R will be examined for a 9 bp deletion at nps. 8281-8289 (Haplogroup B) and a SNP at np. 6392 (Haplogroup F). Any samples that have still not been assigned at this point will be assigned to Haplogroups M, N and R respectively. Table 4.3:SNP sites of haplogroups characterised via DNA sequencing Haplogroup Q Z S W X B F SNP Site 5843 9090 8404 11947 6371 9bp deletion 8281-8289 6392 107 108 109 110 111 112 113 114 115 116 117 Table 4.4: Sequencing results for the samples analysed for Haplogroups Q & Z. Sample Number Haplogroup Q Haplogroup Z rCRS nucleotide Nucleotide of rCRS nucleotide Nucleotide of at SNP site sample at SNP at SNP site sample at SNP site site 7 A5843 A5843 T9090 T9090 18 A5843 A5843 T9090 T9090 40 A5843 A5843 T9090 T9090 41 A5843 A5843 T9090 T9090 51 A5843 A5843 T9090 T9090 105 A5843 A5843 T9090 T9090 113 A5843 A5843 T9090 C9090 114 A5843 A5843 T9090 T9090 116 A5843 A5843 T9090 T9090 191 A5843 A5843 T9090 T9090 Table 4.5: Sequence results for samples examined for the characteristic SNPs of Haplogroups S, W & X. Sample Number Haplogroup S Haplogroup W Haplogroup X rCRS Nucleotide of rCRS Nucleotide of rCRS Nucleotide of nucleotide at sample at SNP nucleotide sample at nucleotide at sample at SNP SNP site site at SNP site SNP site SNP site site 8 T8404 T8404 A11947 A11947 C6371 C6371 15 T8404 T8404 A11947 A11947 C6371 T6371 28 T8404 T8404 A11947 A11947 C6371 T6371 30 T8404 T8404 A11947 A11947 C6371 C6371 31 T8404 T8404 A11947 A11947 C6371 C6371 99 T8404 T8404 A11947 A11947 C6371 T6371 102 T8404 T8404 A11947 A11947 C6371 T6371 110 T8404 T8404 A11947 A11947 C6371 C6371 117 T8404 T8404 A11947 A11947 C6371 C6371 142 T8404 T8404 A11947 A11947 C6371 C6371 118 Table 4.6: Sequencing results of samples analysed for Haplogroups B & F. Sample Number Haplogroup B Haplogroup F rCRS at Sequence of sample rCRS nuceltoide Nucleotide of characteristic site at characteristic at SNP site sample at SNP site site 13 A[CCCCCTCTA]G A[CCCCCTCTA]G T6392 C6392 20 A[CCCCCTCTA]G A[CCCCCTCTA]G T6392 T6392 21 A[CCCCCTCTA]G A[CCCCCTCTA]G T6392 T6392 23 A[CCCCCTCTA]G A[CCCCCTCTA]G T6392 T6392 24 A[CCCCCTCTA]G A[CCCCCTCTA]G T6392 T6392 25 A[CCCCCTCTA]G A[---------]G - - 27 A[CCCCCTCTA]G A[CCCCCTCTA]G T6392 T6392 35 A[CCCCCTCTA]G A[---------]G - - 38 A[CCCCCTCTA]G A[CCCCCTCTA]G T6392 T6392 47 A[CCCCCTCTA]G A[CCCCCTCTA]G T6392 T6392 118 A[CCCCCTCTA]G A[CCCCCTCTA]G T6392 T6392 133 A[CCCCCTCTA]G A[---------]G - - 134 A[CCCCCTCTA]G A[CCCCCTCTA]G T6392 T6392 151 A[CCCCCTCTA]G A[CCCCCTCTA]G T6392 T6392 162 A[CCCCCTCTA]G A[CCCCCTCTA]G T6392 T6392 168 A[CCCCCTCTA]G A[CCCCCTCTA]G T6392 T6392 170 A[CCCCCTCTA]G A[CCCCCTCTA]G T6392 T6392 171 A[CCCCCTCTA]G A[CCCCCTCTA]G T6392 T6392 173 A[CCCCCTCTA]G A[CCCCCTCTA]G T6392 T6392 177 A[CCCCCTCTA]G A[CCCCCTCTA]G T6392 T6392 183 A[CCCCCTCTA]G A[CCCCCTCTA]G T6392 T6392 Based upon the results of both RFLP analysis and DNA sequencing, the samples can be placed into haplogroups (Table 4.7). 119 Table 4.7: All samples and the Haplogroups to which they belong. Sample Number 1 2 5 6 7 8 10 11 13 15 18 19 20 21 23 24 25 27 28 30 31 32 33 34 35 38 39 40 41 43 44 47 48 49 50 51 80 97 98 99 100 101 102 103 104 105 106 107 108 109 110 Haplogroup Sample Number Haplogroup A D C H M N C TJ F X M C R R R R B R X N N H H L3 B R L3 M M J HV R H TJ T M HV T J X H D X A H M L3 L3 L3 H N 113 114 115 116 117 118 119 120 121 122 123 124 125 128 129 130 131 133 134 135 136 138 139 140 141 142 143 145 148 149 151 162 168 170 171 172 173 175 176 177 183 186 187 188 189 190 191 193 198 200 Z M D M N R A C H H TJ H H C HV D T B R C D J HV HV HV N H HV J H R R R R R H R H TJ R R H G H TJ H M D HV D 120 Chapter Five Phylogeographic Analysis of Afghani Mitochondrial DNAs 121 5. Phylogeographic Analysis of Afghan mtDNAs As previously mentioned in Chapter 1, and has been extensively reported over the last two decades, populations which share a geographical region often share the same or similar genetic traits in the mitochondrial genome known as haplogroups (Wallace et al., 1999; Al-Zahery et al., 2003; Hedman et al., 2007; Richard et al., 2007; Tetzlaff et al., 2007; Zimmerman et al., 2007; Jin et al., 2009). These haplogroups are often continent specific and can also provide an indication of historical migrations of modern humans. For instance, the Amerindian lineages (haplogroups A-D) can also be found among Asian populations (Schurr et al., 1990). The L haplogroups (L1, L2 & L3) are often referred to as the African lineages (Kivisild et al., 1999; Quintana-Murci et al., 2004; Butler, 2005), groups M and U (in particular U2) are typical of South Asia (Kivisild et al., 1999; Quintana-Murci et al., 2004), haplogroups HV, JT, U, K, N, I, W & X are common in west Eurasian populations (Kivisild et al., 1999; Quintana-Murci et al., 2004; Butler, 2005; Nasidze et al., 2006, 2007) and groups A, B, C, D, E, F, G, Z & M (in particular sub-group M7) are identified as east Eurasian/East Asian lineages (Kivisild et al., 1999; Quintana-Murci et al., 2004; Butler, 2005; Zlojutro et al., 2008; Irwin et al., 2009a). Amongst the populations of western Eurasia, such as Armenia, Georgia, Azerbaijan, Iraq and Iran, the most common haplogroups found in frequencies ≥10% are HV, H, J, T and U (Kivisild et al., 1999; Al-Zahery et al., 2003; Comas et al., 2004; Quintana-Murci et al., 2004; Derenko et al., 2007), while in south Asia, haplogroups M and U are very common in India (Kivisild et al., 1999) and haplogroups H, M and U in Pakistan (Kivisild et al., 1999b; Quintana-Murci et al., 2004). Among Central Asian populations such as those in Turkmenistan, Uzbekistan, Tajikistan and Kyrgyzstan, frequent haplogroups include C, D, HV/H and U (Comas et al., 2004; Quintana-Murci et al., 2004; Derenko et al., 2007). Table 5.1: Frequencies of the regional haplogroup lineages in the Afghanistan populations (%). Population East Asian West Eurasian South Asian Lineages Lineages Lineages Hazara 37.5 40.0 15.0 7.5 Tajik 10.5 89.5 0.0 0.0 Baloch 13.4 73.4 13.3 0.0 Pashtun 14.3 64.3 7.1 14.3 Afghan (total) 21.8 64.4 8.9 5.0 122 African Lineages The mtDNA data identified the presence of 17 different haplogroups, all belonging to either East Asian (A, B, C, D, F, G and Z), West Eurasian (HV, H, JT, J, T, N and X), African (L3) or South Asian (M and R) lineages. Approximately 65% of the lineages found belong to the West Eurasian collection of haplogroups (Table 5.1) indicating a greater affinity with West Eurasian populations than any other region. This can be attributed to the significant population pressures applied on the Afghan population by multiple invasions and conquests of the Afghan lands by western groups. Until recently, the region in which Afghanistan lies had, for many years, held a significant position as a thoroughfare for trade routes and human migrations and expansions (Barfield, 2010; Haber et al., 2012). The major migrations and invasions of Afghanistan by ancient Persians, ancient Greeks, Indians and those more recent by the Arabs and Mongols are likely a consequence of the desire to control the affluent region brought by the trade routes (Barfield, 2010). The combination of these multiple expansions have engendered a varied arrangement of ethnic groups which themselves accommodate a diverse collection of mtDNA types. 123 124 125 126 127 5.1 Phylogeography of Individual Haplogroups 5.1.1 African Haplogroup L3 The first African populations of AMHs, including mitochondrial eve, originated sometime between 100-200Kya (Disotell, 1999), 192,400Kya (95% Confidence Interval (CI); 151,600-233,600Ky) (Soares et al., 2009). Haplogroup L3 has an East African origin (Salas et al., 2002) and was the haplogroup attributed to the migration out of Africa <100Kya (Disotell, 1999), ~60-80Kya (Salas et al., 2002) ~55-70Kya (Soares et al., 2009). This migration gave rise to the macrohaplogroups M and N (and their subsequent descendant lineages) that are found in all non-Africans (Torroni et al., 2006; Gonder et al., 2007). The coalescent age of L3 has been estimated as ~71,000Kya; 95% CI 57,100-86,600Ky (Soares et al., 2009) and 94.3 ±9.9Ky (Gonder et al., 2007). The coalescent age of a lineage is the estimated time of the modern lineages to coalesce or merge into their ancestral lineage or most recent common ancestor (MRCA). The L haplogroups, including L3, are usually found among sub-Saharan populations, i.e. these haplogroups constitute 100% of pygmy populations from the Democratic Republic of Congo (formally Zaire) and the Central African Republic and ~67% of three groups from Senegal (Chen et al., 1995). The L haplogroups have been observed in high frequencies in north-western Africa, ranging from ~70-~90% in Algeria and Morocco and ~33% in Egypt (Nasidze et al., 2008). The frequency of these haplogroups decrease further toward the Near East exhibiting frequencies of ~10% in Israel, Jordan and Iraq and ~5% in Syria, to an absence in western Iran (Nasidze et al., 2008). L3 is absent from Asian populations and infrequent among the populations South Asia and western Eurasia and have been found at low frequencies in Galicia (north-western Spain) and Catalonia, 2.5% and 3% respectively (Alvarez-Iglesias et al., 2009). The Baloch and Brahui populations of Balochistan in Pakistan both exhibit a frequency of 2.6% (Figure 5.2) for haplogroup L3, while the Makrani population along the south coast of Pakistan exhibits a much larger frequency of 27.3% (Quintana-Murci et al., 2004). The Hazara and Pashtun populations have exhibited lower frequencies than the Makrani, 7.5% and 14.3%, while this lineage is absent among the Tajiks and Baloch populations of Afghanistan (Table 5.2). This infrequent African lineage is likely to have been introduced into the Afghani population, as well as some adjacent populations, as a consequence of the Arab invasion in the 7th century. 128 129 5.1.2 The Early non-African Lineages 5.1.2.1 Haplogroup M* Macrohaplogroup M*, along with its sibling N*, account for all non-African mtDNAs. This haplogroup originated during the human migration from Africa 57-75Kya (Chandrasekar et al., 2009). Soares et al. (2009) provides an age estimate in south Asia of 49,400Ky (95% CI; 39,000-62,200 years) and in East Asia 60,600Ky (95% CI; 47,300-74,300 years). Other estimates include 55-73Ky for the lineage among African populations (Chen et al., 1995) and 69.3 ±5.4Ky among Chinese populations (Kong et al., 2003). Haplogroup M* is considered a south Asian lineage due to its significant contribution and distribution within the Indian population; >70% (Chandrasekar et al., 2009) and ~60% (Disotell, 1999), and as such exhibits lower frequencies in Central Asia and western Eurasia. It is believed to have arrived in the Indian subcontinent via the Southern Route migration from Africa (Disotell, 1999; Macaulay et al., 2005; Torroni et al., 2006; Chandrasekar et al., 2009; Kumar et al., 2009). Sub-group M7 is a common lineage found in East Asia populations such as Korean-Chinese and the Han (Beijing) of China, Mongolians and Koreans (Jin et al., 2009) and Japanese (Asari et al., 2007). The Hazara, Baloch and Pashtun populations of Afghanistan exhibit frequencies of haplogroup M* at 15%, 13.3% and 7.1% respectively (Figure 5.3). Frequencies of macrohaplogroup M* have been found ranging from 26%-64% within the Indian subcontinent (Kivisild et al., 1999b; Quintana-Murci et al., 2004) while the frequencies found within the Afghan populations seem to resemble frequencies found elsewhere in Central Asia (Figure 5.3). 5.1.2.2 Haplogroup N* Haplogroup N* is the second macrohaplogroup to have diverged from the African lineage L3. Its age has been estimated at 64.6 ±6.8Ky (Kong et al., 2003), 61,900 YBP in west Eurasia (95% CI; 49,200-75,000 years), 71,200YBP in South Asia (95% CI; 55,800-87,100 years) and in East Asia, 58,200YBP (95% CI; 44,100-72,800) (Soares et al., 2009). Haplogroup N* is the ancestor of many haplogroups found in Europe, Middle East, Asia and the Americas (among the Amerindians). The origin of this lineage occurred soon after or possibly even during the migration out of Africa, and is typically considered a southwest Eurasian lineage (Kivisild et al., 1999; Quintana-Murci et al., 2004; Nasidze et al., 2006, 2007). Haplogroup N* is fairly common in western Eurasia and is also present in Europe. A frequency of 5.3% has been reported in eastern Crete 130 131 (Martinez et al., 2008) while the combination of haplogroups N*, I, W and X constitute approximately 9% of the Finnish population (Hedman et al., 2007). Haplogroup N* is also found in the Near East and northeast Africa; ~13% in Egypt, ~10% in Israel, Syria and Jordan, ~5% in Iraq and ~23-44% among five Iranian populations (Nasidze et al., 2008). The Hazara exhibit a frequency of 7.5% and the Tajiks 10.5% of haplogroup N* (Figure 5.4). Elsewhere in Central Asia and western Eurasia, the frequency of N*ranges from 2.3% in the Tajiks of Tajikistan (Derenko et al., 2007) to 20% in the South Caspian region in Iran (Comas et al., 2004). The greater frequencies appear in the more western populations rather than in Central or South Asia. Haplogroup M* is affluent in South Asia, however haplogroup N* appears to be lacking from the mtDNA landscape with frequencies of 2.6% and 2.9% in the Brahui of Baluchistan and Gujarati of north-western India, and 3% in both Pakistani and Makrani populations and 7.7% within the Han Chinese population (Yang et al., 2011). The frequencies exhibited in Central Asia are similar to those found within the Hazara and Tajiks with Uzbeks expressing 7.1% (Quintana-Murci et al., 2004) and the Turkmen population 10% (Comas et al., 2004). 5.1.2.3 Haplogroup R* As a descendant of the macrohaplogroup N*, haplogroup R* also diverged soon after the migration from Africa. Along with macrohaplogroups M* and N*, R* is one of the ‘founder lineages for Eurasian settlement ~60-65Kya (Torroni et al., 2006). It has an estimated age of 59,100 years in west Eurasia (CI; 47,100-74,100 YBP), 66,600 years in South Asia (CI; 52,600-81,000 YBP) and 54,300 years in East Asia (CI; 41,200-67,800 YBP) (Soares et al., 2009), and 62.3 ±6.3Ky (Kong et al., 2003). Haplogroup R* is a typical west Eurasian and South Asian lineage largely due to its early divergence from N* in this region and can be characterised by a MboII site gain at np 12704 caused by a transition at np 12705. Within the Finnish population, haplogroup R* makes up for <3% of the maternal gene pool (Hedman et al., 2007). It has often been recorded in Central and South Asia and western Eurasia; however its distribution is not uniform throughout these regions (Figure 5.5). The Karakalpak present a frequency of 10% (Comas et al., 2004), while in the Gujarati population of northwest India it appears in 8.8% of mtDNAs and in 1.8% of Georgians (Quintana-Murci et al., 2004). Within the Afghan populations, the Pashtuns exhibit 28.6%, Tajiks 15.8% and the Hazara 7.5%. Elsewhere, the greatest frequency was found the Uzbeks at a frequency of 20%, 132 133 134 while in the south Caspian region, haplogroup X was found in 2.4% of Persians and 9.5% of Mazandrians. The presence of these three lineages within the Afghani populations and the adjacent populations from Iran, Central Asia and the Indian Subcontinent may be attributable to this region being the initial territory where haplogroups M*, N* and R* settled following the human emergence from Africa. Despite each lineage sharing similar coalescent ages, haplogroup M* is prominent among South Asian populations, particular among those in southern India in Andhra Pradesh where its frequency has been recorded at 64% (Figure 5.3). Haplogroups N* and R* appear to have a similar distribution to each other (Figures 5.4-5.5) with greater frequencies found among Iranian and Central Asian populations. 5.1.3 The East Asian Lineages 5.1.3.1 Haplogroup C Haplogroup C, a derivative of macrohaplogroup M, is accepted to be an East Asian/Eurasian lineage (Quintana-Murci et al., 2004) that can also be found among the indigenous peoples of the Americas. This lineage is dated to 33-44Kya (Chen et al., 1995) and 28,300 years before present (YBP) (95% CI; 19,400-37,400Kya) (Soares et al., 2009). The lineage is generally found at low frequencies among Central Asian and West Eurasian populations (Quintana-Murci et al., 2004), but is more widespread within the populations of Siberia (Bermisheva et al., 2002) and has been reported at frequencies of 15-21.3% among Mongolians (Kivisild et al., 1999b; Derenko et al., 2007; Jin et al., 2009). It can be characterised by an absence of a HincII site at np 13259. Haplogroup C was found exclusively among the Hazara population (Figure 5.6). The frequency of haplogroup C among the Hazara of Afghanistan (15%) is similar to those found in eastern Asia among the Mongolians (17%) and Buryats (16.6%) (Derenko et al., 2007), Mongolians (15%) (Kivisild et al., 1999b), Mongolians (21.3%) and Thais (10%) (Jin et al., 2009). Some Central Asian populations also exhibit strong frequencies, such as the Bukharian Arabs of Uzbekistan (20%) and Kyrgyz (30%) (Comas et al., 2004) and the Shugnan of Tajikistan (18.2%) (Quintana-Murci et al., 2004). 5.1.3.2 Haplogroup D Like haplogroup C, haplogroup D is also regarded as an East Asian lineage (QuintanaMurci et al., 2004) and is also a descendant of macrohaplogroup M. With an estimated age of 57.4 ±8.2Ky (Kong et al., 2003) and 48,300 years (95% CI; 35,600-61,400YBP) 135 136 (Soares et al., 2009), it is older than haplogroup C. It is characterised by an absence of an AluI site at np 5176. Absent or at low frequencies within South Asian and west Eurasian populations, haplogroup D exhibits its greatest frequencies within Central Asia and East Asia. It has been found within these regions ranging from ~30-40% (Asari et al., 2007; Derenko et al., 2007; Zlojutro et al., 2007; Jin et al., 2009) while the Han population from China exhibit a frequency of 24.6% (Yang et al., 2011). The Baloch, Hazara and Tajiks exhibit a frequency of 6.7%, 10% and 10.5% respectively for haplogroup D. The frequencies exhibited in the Afghan populations are similar to those found within Central Asian populations; displaying frequencies from 5-20% among the Uzbek groups (Comas et al., 2004; Quintana-Murci et al., 2004) and 6.8% and 15% within the Tajiks (Derenko et al., 2007; Comas et al., 2004). The Baloch population show a similar frequency with the Brahui of Balochistan, southern Afghanistan/southwest Pakistan (Figure 5.7). 5.1.3.3 Haplogroup G With an estimated age of 35,700 years (95% CI; 25,500-46,300 years) (Soares et al., 2009), haplogroup G is another of the East Asian lineages (Quintana-Murci et al., 2004). Like the previously mentioned East Asian lineages, frequencies among South Asians and west Eurasians are low, whereas the lineage is more frequent within Central Asian and East Asia. Within the populations of East Asia it has often been found to exceed 10% of the mtDNAs; Koreans – 7.7%, Han (Beijing) – 10%, Vietnamese – 16.7%, Mongolians – 17% (Jin et al., 2009), Han Chinese – 3.3% (Yang et al., 2011), Japanese – 8.8% (Asari et al., 2007), Mongolians – 10.6% and Buryats – 11.3% (Derenko et al., 2007) and southern Kazakhs – 20% (Comas et al., 2004). Haplogroup G can be characterised by a HhaI site gain at np 4831 generated by a transition at np 4833. Haplogroup G was found exclusively within the Pashtun population, exhibiting a frequency of 7.1%. Within South Asia, the frequency of this lineage is lower; having been found in only 1% of Pakistani mtDNAs (Quintana-Murci et al., 2004), while its distribution in Central Asian groups, appear to be more similar to the Pashtuns (Figure 5.8). Haplogroup G was found at 5% within the Karakalpaks (western Uzbekistan) and Tajiks and 10% among eastern Uzbeks (Comas et al., 2004), 4.6% among Tajiks and 8.2% within the Kalmyks (south-western Russia) (Derenko et al., 2007). 137 138 139 5.1.3.4 Haplogroup Z Haplogroup Z is the final East Asian lineage (Quintana-Murci et al., 2004) that descends from haplogroup M* to have been identified within the Afghan population. Soares et al. (2009) estimated its divergence age as 24,300YBP (95% CI; 15,400-33,600 years). Haplogroup Z is characterised by the occurrence of a transition at np 9090 within the ATPase6 gene. The Z lineage shares a common origin with haplogroup C is fairly frequent within Central and East Asia (Meinilä et al., 2001) especially among the indigenous populations of northern and eastern Siberia (Bermisheva et al., 2002; Ingman & Gyllensten, 2007a), but is uncommon in western Eurasians. The Korean, Thai and Vietnamese populations express frequencies of 0.6%, 5% and 2.1% respectively (Jin et al., 2009). Ingman & Gyllensten (2007a) identified haplogroup Z was also present within the Volga-Ural region of Russia, and additionally that the lineage is present at low frequencies among the Saami indicating that the Z lineage has been introduced into the population by northern Asian populations via the Volga-Ural region. The Hazara population again illustrates a similarity with populations of Mongolia as both share similar frequencies of haplogroup Z (Figure 5.9); 2.5% (Hazara) and 2.1% among Mongolians (Derenko et al., 2007; Jin et al., 2009). The Tajiks, northeast of Afghanistan, also exhibit a similar frequency, 2.3%, while populations north of Afghanistan, such as the Karakalpaks and Kazakhs, show a slightly greater frequency of 5% (Comas et al., 2004). Like haplogroup C, haplogroup Z was found exclusively among the Hazaras, an observation that may be directly linked to both haplogroups sharing a common origin before the lineages coalesce to haplogroup M* but also of their perceived East Asian and Mongol heritage. 5.1.3.5 Haplogroup A Haplogroup A, despite being a descendant of macrohaplogroup N*, is typically an East Asian lineage (Kivisild et al., 1999; Quintana-Murci et al., 2004) and is also one of the founder lineages of the Americas. Haplogroup A has an estimated age of 25-34Ky (Chen et al., 1995) and 29,200 years (95% CI; 19,100-39,800 years) (Soares et al., 2009) and is characterised by a HaeIII site gain at np 663 that is generated by a transition at the same location. Haplogroup A is usually infrequent or absent among west Eurasians and is generally found in its greatest frequencies in East Asia, but has also been found at a frequency of 13.2% in African-descendant tribes of Brazil (Carvalho et al., 2008). The Hazaras (5%) and the Baloch (6.7%) both exhibit similar frequencies of haplogroup A (Figure 5.10), which in turn are similar to the surrounding populations. Within Central 140 141 142 Asia, populations such as the Khoremian Arabs of western Uzbekistan present a frequency of 10%, the Dungan and Uighur populations of Kyrgyzstan display frequencies of 12.5% and 6.4% respectively and the Tajik population 15% (Comas et al., 2004). A similar frequency (7.5%) was found with the Han population of China (Yang et al., 2011). Within South Asia, haplogroup A was found in 1% of the population from the Uttar Pradesh region of northern India (Kivisild et al., 1999b), while other populations such as the Hunza Burusho of Balochistan, Turkmen, Turkmen Kurds, north-eastern Persians, Tajiks and Shugnan group from Tajikistan presented frequencies of 2.3-3.1% (Quintana-Murci et al., 2004; Derenko et al., 2007). The frequencies from Mongolia from three studies report frequencies of 3.9%, 13% and 4.3% (Kivisild et al., 1999b; Derenko et al., 2007; Jin et al., 2009). 5.1.3.6 Haplogroup B Haplogroup B has an estimated age of 50.8 ±6.6Ky (Kong et al., 2003) and 50,700 years, CI – 38,100-63,800 YBP (Soares et al., 2009). It is a typical East Asian lineage, and along with lineages A, C and D, is one of the founder groups of the Amerindians (Kivisild et al., 1999; Quintana-Murci et al., 2004; Irwin et al., 2009a). The greater frequencies of haplogroup B can be found in the East Asian populations (Figure 5.11) of China, Korea and Mongolia and is less frequent within Central Asia and western Eurasia. It was found in 0.9% of Iraqi mtDNAs (Al-Zahery et al., 2003), while Korean populations reported frequencies of ~15% and ~20% (Jin et al., 2009; Derenko et al., 2007) and Han Chinese 17.8% (Yang et al., 2011). Haplogroup B is typically characterised by a 9bp deletion from np 8281-8289, this deletion occurs in a small section of non-coding DNA between the COII and tRNAlys genes (Figure 5.12). This lineage is primarily characterised by the deletion of the repeated CCCCCTCTA sequence within the mtDNA coding region, however this deletion also presents itself among some African populations, but these populations do not belong to haplogroup B. Despite populations from two completely different regions possessing the same deletion, it has been identified that they occurred independently from one another rather than these populations descending from a common source (Soodyall et al., 1996). The Hazara contain a frequency of 2.5% and Pashtuns 7.1% of haplogroup B. neighbouring populations from Central Asia exhibit similar frequencies of this lineage; Turkmen populations display 2.4% and 5% while Turkmen Kurds exhibit 6.3%, Uzbeks present a frequency of 5% each among the Bukharian Arabs and Uzbeks but 10% among the Khoremian Uzbeks. The Lurs and north-eastern Persians also exhibit similar 143 144 145 frequencies of 5.9% and 6.1% (Comas et al., 2004; Quintana-Murci et al., 2004). Meanwhile East Asian populations such as the Mongolians, Chinese and Koreans exhibit this lineage more often among their populations. In Mongolia, haplogroup B occurs at a frequency of 8.5%, 9.7% and 15.3% (Jin et al., 2009; Kivisild et al., 1999b; Derenko et al., 2007). Among two populations from eastern China; the Han (Beijing) and Korean-Chinese, haplogroup B accounts for 10% and 11.8% of mtDNAs, while it is found in 15.1% and 20.8% of Koreans (Jin et al., 2009; Derenko et al., 2007). 5.1.3.7 Haplogroup F Haplogroup F is the last East Asian lineages (Kivisild et al., 1999; Quintana-Murci et al., 2004; Irwin et al., 2009a) to be found among the Afghan populations. The age of haplogroup F has been estimated as 43,400 years, CI – 32,200-55,000 YBP (Soares et al., 2009) and 60.0 ±9.2Ky (Kong et al., 2003). This lineage is characterised by the absence of a Tsp509I site at np 6389 that is an outcome from a transition at np 6392. Haplogroup F is present in larger frequencies within populations from East Asia and east Central Asia (Figure 5.13). It is rarely reported outside Asia, the Chinese Han present a frequency of 17.7% (Yang et al., 2011) while the Kalmyks of south-western Russia present a frequency of 5.5% (Derenko et al., 2007) that is greater than populations immediately east of the Caspian Sea. Haplogroup F was found exclusively among the Hazara at a frequency of 2.5%. this is similar to the frequencies found in India (1-2%), and among some Uzbeks and Turkmen populations (2.4%) (Kivisild et al., 1999b; Quintana-Murci et al., 2004). The lineage has been found in large frequencies among the Kyrgyz – 15%, Kashmir – 21%, and Uighur – 25% populations (Kivisild et al., 1999b; Comas et al., 2004), while neighbouring populations all exhibit frequencies <10%. Large frequencies have also been reported among Mongolians – 14.9%, and the Han (Beijing) – 22.5%, populations (Jin et al., 2009). As has been reported, these lineages are particularly common within East Asian and North Asian populations. Of these lineages, only haplogroup G has not been found among the Hazaras, thus indicating this Afghani ethnic group have had significant East Asian genetic influence at some point or during multiple durations throughout their history. This influence may be due to the combination of the arrival of Yuezhi invasion shortly before the 1st century BC, the ruling of the Chinese Tang dynasty from 659-751 146 147 AD (Wilbur, 1962) and the more recent Mongol expansion early in the 13th Century. The representation of the East Asian lineages accounts for 37.5% of Hazaran mtDNAs, while among the Tajiks, Baloch and Pashtuns, the contribution is not greater than 14.3% (Table 5.1). This indicates the East Asian invasions and migrations have not had as significant effect on these ethnic groups and their subsequent acquisition of the East Asian lineages may also be due to some maternal admixture by the Hazaras. 5.1.4 The West Eurasian Lineages 5.1.4.1 Haplogroup X Haplogroup X is recognised as a west Eurasian lineage (Kivisild et al., 1999; QuintanaMurci et al., 2004) with an origin in the Near East and West Eurasia (Reidla et al., 2003; Shlush et al., 2008) as well as one found in the Amerindians, and is estimated to have diverged 20.4 ±6.5Ky (Richards et al., 1998), a 95% Credible Region (CR) of 13,70026,600 YBP in the Near East and 17,000-30,000 YBP in Europe (Richards et al., 2000), and 31,800 years (CI; 19,700-44,600 YBP) (Soares et al., 2009). It can be characterised by transition at np 6371 in the COI gene and consists of two major sub-groups, X1 and X2 (Reidla et al., 2003). The former sub-group is confined to north and east Africa while the latter is widespread throughout west Eurasia and is likely to have expanded periLGM or shortly post-LGM as conditions ameliorated (Reidla et al., 2003). The version of haplogroup X found among the Amerindians is a derivative of sub-group X2 (Reidla et al., 2003). Haplogroup X is found in Europe, the Near East, Central Asia and among the Amerindians of the Americas (Reidla et al., 2003). However, frequencies throughout these regions vary. In Europe, haplogroup X has a frequency of 2.5% in the UK and USA (Herrnstadt et al., 2002), 0.8% in France, 0.9% in England while the Orkney Islands have a frequency of 7.2%. Elsewhere in Europe it appears in Spain (4.2%), Greece and Turkey (both 4.4%). In the Middle East, X appears in Yemen (0.9%), Oman (1.3%), Saudi Arabia (1.5%), Syria (1.8%), Jordan (2%), Lebanon (5.8%) and Israeli Druze (26.7%) (Reidla et al., 2003). In Eurasia and Central Asia, haplogroup X was found to present itself at 0.2% in India and 5.5% in Armenia and Georgia (Kivisild et al., 1999), 1% in Andhra Pradesh (Kivisild et al., 1999b), 8.6% in Georgia (Quintana-Murci et al., 2004) 7.6% in Georgia, 2.6% Armenia, 2.7% in the North Caucasus, 0.2% India and 0.6% among Uzbeks (Reidla et al., 2003). The Hazara exhibit a frequency of 7.5% and the Baloch 6.7% (Figure 5.14) which are most similar to the Kashmir (5%) and Turkmen Kurds (6.3%) populations. Elsewhere within the region (Iraq, Hunza Burusho, Turkmen Kurds, Shugnan and Iran), the frequency of haplogroup X is ~2.4%. 148 149 5.1.4.2 Haplogroup HV* Haplogroup HV* is a common west Eurasian and European lineage (Kivisild et al., 1999; Quintana-Murci et al., 2004; Nasidze et al., 2006, 2007). This lineage possesses an estimated age of 27,100 years (CI; 19,600-34,800 YBP) (Soares et al., 2009), a CR of 24,300-29,000 YBP in the Near East, while in Europe the CR is 20,700-22,800 YBP (Richards et al., 2000). Haplogroup HV (and its consequent lineages) account for ~45% of Finnish mtDNAs (Hedman et al., 2007) and despite being a common Eurasian and European lineage is often found in Central Asia. It has been recorded in the Karakalpak (25%) and the Uighurs (6.3%), where similar frequencies are also exhibited in their neighbouring populations (Comas et al., 2004). This lineage can be characterised by the absence of a MseI site at np 14766 that is caused by a transition at the same location. Haplogroup HV is one of two haplogroups that have been found in the four Afghan populations. It is present in 2.5% in the Hazara, 6.7% among the Baloch, 14.3% among Pashtuns, and 15.8% among the Tajiks (Figure 5.15). Similar frequencies were found among the nearby populations of Pakistan; 10.3% Baloch, 5.3% Brahui, 4% Pakistani and 6.1% Makrani. In western Eurasia, HV appears in 19.1% of Persians, 24.3% of Gilaki and 30% of Turkmens (Quintana-Murci et al., 2004). This distribution illustrates a greater affinity in west Eurasian populations. 5.1.1.1 Haplogroup H As the most common haplogroup in Europe and the Near East (Kivisild et al., 1999; Richards et al., 2002; Quintana-Murci et al., 2004; Nasidze et al., 2006, 2007), it has been extensively studied due to its frequencies within European populations, and as such, has an estimated age of 20.5 ±2.5Ky (Richards et al., 1998), a CR of 23,200-28,400 YBP in the Near East and 19,200-21,400 YBP in Europe (Richards et al., 2000) and 18,600 years, CI – 14,700-22,600 YBP (Soares et al., 2009). Haplogroup H can be classified as a result of an absence of an AluI site at np 7025 that is caused by a transition at np 7028. In Europe, haplogroup H, occurs at a frequency of 40-60% (Richards et al., 2000, 2002; Pereira et al., 2006; Roostalu et al., 2007) with Basque populations exhibiting the greatest frequencies, ~60% (Richards et al., 2002) while throughout the Near East, with the westernmost populations display greater frequencies than those in the east. Within the Near East and Caucasus, the frequency of haplogroup H is reported to be 25-30% (Richards et al., 2002), ~20% (Pereira et al., 2006), and 10-30% within the Near East, Caucasus and Central Asia (Roostalu et al., 2007). It was found at ~18% within the 150 151 152 Egyptian population, ~33-37% in Israel, Syria and Jordan, ~27% in Iraq, and ~10-21% within Iran (Nasidze et al., 2008). In the UK and USA, haplogroup H is found in 52% of mtDNAs (Herrnstadt et al., 2002), while in the Spanish regions, Galicia, Cantabria and Catalonia, it accounts for ~44% and ~39% of the population (Alvarez-Iglesias et al., 2009). The distribution of haplogroup H (Figure 5.16) among west Eurasian populations is fairly uniform with frequencies no lower than 10% while in the Indian subcontinent, haplogroup H is all but absent. In Iran, the frequency among the different populations ranged from 10% in the Kurdish group to 17.6% among the Lurs (Quintana-Murci et al., 2004) which are similar to the frequency found among the Hazaras and Pashtuns, 10% & 14.3%. The Central Asian populations of Uzbeks, Turkmen, Tajiks and Shugnan exhibit frequencies of 21.4%-29,5% which are more similar to the Baloch and Tajik populations which both exhibit 26.7% and 36.8%. 5.1.1.1 Haplogroup JT Haplogroup JT has an estimated age of 50,300 years, CI – 38,400-62,500 YBP (Soares et al., 2009). The JT lineage is the ancestor to haplogroups J and T and can be found among western Eurasian and Central Asian populations. Haplogroup JT itself is not often recorded, but its subsequent offspring lineages do present themselves in European, Near Eastern and Central Asian populations. JT was recorded in 1% of modern Hungarians (Tömöry et al., 2007) and 9.7% among Yakuts (Zlojutro et al., 2008), however the frequency of the latter also included the frequencies of haplogroups J and T and is therefore not a complete representation of the JT lineage. Haplogroup JT is characterised by the gain of a NlaIII at np 4216 that is generated by a transition at the same location. Haplogroup JT was found to be present among the Hazaras (2.5%), Tajiks (10.5%) and Baloch (13.3%) (Figure 5.17) but was either absent or not reported in the neighbouring populations. 5.1.1.1 Haplogroup J Derived from JT, haplogroup J is also a typical lineage of west Eurasia (Kivisild et al., 1999; Quintana-Murci et al., 2004; Nasidze et al., 2006, 2007). This haplogroup is also present in some European populations. The estimated age for haplogroup J is 32,600 years, CI – 22,400-43,200 YBP (Soares et al., 2009), 28.0 ±4.0Ky (Richards et al., 1998) and a CR age of 42,400-53,700 YBP in the Near East and 22,000-27,400 YBP in Europe (Richards et al., 2000). It is frequent within west Eurasian and Central Asian populations 153 154 155 (Figure 5.18) and has been found to be present in ~4% of the Finnish population (Hedman et al., 2007), while in the Near East, it appears in slightly greater frequencies; ~7% Egypt, ~12% Israel, ~8% Syria, ~6% Jordan, ~13% Iraq and ~10-30% in Iran (Nasidze et al., 2008). Haplogroup J was found at a very low frequency within India, 0.2% (Kivisild et al., 1999) but in Western Europe, UK and USA populations, haplogroup J features in 7.6% of mtDNAs (Herrnstadt et al., 2002). Haplogroup J is characterised by the absence of a BstNI at np 13704 caused by a transition at np 13708. The Baloch exhibit a frequency of 13.3% while the Pashtuns display 7.1%. The south Caspian populations of Iran exhibit frequencies of ~16-24%, while the populations of Central Asia (Turkmen, Uzbeks, Tajiks), the frequency of this lineage is not greater than 5% in Tajikistan (5% Tajiks and 4.5% Shugnan), 7.1% in Uzbekistan (7.1% Uzbek, 5% Karakalpak & Uzbek) and 9.8% in Turkmenistan (9.8% & 5%) (Comas et al., 2004; Quintana-Murci et al., 2004). 5.1.1.2 Haplogroup T Often identified as a west Eurasian lineage (Kivisild et al., 1999; Quintana-Murci et al., 2004; Nasidze et al., 2006, 2007) haplogroup T is also found in Europe and has an estimated age of divergence of 46.5 ±6.0Ky (Richards et al., 1998) and a CR of 41,90052,900 YBP in the Near East while in Europe this CR is 33,100-40,200 YBP (Richards et al., 2000). Soares et al. (2009) calculated an estimated age of 26,800 years, CI – 18,10035,800 YBP. Haplogroup T is characterised by a BfaI site gain at np 4914 generated by a transition at np 4917. Near Eastern populations have shown to exhibit frequencies of haplogroup T from ~4%-~16%, these frequencies occurred in Egypt, Israel, Jordan, Syria, Iraq and Iran (Nasidze et al., 2008). Haplogroup T was also found within the Finnish population presenting a frequency of ~6% (Hedman et al., 2007). It is fairly frequent in western Eurasia, the distribution of T decreases toward the east and south of Asia (Figure 5.19) with the exception of the Dungan population which presents a frequency of 18.8% (Comas et al., 2004). Haplogroup T was also reported at a frequency of 10.6% in the United Kingdom and USA (Herrnstadt et al., 2002). The Hazara exhibit a frequency of 2.5% and the Baloch 6.7% for haplogroup T, this in comparison to populations from Iran, is relatively low, while they are similar to Central Asian populations. This lineage is a much more frequent trait in western Eurasia than it is in Central Asia. Haplogroup T is found in 1% of Indians in Uttar Pradesh and Andhra Pradesh (Kivisild et al., 1999b). 156 157 The West Eurasian lineages contribute to the majority of the mtDNA types found among each of the Afghani populations in this study, especially among those inhabiting the lowlands; the Baloch, Pashtuns and Tajiks. The terrain of the Afghan lowlands provides a greater opportunity for the gene flow of neighbouring or nearby characteristics and/or lineages into the populations and tribes. These migratory movements may have included the western invasions and reigns by the Persians and the Greeks. The Hazaras do not possess a West Eurasian haplogroup contribution as significant (40% as to 64.3-89.5%) as the other ethnic groups. This may be a consequence of their nomadic lifestyle and isolation among Afghanistan’s largest physical barrier, the Hindu Kush Mountains. 5.2 Discussion Without explaining the distribution of the haplogroups found in Afghanistan, their presence means very little on its own. In total, seventeen haplogroups were found among the Afghani population (Figure 5.20) and of the four Afghani ethnic groups reported, the Hazara seem the most diverse. Of the seventeen haplogroups identified, fifteen were present within the Hazara (only haplogroups G and J were absent), while the Tajiks are the least diverse group as their mtDNA pool contained only six haplogroups, the Pashtun and Baluch populations consist of eight and nine different lineages respectively (Figures 5.21-24). Of the seventeen haplogroups observed, only two were reported within each Afghani ethnic group; the Pashtuns, Tajiks, Hazaras and Baluch, these two lineages are the West Eurasian haplogroups HV and H. The Hazara has the greatest collection of East Asian mtDNAs, and of the seven East Asian haplogroups found within the Afghani population, six were observed among the Hazara; A, B, C, D, F and Z. Additionally, three of these lineages, C, F and Z, were observed only among the Hazaras. Skeleton trees were constructed (Figures 5.20-24) illustrating the haplogroups observed among the whole Afghani population and also among the individual ethnic groups. Branch lengths connecting haplogroups are not representative of the degree of variation between them, however circle size of reported haplogroups are proportionate to their observed frequency. For instance, among the entire Afghani population (Figure 5.20) the largest circle belongs to haplogroup H which was observed most often (n=18) while haplogroups F, G and Z were each observed once within the dataset and therefore have the smallest circle. 158 159 160 161 162 163 Table 5.3: East Asian haplogroup frequencies among the Hazara, Mongolians and Koreans. Lineage Frequency in Population Hazara Mongolians Han Chinese Koreans (Kivisild et al., 1999b, Derenko et al., (Yang et al., 2011) (Derenko et al., 2007, Jin 2007, Jin et al., 2009) et al., 2009) A 5% 3.9%, 4.3%, 13% 7.5% 6.8%, 8.4% B 2.5% 8.5%, 9.7%, 15.3% 17.8% 15.1%, 20.8% C 15% 15%, 17%, 21.3% 5.4%* 1%, 1.7% D 10% 11%, 12.8%, 19% 24.6% 33.5%, 39.8% F 2.5% 5.8%, 6.4%, 14.9% 17.7% 4.9%, 10.1% Z 2.5% 2.1%, 2.1% 5.4%* 0.6% * Haplogroup CZ When compared to East Asian populations (Table 5.3), the Hazaras present similar frequencies of the East Asian lineages and in particular to those reported from Mongolian populations. This indicates the Hazara may contain some historical maternal genetic influence from Mongolia. This indication supports the belief the Hazara have of their perceived Mongol ancestry. The mtDNA genepool of the Tajiks, Baloch and Pashtuns contain low frequencies of East Asian, South Asian and African lineages which infer a limited contribution via migrations and/or invasions or gene flow. The Hazaras also exhibit low frequencies of African and South Asian lineages; however present a near 50/50 split of the remaining mtDNAs between East Asian (37.5%) and West Eurasian (40%) haplogroups. Haplogroups HV and H constitute 12.5% of the Hazaran mtDNAs and when compared to the Tajiks, Pashtuns and Baloch and to nearby populations from Iran, Turkmenistan and Uzbekistan (Quintana-Murci et al., 2004) and Iraq (Al-Zahery et al., 2003) the frequency is somewhat lower (Table 5.4). 164 Table 5.4: Frequency of haplogroups HV & H among Afghan, Iranian, Iraqi, Turkmen, Uzbek and Pakistani populations. Population Frequency of HV Frequency of H Frequency of Pakistan Iran Afghanistan HV & H Hazara 2.5% 10.0% 12.5% Tajik 15.8% 36.8% 52.6% Baloch 6.7% 26.7% 33.4% Pashtun 14.3% 14.3% 28.6% Persian 19.1% 14.3% 33.4% Gilaki 24.3% 13.5% 37.8% Mazandrian 9.5% 14.3% 23.8% Lurs 11.8% 17.6% 29.4% Iranian Kurds 20.0% 10.0% 30.0% Hazara 4.3% 13.0% 17.3% Baloch 10.3% 20.5% 30.8% Iraq 10.6% 19.9% 30.5% Turkmen 4.8% 22.0% 26.8% Uzbek 7.2% 21.4% 28.6% Figure 5.25: Location of the ethnic groups of Afghanistan. 165 The frequencies of these lineages and the contribution made by the remaining West Eurasian lineages found among the Tajik, Baloch and Pashtun ethnic groups indicate a likely origin to be within West Eurasia. The Hazara, however are less likely to have a West Eurasian origin as a consequence of the strong East Asian mtDNA contribution to their genepool. The differences between the Hazaras and the three other ethnic groups reported in this study may be due to the barriers which prevent the regular admixture of the ethnic groups. These barriers include those that are physical such as landscape, religious and cultural or linguistic. The high frequency of the East Asian lineages persisting within the Hazara population may be due to their inhabitancy of the Hindu Kush Mountains. This study has identified the ethnic groups which inhabit the Afghan lowlands both north and south of the Hindu Kush (Figure 5.25) exhibit significantly larger contributions of West Eurasian lineages, and the isolation of the Hazara within the mountain range inhibits the gene flow of these lineages. In addition, the ethnic groups also practice different forms of Islam; the Hazaras are Shi’a Muslims while the other ethnic groups are Sunni Muslims. Consequently, it is unlikely that any admixture would occur between the two denominations of Islam, as Shi’a Muslims have historically been persecuted and disparaged by Sunni Muslims. Despite this, the Hazara do have a contribution of 40% of West Eurasian lineages within their mtDNA genepool and it is a possibility that as a consequence of a history of persecution, the Hazaras may have induced the gene flow of these lineages into their ethnic group. By inducing positive gene flow, the Hazara may have been able to increase their ability to integrate themselves into the Afghani society. Alternatively, the Hazaras may have been part of integrated Afghani society, but have subsequently become isolated as a consequence of the practice of the alternative Islamic denomination and the assimilation of the Mongol expansion is the 13th and 14th centuries. 166 Chapter Six Mitochondrial DNA Diversity and Polymorphism in Afghani Populations 167 6. Mitochondrial DNA Diversity and Polymorphism in Afghani Populations 6.1 Previous mtDNA Studies on Afghani Populations To date, there have been no previous mtDNA studies on the populations of Afghanistan. The Central Asian region in which Afghanistan resides has been a major crossroad for trade routes and migrations throughout history (Haber et al., 2012) but has been studied very little in comparison to Europe and the numerous emerging studies on East Asian populations. Some studies of Central Asia have focussed on populations adjacent to the Afghani populations such as those in Iran, the Near East including Iraq, Jordan, Israel and Syria, the Caucasus region, South Asia, northern Asia, Turkmenistan, Kyrgyzstan, southern Kazakhstan, Tajikistan and Uzbekistan (Comas et al., 1998; 2004; Kivisild et al., 1999; Richards et al., 2000; Al Zahery et al., 2003; 2011; Palanichamy et al., 2004; Quintana-Murci et al., 2004; Derenko et al., 2007; Zlojutro et al., 2007; Irwin et al., 2009b). One study (Irwin et al., 2009b) examined Uzbek sub-populations, including five of Uzbek ancestry but also six with foreign ancestry including one with ancestry in Afghanistan. This population of Afghani Uzbeks is located on Uzbekistan’s eastern border with Tajikistan (Figure 6.1). This group has been established in Uzbekistan within the last century and was identified to contain a large west Eurasian mtDNA contribution (Irwin et al., 2009b); however, their ethnicity is unknown as is the cause for their position in Uzbekistan. Their isolation from Afghanistan may be a consequence of the former Soviet Union’s movement of their border southwards. The study conducted by Quintana-Murci et al. (2004) examined a number of populations from Iran, Pakistan and Central Asia. Some populations studied were located on Pakistan’s western border with Afghanistan such as the Kalash, Baluch, Brahui and Hazara but the study was not expanded in incorporate Afghani populations despite its historical standing as mentioned in Chapter 2. When Iranian populations have been studied in the past (Richards et al., 2000; Comas et al., 2004) their ethnicity has not been reported and with a country with the area Iran possess, 1,648,195Km2 (CIA, 2011) there are certainly a number of different ethnicities. For instance, Comas et al. (2004) report findings of an Iranian population in a region of northern Iran near the south of the Caspian Sea; there are a number of Iranian ethnic groups which live in this region: Mazandaranis, Gilakis or the population may be from Tehran, Iran’s capital, that is only ~200Km south of the Caspian Sea, a relatively short distance. The data from this study is also based upon 16-20 samples from twelve populations and to accurately gauge the mtDNA landscape of these populations larger sample sizes would be appropriate. 168 169 From the published data for the populations studied within this region, we can identify three distinct groups; i) the Iranian plateau and Turkmenistan, Uzbekistan and Tajikistan; consist of large west Eurasian, smaller East Asian and a low or absent South Asian contribution, the latter do however have a greater contribution of East Asian lineages than the populations of Iran, ii) South Asia, where the mtDNA genepool is dominated by South Asian lineages and is supplemented by a small west Eurasian contribution and a near absence of East Asian and African (except among the Makrani, 39.4%) lineages and iii) Kyrgyz populations, who consist of large East Asian, smaller west Eurasian and near absence of South Asian contributions (Comas et al., 2004; Quintana-Murci et al., 2004). Despite the absence of any mtDNA data on the Afghani populations, recently there have been two Y-Chromosome studies emerge (Lacau et al., 2011; Haber et al., 2012). 6.2 mtDNA HVS-I Region Sequencing 6.2.1 Variable Sites In total, 87 mtDNA HVS-I sequences were obtained across four Afghani ethnic groups. The sequences include all base pairs from nucleotide positions 16024-16365, numbered according to Anderson et al. (1981). All sequences were aligned against the rCRS (Figure 6.2) and the number of different haplotypes was identified for each subpopulation. Among the HVS-I sequences, 80 polymorphic sites were identified (Table 6.1). Table 6.1 also shows the number of substitutions at each nucleotide position, the position and quantity of both indels, and transversions. The majority of transitions observed involved pyrimidines (241/287) while 18 transversions were also observed. DnaSP version 5.10 was used for the basic analyses on the HVS-I sequence data. The polymorphisms identified in this study are defined in relation to the CRS sequence, which was observed among four individuals (Table 6.1). The greatest number of variable sites observed within a single HVS-I haplotype was 10 (sample 110_Haz), relative to the CRS. Table 6.2 shows the total number of monomorphic and polymorphic sites observed among the four Afghani ethnic groups, and also the total number of singleton sites; noninformative sites, and parsimony informative sites, polymorphic sites that are present at least twice, within the mtDNA HVS-I sequence. The forensic output and mismatch tables for the polymorphic sites/haplotypes identified among the four Afghani populations can be found in Appendix 3. 170 171 Table 6.1: Frequency and nucleotide positions of transitions, transversions and indels within the HVS-I sequences of the four Afghani ethnic groups. (Transversions in red bold; indels in bold). HVS-I Mutation HVS-I Mutation HVS-I Mutation HVS-I Mutation Nucleotide Frequency Nucleotide Frequency Nucleotide Frequency Nucleotide Frequency Position Position Position Position 16037.G 1 16181.- 1 16239.T 1 16298.C 8 16041.G 1 16182.- 3 16240.G 3 16300.G 1 16051.G 1 16183.C 9 16243.C 1 16304.C 6 16069.T 5 16183.- 2 16248.T 2 16305.T 1 16071.T 5 16184.A 1 16249.C 3 16309.G 1 16086.C 1 16184.T 2 16256.T 3 16311.C 13 16092.C 2 16185.T 1 16257.A 1 16318.T 1 16093.C 4 16186.T 1 16260.T 2 16319.A 8 16111.T 3 16189.C 18 16261.T 7 16325.C 4 16126.C 7 16189.1C 4 16262.T 1 16327.T 7 16129.A 11 16192.T 1 16265.G 2 16335.G 1 16134.T 1 16193.1C 17 16266.T 3 16343.G 1 16136.C 2 16193.2C 2 16270.T 1 16344.T 2 16140.C 2 16201.T 1 16271.C 2 16352.C 2 16145.A 6 16209.C 3 16274.A 3 16353.T 1 16148.T 2 16217.C 3 16278.T 7 16354.T 2 16163.G 1 16222.T 3 16288.C 1 16356.C 7 16172.C 10 16223.T 36 16289.G 1 16357.C 2 16173.T 1 16224.C 1 16290.T 4 16362.C 20 Anderson 4 16174.T 1 16227.G 1 16291.T 1 16175.G 1 16230.G 2 16292.A 3 16176.T 1 16232.A 2 16294.T 6 16180.- 1 16234.T 2 16297.C 3 Table 6.2: General data of the HVS-I polymorphisms among the four Afghani ethnic groups. Population Baluch Hazara Pashtun Tajik Selected region 16024-16365 16024-16365 16024-16365 16024-16365 Number of sites 351 351 351 351 Total number of sites (excluding sites with gaps or 341 338 341 342 Sites with alignment gaps or missing data 10 13 10 9 Invariable (monomorphic) sites 312 282 306 319 Variable (polymorphic) sites 29 56 35 23 Total number of mutations (relative to CRS) 29 57 35 23 Singleton variable sites 12 26 26 17 Parsimony informative sites 17 30 9 6 Singleton variable sites (two variants) 12 25 26 17 Parsimony informative sites (two variants) 17 30 9 6 Singleton variable sites (three variants) 0 1 0 0 Parsimony informative sites (three variants) 0 0 0 0 missing data) 172 Additional mtDNA data of the four Afghani ethnic populations, such as the number of loci and polymorphic sites, frequency of transitions, transversions and indels and also the mean number of pairwise differences, is shown in Table 6.3. Table 6.3: Afghani ethnic group mtDNA HVS-I sequence data. Population Baluch Hazara Pashtun Tajik Sample size 15 40 14 18 No. of loci 351 351 351 351 No. of polymorphic sites 32 63 38 25 Sum of square frequencies 0.0756 0.0275 0.0714 0.0741 No. of observed transitions 27 52 33 22 No. of observed transversions 2 6 2 1 No. of observed substitutions 29 58 35 23 No. of observed indels 3 7 3 2 No. of observed sites with 27 52 33 22 2 6 2 1 29 57 35 23 No. of observed sites with indels 3 7 3 2 Nucleotide composition -C 33.33% 33.48% 33.50% 33.35% Nucleotide composition -T 22.29% 22.15% 22.19% 22.20% Nucleotide composition -A 33.29% 33.35% 33.15% 33.35% Nucleotide composition -G 11.09% 11.02% 11.17% 11.09% transitions No. of observed sites with transversions No. of observed sites with substitutions Mean number of pairwise 7.295238 7.641026 7.208791 3.98032 differences ±3.618935 ±3.639090 ±3.595280 ±2.088066 6.2.2 Haplotype Distribution The number of different haplotypes (h) present within an observed population would be expected to increase as sample sizes increase as the opportunity for multiple polymorphism variations and combinations increase. However, the shorter the DNA sequence length as sample sizes increase, the haplotype diversity would decrease. The haplotype diversity (Hd) for the four Afghani ethnic groups has been calculated (Table 6.4) using DnaSP v.5.10. In this study, the Afghani populations are characterised by high haplotype diversities; the least value among the Tajiks (0.9804) and highest among the Baluch (1.00). Here, the Hd values of the Afghani populations are high (≥0.98) as there are few shared haplotypes among them, which may be due to the relatively low sample sizes. From all HVS-I sequences, 23 (26.4%) haplotypes are shared and found in two or 173 more individuals, while 64 haplotypes (73.6%) were found in only one individual. There are shared haplotypes between each population (Table 6.5); the greatest number can be found between samples belonging to Pashtuns and Tajiks (6 haplotypes shared). Table 6.4: Number of haplotypes (h), haplotype diversity (Hd) and nucleotide diversity (π) of the 4 Afghani populations using DnaSP ver. 5.10 Population N Number of Haplotype S.D. haplotypes (h) Diversity (Hd) Nucleotide S.D.* Diversity (π)* Nucleotide Diversity (Jukes & Cantor) (πJC) Baluch 15 14 0.990 0.028 0.021207 ±0.011795 0.01991 Hazara 40 36 0.994 0.008 0.022148 ±0.011718 0.02009 Pashtun 14 14 1.000 0.027 0.020956 ±0.011732 0.01954 Tajik 18 16 0.980 0.028 0.011571 ±0.006788 0.01109 S.D. = Standard Deviation * = calculated by Arlequin ver. 3.5.1.2 Table 6.5: Number of shared haplotypes between the Afghani ethnic groups in this study Baluch Hazara Pashtun Tajik 6.2.3 Baluch 3 2 2 Hazara Pashtun Tajik 2 4 6 - Genetic Diversity 6.2.3.1 Gene Diversity The test of gene diversity (H) is analogous to the expected heterozygosity of diploid data and is determined by the probability that from a sample population, two randomly selected sequences or haplotypes will differ. The estimated gene diversity (Table 6.6) for the four Afghani populations in this study was calculated using the following equation by Nei (1987): H k 2 n 1 p n 1 i 1 i Where n is the number of mtDNA HVS-I sequences, k is the number of haplotypes and pi is the sample frequency of the i-th haplotype. The standard deviation of the heterozygosity is calculated by: The gene diversity was estimated using Arlequin vers. 3.5.1.2 (Excoffier & Lischer, 2010). Irwin et al., (2009b) calculated the genetic diversity for the Afghani population in Uzbekistan as 0.943 based upon both HVS-I and HVS-II sequence data, and when compared to the genetic diversity for the HVS-I sequence data for the Afghani populations in this study infers that they (the population from Irwin et al.) are less 174 diverse despite the larger mtDNA sequence studied. This may be caused by the population inside Uzbekistan remaining inside their own ethnic group and not admixing with the indigenous or alternative displaced populations thus causing a bottleneck of mtDNA haplotypes. Irwin’s study also identified the Afghani population inside Uzbekistan to have a high random-match probability of 5.5% indicating a greater number of shared haplotypes. The Afghan populations in this study exhibit high genetic diversities (≥0.9804 ±0.0284) which would be expected as a consequence of Afghanistan’s position as a thoroughfare to nearby regions north, south, east and west. Table 6.6: Gene diversity of the Afghani populations in this study Afghani Population Gene Diversity (H) & S.D. Baluch 0.9905 ±0.0281 Hazara 0.9974 ±0.0063 Pashtun 1.000 ±0.0270 Tajik 0.9804 ±0.0284 6.2.3.2 Nucleotide Diversity The nucleotide diversity calculates the probability that ‘two randomly selected homologous nucleotides differ’ (Excoffier & Lischer, 2010), and identifies the mean pairwise differences (π). The calculation (Tajima, 1983; Nei, 1987) and is equivalent to gene diversity, and is equally suitable for both sequence and RFLP data. Where dij represents the number of mutations to have occurred since the divergence of haplotypes i and j, k is the number of haplotypes, while pi and pj are the frequencies of haplotypes i and j, and n is the sample size. There are multiple methods in calculating the evolutionary distance dij (i) Jukes and Cantor method which applies the same mutation rate to all nucleotides, A, C, G and T (Jukes and Cantor, 1969), (ii) the arrangement of differing mutation rates for transitions and transversions as the former are more common and (iii) the arrangement of a mutation rate for each base polymorphism which is most suitable for analysis between species i.e. humans and other apes (Jobling, Hurles and Tyler-Smith, 2004). The mean haplotypes per sample population size (k/n) for the Afghani population in this study was 0.93, ranging from 0.89 among the Tajiks to 1.00 among the Pashtuns. The nucleotide diversity ranges from 0.011571 ±0.006788 to 0.022148 ±0.011718 (Table 6.4) while the mean number of pairwise differences (Table 6.7) ranges from 3.98 ±2.09 to 7.64 ±3.64 between the Tajiks and Hazaras. The Afghani 175 population of Uzbekistan (Irwin et al., 2009b) was identified to have a π value of 11.3 that is greater than those found in this study, which may be attributable to the larger mtDNA sequences used in that study and/or the sample sizes of the four Afghani ethnic groups in this study. Table 6.7: Mean number of pairwise differences between the Afghani populations Population Mean Number of Pairwise Differences (π) S.D. Baluch 7.295238 ±3.618935 Hazara 7.641026 ±3.639090 Pashtun 7.208791 ±3.595280 Tajik 3.980392 ±2.088066 The high genetic diversity observed in this study may be explained by the numerous settlement events by various populations migrating into Afghanistan and thus shaping its mtDNA landscape. Despite housing several ethnic groups, some admixture has been found among them here, as some haplotypes are shared across the ethnic boundaries which may be due to male-driven exogamy. AMOVA (Excoffier et al., 1992) was used to identify the mtDNA HVS-I sequence variation between the four ethnic populations as a single group and also as two groups which divided the populations by religion; those which practice the different denominations of Islam: Sunni Muslims (Baluch, Pashtuns and Tajiks) and Shi’a Musims (Hazaras). AMOVA uses the variance of gene frequencies and the number of mutations between molecular haplotypes (Excoffier & Lischer, 2010). The AMOVA analysis (Table 6.8) of the Afghani populations based upon mtDNA HVSI sequence data reported variation within populations accounted for >98% of the genetic variance. The AMOVA analysis was run again, this time however, the amount of genetic variation between the Afghani populations and 62 other populations was analysed. These mtDNA HVS-I sequences were of populations from Europe, Africa, the Near and Middle East, Central and South Asia and East Asia. Table 6.8: AMOVA results of variance within and among the Afghani populations and additional populations. Variation (%) Among Groups Among Populations, Within Populations P-Value Within Groups Religious Groups 1.89 -0.07 98.18 All Afghanis 1.30 - 98.70 0.08113 ±0.00735 66 Populations 7.78 - 92.22 0.00000 ±0.00000 176 Table 6.9: Pairwise differences between pairs of populations (1=Baluch, 2=Hazara, 3=Pashtun and 4=Tajik) 1 2 3 1 0.00000 2 0.00804 0.00000 3 -0.00725 0.01179 0.00000 4 0.02113 0.02353 0.00865 4 0.00000 Table 6.10: FST p-values between pairs of populations (1=Baluch, 2=Hazara, 3=Pashtun and 4=Tajik) 1 2 3 1 * 2 0.15315 ±0.0333 * 3 0.55856 ±0.0485 0.13514 ±0.0389 * 4 0.08108 ±0.0252 0.07207 ±0.0121 0.31532 ±0.0529 4 * 6.2.3.3 Theta Estimators Several methods were used to estimate the population parameter θ = 2Nfµ, where Nf is the female effective-population size and µ is the mutation rate. The mtDNA HVS-I region should exhibit the same mutation rate across the Afghani populations, therefore enabling θ to be determined by the effective number of women to have contributed their mtDNA genome to female offspring over past generations. Theta π (θπ). This measure estimates the effective size of the female population by using nucleotide diversity. In a population, this figure represents the number of females it would take to permutate the number of pairwise differences within the mtDNA sequence. Theta S (θs) is based upon the number of segregating (polymorphic) sites (S) assuming an infinite site model of genome evolution. The number of S is dependent on the number of DNA sequences, thus the number of segregating sites would be expected to increase as the population sample (number of DNA sequences) also increases. Genetic diversity can be estimated based upon S without the dependency of sequence quantity; when mutations are under no selection pressures and the randomly-mating population is at equilibrium. Θs is estimated using (Watterson, 1975): Here, S represents the total number of segregating sites and n is the number of mtDNA HVS-I sequences. Theta k (θk) is calculated by the number of haplotypes within the sample population assuming the infinite allele model of genome evolution. The infinite allele model 177 assumes that each occurring mutation within the DNA sequence has not previously emerged and instead generates a new allele. Θk shows the relationship between the number of haplotypes observed and the sample size. It was estimated using (Ewens, 1972): Where k is the number of haplotypes and n the number of DNA sequences. The theta estimators were calculated (Table 6.11) using Arlequin version 3.5.1.2 (Excoffier & Lischer, 2010). Table 6.11: Estimators of female effective population size based upon the number of pairwise differences (θπ), the number of segregating sites (θS) and the number of observed haplotypes (θk). Baluch Hazara Pashtun Tajik Mean S.D. n 15 40 14 18 - - Theta π 7.29524 7.64103 7.20879 3.98039 6.53136 1.71087 S.D. 4.05764 4.04258 4.03581 2.33504 3.61777 0.85520 Theta S 8.91879 13.40059 11.00583 6.68692 10.00303 2.87061 S.D. 3.56456 4.20073 4.39099 2.64674 3.70076 0.78653 Theta k 95.44546 363.89165 * 65.06110 * * S.D. ** ** * ** * * * cannot be deduced when all haplotypes differ ** 95% confidence interval limits for theta (k). The calculated θk value of the Hazara was approximately 4-5.5 times greater than the observed values for the Baluch and Tajiks. The Baluch, Hazara and Pashtuns though do exhibit a similar θπ value (7.209-7.641) which is approximately twice the observed for the Tajiks (3.980), while the θS values are relatively similar between the four populations; ranging from 6.687 in the Tajiks, to 13.401 among the Hazaras. These population estimators indicate the Hazaras have the most effective female population in their mtDNA heredity; however this may be a resultant of a higher population size than the Baluch, Pashtuns and Tajiks. 6.2.3.4 Mismatch Distribution A mismatch distribution here, is a graphic form of the number of pairwise differences between a collection of haplotypes in a population, but can also be utilised on both RFLP and microsatellite data. As well as illustrating the diversity within a population, also indicates the population’s demographic history. For instance, a population which is in 178 equilibrium will exhibit a ragged and multi-modal distribution, while populations which have recently experienced demographic expansion, the distribution will be unimodal and present a bell-shape (Rogers & Harpending, 1992). To determine whether the four Afghani populations present a multi- or unimodal distribution, a raggedness index (r) was calculated for each distribution. This statistic is the sum of the squared differences between neighbouring peaks and is calculated as defined by Harpending (1994): Where d is the greatest number of differences between alleles, and xi is the relative frequency of i pairwise differences. Smooth, unimodal distributions habitually have lower raggedness values (less than 0.03 for sequence data) than for multimodal distributions indicating a historical population expansion (Jobling, Hurles & Tyler-Smith, 2004). Figure 6.3: Baluch population mismatch distribution (y axis = frequency); (Raggedness index (r): 0.0324) Figure 6.4: Hazara population mismatch distribution (y axis = frequency); (Raggedness index (r): 0.0224) 179 The mismatch distributions for the Afghani populations with the expected curves under the observed equilibrium/expansion curves and raggedness indexes were generated using DnaSP ver. 5.10 (Figures 6.3-6.6). The Hazaras, Pashtuns and Tajiks show the unimodal curve expected for expanding populations and have raggedness index values <0.03, while the Baluch population distribution is less smooth and has a raggedned index >0.03 (0.0324). Figure 6.5: Pashtun population mismatch distribution (y axis = frequency); (Raggedness index (r): 0.0121) Figure 6.6: Tajik population mismatch distribution (y axis = frequency); (Raggedness index (r): 0.0270) Tajima’s D statistic (Tajima, 1989) can provide an indication of demographic processes as a result of testing the neutrality of a population. When a population is at equilibrium, and is therefore exhibiting the neutrality traits, the D value generated will be zero. Populations which possess positive D values can indicate balancing selection while negative values indicate population expansion. Each Afghani population presented a negative D value (Table 6.12) that was not significant, ranging from -1.796 for the 180 Hazara population to -1.052 for the Baluch population. This analysis was calculated using DnaSP ver. 5.10. Table 6.12: Tajima’s D statistic values for the Afghani populations using the total number of mutations and the total number of segregating sites and their statistical significance. Tajima’s D - using total Statistical Tajima’s D using total Statistical number of mutations Significance number of segregating sites Significance Baluch -1.05171 P>0.10 -1.05171 P>0.10 Hazara -1.79634 0.10>P>0.05 -1.76290 0.10>P>0.05 Pashtun -1.74649 0.10>P>0.05 -1.74649 0.10>P>0.05 Tajik -1.72745 0.10>P>0.05 -1.72745 0.10>P>0.05 6.3 Phylogenetic Network of the Afghani Population A Median Joining network was constructed (Figure 6.7) based upon the mtDNA HVS-I sequences of the Afghani ethnic groups. This network illustrates relationships between the different HVS-I haplotypes identified within the Afghani populations, separating the haplotypes by the polymorphic sites each possess. For instance, the haplotypes for samples 113 and 190 are initially identical as they branch away from the taxa, however beyond the node in the branch, the samples diverge as sample 190 gains one polymorphism and 113 three polymorphisms, each not present in the haplotype of the other. The Median Joining network, like some other analyses described previously, can infer demographic processes of the population(s) such as population expansion when the network presents a star-like phylogeny. A star-like phylogeny is when multiple branches of taxons derive from a common taxa. The Median Joining network based upon the HVS-I sequences of the Afghani populations exhibits a star-like phylogeny inferring the population is under expansion. 6.4 Mitochondrial DNA Genetic Barriers between Afghans and Other Populations Anatomically Modern Humans, as a result of our ever-growing population in conjunction with our expansive distribution exhibit lower genetic variation when compared to other apes (Jobling, Hurles & Tyler-Smith, 2004). This can be attributed to the human population’s placement under certain pressures, such as linguistic, cultural, religious and geographical; the separation of populations due to natural, physical barriers i.e. oceans/seas, mountain ranges, lakes and deserts. The Afghani populations in this study each speak an Indo-European language; the Baluch, Hazara and Tajiks speak a variant of 181 Persian; Baluchi by the Baluch and Dari by both the Hazara and Tajiks (Farr, 2009; Barfeild, 2010; Weinbaum, 2011) while the Pashtuns speak Pashto (Barfield, 2010). Using the Barrier program version 2.2 (Manni et al., 2004) the pairwise FST values derived from HVS-I haplotypes of the Afghani populations and of 3923 HVS-I sequences from 62 additional populations (whose mtDNA HVS-I sequences had been obtained from GenBank) were inputted to determine whether any genetic barriers could be identified between them in relation to their geographical positions. The program will likely identify an abrupt genetic difference between pairs of populations if a large geographical gap is present between them as the likelihood of admixture decreases as distance between the populations increase. This highlights the importance to include data from the intermediate populations to avoid the occurrence of false barriers. The first five genetic barriers identified using Barrier ver. 2.2 are shown in Figure 6.8 and the first ten barriers in Figure 6.9. Figure 6.7: Median Joining network calculated from the HVS-I sequences of the Afghani populations 182 Table 6.13: Co-ordinate values for the Afghani populations and the additional 62 population. Number (in Population Longitude Latitude Number (in reference to reference to Barrier test Barrier test output) output) Population Longitude Latitude 1 AfgBal 62,05 30,28 34 Karelians 33,50 63,44 2 AfgHaz 65,25 34,52 35 Kashmiri 76,51 34,08 3 AfgPas 65,71 31,61 36 KazakhIrw 70,20 41,78 4 AfgTaj 70,57 37,12 37 KikuyuKenya 36,55 01,16 5 Abazinian 39,21 44,04 38 KyrgIrw 71,67 40,99 6 AkhaChina 99,18 23,47 39 LisuYChina 96,57 26,01 7 Albanian 19,48 41,20 40 MokshaSVRiver 44,14 54,13 8 Altai 86,19 50,39 41 MongoBarga 106,53 47,55 9 Armenia 44,51 40,18 42 Morocco 06,83 34,01 10 AzerbRep 48,02 40,26 43 MukriIndia 74,57 15,02 11 Banglad 90,40 23,70 44 NubiaNSudan 30,41 20,07 12 Basque 01,59 42,41 45 Pakistan 73,03 31,40 13 Berber 03,67 32,48 46 RussiaIrw 31,26 58,54 14 Cantonese 110,26 21,12 47 Saami 21,48 65,31 15 Chechenian 45,42 43,19 48 SaliIndia 75,06 18,38 16 Cherkessian 41,44 43,53 49 Sardinians 09,04 40,06 17 ChinaGuang 113,26 23,12 50 Somali 45,02 02,45 18 ChinaMongol 111,74 40,84 51 Spain 03,70 40,41 19 DaiSYChina 101,15 21,55 52 TajIrw 67,38 39,20 20 English 00,44 51,47 53 Tibetian 91,11 29,64 21 Estonian 24,75 59,44 54 Turkey 32,85 39,92 22 Ethiopia 38,74 09,02 55 TurkmenIrw 66,83 37,83 23 Finland 24,93 60,16 56 IndiaReddy 79,16 18,01 24 Georgia 44,79 41,70 57 IndiaChaturvedi 77,56 26,40 25 IndAdia 95,04 28,61 58 IndiaBrahmin 75,15 31,30 26 IndNagan 94,10 25,67 59 IndiaBhargava 80,05 27,34 27 IndNisha 92,90 27,27 60 IranAra 48,40 30,39 28 IndNorthern_Sikh 75,22 31,15 61 IranAZE 48,17 38,05 29 IndPashtuns 80,03 27,34 62 IranBaluch 61,12 28,13 30 IndSouthern_Sril 80,46 07,29 63 IranFars1 51,24 35,41 31 Italy 13,40 41,56 64 IranGilak 49,35 37,16 32 Japan 139,69 35,68 65 IranJonobi 55,54 28,18 33 KabardinianNC 43,31 43,25 66 IranKord 47,01 35,18 183 The first five barriers (Figure 6.8) are described as the following: 1- The first barrier separates the Ethiopian, Kenyan and Somali (numbered 22, 37 and 50 on map) populations of Africa from the North Sudan (44 on map), European, Near East and West Eurasian, Central, South and East Asian populations. 2- The second barrier separates the Altai, Barga Mongols, Chinese Mongols, Japanese, Bangladeshi, Tibetian, Indian Nisha, Adia and Naga, Chinese Lisu, Akha and Dai, Cantonese and Guangdong populations (identified as numbers 8, 41, 18, 32, 11, 53, 27, 25, 26, 39, 6, 19, 14 and 17 on map) from the populations of Central and South Asia. 3- The third barrier isolates the Saami population (47 on map) from all European populations such as the Finns and Estonians (23 and 21), Karelians (34), Russians (46) and English (20). 4- The fourth barrier separates the Kenyan and Somali populations from the Ethiopian population. 5- The fifth barrier isolates the Moroccan population (numbered 42 on map) from the Berbers, Spanish, Basque, Italian and Sardinian (named 13, 51, 12, 31 and 49) populations of Europe. The sixth-tenth genetic barriers on the HVS-I sequence data (Figure 6.9) are described as: 6- The sixth barrier isolates the Indian Bhargava population (number 59) in South Asia from the adjacent Indian Chaturvedi population (57) and the other South Asian populations. 7- The seventh barrier separates the Iranian Baluch population (62 on map) from all other populations; including the Iranian Jonobi and Afghani populations (65, 1, 2, 3 and 4). This barrier separates the Iranian Baluch population from the Afghani Baluch. 8- The eighth barrier isolates the Pakistani population (45) from the neighbouring Central Asian populations (1, 2, 3, 4, 36, 38, 52 and 55) and the Kashmiri (35) and Indian Brahmin (58) populations to the east. Also, in South Asia, the Saliya and Mukri populations of India (48 and 43) are separated from the Indian Reddy (56) and the Sri Lankan population (30 on the map). 9- The ninth barrier separates the Altai and Barga Mongols from the East Asian populations of the second barrier. 10- The tenth barrier separates the Bangladeshi, Tibetian and Indian Naga populations (numbered 11, 53 and 26 on the map) from the Indian Nisha and Adia (27 and 25) and the Chinese and Japanese populations. 184 185 186 The barrier analysis shows no significant genetic barrier between the Afghani populations when analysed with the 62 additional populations. However, a genetic barrier has been determined between the Iranian Baluch population of south-eastern Iran and the Afghani Baluch of south-western Afghanistan. This barrier may be attributable to a geographical barrier; the Dasht-e Margo desert in Afghanistan’s south west or the Hamun lakes in eastern Iran and south-western Afghanistan. A second barrier was identified which separated the Pakistani population from the Afghani populations to its west and from the Indian groups to its east. The barrier analysis also shows that among the top ten genetic barriers determined from this mtDNA HVS-I sequence collection, no significant genetic barrier between the Afghani populations and the Central Asian populations is present. 187 Chapter Seven Y-Chromosome Analysis of Afghani Ethnic Groups Haber et al., (2012) Afghanistan’s Ethnic Groups Share a Y-Chromosomal Heritage Structured by Historical Events. PLoS One 7:e34288. 188 PLoS One Afghanistan’s Ethnic Groups Share a YChromosomal Heritage Structured by Historical Events Marc Haber1,2, Daniel E. Platt3, Maziar Ashrafian Bonab4, Sonia C. Youhanna1, David F. SoriaHernanz2,7, Begoria Martinez-Cruz2, Bouchra Douaihy1, Michella Ghassibe-Sabbagh1, Hoshang Rafatpanah5, Mohsen Ghanbari5, John Whale4, Oleg Balanovsky6, R. Spencer Wells7, David Comas2, Chris Tyler-Smith8, Pierre A. Zalloua1,9*, The Genographic Consortium” 1 The Lebanese American University, Chouran, Beirut, Lebanon. 2 Evolutionary Biology Institute, Pompeu Fabra University, Barcelona, Spain. 3 Bioinformatics and Pattern Discovery, IBM T.J. Watson Research Centre, Yorktown Heights, New York, United States of America. 4 Biological Sciences, School of Biological Sciences, University of Portsmouth, Portsmouth, United Kingdom. 5 Mashhad University of Medical Sciences, Mashhad, Iran. 6 Research Centre for Medical Genetics, Russian Academy of Medical Sciences, Moscow, Russia. 7 The Genographic Project, National Geographic Society, Washington D.C., United States of America. 8 Wellcome Trust Genome Campus, The Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, United Kingdom. 9 Harvard School of Public Health, Harvard University, Boston, Massachusetts, United States of America. Abstract Afghanistan has held a strategic position throughout history. It has been inhabited since the Paleolithic and later became a crossroad for expanding civilizations and empires. Afghanistan’s location, history, and diverse ethnic groups present a unique opportunity to explore how nations and ethnic groups emerged, and how major cultural evolutions and technological developments in human history have influenced modern population structures. In this study we have analyzed, for the first time, the four major ethnic groups in present-day Afghanistan: Hazara, Pashtun, Tajik and Uzbek, using 52 binary markers and 19 short tandem repeats on the non-recombinant segment of the Y-Chromosome. A total of 204 Afghan samples were investigated along with more than 8,500 samples from surrounding populations important to Afghanistan’s history through migrations and conquests, including Iranians, Greeks, Indians, Middle Easteners, East Europeans and East Asians. Our results suggest that all current Afghans largely share a heritage derived from a common unstructured ancestral population that could have emerged during the Neolithic revolution and the formation of the first farming communities. Our results also indicate that inter-Afghan differentiation started during the Bronze Age, probably driven by the formation of the first civilizations in the region, increasing inter-population genetic differences, and giving the Afghans a unique genetic diversity in Central Asia. Citation: Haber M, Platt DE, Ashrafian Bonab M, Youhanna SC, Soria-Hernanz DF, et al. (2012) Afghanistan's Ethnic Groups Share a Y-Chromosomal Heritage Structured by Historical Events. PLoS ONE 7(3): e34288. doi:10.1371/journal.pone.0034288 Editor: Manfred Kayser, Erasmus University Medical Center, The Netherlands Received November 21, 2011; Accepted February 25, 2012; Published March 28, 2012 Copyright: © 2012 Haber et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This study is supported by the Waitt Family Foundation. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: Daniel E Platt is an employee if IBM. With regard to the Genographic Consortium: Janet Ziegle is employed by Applied Biosystems, and Pandihumar Swamikrishnan, Asif Javed, Laximi Parida and Ajay K. Royyuru are employed by IBM. There are no patents or products in development or marketed products to declare. This does not alter the authors' adherence to all the PLoS ONE policies on sharing data and materials. * E-mail: pierre.zalloua@lau.edu.lb " Membership of the Genographic Consortium is provided in the Acknowledgments. Introduction Afghanistan is landlocked country at the intersection of Central Asia, South Asia, and the Middle East that has held a strategic position throughout history. It was a crossroad of ancient trade routes and human migrations. The main east-west trade routes passed through its northern and southern plains, and through its mountain passes before the ascendancy of waterborne trade between Europe and the Far East. Paleolithic humans probably inhabited the caves of Afghanistan as long as 50,000 years ago (ya). In northern Afghanistan, flake tools found in Dara Dadil, Darra Chakhmakh, and elsewhere indicate the probable existence of 189 Middle Paleolithic industries [1]. Northern Afghanistan also sits in a region of the development of the earliest agricultural communities, marked by domestication of the wheat/barley, sheep/goat/cattle complex leading to the Neolithic revolution (10,000-7,000 ya), later supporting the economy of early urban Bronze Age civilizations in Central Asia at the Bactria-Margiana Archaeological Complex (4300-3700 ya) and in India at the Indus Valley (53003800 ya) [2] has been proposed that the decline of these early civilizations was accompanied by, or was the result of, the expanding populations from the Eurasian steppe, reaching the Indian subcontinent in the late Harappan period [3]. The second and first millennia BCE were also marked by the influx of Iranian tribes, later ruling Afghanistan as part of the Achaemenid Empire established by Cyrus the Great (550 BCE) [4]. The military might of the Achaemenids was destroyed by Alexander the Great, bringing Hellenic language and culture to the region. During the next several centuries, control over Afghanistan was contested among the Seleucids, Bactrians, Parthians, and Indians of the Mauryan dynasty [5]. The first century CE, brought a new invasion of Iranian tribes under the leadership of the Kushan tribes, who adopted and spread Buddhism. After they have conquered most of Persia, Arabic armies invaded Afghanistan spreading Islam. Mongol and TurcoMongol expansions brought turmoil to the region, marked by periods of instability to the Silk Road traffic [4], which was later reduced permanently with the establishment of European maritime trade systems. The present population of Afghanistan contains many diverse elements, the result of large-scale migrations and conquests that influenced its culture and demography. Pashtuns are the largest ethnic group in Afghanistan, accounting for about 42 percent ofthe population, with Tajiks (27%), Hazaras (9%), Uzbeks (9%), Aimaqs (4%), Turkmen people (3%), Baluch (2%), and other groups (4%) making up the remainder [6]. In the present study, eight ethnic groups were examined, with a focus on the largest four groups: -The Pashtuns, traditionally lived a seminomadic lifestyle, they reside mainly in southern and eastern Afghanistan and in western Pakistan. They speak Pashto which is a member of the Eastern Iranian languages. - The Tajiks are a Persian-speaking ethnic group which are closely related to the Persians of Iran. In Afghanistan, they are the largest Tajik population outside their homeland to the north in Tajikistan. - The Hazara population speaks Persian with some Mongolian words. They believe they are descendants of Genghis Khan's army that invaded during the twelfth century. -The Uzbeks are a Turkic speaking group that have been living a sedentary farming lifestyle in Northern Afghanistan. While previous theories about the origin of the Afghans are usually based on oral traditions or scanty historical information (Table S1), few studies have explored the genetic structure of the Afghan people, and those that did were limited to either listing of autosomal short tandem repeats (STRs) frequencies [7,8] or Y-chromosome STR analysis in a single ethnic group [9]. In this study, we present an extensive analysis of the Y-chromosomal variation in the major ethnic groups of Afghanistan. We provide, for the first time, deep phylogenetic information on Afghan haplogroup memberships, and we also analyze 19 Y-chromosomal STRs allowing fine comparisons across and among populations. We use this information to explore whether the ethnic groups in Afghanistan reflect different social systems that arose in a common population or whether cultural differences are founded on already existing genetic differences. We also seek to understand the genetic composition of modern Afghans in the context of surrounding populations as well as other possible source populations, identifying traces of historical movements that influenced the different ethnic groups, and exploring how the establishment of the first civilizations in the region affected the present Afghan genetic diversity. Materials and Methods Ethics Statement All participants recruited and genotyped in the present study had at least three generations of paternal ancestry in their country of birth, and provided details of their geographical origin and written consent for this study, which was approved by the IRB of the Lebanese American University. Subjects and Comparative Datasets The modern populations selected for this study were those from regions with ancient historical importance to Afghanistan through conquest or migration, including Iranians, Greeks and Indians, in addition to populations with more recent impacts, such as the Arab expansion in the 7th century and the East Asian invasions in the 13th and 14th century. In addition, we have also included populations from the Pontic-Caspian steppe region, from West Russia and East Europe, which were possibly involved in the Indo-European migrations that reached the Iranian plateau and Northern India. A total of 8,706 samples were used in the analyses including 204 newly genotyped samples from Afghanistan. The genotyping results and the 190 subjects' paternal province and their city or village of origin when available are listed in Table S2. The dataset used include Middle Easterns (2,720 samples) [10,11,12,13,14], Central/South Asians (1,335 samples) [15,16,17,18], East Asians (1,029 samples) [15,19], Caucasians (1,525 samples) [20], West Russians (545 samples) [21], Europeans (1,123 samples) [21,22,23, 24,25], and Africans (222 samples) [26,27]. More details on the analyzed samples are listed in Table S3. Genotyping DNA was extracted from blood or buccal swabs using a standard phenolchloroform protocol. Samples were genotyped using the Applied Biosystems 7900HT Fast Real-Time PCR System with a set of 52, highly informative, custom Y-chromosomal binary marker assays (Applied Biosystems, Foster City, CA) from the non-recombining portion of the Y chromo-some which define 32 different haplogroups. A total of 19 Y-chromosome STR loci were analyzed for each sample in two multiplexes on an Applied Biosystems 3130xl Genetic Analyzer. The first multiplex contained the standard 17 loci of the Y-filerTM PCR Amplification kit (Applied Biosystems, Foster City, CA). The remaining two loci, DYS388 and DYS426, were genotyped in a custom multiplex. STR alleles were named according to previous recommendations [28]. Statistical Analyses Haplogroup Frequencies and Principal Component Analysis, Fisher's exact tests were performed on haplogroups vs populations to identify which haplogroups were significantly over- or under- represented in Afghanistan's ethnic groups. A principal component analysis (PCA) [29], was performed on relative haplogroup frequencies normalized within populations, centered, and without variance normalization. Since haplogroup resolution was not uniform across studies, the haplogroups were reduced to the most informative derived markers shared across studies. Genetic Distances, Multidimensional Scaling and Barrier Analysis, Non-metric multidimensional scaling (MDS) [30] was performed using WST distances between populations computed by ARLEQUIN [31] on Y-STR loci DYS19, DYS389I, DYS389b, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, DYS439, DYS448, DYS456, DYS458, DYS635, GATA H4. Monmonier's maximum difference algorithm [32] was imple-mented using Barrier [33]. The algorithm enables interpretation of microevolutionary processes in a geographic context, identifying genetic barriers that can be visualized on a map. AMOVA, Significance of population structures created by Barrier was tested using AMOVA [34], implemented in ARLEQUIN [31]. We also tested whether geography or Barrier structures better explained the present distribution of diversity. AMOVA seeks to identify variance within populations due to drift by comparing variation among groups of similar populations via a nested analysis of variance. First, populations were grouped according to their geographic location as follows; 1- Afghanistan: Pashtun, Tajik, Uzbek, Hazara. 2- East Europe: Belarus, West Russia. 3Caucasus: Avar, Darginian, Lezgi, Abkhazian, Circassian. 4- Middle East and Europe: Greece, Turkey, Lebanon, Syria. 5-Iran: East Azerbaijan, Markazi, Mazandaran, Qazvin, Sistan and Baluchistan. 6- India: North, West, South. Second, populations were grouped according to the identified barriers; 1Pashtun, Tajik, North India, West India. 2- Hazara, Uzbek 3- Caucasus: Avar, Darginian, Lezgi. 4- Caucasus: Circassian, Abkhazian, 5- Iran: East Azerbaijan, Markazi, Mazan-daran, Qazvin, Sistan and Baluchistan. 6Belarus, West Russia. 7-Middle East and Europe: Greece, Turkey, Lebanon, Syria. Reduced Median Networks, Reduced Median (RM) Networks [35] of STR haplotypes within C-M130, R1a1a-M17, E1b1b1-M35, and B-M60 were calculated using a reduction threshold of 1, with no STR weighting. BATWING, We applied BATWING [36] to compute candidate population splits in the modal tree among regional populations within and around Afghanistan in order to test whether BARRIER-identified population separations also showed older splits, exploring multiple combinations of populations. The Hastings-Metropolis algorithm will tend to select larger likelihoods for the leading genetic support assuming all the populations originally emerged from one population with no genetic flow subsequent to each splitting event. This provides a very specific view in determining genetic relationships among the populations which could be compared and contrasted with other methods, such as MDS or BARRIER [33]. STRs used were those described under the MDS section above. The mutation rate priors applied to these calculations were those proposed in Xue et al. [19] based on Zhivotovsky et al.'s rate estimates [37]. There are differences between mutation rates that appear to accumulate over multiple generations (an ''evolutionary rate'') versus those that accumulate from generation to generation (a ''genealogical rate'') [38], which appears yet unresolved. Nevertheless, the topology of the population splits BATWING predicts, and the relative periods of isolation are proportionately unaffected. Therefore, the population split trees still serve for comparison with BARRIER and other methods regardless of the mutation rate. Effective population sizes tend to scale inversely with the rates, with a slight impact due to the effective population size prior. Use ofthe Zhivotovsky rates in prior publications allows for comparisons with other publications that applied the same rates. The data were partitioned into multiple runs (Table S6). The independent computation of multiple trees with different subsets and groupings of populations should produce similar population splits and ages of population divisions among configurations. One caveat is that inclusion of other populations may provide more support to different candidate modal trees. Therefore, compari-sons among multiple runs provide a consistency check for convergence and stability: each of the runs must correspond with the others at the points of their shared topologies. Given agreement between BATWING runs, a composite tree comprised of these multiple runs, and connected through shared branches, can be constructed. The Indian populations structures resulted in slower equilibra-tion than was seen among the other populations. After equilibration, the Indian populations showed older splits among them than is shown between India as a whole and the other populations when India is pooled. This older split may have resulted partly from differences in weights among candidate trees that the Metropolis-Hastings algorithm samples based on the likelihood ratios derived from the population configurations that will lead to different modal trees with different split times. 191 Alternatively, the older split may have also resulted from violations of the assumption of isolation after population splitting. These complications led to the separate treatments of India BATWING runs from the western populations runs. Results Genotyping revealed 32 halpogroups present in Afghanistan's ethnic groups among our samples. Haplogroups R1a1a-M17, C3-M217, J2-M172, and LM20 were the most frequent when Afghan ethnic groups were pooled, together comprising >66% of the chromosomes. Absolute and relative haplogroup frequencies are tabulated in Table S4. Haplogroup frequencies across the major ethnic groups revealed large differences. In particular, frequencies of haplogroup C3-M217, which is mainly found in East Asia, and haplogroup R1a1a-M17, which is found in Eurasia, varied substantially among the Afghan groups. C3-M217 was significantly more frequent (p = 4.55 x10"9) in Uzbeks (41.18%) and Hazaras (33.33%) than it was in Tajiks (3.57%) and Pashtuns (2.04%). On the other hand, R1a1a-M17 was significantly more frequent (p = 3.00x10 6 ) i n Pashtuns (51.02%) and Tajiks (30.36%) than in Uzbeks (17.65%) and Hazaras (6.67%). RM networks of C3-M217 (Figure S1A) and R1a1a-M17 (Figure S1B) show that when a haplogroup was infrequent in an ethnic group, its haplotypes existed on branches not shared with other Afghans, suggesting that the underrepre-sented haplogroups are not the result of a gene flow between the ethnic groups, but probably a direct assimilation from source populations. Haplogroups autochthonous to India [15]; L-M20, H-M69, and R2a-M124 were found more (p = 0.004) in Pashtuns (20.41%) and Tajiks (19.64%) than in Uzbeks (5.88%) and Hazaras (5%). E1b1b1-M35 was found in Hazaras (5%) and Uzbeks (5.88%) but not in Pashtuns and Tajiks. RM network of E1b1b1-M35 (Figure S1C) shows that Afghanistan's lineages are correlated with Middle Easterners and Iranians. We also note the presence of the African B-M60 only in Hazara, with a relatively recent common founder ancestor from East Africa as shown in the RM network (Figure S1D). PCA of the haplogroups frequency (Figure 1) also shows differences among Afghans. Although the worldwide populations are mostly clustered according to geography, Afghan groups appear to show more affinity to nonAfghans than to each others. Pashtun and Hazara in Afghanistan and Pakistan show affinity to their ethnic groups across borders. The Afghan Tajiks show equal distance to Central Asia and to Iran/Caucasus/West Russia. The Afghan Hazara, Afghan Uzbek, and Pakistan Hazara sit between East Asia and the Middle East/Europe-Caucasus/West Russia cluster. More details about the structure of the Afghan population appear in the MDS of the W S T 's (Figure 2B) which shows that the Afghan Pashtun and Tajik are closer to North and West Indians than to the other Afghans; Hazara and Uzbek. This cluster also sits between East Europeans and Iranians more close to the Iranians especially to East Azerbaijan. Furthermore, Barrier (Figure 2A) shows that Barrier IV splits the Afghan populations separating the Hazara and Uzbek from the Pashtun, Tajik and the Indian populations, creating groups of populations that have less variation within the groups (2.30%, p<0.001) and more variation among groups (10.48%, p<0.001) compared to populations grouped by region or country (within groups = 4.95%, p<0.001, among groups = 7.16%, p<0.001) (Table S5). To explore the time depth in which the above reported structures have emerged, we employed BATWING to create hypotheses on historical population splitting and coalescent events, Figure 1. PCA derived from Y-chromosomal haplogroup frequencies. The two leading principal components display the variance. The superimposed biplot shows the contribution of each haplogroup as grey component loading vectors. doi:10.1371/journal.pone.0034288.g001 reflecting dominating genetic ancestral structures identified in BATWING's modal trees from which the current populations have emerged (Table S6). The BATWING results showed that most of the regional splits occurred around 10 kya (95% CI 7,100-15,825) (Figure 3). These splits coincide with post LGM expansions that have led to the Neolithic agricultural revolution. During this period Afghans, Iranians, Indians and East Europeans most likely emerged as distinct unstructured populations. BATW-ING showed another wave ofsplits that started later and may have created the interpopulation structures. This second wave of splits started in Afghans 4.7 kya (95% CI 2,775-7,725), marking the 192 start of civilization building and displacements, and these splits appear to have continued to nearly modern times. BATWING results in general corroborated the geographical splits identified by BARRIER. Results This study describes for the first time the Y-chromosome diversity of the main ethnic groups in Afghanistan. We have Figure 2. Population genetic structure vs geography. Genetic barriers (A) and MDS plot (B) based on the W 5r's distances between populations derived from Y-STR data. doi:10.1371/journal.pone.0034288.g002 explored the genetic composition of modern Afghans and correlated their genetic diversity with well established historical events and movements of neighbouring populations. The study data strongly shows that continuous migrations and movements through Central Asia since at least the Holocene, have created population structures that today, are highly correlated with ethnicity in Afghanistan. A previous study on Pakistan [39], that included ethnic groups also present in Afghanistan (Baluch, Hazara, Pashtun), showed that Ychromosome variation was structured by geography and not by ethnic affiliation. With the exception of Hazara, all ethnic groups in Pakistan were shown to have similar Y-chromosome diversity, they clustered with South Asians, and they are close to Middle Eastern males. A Y-chromosome study [40] on populations from Turkmenistan, Uzbekistan, Kazakhstan, 193 Kyrgyzstan, and Tajikstan, found that there is greater diversity among populations that share the same ethnic group than among the ethnic groups themselves. These observations support a common genetic ancestry hypothesis for these populations irrespective of ethnicity. We have also found substantial differences among the various groups of Afghanistan. The interethnic comparisons however could not be tested in this study since information on tribe and clan affiliation was not available. The high genetic diversity observed among Afghanistan s groups has also been observed in other populations of Central Asia [41,42,43,44,45]. It is possibly due to the strategic location ofthis region and its unique harsh geography of mountains, deserts and steppes, which could have facilitated the establishment of social organizations within Figure 3. Composite BATWING population splitting. The composite tree is constructed from data sets described in the text, based on the results displayed in Table S6, with a pruned leading topology and averaged times. Numbers indicate branch lengths measured in thousand years. doi:10.1371/journal.pone.0034288.g003 expanding populations, and helped maintaining genetic boundaries among groups that have developed over time into distinct ethnicities. The RM networks of the major common haplogroups show that the flow of paternal lineages among the various ethnic groups is very limited, and it is consistent with high level of endogamy practiced by these groups. Similar Ychromosome results have been previously reported among the Central Asian ethnic groups [40], but with less pronounced genetic differentiation in maternal lineages [40], most likely the results of endogamous practices that were tolerant to assimilation of foreign females. The prevailing Y-chromosome lineage in Pashtun and Tajik (R1a1a-M17), has the highest observed diversity among populations of the Indus Valley [46]. R1a1a-M17 diversity declines toward the Pontic-Caspian steppe where the mid-Holocene R1a1a7-M458 sublineage is dominant [46]. R1a1a7-M458 was absent in Afghanistan, suggesting that R1a1a-M17 does not support, as previously thought [47], expansions from the Pontic Steppe [3], bringing the Indo-European languages to Central Asia and India. MDS and Barrier analysis have identified a significant affinity between Pashtun, Tajik, North Indian, and West Indian populations, creating an Afghan-Indian population structure that excludes the Hazaras, Uzbeks, and the South Indian Dravidian speakers. In addition, gene flow to Afghanistan from India marked by Indian lineages, L-M20, H-M69, and R2a-M124, also seems to mostly involve Pashtuns and Tajiks. This genetic affinity and gene flow suggests interactions that could have existed since at least the establishment of the region s first civilizations at the Indus Valley and the Bactria-Margiana Archaeological Complex. Furthermore, BATWING results indicate that the Afghan populations split from Iranians, Indians and East Europeans at about 10.6 kya (95% CI 7,100-15,825), which marks the start of the Neolithic revolution and the establishment of the farming communities. In addition, Pashtun split first from the rest of the Afghans around 4.7 kya (95% CI 2,775-7,725), which is a date marked by the rise of the Bronze Age civilizations of the region. These dates suggest that the differentiation of the social systems in Afghanistan could have been driven by the emergence of the first urban civilizations. However, the dates suggested by BATWING should be treated with care, since BATWING does not model gene flow and differential assimilation of incoming migrations. These events could alter the time of split. However, it was previously shown that topologies and times of splits in the modal trees generated by BATWING are insensitive to inmigration [13], which leaves BATWING timing results insusceptible to inmigrations and invasions that might be expected to reduce the times of split [13]. On the other hand, the times of population splits for BATWING s modal trees are very susceptible to subsequent migration between those populations. This means that the 2 major waves of splitting could have occurred earlier, but since RM networks of the major haplogroups show limited gene flow between the ethnic groups and since the population structure suggested by MDS and Barrier correlate populations from the historically connected [2] Bronze Age sites to Pashtun and Tajik, BATWING suggested splits in Afghan populations at 4.7 kya (95% CI 2,775-7,725) are very probable. A previous study by Heyer et al conducted in Central Asia [40] have also estimated significantly older dates for the emergence of ethnic groups from what has been historically known. These older dates may be explained by the fact that This suggests that the ethnic groups could have resulted from a encompass fusion of different populations [40] or that ethnicities developed were established from anin already structured population(s). BATWING’s hypotheses model mutations and coalescent events, reflecting ancestral structures from which the current populations have emerged. Later expansions into the region would have assimilated the ancestral population, granting the Afghans distinctive genetics from the expanding source populations even though they shared general genetic features. This is evident in the Afghan Hazara and Afghan Uzbek who have always been associated with expanding Mongols and Turco-Mongols. Although we have found that at least third to half of their chromosomes are of East Asian origin, PCA places them between East Asia and Caucasus/Middle East/Europe clusters. Historical expansions and invasions appear to have had differential contribution in shaping Afghanistan population structures. We have found limited genetic evidence of expansions previously thought to have left specific imprints in current populations. 194 The E1b1b1-M35 lineages in some Pakistani Pashtun were previously traced to a Greek origin brought by Alexander s invasions [48]. However, RM network of E1b1b1-M35 found that Afghanistan s lineages are correlated with Middle Easterners and Iranians but not with populations from the Balkans. The Islamic invasion in the 7th century CE left an immense cultural impact on the region, with reports of Arabs settling in Afghanistan and mixing with the local population [49]. However the genetic signal of this expansion is not clearly evident: some Middle Eastern lineages such as E1b1b1-M35 are present in Afghanistan, but the most prevalent lineage among Arabs (J1-M267) was only found in one Afghan subject. In addition, the three Afghans that identified their ethnicity as Arab, had lineages autochthonous to India. We also note that three Hazara subjects belonged to haplogroup B-M60, which is very rare outside Africa. RM network shows that the subjects had a recent founding ancestor from East Africa, which could have been brought Table S1 Suggested origins of the main ethnic groups in Afghanistan. (DOC) Table S2 Y-chromosome haplogroups and haplotypes in 204 unrelated individuals from Afghanistan. (XLS) Table S3 Populations selected for this study. (XLS) Table S4 Y-chromosome haplogroups frequencies in Afghani-stan s ethnic groups. (XLS) Table S5 AMOVA results. Comparing populations grouped according to their country or region of origin with populations grouped according to Barrier structures. (DOC) Table S6 BATWING topologies and dates with 95% confidence intervals of population splits derived from multiple combinations of population subsets. (XLS) to Afghanistan through slave trade. This shows that the genetic ethnic boundaries have been selectively permeable, however the history of the rules of assimilation in this region over time are not yet clearly understood. Language adoption and spread in Afghanistan also seem to have been a complex process. The Afghan genetic structure tends to correlate Hazara and Uzbek which belong to two different language families. Hazara, like Pashtun and Tajik, belong to the Indo-Iranian group of the Indo-European family, while the Uzbek language is in the Turkic family. The form of Turkic spoken by the Uzbek appears to be a direct descendent of an extinct Turkic language that was developed in the 15th century CE [50]. It appears that the dominating genetics shared among Uzbek and Hazara split >1 ky prior to this date. Therefore, it is possible that language differences in Afghanistan reflect a more recent cultural shift. In conclusion, Y-chromosome diversity in Afghanistan reveals major differences among its ethnic groups. However, we have found that all Afghans largely share a heritage of a common ancestral population that emerged during the Neolithic revolu-tion and remained unstructured until 4.7 kya (95% CI 2,775-7,725). The first genetic structures between the different social systems started during the Bronze Age accompanied, or driven, by the formation of the first civilizations in the region. Later migrations and invasions to the region have been differentially assimilated by the ethnic groups, increasing inter-population genetic differences, and giving the Afghan a unique genetic diversity in Central Asia. Supporting Information Figure S1 Reduced median networks. (A) C-M130, (B) R1a1a-M17, (C) E1b1b1-M35, and (D) B-M60 showing STR haplotype distributions among populations; area is proportional to haplotype frequency, and color indicates populations. Connecting lines represent putative phylogenetic relationships between haplotypes. (TIF) Acknowledgements We thank the sample donors for taking part in this study. We also thank Dr. Christopher Thornton and Mr. Brian Johnsrud for their insightful comments. CTS is supported by The Wellcome Trust. The Genographic Project is supported by funding from the National Geographic Society, IBM, and the Waitt Family Foundation. Members of the Genographic Consortium: Janet S. Ziegle (Applied Biosystems, Foster City, California, United States); Li Jin & Shilin Li (Fudan University, Shanghai, China); Pandikumar Swamikrishnan (IBM, Somers, New York, United States); Asif Javed, Laxmi Parida & Ajay K. Royyuru (IBM, Yorktown Heights, New York, United States); Lluis Quintana-Murci (Institut Pasteur, Paris, France); R. John Mitchell (La Trobe University, Melbourne, Victoria, Australia); Syama Adhikarla, ArunKumar GaneshPrasad, Ramasamy Pitchappan & Arun Varatharajan Santhakumari (Madurai Kamaraj University, Madurai, Tamil Nadu, India); Angela Hobbs & Himla Soodyall (National Health Laboratory Service, Johannesburg, South Africa); Elena Balanovska (Research Centre for Medical Genetics, Russian Academy of Medical Sciences, Moscow, Russia); Daniela R. Lacerda & Fabricio R. Santos (Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil); Pedro Paulo Vieira (Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil); Jaume Bertranpetit & Marta Mele (Universitat Pompeu Fabra, Barcelona, Spain); Christina J. Adler, Alan Cooper, Clio S. I. Der Sarkissian & Wolfgang Haak (University of Adelaide, South Australia, Australia); Matthew E. Kaplan & Nirav C. Merchant (University of Arizona, Tucson, Arizona, United States); Colin Renfrew (University of Cambridge, Cambridge, United Kingdom); Andrew C. Clarke & Elizabeth A. Matisoo-Smith (University of Otago, Dunedin, New Zealand); Matthew C. Dulik, Jill B. Gaieski, Amanda C. Owings, Theodore G. Schurr & Miguel G. Vilar (University of Pennsylvania, Philadelphia, Pennsylvania, United States). Author Contributions Conceived and designed the experiments: MH DP PZ. Performed the experiments: MH SY BD MGS. Analyzed the data: MH DP MAB DSH BMC. Contributed reagents/materials/analysis tools: MAB HR MG OB JW. Wrote the paper: MH PZ. Revised the manuscript: RSW DC CTS. References 1. 2. 3. Dupree L (1964) Prehistoric Archeological Surveys and Excavations in Afghanistan: 1959-1960 and 1961-1963. Science 146: 638-640. Dupree L (1980) Afghanistan. Princeton, NJ: Princeton University Press. 778 p. Gimbutas M (1970) Proto-Indo-European Culture: The Kurgan Culture during the Fifth, Fourth, and Third Millennia B.C. In: Cardona G, Hoenigswald M, Senn A, eds. Indo-European and Indo-Europeans: Papers Presented at the Third Indo-European Conference at the University of Pennsylvania. Philadel-phia, PA: University of Pennsylvania Press. pp 155-197. 195 4. 5. 6. 7. Wilber D (1962) Afghanistan: Its people, its society, its culture. New Haven, CT: Hraf Press. Elizabeth E, Sarkhosh CV (2007) From Persepolis to the Punjab : exploring ancient Iran, Afghanistan and Pakistan. London: British Museum Press. Library of Congress. Federal Research Division (2001) Afghanistan : a country study. Baton Rouge, LA: Claitor s Pub. Division. xlv, 226 p. Berti A, Barni F, Virgili A, Iacovacci G, Franchi C, et al. (2005) Autosomal STR frequencies in Afghanistan population. J Forensic Sci 50:1494-1496. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. Di CristofaroJ, Buhler S, Temori SA, ChiaroniJ (2012) Genetic data of 15 STR loci in five populations from Afghanistan. Forensic Sci Int Genet 6(1): e44-45. Lacau H, Bukhari A, Gayden T, La Salvia J, Regueiro M, et al. (2011) YSTR profiling in two Afghanistan populations. Leg Med (Tokyo) 13:103-108. Alakoc YD, Gokcumen O, Tug A, Gultekin T, Gulec E, et al. (2010) Ychromosome and autosomal STR diversity in four proximate settlements in Central Anatolia. Forensic Sci Int Genet 4: e135-137. Cinnioglu C, King R, Kivisild T, Kalfoglu E, Atasoy S, et al. (2004) Excavating Y-chromosome haplotype strata in Anatolia. Hum Genet 114: 127-148. El-Sibai M, Platt DE, Haber M, Xue Y, Youhanna SC, et al. (2009) Geographical structure of the Y-chromosomal genetic landscape of the Levant: a coastal-inland contrast. Ann Hum Genet 73: 568-581. Haber M, Platt DE, Badro DA, Xue Y, El-Sibai M, et al. (2011) Influences of history, geography, and religion on genetic structure: the Maronites in Lebanon. Eur J Hum Genet 19: 334-340. Zalloua PA, Platt DE, El Sibai M, Khalife J, Makhoul N, et al. (2008) Identifying genetic traces of historical expansions: Phoenician footprints in the Mediterranean. Am J Hum Genet 83: 633-642. Sengupta S, Zhivotovsky LA, King R, Mehdi SQ, Edmonds CA, et al. (2006) Polarity and temporality of high-resolution y-chromosome distributions in India identify both indigenous and exogenous expansions and reveal minor genetic influences of Central Asian pastoralists. Am J Hum Genet 78: 202-221. Yadav B, Raina A, Dogra TD (2011) Haplotype diversity of17 Ychromosomal STRs in Saraswat Brahmin Community of North India. Forensic Sci Int Genet 5: e63-70. Balamurugan K, Suhasini G, Vijaya M, Kanthimathi S, Mullins N, et al. (2010) Y chromosome STR allelic and haplotype diversity in five ethnic Tamil populations from Tamil Nadu, India. Leg Med (Tokyo) 12: 265269. Thangaraj K, Naidu BP, Crivellaro F, Tamang R, Upadhyay S, et al. (2010) The influence of natural barriers in shaping the genetic structure of Maharashtra populations. PLoS One 5: e15283. Xue Y, Zerjal T, Bao W, Zhu S, Shu Q, et al. (2006) Male demography in East Asia: a north-south contrast in human population expansion times. Genetics 172: 2431-2439. Balanovsky O, Dibirova K, Dybo A, Mudrak O, Frolova S, et al. (2011) Parallel Evolution of Genes and Languages in the Caucasus Region. Mol Biol Evoldoi: 10.1093/molbev/msr126. Roewer L, Willuweit S, Kruger C, Nagy M, Rychkov S, et al. (2008) Analysis of Y chromosome STR haplotypes in the European part of Russia reveals high diversities but non-significant genetic distances between populations. Int J Legal Med 122: 219-223. Bosch E, Calafell F, Gonzalez-Neira A, Flaiz C, Mateu E, et al. (2006) Paternal and maternal lineages in the Balkans show a homogeneous landscape over linguistic barriers, except for the isolated Aromuns. Ann Hum Genet 70: 459-487. Rebala K, Tsybovsky IS, Bogacheva AV, Kotova SA, Mikulich AI, et al. (2011) Forensic analysis of polymorphism and regional stratification of Y-chromosomal microsatellites in Belarus. Forensic Sci Int Genet 5: e17-20. Volgyi A, Zalan A, Szvetnik E, Pamjav H (2009) Hungarian population data for 11 Y-STR and 49 Y-SNP markers. Forensic Sci Int Genet 3: e27-28. Kovatsi L, Saunier JL, Irwin JA (2009) Population genetics of Ychromosome STRs in a population of Northern Greeks. Forensic Sci Int Genet 4: e21-22. Batini C, Ferri G, Destro-Bisol G, Brisighelli F, Luiselli D, et al. (2011) Signatures of the pre-agricultural peopling processes in sub-Saharan Africa as revealed by the phylogeography of early Y chromosome lineages. Mol Biol Evoldoi: 10.1093/molbev/msr089. Gomes V, Sanchez-Diz P, Amorim A, Carracedo A, Gusmao L (2010) Digging deeper into East African human Y chromosome lineages. Hum Genet 127: 603-613. Gusmao L, Butler JM, Carracedo A, Gill P, Kayser M, et al. (2006) DNA Commission of the International Society of Forensic Genetics (ISFG): an update of the recommendations on the use of Y-STRs in forensic analysis. Forensic Sci Int 157: 187-197. 196 29. Jolliffe I (1986) Principal Coponents Analysis, Second Edition. New York, NY: Springer. 30. Kruskal JB (1964) Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29: 1-27. 31. Excoffier L, Laval G, Schneider S (2005) Arlequin (version 3.0): An integrated software package for population genetics data analysis. Evol Bioinform Online 1: 47-50. 32. Monmonier M (1973) Maximum-difference barriers: An alternative numerical regionalization method. Geographical Analysis. pp 245-261. 33. Manni F, Guerard E, Heyer E (2004) Geographic patterns of (genetic, morphologic, linguistic) variation: How barriers can be detected by using Monmonier s algorithm. Human Biology 76: 173-190. 34. Excoffier L, Smouse PE, Quattro JM (1992) Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data. Genetics 131: 479-491. 35. Bandelt HJ, Forster P, Sykes BC, Richards MB (1995) Mitochondrial portraits of human populations using median networks. Genetics 141: 743-753. 36. Wilson IJ, Weale ME, Balding DJ(2003) Inferences from DNA data: population histories, evolutionary processes and forensic match probabilities. Journal ofthe Royal Statistical Society A 166, part 2: 155201. 37. Zhivotovsky LA, Underhill PA, Cinnioglu C, Kayser M, Morar B, et al. (2004) The effective mutation rate at Y chromosome short tandem repeats, with application to human population-divergence time. Am J Hum Genet 74: 50-61. 38. Zhivotovsky LA, Underhill PA, Feldman MW (2006) Difference between evolutionarily effective and germ line mutation rate due to stochastically varying haplogroup size. Mol Biol Evol 23: 2268-2270. 39. Qamar R, Ayub Q, Mohyuddin A, Helgason A, Mazhar K, et al. (2002) Y-chromosomal DNA variation in Pakistan. Am J Hum Genet 70: 1107-1124. 40. Heyer E, Balaresque P, Jobling MA, Quintana-Murci L, Chaix R, et al. (2009) Genetic diversity and the emergence of ethnic groups in Central Asia. BMC Genet 10: 49. 41. Zerjal T, Wells RS, Yuldasheva N, Ruzibakiev R, Tyler-Smith C (2002) A genetic landscape reshaped by recent events: Y-chromosomal insights into central asia. Am J Hum Genet 71: 466-482. 42. Wells RS, Yuldasheva N, Ruzibakiev R, Underhill PA, Evseeva I, et al. (2001) The Eurasian heartland: a continental perspective on Ychromosome diversity. Proc Natl Acad Sci USA 98: 10244-10249. 43. Chaix R, Austerlitz F, Khegay T, Jacquesson S, Hammer MF, et al. (2004) The genetic or mythical ancestry of descent groups: lessons from the Y chromosome. Am J Hum Genet 75: 1113-1116. 44. Perez-Lezaun A, Calafell F, Comas D, Mateu E, Bosch E, et al. (1999) Sex-specific migration patterns in Central Asian populations, revealed by analysis of Y-chromosome short tandem repeats and mtDNA. Am J Hum Genet 65: 208-219. 45. Martinez-Cruz B, Vitalis R, Segurel L, Austerlitz F, Georges M, et al. (2011) In the heartland of Eurasia: the multilocus genetic landscape of Central Asian populations. Eur JHum Genet 19: 216-223. 46. Underhill PA, Myres NM, Rootsi S, Metspalu M, Zhivotovsky LA, et al. (2010) Separating the post-Glacial coancestry of European and Asian Y chromosomes within haplogroup R1a. Eur J Hum Genet 18: 479-484. 47. Semino O, Passarino G, Oefner PJ, Lin AA, Arbuzova S, et al. (2000) The genetic legacy of Paleolithic Homo sapiens sapiens in extant Europeans: a Y chromosome perspective. Science 290: 1155-1159. 48. Firasat S, Khaliq S, Mohyuddin A, Papaioannou M, Tyler-Smith C, et al. (2007) Y-chromosomal evidence for a limited Greek contribution to the Pathan population of Pakistan. Eur J Hum Genet 15: 121-126. 49. Emadi H (2005) Culture and customs of Afghanistan. Santa Barbara, CA: Greenwood. 284 p. 50. Johanson L (1998) A History of Turkic. In: Johanson L, Csato E, eds. The Turkic Languages. London: Routledge. 8. Conclusion The work undertaken in this study was aimed at investigating the mtDNA haplogroup composition and distribution among four ethnic groups of Afghanistan and to identify whether this has been influenced by the demographic processes imposed upon Afghanistan throughout its history. The Afghani population has been omitted from the numerous population genetics studies of the past thirty years, largely due to the near constant political and civil instability in this time. The results illustrate that the majority of Afghani mtDNA types identified belong to West Eurasian lineages (64.4%). When the individual ethnic groups are analysed, the Hazara exhibit a large East Asian mtDNA contribution, at least 2½ times greater than any other ethnic group. This pattern of East Asian lineage contribution is mirrored when Y-Chromosome lineages are examined (Haber et al., 2012), where the Hazara contain the East Asian lineage C3-M217 (a lineage inferred to derive from Genghis Khan (McElreavey and Quintana-Murci, 2005)) among 33.33% of the population. The same lineage was found in <4% of Tajiks and Pashtuns. The presence and strength of the East Asian lineages observed by both mtDNA and Y-Chromosome indicates a large occurrence of assimilation with East Asian tribes among the Hazaras, thus indicating their origins may be from a source population in eastern Asia. Additionally, the divergence and isolation of the Hazaras from the rest of Afghanistan’s ethnic groups may be attributed to a combination of the assimilation with Mongol or East Asian peoples, the practice of Shia Islam (while their ethnic contemporaries practice Sunni Islam) and also their residence among the harsh topography of the Hindu Kush mountain range. The West Eurasian haplogroups compose the majority of the lineages observed in the Tajiks, Baluch and Pashtuns (>64%) and 40% among the Hazara. The common West Eurasian haplogroups HV and H are found in each ethnic group ranging from 52.6% in the Tajik population to 12.5% in the Hazaras. Likewise, the common West Eurasian Yhaplogroup R1a1a is found least among the Hazaras and is greatest in the Pashtun population. This supports the beliefs of the Baluch, Pashtuns and Tajiks for a West Eurasian origin and may indicate a potential common ancestry of these groups which have diverged to become their distinct ethnic groups over subsequent generations. The HVS-I sequence data has illustrated that the Afghani ethnic groups (excluding the Baluch) are expanding based upon their smooth bell-shaped mismatch distributions and the star-like phylogeny of the Median Joining network. Each ethnic group has identified a high level of genetic diversity based upon the numerous unique haplotypes exhibited. A 197 genetic barrier has also been identified separating the Iranian Baluch population of southeastern Iran and the Afghani Baluch population of south-western Afghanistan. Additionally, Barrier analysis has highlighted that the Afghani populations have a greater affinity with West Eurasians (with the exception of the Iranian Baluch population) and Central Asians than to South or East Asian populations. The populations of Afghanistan possess a unique mtDNA structure that is likely to have been shaped by the combination of the country’s extreme terrain and its strategic position in Central Asia which encouraged the numerous migrations, invasions and Empire expansion events. 198 References Achilli, A., Rengo, C., Magri, C., Battaglia, V., Olivieri, A., Scozzari, R., Cruciani, F., Zeviani, M., Briem, E., Carelli, V., Moral, P., Dugoujon, J-M., Roostalu, U., Loogväli, E-L., Kivisild, T., Bandelt, H-J., Richards, M., Villems, R., SantachiaraBenerecetti, A.S., Semino, O., Torroni, A. (2004) The Molecular Dissection of mtDNA Haplogroup H Confirms That the Franco-Cantabrian Glacial Refuge Was a Major Source for the European Gene Pool. American Journal of Human Genetics 75:910-918. Achilli, A., Rengo, C., Battaglia, V., Pala, M., Olivieri, A., Fornarino, S., Magri, C., Scozzari, R., Babudi, N., Santachiara-Benerecetti, A.S., Bandelt, H-J., Semino, O., Torroni, A. (2005) Saami and Berbers – An Unexpected Mitochondrial DNA Link. American Journal of Human Genetics 76:883-886. Achilli, A., Perego, U.A., Bravi, C.M., Coble, M.D., Kong, Q-P., Woodward, S.R., Salas, A., Torroni, A., Bandelt, H-J. (2008) The Phylogeny of the Four Pan-American MtDNA Haplogroups: Implications for Evolutionary and Disease Studies. PLoS One 3:e1764. Afghanistan Climate, Temperature, Average Weather History, Rainfall/Precipitation, Sunshine (n.d.) Retrieved January 10th, 2011, from http://www.climatetemp.info/afghanistan/ Afghanistan Ethnic Groups Map (2009) Retrieved January, 11th, 2011, from http://www.mapsofworld.com/afghanistan/afghanistan-ethnic-groups-map.html Afghanistan Online: Chronological History of Afghanistan. (2008) Retrieved November 9th, 2010, from http://www.afghan-web.com/history/chron/index.html Afghans: Their History and Culture. (2002) Retrieved November 15th, 2010, from http://www.cal.org/co/afghan/apeop.html Alshamali, F., Brandstätter, A., Zimmermann, B., Parson, W. (2008) Mitochondrial DNA control region variation in Dubai, United Arab Emirates. Forensic Science International: Genetics 2:e9-10. 199 Alvarez-Iglesias, V., Mosquera-Miguel, A., Cerezo, M., Quintans, B., Zarrabeitia, M.T., Cusco, I., Lareu, M.V., Garcia, O., Perez-Jurado, L., Carracedo, A., Salas, A. (2009) New Population and Phylogenetic Features of the Internal Variation within Mitochondrial DNA Haplogroup R0. PLoS ONE 4:e5112. Al-Zahery, N., Semino, O., Benuzzi, G., Magri, C., Passarino, G., Torroni, A., Santachiara-Benerecetti, A.S. (2003) Y-Chromosome and mtDNA Polymorphisms in Iraq, A Crossroad of the Early Human Dispersal and of post-Neolithic Migrations. Molecular Phylogenetics and Evolution 28:458-472. Alzualde, A., Izzagirre, N., Alonso, S., Alonso, A., de la Rua, C. (2005) Temporal Mitochondrial DNA Variation in the Basque Country: Influence of Post-Neolithic Events. Annuals of Human Genetics 69:665-679. Anderson, S., Bankier, S., Barrell, B., De Brujin, M., Coulson, A., Drouin, J., Eperon, I., Nierlich, D., Roe, B., Sanger, F., Schreier, P., Smith, A., Staden, R., Young, I. (1981) Sequence and Organization of the Human Mitochondrial Genome. Nature 290:457-465. Andrews, R., Kubacka, I., Chinney, P., Lightowlers, R., Turnbull, D., Howell, N. (1999) Reanalysis and Revision of the Cambridge Reference Sequence for Human Mitochondrial DNA. Nature Genetics 23:147. Anthony, D. (2007) The Horse, The Wheel and Language: How Bronze Age Riders from the Eurasian Steppes Shaped the Modern World. New Jersey: University Presses of California, Columbia and Princeton. Asari, M., Umetsu, K., Adachi, N., Azumi, J., Shimizu, K., Shiono, H. (2007) Utility of Haplogroup Determination for Forensic mtDNA Analysis in the Japanese Population. Legal Medicine 9:237-240. Barfield, T. (2010) Afghanistan: A Cultural and Political History. Princeton, New Jersey: Princeton University Press. 200 Bartlett, J., Stirling, D. (2003) A Short History of the Polymerase Chain Reaction. Methods in Molecular Biology 226:3-6. BBC (2010) Afghanistan Country Profile. Retrieved November 10th, 2010, from http://news.bbc.co.uk/1/hi/world/south_asia/country_profiles/1162668.stm Bednarik, R. (2010) An Overview of Asian Paleoart of the Pleistocene. IFRAO Symposium – Congress: Pleistocene Art of Asia (Pre-Acts). Behar, D., Rosset, S., Blue-Smith, J., Balanovsky, O., Tzur, S., Comas, D., Mitchell, R.J., Quintana-Murci, L., Tyler-Smith, C., Wells, R.S., The Genographic Consortium. (2007) The Genographic Project Public Participation Mitochondrial DNA Database. PLoS Genetics 3:1083-1095. Bermisheva, M.A., Tambets, K., Villems, R., Khusnutdinova, E.K. (2002) Diversity of Mitochondrial DNA Haplogroups in Ethnic Populations of the Volga-Ural Region. Molecular Biology 36:802-812. Berniell-Lee, G., Plaza, S., Bosch, E., Calafell, F., Jourdan, E., Césari, M., Lefranc, G., Comas, D. (2008) Admixture and Sexual Bias in the Population Settlement of La Réunion Island (Indian Ocean). American Journal of Physical Anthropology 136:100107. Bogenhagen, D. (1999) Repair of mtDNA in Vertebrates. American Journal of Human Genetics 64:1276-1281. Borst, P. (1977) Structure and Function of Mitochondrial DNA. Trends in Biochemical Science 2(2):31-34. Brown, M., Voljavec, A., Lott, M., Torroni, A., Yang, CC., Wallace, D. (1992) Mitochondrial DNA Complex I and III Mutations Associated With Leber’s Hereditary Optic Neuropathy. Genetics 130:163-173. Brown, M., Hosseini, S., Torroni, A., Bandelt, HJ., Allen, J., Schurr, T., Scozzari, R., Cruciani, F., Wallace, D. (1998) mtDNA Haplogroup X: An Ancient Link between 201 Europe/Western Asia and North America? American Journal of Human Genetics 63:1852-1861. Brown, W., George, M., Wilson, A. (1979) Rapid Evolution of Animal Mitochondrial DNA. Proceedings of the National Academy of Sciences of the USA 76(4):1967-1971. Burch, J. (2008) Reuters: Afghan Census Postponed for Two Years - U.N. Retrieved November 7th, 2010, from http://www.reuters.com/article/idUSISL267420080608 Butler, J. (2005) Forensic DNA Typing: Biology, Technology, and Genetics of STR Markers. 2nd Edition. London. Elsevier Academic Press. Cann, R., Stoneking, M., Wilson, A. (1987) Mitochondrial DNA and Evolution. Nature 325:31-36. Cartmill, M., Smith, F.H. (2009) The Human Lineage. Hoboken, New Jersey. John Wiley & Sons. Carvalho, B.M., Bortolini, M.C., dos Santos, S.E.B., Ribeiro-dos-Santos, A.K.C. (2008) Mitochondrial DNA Mapping of Social-Biological Interactions in Brazilian Amazonian African-Descendant Populations. Genetics and Molecular Biology 31:12-22. Chandrasekar, A., Kumar, S., Sreenath, J., Sarkar, B.N., Urade, B.P., Mallick, S., Bandopadhyay, S.S., Barua, P., Barik, S.S., Basu, D., Kiran, U., Gangopadhyay, P., Sahani, R., Prasad, B.V.R., Gangopadhyay, S., Lakshmi, G.R., Ravuri, R.R., Padmaja, K., Venugopal, P.N., Sharma, M-B., Rao, V.R. (2009) Updating Phylogeny of Mitochondrial DNA Macrohaplogroup M in India: Dispersal of Modern Human in South Asian Corridor. PLoS ONE 4:e7447. Chen, Y-S., Torroni, A., Excoffier, L., Santachiarra-Benerecetti, A.S., Wallace, D. (1995) Analysis of mtDNA Variation in African Populations Reveals the Most Ancient of All Human Continent-Specific Haplogroups. American Journal of Human Genetics 57:133-149. 202 Chen, X., Prosser, R., Simoetti, S., Sadlock, J., Jagiello, G., Schon, E. (1995) Rearranged Mitochondrial Genomes Are Present in Human Oocytes. American Journal of Human Genetics 57:239-247. Childe, G. (1925) The Dawn of European Civilization. London: Keegan Paul Trench & Trubner CIA - The World Factbook: Afghanistan (2010) Retrieved November 7th, 2010, from https://cia.gov/library/publications/the-world-factbook/geos/af.html CIA - The World Factbook: Iran (2011) Retrieved December 14th, 2011, from https://www.cia.gov/library/publications/the-world-factbook/geos/ir.html Colorado State University/Department of Defense (2010) Afghanistan: Cultural Heritage at a Glance. Retrieved November 15th, 2010, from http://www.cemml.colostate.edu/cultural/09476/afgh02-01enl.html Comas, D., Plaza, S., Wells, R.S., Yuldaseva, N., Lao, O., Calafell, F., Bertranpetit, J. (2004) Admixture, Migrations, and Dispersals in Central Asia: Evidence from Maternal DNA Lineages. European Journal of Human Genetics 12:495-504. Cox, M., Mendez, F., Karafet, T., Pilkington, M., Kingan, S., Destro-Bisol, G., Strassmann, B., Hammer, M. (2008) Testing for Archaic Hominin Admixture on the X Chromosome: Model Likelihood for the Modern Human RRM2P4 Region from Summaries of Genealogical Topology Under the Structural Coalescent. Genetic 178:427437. Davison, K., Dolukhanov, P., Sarson, G., Shukurov, A. (2006) The Role of Waterways in the Spread of the Neolithic. Journal of Archaeological Science 33:641652. Derenko, M., Malyarchuk, B., Grzybowski, T., Denisova, G., Dambueva, I., Perkova, M., Dorzhu, C., Luzina, F., Lee, H.K., Vanecek, T., Villems, R., Zakharov, I. (2007) Phylogeographic Analysis of Mitochondrial DNA in Northern Asian Populations. American Journal of Human Genetics 81:1025-1041. 203 Disotell, T. (1999) Human Evolution: The Southern Route to Asia. Current Biology 9:R925-R928. Dupree, L., Dupree, N. (2011) Afghanistan In Encyclopædia Britannica. Retrieved January 8th, 2011, from http://www.britannica.com/EBchecked/topic/7798/Afghanistan Ebner, S., Lang, R., Mueller, E., Eder, W., Oeller, M., Moser, A., Koller, J., Paulweber, B., Mayr, J., Sperl, W., Kofler, B. (2011) Mitochondrial Haplogroups, Control Region Polymorphisms and Malignant Melanoma: A Study in Middle European Caucasians. PLoS ONE 6:e27192. Elson, J., Samuels, D., Turnbull, D., Chinnery, P. (2001) Random Intracellular Drift Explains the Clonal Expansion of Mitochondrial DNA Mutations with Age. American Journal of Human Genetics 68:802-806. Ewens, W.J. (1972) The Sampling Theory of Selectively Neutral Alleles. Theoretical Populations Biology 3:87-112. Excoffier, L., Smouse, P., Quattro, J. (1992) Analysis of Molecular Variance Inferred from Metric Distances Among DNA Haplotypes: Application to Human Mitochondrial DNA Restriction Data. Genetics 131:479-491. Excoffier, L., Lischer, H.E.L. (2010) Arlequin Suite ver 3.5: A New Series of Programs to Perform Population Genetics Analysis Under Linux and Windows. Molecular Ecology Resources 10:564-567. Fagundes, N., Kanitz, R., Eckert, R., Valls, A., Bogo, M., Salzano, F., Smith, D.G., Silva Jr., W., Zago, M., Ribeiro-dos-Santos, A., Santos, S., Petzl-Erler, M.L., Bonatto, S. (2008) Mitochondrial Population Genomics Supports a Single a Single PreClovis Origin with a Coastal Route for the Peopling of the Americas. American Journal of Human Genetics 82:583-592. 204 Farr, G. (2009) The Hazara of Central Afghanistan. In B. Brower & B. R. Johnston (Eds), Disappearing Peoples? : Indigenous Groups and Ethnic Minorities in South and Central Asia (pp154-169). Walnut Creek, CA, USA: Left Coast Press. Fechner, A., Quinque, D., Rychkov, S., Morozowa, I., Naumova, O., Schneider, Y., Willuweit, S., Zhukova, O., Roewer, L., Stoneking, M., Nasidze, I. (2008) Boundaries and Clines in the West Eurasian Y-Chromosome Landscape: Insights from the European Part of Russia. American Journal of Physical Anthropology 137:41-47. Forster, P., Cali, F., Röhl, A., Metspalu, E., D’Anna, R., Mirisola, M., De Leo, G., Flugy, A., Salerno, A., Ayala, G., Kouvatsi, A., Villems, R., Romano, V. (2002) Continental and Subcontinental Distributions of mtDNA Control Region Types. International Journal of Legal Medicine 116:99-108. Forster, P., Renfrew, C. (2011) Mother Tongue and Y Chromosomes. Science 333:1390-1391. Forster, P., Matsumura, S. (2005) Did Early Humans Go North or South? Science 308:965-966. Fortson, B. (2009) Indo-European Language and Culture, An Introduction. 2nd Edition. Chichester. John Wiley & Sons Ltd. Fu, Y., Xie, C., Xu, X., Li, C., Zhang, Q., Zhou, H., Zhu, H. (2009) Ancient DNA Analysis of Human Remains from the Upper Capital City of Kublai Khan. American Journal of Physical Anthropology 138:23-29. Giles, R., Blanc, H., Cann, H., Wallace, D. (1980) Maternal Inheritance of Human Mitochondrial DNA. Proceedings of the National Academy of Sciences of the USA 77(11):6715-6719. Gonder, M., Mortensen, H., Reed, F., de Sousa, A., Tishkoff, S. (2007) WholemtDNA Genome Sequence Analysis of Ancient African Lineages. Molecular Biology and Evolution 24:757-768. 205 Green, R., Krause, J., Briggs, A.W., Maricic, T., Stenzel, U., Kircher, M., Patterson, N., Li, H., Zhai, W., His-Yang Fritz, M., Hansen, N.F., Durand, E.Y., Malspinas, AS. Jensen, J.D., Marques-Bonet, T., Alkan, C., Prüfer, K., Meyer, M., Burbano, H.A., Good, J.M., Schultz, R., Aximu-Petri, A., Butthof, A., Höber, B., Höffner, B., Siegemund, M., Weihmann, A., Nusbaum, C., Lander, E.S., Russ, C., Novod, N., Affourtit, J., Egholm, M., Verna, C., Rudan, P., Brajkovic, D., Kucan, Z., Gusic, I., Doronichev, V.B., Golovanova, L.V., Lalueza-Fox, C., de la Rasilla, M., Fortea, J., Rosas, A., Schmitz, R.W., Johnson, P.L.F., Eichler, E.E., Falush, D., Birney, E., Mullkin, J.C., Slatkin, M., Nielsen, R., Kelso, J., Lachmann, M., Reich, D., Pääbo, S. (2010) A Draft Sequence of the Neanderthal Genome. Science 328:710-722. Grignani, P., Turchi, C., Achilli, A., Peloso, G., Alu, M., Ricci, U., Robino, C., Pelotti, S., Carnevali, E., Boschi, I., Tagliabracci, A., Previdere, C. (2009) Multiplex mtDNA Coding Region SNP Assays for Molecular Dissection of Haplogroups U/K and J/T. Forensic Science International: Genetics 4:21-25. Haber, M., Platt, D.E., Badro, D.A., Xue, Y., El-Sibai, M., Ashrafian Bonab, M., Youhanna, S.C., Saade, S., Soria-Hernanz, D.F., Royyuru, A., Spencer Wells, R., Tyler-Smith, C., Zalloua, P.A., The Genographic Consortium (2010) Influences of History, Geography, and Religion on Genetic Structure: the Maronites in Lebanon. European Journal of Human Genetics 19:334-340. Haber, M., Platt, D.E., Ashrafian Bonab, M., Youhanna, S.C., Soria-Hernanz, D.F., Martinez-Cruz, B., Douaihy, B., Ghassibe-Sabbagh, M., Rafatpanah, H., Ghanbari, M., Whale, J., Balanovsky, O., Spencer Wells, R., Comas, D., Tyler-Smith, C., Zalloua, P.A., The Genographic Consortium (2012) Afghanistan’s Ethnic Groups Share a Y-Chromosomal Heritage Structured by Historical Events. PLoS One 7:e34288. Harpending, R.C. (1994) Signature of Ancient Population Growth in a Low-Resolution Mitochondrial DNA Mismatch Distribution. Human Biology 66:591-600. Harper, D. (2010) Online Etymology Dictionary. Retrieved December 20th, 2010, from http://www.etymonline.com/index.php?search=Afghanistan&searchmode=none 206 Hasegawa, M., Horai, S. (1991) Time of the Deepest Root for Polymorphism in Human Mitochondrial DNA. Journal of Molecular Evolution 32:37-42. Hedman, M., Brandstätter, A., Pimenoff, V., Sistonen, P., Palo, J., Parson, W., Sajantila, A. (2007) Finnish Mitochondrial DNA HVS-I and HVS-II Population Data. Forensic Science International 172:171-178. Herrnstadt, C., Elson, J.L., Fahy, E., Preston, G., Turnbull, D.M., Anderson, C., Ghosh, S.S., Olefsky, J.M., Beal, M.F., Davis, R.E., Howell, N. (2002) ReducedMedian-Network Analysis of Complete Mitochondrial DNA Coding-Region Sequences for the Major African, Asian, and European Haplogroups. American Journal of Human Genetics 70:1152-1171. Heyer, E., Balaresque, P., Jobling, M., Quintana-Murci, L., Chaix, R., Segurel, L., Aldashev, A., Hegay, T. (2009) Genetic Diversity and the Emergence of Ethnic Groups in Central Asia. BMC Genetics 10:49. Hodgson, J., Disotell, T. (2008) No Evidence of a Neanderthal Contribution to Modern Human Diversity. Genome Biology 9(2):206. Howell, N., McCullogh, D., Kubacka, I., Halvorson, S., Mackey, D. (1992) The Sequence of Human mtDNA: The Question of Errors versus Polymorphisms. American Journal of Human Genetics 50:1333-1337. Hudjashov, G., Kivisild, T., Underhill, P., Endicott, P., Sanchez, J., Lin, A., Shen, P., Oefner, P., Renfrew, C., Villems, R., Forster, P. (2007) Revealing the prehistoric settlements of Australia by Y chromosome and mtDNA Analysis. Proceedings of the National Academy of Sciences of the USA 104:8726-8730. Ingman, M., Gyllensten, U. (2001) Analysis of the Complete Human mtDNA Genome: Methodology and Inferences for Human Evolution. Journal of Heredity 92(6):454-461. Ingman, M., Gyllensten, U. (2007a) A Recent Genetic Link Between Sami and the Volga-Ural Region of Russia. European Journal of Human Genetics 15:115-120. 207 International Security Assistance Force (ISAF) (n.d.) Afghanistan Provinces Map Retrieved December 20th, 2010, from http://www.isaf.nato.int/map-usfora/index.php Irwin, J., Saunier, J., Beh, P., Strouss, K., Painter, C., Parsons, T. (2009a) Mitochondrial DNA Control Region Variation in a Population Sample from Hong Kong, China. Forensic Science International: Genetics 3:e119-e125. Irwin, J., Ikramov, A., Saunier, J., Bodner, M., Amory, S., Röck, A., O’Callaghan, J., Nuritdinov, A., Atakhodjaev, S., Mukhamedov, R., Parson, W., Parsons, T. (2009b) The mtDNA Composition of Uzbekistan: A Microcosm of Central Asian Patterns. International Journal of Legal Medicine 124:195-204. Islamic Republic of Afghanistan Central Statistics Organization (CSO) (2010) Afghanistan Statistical Yearbook 2009-2010. Retrieved November 10th, 2010, from http://www.cso.gov.af/ Islamic Republic of Afghanistan Office of the President; Biography (2009) Retrieved November 7th, 2010, from http://www.president.gov.af/sroot_eng.aspx?id=166 Jacobson, J. (1979) Recent Developments in South Asian Prehistory and Protohistory. Annual Review of Anthropology 8:467-502. Jansen, T., Forster, P., Levine, M., Oelke, H., Hurles, M., Renfrew, C., Weber, J., Olek, K. (2002) Mitochondrial DNA and the Origins of the Domestic Horse. Proceedings of the National Academy of Sciences of the USA 99(16):10,905-10,910. Jin, HJ., Tyler-Smith, C., Kim, W. (2009) The Peopling of Korea Revealed by Analyses of Mitochondrial DNA and Y-Chromosomal Markers. PLoS ONE 4:e4210. Jobling, M.A. (2001) In the Name of the Father: Surnames and Genetics. Trends in Genetics 17:353-357. Jobling, M.A., Tyler-Smith, C. (2003) The Human Y-Chromosome: An Evolutionary Marker Comes of Age. Nature Review Genetics 4:598-612. 208 Jobling, M.A., Hurles, M.E., Tyler-Smith, C. (2004) Human Evolutionary Genetics: Origins, Peoples & Disease. Abingdon, UK. Garland Science. Jukes, T., Cantor, C. (1969) Evolution of Protein Molecules. In H.N. Munro (Ed), Mammalian Protein Metabolism (pp21-132). New York:Academic Press, p21-132. Kivisild, T., Bamshad, M.J., Kaldma, K., Metspalu, M., Metspalu, E., Reidla, M., Laos, S., Parik, J., Watkins, W.S., Dixon, M.E., Papiha, S.S., Mastana, S.S., Mir, M.R., Ferak, V., Villems, R. (1999) Deep Common Ancestry of Indian and WesternEurasian Mitochondrial DNA Lineages. Current Biology 9:1331-1334. Kivisild, T., Kaldma, K., Metspalu, M., Parik, J., Papiha, S., Villems, R. (1999b) The Place of the Indian mtDNA Variants in the Global Network of Maternal Lineages and the Peopling of the Old World. In R. Deka, S. Papiha, R. Chakraborty (Eds.) Genomic Diversity: Applications in Human Population Genetics (pp. 135-152). New York: Kluwer Academic/Plenum Publishers. Kivisild, T., Rootsi, S., Metspalu, M., Mastana, S., Kaldma, K., Parik, J., Metspalu, E., Adojaan, M., Tolk, H-V., Stepanov, V., Gölge, M., Usanga, E., Papiha, S.S., Cinnioglu, C., King, R., Cavalli-Sforza, L., Underhill, P.A., Villems, R. (2003) The Genetic Heritage of the Earliest Settlers Persists Both in Indian Tribal and Caste Populations. American Journal of Human Genetics 72:313-332. Klein, R.G. (2008) Out of Africa and the Evolution of Human Behaviour. Evolutionary Anthropology 17:267-281. Kolman, C., Sambuughin, N., Bermingham, E. (1996) Mitochondrial DNA Analysis of Mongolian Populations and Implications for the Origin of New World Founders. Genetics Society of America 142:1321-1334. Kong, Q-P., Yao, Y. G., Sun, C., Bandelt, HJ., Zhu, C. L., Zhang, Y. P. (2003) Phylogeny of East Asian Mitochondrial DNA Lineages Inferred from Complete Sequences. American Journal of Human Genetics 73:671-676. 209 Kraytsberg, Y., Schwartz, M., Brown, T.A., Ebralidse, K., Kunz, W.S., Clayton, D.A., Vissing, J., Khrapko, K. (2004) Recombination of Human Mitochondrial DNA. Science 304:981. Kumar, S., Reddy Ravuri, R., Koneru, P., Urade, BP., Sarkar, BN., Chandrasekar, A., Rao, VR. (2009) Reconstructing Indian-Australian Phylogenetic Link. BMC Evolutionary Biology 9:173-177. Kumar, V., Reddy, AN., Babu, P., Nageswar, T., Thangaraj, K., Reddy, AG., Singh, L., Reddy, B. (2008) Molecular Genetic Study on the Status of Transitional Groups in Central India: Cultural Diffusion or Demic Diffusion? International Journal of Human Genetics 8(1-2):31-39. Kvist, L., Martens, J., Nazarenko, A.A., Orell, M. (2003) Paternal Leakage of Mitochondrial DNA in the Great Tit (Parus major). Molecular Biology and Evolution 20(2):243-247. Lacau, H., Bukhari, A., Gayden, T., La Salvia, J., Regueiro, M., Stojkovic, O., Herrera, R. (2011) Y-STR Profiling in Two Afghanistan Populations. Legal Medicine 13(2):103-108. Lewis, PM. (Ed) (2009) Ethnologue: Languages of the World, Sixteenth Edition. Dallas, TX, USA: SIL International. Online Version: http://www.ethnologue.com/ Lightowlers, R., Chinnery, P., Turnbull, D., Howell, N. (1997) Mammalian Mitochondrial Genetics: Heredity, Heteroplasmy and Disease. Trends in Genetics 13:450-455. Liu, H., Prugnolle, F., Manica, A., Balloux, F. (2006) A Geographically Explicit Genetic Model of Worldwide Human-Settlement History. American Journal of Human Genetics 79:230-237. Macaulay, V., Richards, M., Hickey, E., Vega, E., Cruciani, F., Guida, V., Scozzari, R., Bonne-Tamir, B., Sykes, B., Torroni, A. (1999) The Emerging Tree of West 210 Eurasian mtDNAs: A Synthesis of Control-Region Sequences and RFLPs. American Journal of Human Genetics 64:232-249. Macaulay, V., Hill, C., Achilli, A., Rengo, C., Clarke, D., Meehan, W., Blackburn, J., Semino, O., Scozzari, R., Cruciani, F., Taha, A., Kassim Shaari, N., Maripa Raja, J., Ismail, P., Zainuddin, Z., Goodwin, W., Bulbeck, D., Bandelt, H-J., Oppenheimer, S., Torroni, A., Richards, M. (2005) Single, Rapid Coastal Settlement of Asia Revealed by Analysis of Complete Mitochondrial Genomes. Science 308:10341036. Maji, S., Krithika, S., Vasulu, T. (2008) Distribution of Mitochondrial DNA Macrohaplogroup N in India with Special Reference to Haplogroup R and its SubHaplogroup U. International Journal of Human Genetics 8(1-2):85-96. Mallory, J. (2003) Archaeological Models and Asian Indo-Europeans. In N. SimsWilliams (Ed) Indo-Iranian Languages and Peoples (pp. 19-42). Oxford. British Academy. Manni, F., Guérard, Heyer, E. (2004) Georgraphic Patterns of (Genetic, Morphologic, Linguistic) Variation: How Barriers Can Be Detected by Using Monmonier’s Algorithm. Human Biology 76:173-190. Martinez, L., Mirabel, S., Luis, J.R., Herrera, R.J. (2008) Middle Eastern and European mtDNA Lineages Characterize Populations from Eastern Crete. American Journal of Physical Anthropology 137:213-223. McElreavey, K., Quintana-Murci, L. (2005) A Population Genetics Perspective of the Indus Valley Through Uniparentally Inherited Markers. Annuals of Human Biology 32:154-162. Meinilä, M., Finnilä, S., Majamaa, K. (2001) Evidence for mtDNA Admixture Between the Finns and the Saami. Human Heredity 52:160-170. Mellars, P. (2006) Going East: New Genetic and Archaeological Perspectives on the Modern Human Colonization of Eurasia. Science 313:796-800. 211 Merriwether, A., Clark, A., Ballinger, S., Schurr, T., Soodyall, H., Jenkins, T., Sherry, S., Wallace, D. (1991) The Structure of Human Mitochondrial DNA Variation, journal of Molecular Evolution 33:543-555. Meusel, M., Moritz, R. (1993) Transfer of Paternal Mitochondrial DNA During Fertilization of Honeybee (Apis mellifera L.) eggs. Current Genetics 24(6):539-543. Mikkelsen, M., Rockenbauer, E., Sørensen, E., Rasmussen, M., Børsting, C., Morling, N. (2008) A Mitochondrial DNA SNP Multiplex Assigning Caucasians into 36 Haplo- and Subhaplogroups. Forensic Science International: Genetics Supplement Series 1:287-289. Nasidze, I., Stoneking, M. (2001) Mitochondrial DNA Variation and Language Replacements in the Caucasus. Proceedings of the Royal Society London Biological Sciences 268:1197-1206. Nasidze, I., Ling, E.Y.S., Quinque, D., Dupanloup, I., Cordaux, R., Rychkov, S., Naumova, O., Zhukova, O., Sarraf-Zadegan, N., Naderi, G.A., Asgary, S., Sardas, S., Farhud D.D., Sarkisian, T., Asadov, C., Kerimov, A., Stoneking, M. (2004a) Mitochondrial DNA and Y-Chromosome Variation in the Caucasus. Annuals of Human Genetics 68:205-221. Nasidze, I., Quinque, D., Dupanloup, I., Rychkov, S., Naumova, O., Zhukova, O., Stoneking, M. (2004b) Genetic Evidence Concerning the Origins of South and North Ossetians. Annuals of Human Genetics 68:588-599. Nasidze, I., Quinque, D., Ozturk, M., Bendukidze, N., Stoneking, M. (2005a) mtDNA and Y-Chromosome Variation in Kurdish Groups. Annuals of Human Genetics 69:401412. Nasidze, I., Quinque, D., Dupanloup, I., Cordaux, R., Kokshunova, L., Stoneking, M. (2005b) Genetic Evidence for the Mongolian Ancestry of Kalmyks. American Journal of Physical Anthropology 128:846-854. 212 Nasidze, I., Quinque, D., Rahmani, M., Ali Alemohamad, S., Stoneking, M. (2006) Concomitant Replacement of Language and mtDNA in South Caspian Populations of Iran. Current Biology 16:668-673. Nasidze, I., Quinque, D., Udina, I., Kunizheva, S., Stoneking, M. (2007) The Gagauz, a Linguistic Enclave, are not a Genetic Isolate. Annuals of Human Genetics 71:379-389. Nasidze, I., Quinque, D., Rahmani, M., Alemohamad, S.A., Stoneking, M. (2008) Close Genetic Relationship Between Semitic-speaking and Indo-European-speaking Groups in Iran. Annuals of Human Genetics 72:241-252. Nei, M. (1987) Molecular Evolutionary Genetics. Columbia University Press, New York, NY, USA. Nei, M. (1995) Genetic Support for the out-of-Africa theory of human evolution. Proceedings of the National Academy of Sciences of the USA 92:6720-6722. Nielson, P. (2010) The People and Cultures of Afghanistan. Retrieved January 11th, 2011, from http://www.suite101.com/content/the-people-of-afghanistan-a189542 Olivo, P., van de Walle, M., Laipis, P., Hauswirth, W. (1983) Nucleotide Sequence Evidence for Rapid Genotypic Shifts in the Bovine Mitochondrial DNA D-loop. Nature 306:400-402. Palanichamy, M., Sun, C., Agrawal, S., Bandelt, HJ., Kong, QP., Khan, F., Wang, CY., Chaudhuri, TK., Palla, V., Zhang, YP. (2004) Phylogeny of Mitochondrial DNA Macrohaplogroup N in India, Based on Complete Sequencing: Implication for the Peopling of South Asia. American Journal of Human Genetics 75:966-978. Pereira, L., Richards, M., Goias, A., Alonso, A., Albarran, C., Garcia, O., Behar, D., Gölge, M., Hatina, J., Al-Gazali, L., Bradley, D., Macaulay, V., Amorim, A. (2006) Evaluating the Forensic Informativeness of mtDNA Haplogroup H Sub-Typing on a Eurasian Scale. Forensic Science International 159:43-50. 213 Petrov, V., Weinbaum, M. (2011) Afghanistan In Encyclopædia Britannica. Retrieved January 8th, 2011, from http://www.britannica.com/EBchecked/topic/7798/Afghanistan Piganeau, G., Eyre-Walker, A. (2004) A Reanalysis of the Indirect Evidence for Recombination in Human Mitochondrial DNA. Heredity 92:282-288. Powell, G., Yang, H., Tyler-Smith, C., Xue, Y. (2007) The Population History of the Xibe in Northern China: A Comparison of Autosomal, mtDNA and Y-Chromosomal Analyses of Migration and Gene Flow. Forensic Science International: Genetics 1:115119. Qamar, R., Ayub, Q., Mohyuddin, A., Mazhar, K., Mansoor, A., Zerjal, T., TylerSmith, C., Mehdi, Q. (2002) Y-Chromosomal DNA Variation in Pakistan. American Journal of Human Genetics 70:1107-1124. Qiagen (2009) PAXgene Blood DNA Kit. Retrieved December 10th, 2010, from http://www.qiagen.com/products/genomicdnastabilizationpurification/paxgeneblooddnas ystem/paxgeneblooddnakit.aspx#Tabs=t1 Quintana-Murci, L., Chaix, R., Wells, S., Behar, D., Sayar, H., Scozzari, R., Rengo, C., Al-Zahery, N., Semino, O., Santachiara-Benerecetti, S., Coppa, A., Ayub, Q., Mohyuddin, A., Tyler-Smith, C., Mehdi, Q., Torroni, A., McElreavey, K. (2004) Where West Meets East: The Complex mtDNA Landscape of the Southwest and Central Asian Corridor. American Journal of Human Genetics 74:827-845. Rasanayagam, A. (2003) Afghanistan: A Modern History. London: I.B. Tauris. Reidla, M., Kivisild, T., Metspalu, E., Kaldma, K., Tambets, K., Tolk, H-V, Parik, J., Loogvali, E-L., Derenko, M., Malyarchuk, B., Bermisheva, M., Zhadanov, S., Pennarun, E., Gubina, M., Golubenko, M., Damba, L., Fedorova, S., Gusar, V., Grechanina, E., Mikerezi, I., Moisan, J-P., Chavantré, A., Khusnutdinova, E., Osipova, L., Stepanov, V., Voevoda, M., Achilli, A., Rengo, C., Rickards, O., De Stefano, G.F., Papiha, S., Beckman, L., Janicijevic, B., Rudan, P., Anagnou, N., Michalodimitrakis, C., Koziel, S., Usanga, E., Geberhiwot, T., Herrnstadt, C., 214 Howell, N., Torroni, A., Villems, R. (2003) Origin and Diffusion of mtDNA Haplogroup X. American Journal of Human Genetics 73:1178-1190. Richard, C., Pennarun, E., Kivisild, T., Tambets, K., Tolk, H-V., Metspalu, E., Reidla, M., Chevalier, S., Giraudet, S., Lauc, L., Pericic, M., Rudan, P., Claustres, M., Journel, H., Dorval, I., Muller, C., Villems, R., Chaventre, A., Moisan, JP. (2007) An mtDNA Perspective of French Genetic Variation. Annuals of Human Biology 34:68-79. Richards, M., Macaulay, V., Bandelt, H-J., Sykes, B. (1998) Phylogeography of Mitochondrial DNA in Western Europe. Annuals of Human Genetics 62:241-260. Richards, M., Macaulay, V., Hickey, E., Vega, E., Sykes, B., Guida, V., Rengo, C., Sellitto, D., Cruciani, F., Kivisild, T., Villems, R., Thomas, M., Rychkov, S., Rychkov, O., Rychkov, Y., Gölge, M., Dimitrov, D., Hill, E., Bradley, D., Romano, V., Cali, F., Vona, G., Demaine, A., Papiha, S., Triantaphyllidis, C., Stefanescu, G., Hatina, J., Belledi, M., Di Rienzo, A., Novelletto, A., Oppenheim, A., Norby, S., AlZaheri, N., Santachiara-Benerecetti, S., Scozzari, R., Torroni, A., Bandelt, H-J. (2000) Tracing European Founder Lineages in the Near Eastern mtDNA Pool. American Journal of Human Genetics 67:1251-1276. Richards, M., Macaulay, V., Torroni, A., Bandelt, H-J. (2002) In Search of Geographical Patterns in European Mitochondrial DNA. American Journal of Human Genetics 71:1168-1174. Rogers, A.R., Harpending, H. (1992) Population Growth Makes Waves in the Distribution of Pairwise Genetic Differences. Molecular Biology and Evolution 9:552569. Röhl, A., Brinkmann, B., Forster, L., Forster, P. (2001) An Annotated mtDNA Database. International Journal of Legal Medicine 115:29-39. Roostalu, U., Kutuev, I, Loogvali, E-L., Metspalu, E., Tambets, K., Reidla, M., Khusnutdinova, E.K., Usanga, E., Kivisild, T., Villems, R. (2007) Origin and Expansion of Haplogroup H, the Dominant Human Mitochondrial DNA Lineage in West 215 Eurasia: The Near Eastern and Caucasian Perspective. Molecular Biology and Evolution 24:436-448. Ruvolo, M., Zehr, S., von Dornum, M., Pan, D., Chang, D., Lin, J. (1993) Mitochondrial COII Sequences and Modern Human Origins. Molecular Biology and Evolution 10:1115-1135. Salas, A., Richards, M., De la Fe, T., Lareu, M-V., Sobrino, B., Sanchez-Diaz, P., Macaulay, V., Carracedo, A. (2002) The Making of the African mtDNA Landscape. American Journal of Human Genetics 71:1082-1111. Schick, K., Toth, D. (1993) Making Silent Stones Speak: Human Evolution and the Dawn of Technology. New York: Simon & Schuster. Schurr, T., Ballinger, S., Gan, Y-Y., Hodge, J., Merriwether, D.A., Lawrence, D., Knowler, W., Weiss, K., Wallace, D. (1990) Amerindian Mitochondrial DNAs Have Rare Asian Mutations at High Frequencies, Suggesting They Derived from Four Primary Maternal Lineages. American Journal of Human Genetics 46:613-623. Schwartz, M., Vissing, J. (2002) Paternal Inheritance of Mitochondrial DNA. New England Journal of Medicine 347:576-580. Shepard, E.M., Herrera, R.J. (2006) Iranian STR Variation at the Fringes of Biogeographical Demarcation. Forensic Science International 158:140-148. Shlush, L., Behar, D., Yudkovsky, G., Templeton, A., Hadid, Y., Basis, F., Hammer, M., Itzkovitz, S., Skorecki, K. (2008) The Druze: A Population Genetic Refugium of the Near East. PLoS ONE 3:e2105. Short, D. (2007) Indo-European Languages; Part 1: Centum Languages. Retrieved August 18th, 2009, from http://www.danshort.com/ie/iecentum.htm Short, D. (2007) Indo-European Languages; Part 2: Satem Languages. Retrieved August 18th, 2009, from http://danshort.com/ie/iesatem.htm 216 Soares, P., Ermini, L., Thompson, N., Mormina, M., Rito, T., Röhl, A., Salas, A., Oppenheimer, S., Macaulay, V., Richards, M. (2009) Correcting for Purifying Selection: An Improved Human Mitochondrial Molecular Clock. American Journal of Human Genetics 84:740-759. Soodyall, H., Vigilant, L., Hill, A.V., Stoneking, M., Jenkins, T. (1996) mtDNA Control-Region Sequence Variation Suggests Multiple Independent Origins of an “Asian-Specific” 9-bp Deletion in Sub-Saharan Africans. American Journal of Human Genetics 58:595-608. St. John, J., Sakkas, D., Dimitriadi, K., Barnes, A., Maclin, V., Ramey, J., Barratt, C., De Jonge, C. (2000) Failure of Elimination of Paternal Mitochondrial DNA in Abnormal Embryos. The Lancet 355:200. Stoneking, M. (2008) Human Origins: The Molecular Perspective. EMBO Reports 9:S46-S50. Stringer, C. (2002) Modern Human Origins: Progress and Prospects. Philosophical Transactions, Biological Sciences 357:563-579. Sykes, B., Irven, C. (2000) Surnames and the Y Chromosome. American Journal of Human Genetics 66:1417-1419. Tajima, F. (1983) Evolutionary Relationship of DNA Sequences in Finite Populations. Genetics 105:437-460. Tajima, F. (1989) Statistical Method for Testing the Neutral Mutation Hypothesis by DNA Polymorphism. Genetics 123:585-595. Tambets, K., Rootsi, S., Kivisild, T., Help, H., Serk, P., Loogvali, EL., Tolk, HV., Reidle, M., Metspalu, E., Pliss, L., Balanovsky, O., Pshenichnov, A., Balanovska, E., Gubina, M., Zhadanov, S., Osipova, L., Damba, L., Voevoda, M., Kutuev, I., Bermisheva, M., Khusnutdinova, E., Gusar, V., Grechanina, E., Parik, J., Pennarun, E., Richard, C., Chaventre, A., Moisan, JP., Barac, L., Pericic, M., Rudan, P., Terzic, R., Mikerezi, I., Krumina, A., Baumanis, V., Koziel, S., Rickards, 217 O., De Stefano, GF., Anagnou, N., Pappa, KI., Michalodimitrakis, E., Ferak, V., Furedi, S., Komel, R., Beckman, L., Villems, R. (2004) The Western and Eastern Roots of the Saami – the Story of Genetic “Outliers” Told by Mitochondrial DNA and Y Chromosomes. American Journal of Human Genetics 74:661-682. Tetzlaff, S., Brandstatter, A., Wegener, R., Parson, W., Weirich, V. (2007) Mitochondrial DNA Population Data of HVS-I and HVS-II Sequences from a Northeast German Sample. Forensic Science International 172:218-224. Tömöry, G., Csanyi, B., Bogacsi-Szabo, E., Kalmar, T., Czibula, A., Csosz, A., Priskin, K., Mende, B., Lango, P., Downes, C.S., Rasko, I. (2007) Comparison of Maternal Lineage and Biogeographic Analyses of Ancient and Modern Hungarian Populations. American Journal of Physical Anthropology 134:354-368. Torroni, A., Schurr, T., Yang, C. C., Szathmary, E., Williams, R., Schanfield, M., Troup, G., Knowler, W., Lawrence, D., Weiss, K., Wallace, D. (1992) Native American Mitochondrial DNA Analysis Indicates That the Amerind and the Nadene Populations Were Founded by Two Independent Migrations. Genetics 130:153-162. Torroni, A., Schurr, T., Cabell, M., Brown, M., Neel, J., Larsen, M., Smith, D., Vullo, C., Wallace, D. (1993) Asian Affinities and Continental Radiation of the Four Founding Native American mtDNAs. American Journal of Human Genetics 53:563-590. Torroni, A., Neel, J., Barrantes, R., Schurr, T., Wallace, D. (1994a) Mitochondrial DNA “Clock” for the Amerinds and its implications for timing their entry into North America. Proceedings of the National Academy of Sciences of the USA 91:1158-1162. Torroni, A., Miller, J., Moore, L., Zamudio, S., Zhuang, J., Droma, T., Wallace, D. (1994b) Mitochondrial DNA Analysis in Tibet: Implications for the Origin of the Tibetan Population and Its Adaptation to High Altitude. American Journal of Physical Anthropology 93:189-199. Torroni, A., Huoponen, K., Franalacci, P., Petrozzi, M., Morelli, L., Scozzari, R., Obinu, D., Savontaus, M. L., Wallace, D. (1996) Classification of European mtDNAs from an Analysis of Three European Populations. Genetics 144:1835-1850. 218 Torroni, A., Petrozzi, M., D’Urbano, L., Sellitto, D., Zeviani, M., Carrara, F., Carducci, C., Leuzzi, V., Carelli, V., Barboni, P., De Negri, A., Scozzari, R. (1997) Haplotype and Phylogenetic Analyses Suggest That One European-Specific mtDNA Background Plays a Role in the Expression of Leber Hereditary Optic Neuropathy by Increasing the Penetrance of Primary Mutations 11778 and 14484. American Journal of Human Genetics 60:1107-1121 Torroni, A., Bandelt, HJ., D’Urbano, L., Lahermo, P., Moral, P., Sellitto, D., Rengo, C., Forster, P., Savontaus, M.L., Bonne-Tamir, B., Scozzari, R. (1998) mtDNA Analysis Reveals a Major Late Paleolithic Population Expansion from Southwestern to Northeastern Europe. American Journal of Human Genetics 62:1137-1152. Torroni, A., Cruciani, F., Rengo, C., Sellitto, D., Lopez-Bigas, N., Rabionet, R., Govea, N., Lopez de Munain, A., Sarduy, M., Romero, L., Villamar, M., del Castillo, I., Moreno, F., Estivill, X., Scozzari, R. (1999) The A1555G Mutation in the 12S rRNA Gene of Human mtDNA: Recurrent Origins and Founder Events in Families Affected by Sensorineural Deafness. American Journal of Human Genetics 65:13491358. Torroni, A., Bandelt, HJ., Macaulay, V., Richards, M., Cruciani, F., Rengo, C., Martinez-Cabrera, V., Villems, R., Kivisild, T., Metspalu, E., Parik, J., Tolk, HV., Tambets, K., Forster, P., Karger, B., Francalacci, P., Rudan, P., Janicijevic, B., Rickards, O., Savontaus, ML., Huoponen, K., Laitinen, V., Koivumäki, S., Sykes, B., Hickey, E., Novelletto, A., Moral, P., Sellitto, D., Coppa, A., Al-Zaheri, N., Santachiara-Benerecetti, A.S., Semino, O., Scozzari, R. (2001) A Signal, from Human mtDNA, of Postglacial Recolonization in Europe. American Journal of Human Genetics 69:844-852. Torroni, A., Achilli, A., Macaulay, V., Richards, M., Bandelt, HJ. (2006) Harvesting the Fruit of the Human mtDNA Tree. Trends in Genetics 22(6):339-345. United Nations Population Division (UNPD) (2009) World Population Prospects: The 2008 Revision Population Database. Retrieved http://esa.un.org/unpp/ 219 January 11th, 2011, from United Nations Population Fund (UNFPA) (2010) From Conflict and Crisis to Renewal: Generations of Change; Demographic, Social and Economic Indicators. Retrieved 10th, November 2010, from http://www.unfpa.org/swp/2010/web/en/indicators.shtml UNHCR - United Nations Refugee Agency (2003) Assessment for Uzbeks in Afghanistan, Retrieved November 17th, 2010, from http://www.unhcr.org/refworld/docid/469f3a521d.html and http://www.unhcr.org/cgibin/texis/vtx/page?page=49e486eb6 UNHCR (2011) 2011 UNHCR Country Operations Profile - Afghanistan. Retrieved January 6th, 2011, from http://www.unhcr.org/cgi-bin/texis/vtx/page?page=49e486eb6 United Nations Statistics Division (2010) 2010 World Population and Housing Census Programme. Retrieved November 15th, 2010, from http://unstats.un.org/unsd/demographic/sources/census/censusdates.htm Vigilant, L., Stoneking, M., Harpending, H., Hawkes, K., Wilson, A. (1991) African Populations and the Evolution of Human Mitochondrial DNA. Science 253:1503-1507. Wallace, D., Brown, M., Lott, M. (1999) Mitochondrial DNA Variation in Human Evolution and Disease. Gene 238:211-230. Watson, J., Crick, F. (1953) A structure for deoxyribose nucleic acid. Nature 171:737738. Watterson, G. (1975) On the Number of Segregating Sites in Genetical Models without Recombination. Theoretical Population Biology 7:256-276. Weather and Climate in Afghanistan (2011) Retrieved January 10th, 2011, from http://www.southtravels.com/asia/afghanistan/weather.html Weinbaum, M. (2011) Afghanistan In Encyclopædia Britannica. Retrieved January 8th, 2011, from http://www.britannica.com/EBchecked/topic/7798/Afghanistan 220 Wilber, D. (1962) Afghanistan: Its People, Its Society, Its Culture. New Haven: Hraf Press. Winters, C. (2011) The Gibraltar Out of Africa Exit for Anatomically Modern Humans. Webmed Central 2(10):WMC002319. Wolpoff, M., Wu, X., Thorne, A. (1984) Modern Homo sapiens Origins: A General Theory of Hominid Evolution Involving the Fossil Evidence from East Asia. In F. Smith, F. Spencer (Eds), The Origins of Modern Humans: A World Survey of the Fossil Evidence (pp 411-483). New York: A. R. Liss. Wolpoff, M., Hawks, J., Caspari, R. (2000) Multiregional, Not Multiple Origins. American Journal of Physical Anthropology 112:129-136. Yang, Y., Zhang, P., He, Q., Zhu, Y., Yang, X., Lv, R., Chen, J. (2011) A New Strategy for the Discrimination of Mitochondrial DNA Haplogroups in Han Population. Journal of Forensic Sciences 56:586-590. Zerjal, T., Xue, Y., Bertorelle, G., Wells, R. S., Bao, W., Zhu, S., Qamar, R., Ayub, Q., Mohyuddin, A., Fu, S., Li, P., Yuldasheva, N., Ruzibakiev, R., Xu, J., Shu, Q., Du, R., Yang, H., Hurles, M., Robinson, E., Gerelsaikhan, T., Dashnyam, B., Mehdi, Q., Tyler-Smith, C. (2003) The Genetic Legacy of the Mongols. American Journal of Human Genetics 72:717-721. Zimmerman, B., Brandstätter, A., Duftner, N., Niederwieser, D., Spiroski, M., Arsov, T., Parson, W. (2007) Mitochondrial Control Region Population Data from Macedonia. Forensic Science International: Genetics 1:e4-e9. Zlojutro, M., Tarskaia, L., Sorensen, M., Snodgrass, J.J., Leonard, W., Crawford, M. (2008) The Origins of the Yakut People: Evidence from Mitochondrial DNA Diversity. International Journal of Human Genetics 8:119-130. Zvelebil, M. (1980) The Rise of the Nomads in Central Asia. In A. Sherratt (Ed). The Cambridge Encyclopedia of Archaeology (pp252-256). New York:Crown. 221 Appendices Appendix 1: Materials Appendix 2: Ethical Consent Forms Appendix 3: mtDNA HVS-I Sequence Data 222 Appendix 1: Materials 223 Equipment 1-10µl pipettor 1-20µl pipettor 10-100µl pipettor 100-1,000µl pipettor Techne TC-3000 Thermocycler Applied Biosystem Veriti Thermocycler Thermo Electron Corporation Px2 Thermal Cycler Jouan BR4i Centrifuge ThermoScientific Heraeus Pico 17 Centrifuge MSE MicroCentaur Centrifuge Grant Heatblock Eppendorf Concentrator 5301 Vacuum Captair Bio PCR UV Cabinet Priorclave Compact 40 Benchtop Autoclave Stinol Fridge/Freezer GeneFlash Syngene Bio Imaging UV Transilluminator BioRad PowerPac Basic Power Pack Fissions Whirlmixer Vortex Mettler PJ400 Scales NanoDrop Spectrophotometer ND-1000 Stuart Scientific Magnetic Stirrer SM1 Proline Microwave LEEC Heated Cabinet Electrophoresis Tank Consumables 0.5-10µl Pipettor tips 1-200µl Pipettor tips 100-1,000µl Pipettor tips 2ml Microcentrifuge tubes 1.5ml Microcentrifuge tubes 0.5ml Microcentrifuge tubes 0.2ml Domed PCR tubes Macherey-Nagel NucleoSpin Extract II Kit Restriction Enzymes AluI BfaI BstNI HaeII HaeIII HhaI HincII HinfI HpaI HphI MboII MnlI MseI NlaIII 224 Solutions ● 10x TBE ● 10% Ammonium Persulphate (APS) 54g Tris Base 27.5g Boric Acid 4.65g EDTA 1g APS in 10ml dH2O. Added to 500ml dH2O. ● Ethidium Bromide (10mg/mL) ● Glycogen 0.2g in 20ml dH2O. 20mg/ml 225 DNA Isolation Protocol 226 227 228 Appendix 2: Ethical Consent Forms 229 230 231 232 233 234 235 Appendix 3: mtDNA HVS-I Sequence Data 236 Forensic Format of HVS-I Sequence Haplotypes for the Afghani Ethnic Groups Baluch: Sample Number 43_Afghani_Bal Polymorphic Sites 16126.C 16163.G 16186.T 16189.C 16294.T 16325.C 44_Afghani_Bal 16172.C 16183.C 16189.C 16193.1C 49_Afghani_Bal 16071.T 51_Afghani_Bal 16145.A 16176.T 16223.T 16261.T 16311.C 97_Afghani_Bal 16069.T 16126.C 16145.A 16172.C 16222.T 16261.T 16292.A 16344.T 98_Afghani_Bal 16069.T 16093.C 16126.C 16145.A 16240.G 16261.T 99_Afghani_Bal 16189.C 16189.1C 16193.1C 16223.T 16278.T 16311.C 100_Afghani_Bal 16354.T 101_Afghani_Bal 16093.C 16223.T 16362.C 103_Afghani_Bal 16182.- 16183.C 16189.C 16193.1C 16223.T 16290.T 16319.A 16362.C 104_Afghani_Bal 16256.T 16294.T 16352.C 114_Afghani_Bal 16129.A 16223.T 121_Afghani_Bal 16240.G 16256.T 16294.T 16352.C 122_Afghani_Bal 16300.G 16325.C 16362.C 123_Afghani_Bal 16071.T Pashtun: Sample Number Polymorphic Sites 20_Afghani_Pas 16183.C 16189.C 16193.1C 16249.C 16265.G 25_Afghani_Pas 16140.C 16182.- 16183.C 16189.C 16193.1C 16193.2C 16217.C 16274.A 16335.G 33_Afghani_Pas 16184.T 34_Afghani_Pas 16183.C 16189.C 16193.1C 16223.T 16278.T 38_Afghani_Pas 16136.C 16174.T 16248.T 16266.T 16304.C 16325.C 16356.C 39_Afghani_Pas 16223.T 16289.G 47_Afghani_Pas 16309.G 16318.T 16343.G 16362.C 80_Afghani_Pas 16192.T 16217.C 16357.C 138_Afghani_Pas 16069.T 16126.C 16145.A 16222.T 16261.T 139_Afghani_Pas 16217.C 162_Afghani_Pas 16266.T 16304.C 16311.C 16356.C 186_Afghani_Pas Anderson 187_Afghani_Pas 16223.T 16227.G 16262.T 16278.T 16294.T 16362.C 191_Afghani_Pas 16129.A 16223.T 237 Hazara: Sample Number Polymorphic Sites 1_Afghani_Haz 16223.T 16290.T 16319.A 16362.C 2_Afghani_Haz 16223.T 16362.C 5_Afghani_Haz 16129.A 16223.T 16298.C 16319.A 16327.T 6_Afghani_Haz 16362.C 7_Afghani_Haz 16223.T 16311.C 8_Afghani_Haz 16111.T 16129.A 16223.T 16257.A 16261.T 10_Afghani_Haz 16223.T 16297.C 16298.C 16327.T 16357.C 11_Afghani_Haz 16071.T 13_Afghani_Haz 16172.C 16183.C 16189.C 16193.1C 16232.A 16249.C 16304.C 16311.C 15_Afghani_Haz 16183.- 16189.C 16193.1C 16223.T 16278.T 18_Afghani_Haz 16129.A 16223.T 16297.C 19_Afghani_Haz 16223.T 16288.C 16298.C 16327.T 28_Afghani_Haz 16189.C 16193.1C 16223.T 16278.T 40_Afghani_Haz 16069.T 16126.C 16145.A 16172.C 16261.T 16292.A 16344.T 41_Afghani_Haz 16069.T 16126.C 16145.A 16172.C 16222.T 16261.T 102_Afghani_Haz 16183.C 16189.C 16193.1C 16223.T 16278.T 105_Afghani_Haz 16051.G 16086.C 16291.T 16305.T 16353.T 106_Afghani_Haz 16129.A 16189.C 16189.1C 16193.1C 16223.T 16248.T 16297.C 107_Afghani_Haz 16093.C 16223.T 16230.G 16234.T 16311.C 16362.C 108_Afghani_Haz 16111.T 16136.C 16223.T 16260.T 16298.C 109_Afghani_Haz 16209.C 16230.G 16256.T 110_Afghani_Haz 16037.G 16041.G 16172.C 16183.- 16189.C 16193.1C 16232.A 16249.C 16304.C 16311.C 113_Afghani_Haz 16185.T 16209.C 16260.T 16298.C 115_Afghani_Haz 16223.T 16294.T 16362.C 116_Afghani_Haz 16223.T 16239.T 16240.C 16274.A 16311.C 16319.A 117_Afghani_Haz 16224.C 16311.C 16362.C 118_Afghani_Haz 16129.A 16175.G 16180.- 16181.- 16189.C 16189.1C 16193.1C 16193.2C 16311.C 119_Afghani_Haz 16189.C 16193.1C 16223.T 16290.T 16319.A 16362.C 120_Afghani_Haz 16129.A 16223.T 16298.C 16319.A 16327.T 124_Afghani_Haz 16270.T 125_Afghani_Haz 16304.C 128_Afghani_Haz 16223.T 16298.C 16327.T 129_Afghani_Haz 16182.- 16183.C 16189.C 16193.1C 16319.A 16362.C 130_Afghani_Haz 16092.C 16129.A 16148.T 16223.T 16271.C 16362.C 131_Afghani_Haz 16126.C 16292.T 16294.T 133_Afghani_Haz 16111.T 16140.C 16183.C 16189.C 16193.1C 16234.T 16243.C 135_Afghani_Haz 16093.C 16129.A 16223.T 16298.C 16327.T 136_Afghani_Haz 16092.C 16129.A 16148.T 16223.T 16271.C 16362.C 151_Afghani_Haz 16356.C 168_Afghani_Haz 16311.C 16356.C 16362.C 238 Tajik: Sample Number Polymorphic Sites 30_Afghani_Taj 16223.T 16290.T 16319.A 16362.C 32_Afghani_Taj Anderson 134_Afghani_Taj 16189.C 16189.1C 16193.1C 16223.T 16278.T 16311.C 140_Afghani_Taj 16356.C 142_Afghani_Taj 16201.T 16209.C 16223.T 16265.G 143_Afghani_Taj Anderson 145_Afghani_Taj 16172.C 16184.A 149_Afghani_Taj 16274.A 170_Afghani_Taj 16134.T 16172.C 16356.C 173_Afghani_Taj 16266.T 16304.C 16311.C 16356.C 175_Afghani_Taj Anderson 176_Afghani_Taj 16071.T 16172.C 188_Afghani_Taj 16327.T 189_Afghani_Taj 16071.T 16362.C 190_Afghani_Taj 16185.T 16354.T 193_Afghani_Taj 16173.T 16223.T 16362.C 198_Afghani_Taj 16325.C 200_Afghani_Taj 16172.C 16223.T 16362.C 239 Mismatch Format of HVS-I Sequence Haplotypes for the Afghani Ethnic Groups Baluch: Pashtun: 240 Hazara: 241 Tajik: 242