F - 8th International Biocuration Conference
Transcription
F - 8th International Biocuration Conference
The 8th International Biocuration Conference, April 23-26, 2015, Beijing, China Challenges and Practices of Big Data in Life Science 《Genomics study driven by biological questions》 Yixue Li Institute of Biochemistry & Cell Biology Shanghai Institutes for Biological Sciences Chinese Academy of Sciences yxli@sibs.ac.cn “For such a large number of problems there will be some animal of choice, or a few such animals, on which it can be most conveniently studied” August Krogh et al., Am J Physiol. 90(2) pp. 243-251(1929) August Krogh: a Danish Nobel Laureate and a physiologist, he got his Nobel praise last century on 1920. “Science is a gamble, then you need to win” is a Key! Dr. Takash Gojobori, 2014 Jeju, Korea, Dog genomics study for deciphering mechanisms of adoption to high-altitude hypoxia Tibetans average elevation of 4,000 meters oxygen level is about 60% of that at sea level cold climate and limited resources sustained increase in cerebral blood flow lower hemoglobin concentration less susceptibility to chronic mountain sickness Andeans Ethiopians hemoglobin concentration is higher temporary and reversible acclimatization increased oxygen level in hemoglobin around 3,000 metres to 3,500 metres elevated hemoglobin levels do not increased in oxygen-content of hemoglobin Beall. PNAS. 2002 As erythrocytosis is a common symptom of chronic mountain sickness which will lead to high blood viscosity and cardiovascular disorders. Whereas, the decrease in hemoglobin level may provide a protective mechanism for people live in highland. Pre-genome scan result • HIF (hypoxia-inducible factor) pathway – Tibetans • Metabolic pathways – Yak – Tibetan antelope Simonson et al. Science. 2010 Although a lot of studies focused on wildlife and human highlanders, only few researches were performed on domesticated animals that migrated to the plateau with humans. Beall et al. PNAS, 2010 Qiu et al. Nat. Genet. 2012 Results about EPAS1/HIF2α • 31 SNPs were found in intron region of EPAS1 gene which is a transcription factor also called HIF2α. EPAS1 gene were found in high linkage disequilibrium that correlated significantly with hemoglobin concentration in Tibetans population(196 Tibetans and 84 Han individual from HaoMap3, Beall et al. 2010). • Because all of the found SNPs are located at the intron region of EPAS1, the detailed functional association between genotype and phenotype remains unclear. We still want to know the detailed type of selections exists for human high-altitude adaption occurred in the hypoxia-inducible factor (HIF)? Foucs on domesticated animals that migrated to the plateau with humans/Tibecan. Tibetans Vs. Tibetan Mastiff Increased blood flow(Tibetans) Vs. ?(Tibetan Mastiff ) Genome wide association study Vs. Whole genome sequencing Vs. illumina genotyping chips Samples and Data • We sampled six dog breeds from continuous altitudes along the “Ancient Tea Horse Road” in southwestern China. • Each dog was sampled from one individual village to avoid potential kinships. • The sex ratio was kept as 1:1 for each breed. • In total, 60 dogs from six dog breeds were sequenced. Breed (abbreviation) History Sample size Location Cuomei, Tibet, China (n = 4) Tibetan Mastiff (TM) Ancient 10 Yushu, Qinghai, China (n = 4) Diqing, Yunnan, China (n= 2) Diqing indigenous dog Altitude 5,100 m 4,200 m 3,300 m Ancient 10 Diqing, Yunnan, China 3,300 m Ancient 10 Lijiang, Yunnan, China 2,400 m Kunming dog (KM) Modern 10 Kunming, Yunnan, China 1,800 m German Shepherd (GS) Modern 10 Kunming, Yunnan, China 1,800 m Ancient 10 Yingjiang, Yunnan, China 800 m (DQ) Lijiang indigenous dog (LJ) Yingjiang indigenous dog (YJ) From raw reads to SNPs Raw reads Per individual analysis Per breed analysis Read QC QC report Read mapping Mapping report SNP calling Depth report SNP filtering SNP report SNP annotation Annotation report Workflow for Population Genetics data analyzing Diversity Population polymorphism Allele frequency demographic event LD PCA Population structure Tree Evolution relationship Ancestry Fst Selective sweep mapping Diversity reduction LD increasing Selective target Whole-genome FST mapping • We performed whole-genome FST scan and focused on regions with the extreme FST value (Z(FST) > 5) . • 28 unique autosomal regions containing 141 candidate genes were identified. • Five genes of them including: EPAS1, MSRB3, HBB, CDK2 and GNB1 belong to the GO categories ‘response to oxygen levels’ and ‘response to oxidative stress’. Fst:genetic differences among population HIF pathways • The region with the strongest differentiation EPAS1, a gene encodes the hypoxia-inducible factor (HIF) 2α. • Network analysis indicated that the other candidate hypoxia-response genes we identified would all be regulated by HIF signaling pathway, suggesting an essential role of EPAS1 in the adaption of high-altitude dogs. • Interestingly, EPAS1 was also identified as a selective target in Tibetan people. Amino acid conservation • Among the four non-synonymous mutations, one (G305S) occurred in the PAS domain, which is essential for the activity of EPAS1. • G305S is also a quite conserved amino acid mutation, which is invariant among all the vertebrates we examined. Structural and functional effects of G305S • G305S occurred in a beta sheet, which could affect the thermodynamic stability of the domain. • Prediction of functional effects supports that only G305S is deleterious, while the other three are tolerated. Physiological association • We conducted association testing for the variant G305S and hematologic parameters in DQ, the high-altitude breed where enough homozygotes (n = 40) and heterozygotes (n = 29) could be collected. • Although no evident relationship with hemoglobin concentration was found, The homozygotes with two mutant alleles (AA) show decreased vascular resistance as compared with the heterozygotes (GA). Zhen W. et.al., Genome Research, 2013 Camel genomics study for a prevention mechanisms of Type 2 diabetes • Storing energy in humps and abdomen in the form of fat, enabling them to survive long periods without any food and water. • The body temperature may vary from 34 to 41 ℃ (Celsius temperature) throughout the day. • The blood glucose levels in camels (6-8 mmol·l-1) are twice more than those in other ruminants. • Tolerant of a high dietary intake of salt, consuming eight times more than cattle and sheep. • The Camelidae family are the only mammals that can produce heavy-chain antibodies (HCAbs). Kaske, M., Elmahdi, B., Engelhardt, W. & Sallmann, H. P. Insulin responsiveness of sheep, ponies, miniature pigs and camels: results of hyper insulinemic clamps using porcine insulin. J. Comp. Physiol. B 171, 549–556 (2001). GENOME ANNOTATION PIPELINE Protein-coding gene prediction Scaffolds RepeatMasker Repeat elements Ab-initio (Augustus, GenScan) EST (dromedary) Repeatmasked sequences tRNA (tRNAscan) EvidenceModeler ncRNA prediction rRNA (SILVA) Protein-coding gene miRNA (miRBase) Repeat and ncRNAmasked sequences InterProScan Domain/ Family Homology (genBlastA) KAAS GO KEGG Genome data visualization 基因组 比较基因组 交互展示 动态展示 综合展示 群体基因组 功能基因组 (转录组 蛋白质组 代谢组) THE ACCELERATED EVOLUTION OF PATHWAYS IN CAMELS We estimated the dN/dS ratios for the camel and its closest cattle orthologs, taking the human ortholog as an outgroup. The significantly faster evolving genes in camels than in cattle were identified and were mapped to the KEGG pathways. RAPIDLY EVOLVING GENES Human Cattle Camel • • • Rapidly evolving gens, as measured by an increased dN/dS ratio, may under adaptive selection or relaxed purifying selection. In total, 2,730 genes evolving significantly faster in camel than in cattle by taking human orthologs as outgroups. These rapidly evolving genes are enriched in metabolic pathways and signaling pathways regulating metabolic processes, and part of genes are also cancer related. INSULIN SIGNALING PATHWAY • Physiological experiments demonstrated that the high level of blood glucose in camels may be caused by their strong capacity for insulin resistance. • Our research shows that a significantly large number of rapidly evolving genes in camels are involved in insulin signaling pathway, which may change its sensitivity to insulin. • Does there exist a unique CYP2-CYP4 metabolic module to helps camel tolerate hyperglycemia in their population lever? Copy number variation of P450 Family between camel and other mammals A total of 60 members in the P450 family were found in the camel genome and were carefully annotated. A remarkable gene number variation between camels and other mammals in the subfamilies of CYP2J, CYP4A and CYP4F was found; there were 11 copies of CYP2J in camels, more than those in cattle (4) and humans (1). In contrast, there was only one copy of CYP4A and two copies of CYP4F in camels, fewer than those in cattle (3 and 7, respectively) and humans (2 and 6, respectively) . Phylogenetic analysis of CYP2 and CYP4 family supported the expansion of CYP2J and contraction of CYP4A/F in the camel lineage. Cytochrome P450 family Family CYP2 Subfamily CYP2E CYP2J CYP4 CYP4A CYP4F Cattle 23 1 4 13 3 7 Horse 31 1 1 15 3 7 Human 20 1 1 12 2 6 Camel 27 2 11 7 1 2 EETs • 19(S)-HETE was demonstrated to be a potent vasodilator of renal preglomerular vessels that stimulate water re-absorption. • The activity of CYP2J is regulated by high-salt diet and its suppression can lead to high blood pressure. Camels are known to be able to take in a large amount of salt, but they do not seem to develop hypertension, perhaps because they have more copies of CYP2J genes. Does natural adopted CYP2J-CYP4A duplex system enable camel to move away from metabolic syndrome? Zhen W. et.al., Nature communications, 2012 P450 gene family CYP2/CYP4 20-HETE/EETS EETS Promote tumor metastases and enhance angiogenesis in and around primary tumors Diabetes/High blood glucose mutations mutations Oncogene (Negative selection) Tumor suppressor genes (Positive selection) Genes involved in insulin signaling pathways (Negative selection) Rapidly evolving genes found in camel genome GENOME ANNOTATION PIPELINE Protein-coding gene prediction Scaffolds RepeatMasker Repeat elements Ab-initio (Augustus, GenScan) EST (dromedary) Repeatmasked sequences tRNA (tRNAscan) EvidenceModeler ncRNA prediction rRNA (SILVA) Protein-coding gene miRNA (miRBase) Repeat and ncRNAmasked sequences InterProScan Domain/ Family Homology (genBlastA) KAAS GO KEGG Thanks for your attention!