Reconciling the spectrum of Sagittarius A* with a two
Transcription
Reconciling the spectrum of Sagittarius A* with a two
letters to nature Received 7 July; accepted 21 September 1998. 1. Dalbey, R. E., Lively, M. O., Bron, S. & van Dijl, J. M. The chemistry and enzymology of the type 1 signal peptidases. Protein Sci. 6, 1129±1138 (1997). 2. Kuo, D. W. et al. Escherichia coli leader peptidase: production of an active form lacking a requirement for detergent and development of peptide substrates. Arch. Biochem. Biophys. 303, 274±280 (1993). 3. Tschantz, W. R. et al. Characterization of a soluble, catalytically active form of Escherichia coli leader peptidase: requirement of detergent or phospholipid for optimal activity. Biochemistry 34, 3935±3941 (1995). 4. Allsop, A. E. et al. in Anti-Infectives, Recent Advances in Chemistry and Structure-Activity Relationships (eds Bently, P. H. & O'Hanlon, P. J.) 61±72 (R. Soc. Chem., Cambridge, 1997). 5. Black, M. T. & Bruton, G. Inhibitors of bacterial signal peptidases. Curr. Pharm. Des. 4, 133±154 (1998). 6. Date, T. Demonstration by a novel genetic technique that leader peptidase is an essential enzyme in Escherichia coli. J. Bacteriol. 154, 76±83 (1983). 7. Whitely, P. & von Heijne, G. The DsbA-DsbB system affects the formation of disul®de bonds in periplasmic but not in intramembraneous protein domains. FEBS Lett. 332, 49±51 (1993). 8. Peat, T. S. et al. Structure of the UmuD9 protein and its regulation in response to DNA damage. Nature 380, 727±730 (1996). 9. Paetzel, M. et al. Crystallization of a soluble, catalytically active form of Escherichia coli leader peptidase. Proteins Struct. Funct. Genet. 23, 122±125 (1995). 10. van Klompenburg, W. et al. Phosphatidylethanolamine mediated insertion of the catalytic domain of leader peptidase in membranes. FEBS Lett. 431, 75±79 (1998). 11. Kim, Y. T., Muramatsu, T. & Takahashi, K. Identi®cation of Trp 300 as an important residue for Escherichia coli leader peptidase activity. Eur. J. Biochem. 234, 358±362 (1995). 12. Landolt-Marticorena, C., Williams, K. A., Deber, C. M. & Reithmeirer, R. A. Non-random distribution of amino acids in the transmembrane segments of human type I single span membrane proteins. J. Mol. Biol. 229, 602±608 (1993). 13. James, M. N. G. in Proteolysis and Protein Turnover (eds Bond, J. S. & Barrett, A. J.) 1±8 (Portland, Brook®eld, VT, 1994). 14. Strynadka, N. C. J. et al. Molecular structure of the acyl-enzyme intermediate in b-lactamase at 1.7 AÊ resolution. Nature 359, 393±400 (1992). 15. Manard, R. & Storer, A. C. Oxyanion hole interactions in serine and cysteine proteases. Biol. Chem. Hoppe-Seyler 373, 393±400 (1992). 16. Nicolas, A. et al. Contribution of cutinase Ser 42 side chain to the stabilization of the oxyanion transition state. Biochemistry 35, 398±410 (1996). 17. Paetzel, M. et al. Use of site-directed chemical modi®cation to study an essential lysine in Escherichia coli leader peptidase. J. Biol. Chem. 272, 9994±10003 (1997). 18. Paetzel, M. & Dalbey, R. E. Catalytic hydroxyl/amine dyads with serine proteases. Trends Biochem. Sci. 22, 28±31 (1997). 19. von Heijne, G. Signal sequences. The limits of variation. J. Mol. Biol. 184, 99±105 (1985). 20. Izard, J. W. & Kendall, D. A. Signal peptides: exquisitely designed transport promoters. Mol. Microbiol. 13, 765±773 (1994). 21. Matthews, B. W. Solvent content of protein crystals. J. Mol. Biol. 33, 491±497 (1968). 22. Otwinowski, Z. in DENZO (eds Sawyer, L., Isaacs, N. & Baily, S.) 56±62 (SERC Daresbury Laboratory, Warrington, UK, 1993). 23. Collaborative Computational Project No. 4 The CCP4 suite: programs for protein crystallography. Acta Crystallogr. D 50, 760±763 (1994). 24. Jones, T. A., Zou, J. Y., Cowan, S. W. & Kieldgaard, M. Improved methods for building protein models in electron density maps and the location of errors in these models. Acta Crystallogr. A 47, 110±119 (1991). 25. Brunger, A. T. X-PLOR: A System for X-ray Crystallography and NMR (Version 3.1) (Yale Univ. Press, New Haven, 1987). 26. Tronrud, D. E. Conjugate-direction minimization: an improved method for the re®nement of macromolecules. Acta Crystallogr. A 48, 912±916 (1992). 27. Wolfe, P. B., Wickner, W. & Goodman, J. M. Sequence of the leader peptidase gene of Escherichia coli and the orientation of leader peptidase in the bacterial envelope. J. Biol. Chem. 258, 12073±12080 (1983). 28. Kraulis, P. G. Molscript: a program to produce both detailed and schematic plots of protein structures. J. Appl. Crystallogr. 24, 946±950 (1991). 29. Nicholls, A., Sharp, K. A. & Honig, B. Protein folding and association: insights from the interfacial and the thermodynamic properties of hydrocarbons. Proteins Struct. Funct. Genet. 11, 281±296 (1991). 30. Meritt, E. A. & Bacon, D. J. Raster3D: photorealistic molecular graphics. Methods Enzymol. 277, 505± 524 (1997). Acknowledgements. We thank SmithKlineBeecham Pharmaceuticals for penem inhibitor; R. M. Sweet for use of beamline X12C (NSLS, Brookhaven National Laboratory); G. Petsko for the ethylmercury phosphate; M. N. G. James for access to equipment for characterization of earlier crystal forms of SPase; and S. Mosimann and S. Ness for discussions. This work was supported by the Medical Research Council of Canada, the Canadian Bacterial Diseases Network of Excellence, and British Columbia Medical Research Foundation grants to N.C.J.S. M.P. is funded by an MRC of Canada post-doctoral fellowship, N.C.J.S. by an MRC of Canada scholarship, and R.E.D. by the NIH and the American Heart Association. Correspondence and requests for materials should be addressed to N.C.J.S. (e-mail: natalie@byron. biochem.ubc.ca). errata Reconciling the spectrum of Sagittarius A* with a two-temperature plasma model Rohan Mahadevan Nature 394, 651±653 (1998) .................................................................................................................................. A misleading typographical error was introduced into the second sentence of the bold introductory paragraph of this Letter: the word ``infrared'' should be ``inferred''. M Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence S. T. Cole, R. Brosch, J. Parkhill, T. Garnier, C. Churcher, D. Harris, S. V. Gordon, K. Eiglmeier, S. Gas, C. E. Barry III, F. Tekaia, K. Badcock, D. Basham, D. Brown, T. Chillingworth, R. Connor, R. Davies, K. Devlin, T. Feltwell, S. Gentles, N. Hamlin, S. Holroyd, T. Hornsby, K. Jagels, A. Krogh, J. McLean, S. Moule, L. Murphy, K. Oliver, J. Osborne, M. A. Quail, M.-A. Rajandream, J. Rogers, S. Rutter, K. Seeger, J. Skelton, R. Squares, S. Squares, J. E. Sulston, K. Taylor, S. Whitehead & B. G. Barrell Nature 393, 537±544 (1998) .......................................................................................................................................................................................................................................................................... As a result of an error during ®lm output, Table 1 was published with some symbols missing. The correct version can be found at http://www.sanger.ac.uk and is reproduced again here (following pages). Also, in Fig. 2, we incorrectly labelled Rv0649 as fadD37 instead of fabD2. Two of the genes for mycolyl transferases were inverted: Rv0129c encodes antigen 85C and not 85C9 as stated, whereas Rv3803c codes for the secreted protein MPT51 and not antigen 85C (Infect. Immun. 59, 372±382; 1991); Rv3803c is now designated fbpD. We thank Morten Harboe and Harald Wiker for drawing this to our attention. The sequence of Rv0746 from M. bovis BCG-Pasteur presented in Fig. 5b was incorrect and should have shown a 16-codon deletion instead of 29, as indicated here: H37Rv.....GSGAPGGAGGAAGLWGTGGAGGAGGSSAGGGGAGGAGGAGGWLLGDGGAGGIGGAST... ..........:::::::::::::::::::: ::::::::::::::::::::: BCG.......GSGAPGGAGGAAGLWGTGGA----------------GGAGGWLLGDGGAGGIGGAST... 190 Nature © Macmillan Publishers Ltd 1998 M NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com 8 letters to nature 8 NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com Nature © Macmillan Publishers Ltd 1998 191 letters to nature 8 192 Nature © Macmillan Publishers Ltd 1998 NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com letters to nature 8 NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com Nature © Macmillan Publishers Ltd 1998 193 letters to nature 8 194 Nature © Macmillan Publishers Ltd 1998 NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com letters to nature 8 NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com Nature © Macmillan Publishers Ltd 1998 195 letters to nature 8 196 Nature © Macmillan Publishers Ltd 1998 NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com letters to nature 8 NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com Nature © Macmillan Publishers Ltd 1998 197 letters to nature 8 198 Nature © Macmillan Publishers Ltd 1998 NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com article Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence 8 S. T. Cole*, R. Brosch*, J. Parkhill, T. Garnier*, C. Churcher, D. Harris, S. V. Gordon*, K. Eiglmeier*, S. Gas*, C. E. Barry III†, F. Tekaia‡, K. Badcock, D. Basham, D. Brown, T. Chillingworth, R. Connor, R. Davies, K. Devlin, T. Feltwell, S. Gentles, N. Hamlin, S. Holroyd, T. Hornsby, K. Jagels, A. Krogh§, J. McLean, S. Moule, L. Murphy, K. Oliver, J. Osborne, M. A. Quail, M.-A. Rajandream, J. Rogers, S. Rutter, K. Seeger, J. Skelton, R. Squares, S. Squares, J. E. Sulston, K. Taylor, S. Whitehead & B. G. Barrell Sanger Centre, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK * Unité de Génétique Moléculaire Bactérienne, and ‡ Unité de Génétique Moléculaire des Levures, Institut Pasteur, 28 rue du Docteur Roux, 75724 Paris Cedex 15, France † Tuberculosis Research Unit, Laboratory of Intracellular Parasites, Rocky Mountain Laboratories, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Hamilton, Montana 59840, USA § Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark . ............ ............ ............ ........... ............ ............ ............ ........... ............ ............ ............ ........... ............ ............ ............ ........... ............ ............ ............ ............ ........... Countless millions of people have died from tuberculosis, a chronic infectious disease caused by the tubercle bacillus. The complete genome sequence of the best-characterized strain of Mycobacterium tuberculosis, H37Rv, has been determined and analysed in order to improve our understanding of the biology of this slow-growing pathogen and to help the conception of new prophylactic and therapeutic interventions. The genome comprises 4,411,529 base pairs, contains around 4,000 genes, and has a very high guanine + cytosine content that is reflected in the biased amino-acid content of the proteins. M. tuberculosis differs radically from other bacteria in that a very large portion of its coding capacity is devoted to the production of enzymes involved in lipogenesis and lipolysis, and to two new families of glycine-rich proteins with a repetitive structure that may represent a source of antigenic variation. Despite the availability of effective short-course chemotherapy (DOTS) and the Bacille Calmette-Guérin (BCG) vaccine, the tubercle bacillus continues to claim more lives than any other single infectious agent1. Recent years have seen increased incidence of tuberculosis in both developing and industrialized countries, the widespread emergence of drug-resistant strains and a deadly synergy with the human immunodeficiency virus (HIV). In 1993, the gravity of the situation led the World Health Organisation (WHO) to declare tuberculosis a global emergency in an attempt to heighten public and political awareness. Radical measures are needed now to prevent the grim predictions of the WHO becoming reality. The combination of genomics and bioinformatics has the potential to generate the information and knowledge that will enable the conception and development of new therapies and interventions needed to treat this airborne disease and to elucidate the unusual biology of its aetiological agent, Mycobacterium tuberculosis. The characteristic features of the tubercle bacillus include its slow growth, dormancy, complex cell envelope, intracellular pathogenesis and genetic homogeneity2. The generation time of M. tuberculosis, in synthetic medium or infected animals, is typically ,24 hours. This contributes to the chronic nature of the disease, imposes lengthy treatment regimens and represents a formidable obstacle for researchers. The state of dormancy in which the bacillus remains quiescent within infected tissue may reflect metabolic shutdown resulting from the action of a cell-mediated immune response that can contain but not eradicate the infection. As immunity wanes, through ageing or immune suppression, the dormant bacteria reactivate, causing an outbreak of disease often many decades after the initial infection3. The molecular basis of dormancy and reactivation remains obscure but is expected to be genetically programmed and to involve intracellular signalling pathways. The cell envelope of M. tuberculosis, a Gram-positive bacterium with a G + C-rich genome, contains an additional layer beyond the peptidoglycan that is exceptionally rich in unusual lipids, glycoliNATURE | VOL 393 | 11 JUNE 1998 pids and polysaccharides4,5. Novel biosynthetic pathways generate cell-wall components such as mycolic acids, mycocerosic acid, phenolthiocerol, lipoarabinomannan and arabinogalactan, and several of these may contribute to mycobacterial longevity, trigger inflammatory host reactions and act in pathogenesis. Little is known about the mechanisms involved in life within the macrophage, or the extent and nature of the virulence factors produced by the bacillus and their contribution to disease. It is thought that the progenitor of the M. tuberculosis complex, comprising M. tuberculosis, M. bovis, M. bovis BCG, M. africanum and M. microti, arose from a soil bacterium and that the human bacillus may have been derived from the bovine form following the domestication of cattle. The complex lacks interstrain genetic diversity, and nucleotide changes are very rare6. This is important in terms of immunity and vaccine development as most of the proteins will be identical in all strains and therefore antigenic drift will be restricted. On the basis of the systematic sequence analysis of 26 loci in a large number of independent isolates6, it was concluded that the genome of M. tuberculosis is either unusually inert or that the organism is relatively young in evolutionary terms. Since its isolation in 1905, the H37Rv strain of M. tuberculosis has found extensive, worldwide application in biomedical research because it has retained full virulence in animal models of tuberculosis, unlike some clinical isolates; it is also susceptible to drugs and amenable to genetic manipulation. An integrated map of the 4.4 megabase (Mb) circular chromosome of this slow-growing pathogen had been established previously and ordered libraries of cosmids and bacterial artificial chromosomes (BACs) were available7,8. Organization and sequence of the genome Sequence analysis. To obtain the contiguous genome sequence, a combined approach was used that involved the systematic sequence analysis of selected large-insert clones (cosmids and BACs) as well as Nature © Macmillan Publishers Ltd 1998 537 article random small-insert clones from a whole-genome shotgun library. This culminated in a composite sequence of 4,411,529 base pairs (bp) (Figs 1, 2), with a G + C content of 65.6%. This represents the second-largest bacterial genome sequence currently available (after that of Escherichia coli)9. The initiation codon for the dnaA gene, a hallmark for the origin of replication, oriC, was chosen as the start point for numbering. The genome is rich in repetitive DNA, particularly insertion sequences, and in new multigene families and duplicated housekeeping genes. The G + C content is relatively constant throughout the genome (Fig. 1) indicating that horizontally transferred pathogenicity islands of atypical base composition are probably absent. Several regions showing higher than average G + C content (Fig. 1) were detected; these correspond to sequences belonging to a large gene family that includes the polymorphic G + C-rich sequences (PGRSs). Genes for stable RNA. Fifty genes coding for functional RNA molecules were found. These molecules were the three species produced by the unique ribosomal RNA operon, the 10Sa RNA involved in degradation of proteins encoded by abnormal messenger RNA, the RNA component of RNase P, and 45 transfer RNAs. No 4.5S RNA could be detected. The rrn operon is situated unusually as it occurs about 1,500 kilobases (kb) from the putative oriC; most eubacteria have one or more rrn operons near to oriC to exploit the gene-dosage effect obtained during replication10. This arrangement may be related to the slow growth of M. tuberculosis. The genes encoding tRNAs that recognize 43 of the 61 possible sense codons were distributed throughout the genome and, with one 0 4 1 M. tuberculosis H37Rv 4,411,529 bp 3 2 exception, none of these uses A in the first position of the anticodon, indicating that extensive wobble occurs during translation. This is consistent with the high G + C content of the genome and the consequent bias in codon usage. Three genes encoding tRNAs for methionine were found; one of these genes (metV) is situated in a region that may correspond to the terminus of replication (Figs 1, 2). As metV is linked to defective genes for integrase and excisionase, perhaps it was once part of a phage or similar mobile genetic element. Insertion sequences and prophages. Sixteen copies of the promiscuous insertion sequence IS6110 and six copies of the more stable element IS1081 reside within the genome of H37Rv8. One copy of IS1081 is truncated. Scrutiny of the genomic sequence led to the identification of a further 32 different insertion sequence elements, most of which have not been described previously, and of the 13E12 family of repetitive sequences which exhibit some of the characteristics of mobile genetic elements (Fig. 1). The newly discovered insertion sequences belong mainly to the IS3 and IS256 families, although six of them define a new group. There is extensive similarity between IS1561 and IS1552 with insertion sequence elements found in Nocardia and Rhodococcus spp., suggesting that they may be widely disseminated among the actinomycetes. Most of the insertion sequences in M. tuberculosis H37Rv appear to have inserted in intergenic or non-coding regions, often near tRNA genes (Fig. 1). Many are clustered, suggesting the existence of insertional hot-spots that prevent genes from being inactivated, as has been described for Rhizobium11. The chromosomal distribution of the insertion sequences is informative as there appears to have been a selection against insertions in the quadrant encompassing oriC and an overrepresentation in the direct repeat region that contains the prototype IS6110. This bias was also observed experimentally in a transposon mutagenesis study12. At least two prophages have been detected in the genome sequence and their presence may explain why M. tuberculosis shows persistent low-level lysis in culture. Prophages phiRv1 and phiRv2 are both ,10 kb in length and are similarly organized, and some of their gene products show marked similarity to those encoded by certain bacteriophages from Streptomyces and saprophytic mycobacteria. The site of insertion of phiRv1 is intriguing as it corresponds to part of a repetitive sequence of the 13E12 family that itself appears to have integrated into the biotin operon. Some strains of M. tuberculosis have been described as requiring biotin as a growth supplement, indicating either that phiRv1 has a polar effect on expression of the distal bio genes or that aberrant excision, leading to mutation, may occur. During the serial attenuation of M. bovis that led to the vaccine strain M. bovis BCG, the phiRv1 prophage was lost13. In a systematic study of the genomic diversity of prophages and insertion sequences (S.V.G. et al., manuscript in preparation), only IS1532 exhibited significant variability, indicating that most of the prophages and insertion sequences are currently stable. However, from these combined observations, one can conclude that horizontal transfer of genetic material into the free-living ancestor of the M. tuberculosis complex probably occurred in nature before the tubercle bacillus adopted its specialized intracellular niche. 8 Figure 1 Circular map of the chromosome of M. tuberculosis H37Rv. The outer circle shows the scale in Mb, with 0 representing the origin of replication. The first ring from the exterior denotes the positions of stable RNA genes (tRNAs are blue, Figure 2 Linear map of the chromosome of M. tuberculosis H37Rv showing the others are pink) and the direct repeat region (pink cube); the second ring inwards position and orientation of known genes and coding sequences (CDS). We used shows the coding sequence by strand (clockwise, dark green; anticlockwise, light the following functional categories (adapted from ref. 20): lipid metabolism green); the third ring depicts repetitive DNA (insertion sequences, orange; 13E12 (black); intermediary metabolism and respiration (yellow); information pathways REP family, dark pink; prophage, blue); the fourth ring shows the positions of the (pink); regulatory proteins (sky blue); conserved hypothetical proteins (orange); PPE family members (green); the fifth ring shows the PE family members (purple, proteins of unknown function (light green); insertion sequences and phage- excluding PGRS); and the sixth ring shows the positions of the PGRS sequences related functions (blue); stable RNAs (purple); cell wall and cell processes (dark (dark red). The histogram (centre) represents G + C content, with ,65% G + C in green); PE and PPE protein families (magenta); virulence, detoxification and yellow, and .65% G + C in red. The figure was generated with software from adaptation (white). For additional information about gene functions, refer to DNASTAR. http://www.sanger.ac.uk. 538 Nature © Macmillan Publishers Ltd 1998 NATURE | VOL 393 | 11 JUNE 1998 Q article Genes encoding proteins. 3,924 open reading frames were identified in the genome (see Methods), accounting for ,91% of the potential coding capacity (Figs 1, 2). A few of these genes appear to have in-frame stop codons or frameshift mutations (irrespective of the source of the DNA sequenced) and may either use frameshifting during translation or correspond to pseudogenes. Consistent with the high G + C content of the genome, GTG initiation codons (35%) are used more frequently than in Bacillus subtilis (9%) and E. coli (14%), although ATG (61%) is the most common translational start. There are a few examples of atypical initiation codons, the most notable being the ATC used by infC, which begins with ATT in both B. subtilis and E. coli9,14. There is a slight bias in the orientation of the genes (Fig. 1) with respect to the direction of replication as ,59% are transcribed with the same polarity as replication, compared with 75% in B. subtilis. In other bacteria, genes transcribed in the same direction as the replication forks are believed to be expressed more efficiently9,14. Again, the more even distribution in gene polarity seen in M. tuberculosis may reflect the slow growth and infrequent replication cycles. Three genes (dnaB, recA and Rv1461) have been invaded by sequences encoding inteins (protein introns) and in all three cases their counterparts in M. leprae also contain inteins, but at different sites15 (S.T.C. et al., unpublished observations). Protein function, composition and duplication. By using various database comparisons, we attributed precise functions to ,40% of the predicted proteins and found some information or similarity for another 44%. The remaining 16% resembled no known proteins and may account for specific mycobacterial functions. Examination of the amino-acid composition of the M. tuberculosis proteome by correspondence analysis16, and comparison with that of other microorganisms whose genome sequences are available, revealed a statistically significant preference for the amino acids Ala, Gly, Pro, Arg and Trp, which are all encoded by G + C-rich codons, and a comparative reduction in the use of amino acids encoded by A + Trich codons such as Asn, Ile, Lys, Phe and Tyr (Fig. 3). This approach also identified two groups of proteins rich in Asn or Gly that belong to new families, PE and PPE (see below). The fraction of the proteome that has arisen through gene duplication is similar to that seen in E. coli or B. subtilis (,51%; refs 9, 14), except that the level of sequence conservation is considerably higher, indicating that there may be extensive redundancy or differential production of the corresponding polypeptides. The apparent lack of divergence following gene duplication is consistent with the hypothesis that M. tuberculosis is of recent descent6. General metabolism, regulation and drug resistance Metabolic pathways. From the genome sequence, it is clear that the tubercle bacillus has the potential to synthesize all the essential amino acids, vitamins and enzyme co-factors, although some of the pathways involved may differ from those found in other bacteria. M. tuberculosis can metabolize a variety of carbohydrates, hydrocarbons, alcohols, ketones and carboxylic acids2,17. It is apparent from genome inspection that, in addition to many functions involved in lipid metabolism, the enzymes necessary for glycolysis, the pentose phosphate pathway, and the tricarboxylic acid and glyoxylate cycles are all present. A large number (,200) of oxidoreductases, oxygenases and dehydrogenases is predicted, as well as many oxygenases containing cytochrome P450, that are similar to fungal proteins involved in sterol degradation. Under aerobic growth conditions, ATP will be generated by oxidative phosphorylation from electron transport chains involving a ubiquinone cytochrome b reductase complex and cytochrome c oxidase. Components of several anaerobic phosphorylative electron transport chains are also present, including genes for nitrate reductase (narGHJI), fumarate reductase (frdABCD) and possibly nitrite reductase (nirBD), as well as a new reductase (narX) that results from a rearrangement of a homologue of the narGHJI operon. Two genes encoding haemoglobin-like NATURE | VOL 393 | 11 JUNE 1998 proteins, which may protect against oxidative stress or be involved in oxygen capture, were found. The ability of the bacillus to adapt its metabolism to environmental change is significant as it not only has to compete with the lung for oxygen but must also adapt to the microaerophilic/anaerobic environment at the heart of the burgeoning granuloma. Regulation and signal transduction. Given the complexity of the environmental and metabolic choices facing M. tuberculosis, an extensive regulatory repertoire was expected. Thirteen putative sigma factors govern gene expression at the level of transcription initiation, and more than 100 regulatory proteins are predicted (Table 1). Unlike B. subtilis and E. coli, in which there are .30 copies of different two-component regulatory systems14, M. tuberculosis has only 11 complete pairs of sensor histidine kinases and response regulators, and a few isolated kinase and regulatory genes. This relative paucity in environmental signal transduction pathways is probably offset by the presence of a family of eukaryotic-like serine/ threonine protein kinases (STPKs), which function as part of a phosphorelay system18. The STPKs probably have two domains: the well-conserved kinase domain at the amino terminus is predicted to be connected by a transmembrane segment to the carboxy-terminal region that may respond to specific stimuli. Several of the predicted envelope lipoproteins, such as that encoded by lppR (Rv2403), show extensive similarity to this putative receptor domain of STPKs, suggesting possible interplay. The STPKs probably function in signal transduction pathways and may govern important cellular decisions such as dormancy and cell division, and although their partners are unknown, candidate genes for phosphoprotein phosphatases have been identified. Drug resistance. M. tuberculosis is naturally resistant to many antibiotics, making treatment difficult19. This resistance is due mainly to the highly hydrophobic cell envelope acting as a permeability barrier4, but many potential resistance determinants are also encoded in the genome. These include hydrolytic or drug-modifying enzymes such as b-lactamases and aminoglycoside acetyl transferases, and many potential drug–efflux systems, such as 14 members of the major facilitator family and numerous ABC transporters. Knowledge of these putative resistance mechanisms will promote better use of existing drugs and facilitate the conception of new therapies. 8 F2 -22.6% Glu Mj 0.1 Tyr Gly Val Ile Lys 0 Af Mth Ae Bs Bb Arg Met Mt Cys Asp Hp Phe -0.1 Leu Sc Ser Pro Hi Ce His Ec Ss Thr Mg Ala Trp Mp Asn -0.2 -0.3 Gln -0.30 -0.15 0 0.15 0.30 F1 - 55.2% Figure 3 Correspondence analysis of the proteomes from extensively sequenced organisms as a function of amino-acid composition. Note the extreme position of M. tuberculosis and the shift in amino-acid preference reflecting increasing G + C content from left to right. Abbreviations used: Ae, Aquifex aeolicus; Af, Archaeoglobus fulgidis; Bb, Borrelia burgdorfei; Bs, B. subtilis; Ce, Caenorhabditis elegans; Ec, E. coli; Hi, Haemophilus influenzae; Hp, Helicobacter pylori; Mg, Mycoplasma genitalium; Mj, Methanococcus jannaschi; Mp, Mycoplasma pneumoniae; Mt, M. tuberculosis; Mth, Methanobacterium thermoautotrophicum; Sc, Saccharomyces cerevisiae; Ss, Synechocystis sp. strain PCC6803. F1 and F2, first and second factorial axes16. Nature © Macmillan Publishers Ltd 1998 539 article Figure 4 Lipid metabolism. a, Degradation of a OH host-cell lipids is vital in the intracellular life of O R SCoA echA1-21 R SCoA β-Oxidation (fadA/fadB) fadE1-36 O Citric acid cycle O O R accA1/accD1 accA2/accD2 O fadD1-36 as well as potential precursors of mycobac- Purines, Pyrimidines Glyoxylate shunt SCoA precursors for many metabolic processes, NADH, FADH 2 SCoA SCoA fadA2-6 O R ATP fadB2-5 O M. tuberculosis. Host-cell membranes provide terial cell-wall constituents, through the actions of a broad family of b-oxidative Porphyrins enzymes encoded by multiple copies in the Amino acids genome. These enzymes produce acetyl Glucose CoA, which can be converted into many different metabolites and fuel for the bacteria S Co A O through the actions of the enzymes of the CO2 R citric acid cycle and the glyoxylate shunt of OH this cycle. b, The genes that synthesize Cell wall and mycobacterial lipids mycolic acids, the dominant lipid component of the mycobacterial cell wall, include the type I fatty acid synthase (fas) and a unique Host membranes type II system which relies on extension of a precursor bound to an acyl carrier protein to b form full-length (,80-carbon) mycolic acids. The H OH O H OH fas cma genes are responsible for cyclopropanation. c, The genes that produce phthiocerol dimycocerosate form a large operon and represent type I (mas) and type II (the pps operon) polyketide synthase systems. Functions are colour coordinated. 0 1 fabD mabA 2 acpM 3 kasA kasB 4 5 6 7 8 (kb) accD inhA cmaA1 Gene Function fas Fatty acid synthase, produces C16 -C 26 acyl-CoA esters Malonyl-CoA:AcpM acyltransferase, AcpM loading Acyl carrier protein, meromycolate precursor transport Ketoacyl acyl carrier protein synthase, chain elongation Ketoacyl acyl carrier protein synthase, chain elongation Acetyl-CoA carboxylase, malonyl-CoA synthesis 3-oxo-acyl-acyl carrier protein reductase, reduces KasA/B product Enoyl-acyl carrier protein reductase cyclopropane mycolic acid synthase 1, distal-position specific cyclopropane mycolic acid synthase 2, proximal-position specific fabD acpM kasA kasB accD mabA inhA cmaA1 cmaA2 cmaA2 c CH 3 CH 3 O ppsA ppsB ppsC ppsD ppsE CH 3 CH 3 mas CH 3 CH 3 0 10 20 30 O OCH 3 CH 3 CH 3 CH 3 CH 3 40 (kb) Gene Function mas Four rounds extension of C 16,18 using Me-malonyl CoA ppsA Extension of C 18 with malony CoA, partial reduction ppsB Extension with malony CoA, partial reduction ppsC Extension with malony CoA, complete reduction ppsD Extension with Me-malony CoA, partial reduction ppsE Extension with malony CoA, partial reduction, decarboxylation 540 O O Nature © Macmillan Publishers Ltd 1998 NATURE | VOL 393 | 11 JUNE 1998 8 article Lipid metabolism Very few organisms produce such a diverse array of lipophilic molecules as M. tuberculosis. These molecules range from simple fatty acids such as palmitate and tuberculostearate, through isoprenoids, to very-long-chain, highly complex molecules such as mycolic acids and the phenolphthiocerol alcohols that esterify with mycocerosic acid to form the scaffold for attachment of the mycosides. Mycobacteria contain examples of every known lipid and polyketide biosynthetic system, including enzymes usually found in mammals and plants as well as the common bacterial systems. The biosynthetic capacity is overshadowed by the even more remarkable radiation of degradative, fatty acid oxidation systems and, in total, there are ,250 distinct enzymes involved in fatty acid metabolism in M. tuberculosis compared with only 50 in E. coli20. Fatty acid degradation. In vivo-grown mycobacteria have been suggested to be largely lipolytic, rather than lipogenic, because of the variety and quantity of lipids available within mammalian cells and the tubercle2 (Fig. 4a). The abundance of genes encoding components of fatty acid oxidation systems found by our genomic approach supports this proposition, as there are 36 acyl-CoA synthases and a family of 36 related enzymes that could catalyse the first step in fatty acid degradation. There are 21 homologous enzymes belonging to the enoyl-CoA hydratase/isomerase superfamily of enzymes, which rehydrate the nascent product of the acylCoA dehydrogenase. The four enzymes that convert the 3-hydroxy fatty acid into a 3-keto fatty acid appear less numerous, mainly a Representatives of the PE family PE ~110 aa PE ~110 aa PGRS (GGAGGA) n 0 .. >1,400 aa unique sequence 0 .. ~500 aa Representatives of the PPE family PPE ~180 aa PPE ~180 aa PPE ~180 aa MPTR (NxGxGNxG) n ~200 .. >3,500 aa GxxSVPxxW ~200 .. ~400 aa unique sequence 0 .. ~400 aa b PE-PGRS sequence variation in ORF Rv0746 H37Rv PE 29 aa PGRS BCG PE PGRS 46 aa H37Rv .. ...GSGAPGGAGGAAGLWGTGGAGGAGGSSAGGGGAGGAGGAGGWLLGDGGAGGIGGAST .......... ::::::::::::::::: ............................. ::::::::::: ...GSGAPGGAGGAAGLWGT-----------------------------GGAGGIGGAST BCG .... H37Rv .. VLGGTGGGGGVGGLWGAGGAGGAGGTGLVGGDGGAGGAGGTGGLLAGLIGAGGGHGGTGG :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: VLGGTGGGGGVGGLWGAGGAGGAGGTGLVGGDGGAGGAGGTGGLLAGLIGAGGGHGGTGG BCG .... ....... H37Rv .. LSTNGDGGVGGAGGNAGMLAGPGGAGGAGGDGENLDTGGDGGAGGSAGLLFGSGGAGGAG ....... :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: LSTNGDGGVGGAGGNAGMLAGPGGAGGAGGDGENLDTGGDGGAGGSAGLLFGSGGAGGAG BCG .... H37Rv .. GFGFLGGDGGAGGNAGLLLSSGGAGGFGGFGTAGGVGGAGGNAGWLGF-----------:::::::::::::::::::::::::::::::::::::::::::::::: GFGFLGGDGGAGGNAGLLLSSGGAGGFGGFGTAGGVGGAGGNAGWLGFGGAGGIGGIGGN BCG .... ....... H37Rv .. ----------------------------------GGAGGVGGSAGLIGTGGNGGNGGTGA .......:::::::::::::::::::::::::::::::::: :::::::::::::::::::::::::: BCG .... ANGGAGGNGGTGGQLWGSGGAGGEGGAALSVGDTGGAGGVGGSAGLIGTGGNGGNGGTGA H37Rv .. NAGSPGTGGAGGLLLGQNGLNGLP* :::::::::::::::::::::::: NAGSPGTGGAGGLLLGQNGLNGLP* BCG .... ....... Figure 5 The PE and PPE protein families. a, Classification of the PE and PPE protein families. b, Sequence variation between M. tuberculosis H37Rv and M. bovis BCG-Pasteur in the PE-PGRS encoded by open reading frame (ORF) Rv0746. NATURE | VOL 393 | 11 JUNE 1998 because they are difficult to distinguish from other members of the short-chain alcohol dehydrogenase family on the basis of primary sequence. The five enzymes that complete the cycle by thiolysis of the b-ketoester, the acetyl-CoA C-acetyltransferases, do indeed appear to be a more limited family. In addition to this extensive set of dissociated degradative enzymes, the genome also encodes the canonical FadA/FadB b-oxidation complex (Rv0859 and Rv0860). Accessory activities are present for the metabolism of odd-chain and multiply unsaturated fatty acids. Fatty acid biosynthesis. At least two discrete types of enzyme system, fatty acid synthase (FAS) I and FAS II, are involved in fatty acid biosynthesis in mycobacteria (Fig. 4b). FAS I (Rv2524, fas) is a single polypeptide with multiple catalytic activities that generates several shorter CoA esters from acetyl-CoA primers5 and probably creates precursors for elongation by all of the other fatty acid and polyketide systems. FAS II consists of dissociable enzyme components which act on a substrate bound to an acyl-carrier protein (ACP). FAS II is incapable of de novo fatty acid synthesis but instead elongates palmitoyl-ACP to fatty acids ranging from 24 to 56 carbons in length17,21. Several different components of FAS II may be targets for the important tuberculosis drug isoniazid, including the enoyl-ACP reductase InhA22, the ketoacyl-ACP synthase KasA and the ACP AcpM21. Analysis of the genome shows that there are only three potential ketoacyl synthases: KasA and KasB are highly related, and their genes cluster with acpM, whereas KasC is a more distant homologue of a ketoacyl synthase III system. The number of ketoacyl synthase and ACP genes indicates that there is a single FAS II system. Its genetic organization, with two clustered ketoacyl synthases, resembles that of type II aromatic polyketide biosynthetic gene clusters, such as those for actinorhodin, tetracycline and tetracenomycin in Streptomyces species23. InhA seems to be the sole enoyl-ACP reductase and its gene is co-transcribed with a fabG homologue, which encodes 3-oxoacyl-ACP reductase. Both of these proteins are probably important in the biosynthesis of mycolic acids. Fatty acids are synthesized from malonyl-CoA and precursors are generated by the enzymatic carboxylation of acetyl (or propionyl)CoA by a biotin-dependent carboxylase (Fig. 4b). From study of the genome we predict that there are three complete carboxylase systems, each consisting of an a- and a b-subunit, as well as three b-subunits without an a-counterpart. As a group, all of the carboxylases seem to be more related to the mammalian homologues than to the corresponding bacterial enzymes. Two of these carboxylase systems (accA1, accD1 and accA2, accD2) are probably involved in degradation of odd-numbered fatty acids, as they are adjacent to genes for other known degradative enzymes. They may convert propionyl-CoA to succinyl-CoA, which can then be incorporated into the tricarboxylic acid cycle. The synthetic carboxylases (accA3, accD3, accD4, accD5 and accD6) are more difficult to understand. The three extra b-subunits might direct carboxylation to the appropriate precursor or may simply increase the total amount of carboxylated precursor available if this step were ratelimiting. Synthesis of the paraffinic backbone of fatty and mycolic acids in the cell is followed by extensive postsynthetic modifications and unsaturations, particularly in the case of the mycolic acids24,25. Unsaturation is catalysed either by a FabA-like b-hydroxyacylACP dehydrase, acting with a specific ketoacyl synthase, or by an aerobic terminal mixed function desaturase that uses both molecular oxygen and NADPH. Inspection of the genome revealed no obvious candidates for the FabA-like activity. However, three potential aerobic desaturases (encoded by desA1, desA2 and desA3) were evident that show little similarity to related vertebrate or yeast enzymes (which act on CoA esters) but instead resemble plant desaturases (which use ACP esters). Consequently, the genomic data indicate that unsaturation of the meromycolate chain may occur while the acyl group is bound to AcpM. Much of the subsequent structural diversity in mycolic acids is Nature © Macmillan Publishers Ltd 1998 8 541 article generated by a family of S-adenosyl-L-methionine-dependent enzymes, which use the unsaturated meromycolic acid as a substrate to generate cis and trans cyclopropanes and other mycolates. Six members of this family have been identified and characterized25 and two clustered, convergently transcribed new genes are evident in the genome (umaA1 and umaA2). From the functions of the known family members and the structures of mycolic acids in M. tuberculosis, it is tempting to speculate that these new enzymes may introduce the trans cyclopropanes into the meromycolate precursor. In addition to these two methyltransferases, there are two other unrelated lipid methyltransferases (Ufa1 and Ufa2) that share homology with cyclopropane fatty acid synthase of E. coli25. Although cyclopropanation seems to be a relatively common modification of mycolic acids, cyclopropanation of plasma-membrane constituents has not been described in mycobacteria. Tuberculostearic acid is produced by methylation of oleic acid, and may be synthesized by one of these two enzymes. Condensation of the fully functionalized and preformed meromycolate chain with a 26-carbon a-branch generates full-length mycolic acids that must be transported to their final location for attachment to the cell-wall arabinogalactan. The transfer and subsequent transesterification is mediated by three well-known immunogenic proteins of the antigen 85 complex26. The genome encodes a fourth member of this complex, antigen 85C9 ( fbpC2, Rv0129), which is highly related to antigen 85C. Further studies are needed to show whether the protein possesses mycolytransferase activity and to clarify the reason behind the apparent redundancy. Polyketide synthesis. Mycobacteria synthesize polyketides by several different mechanisms. A modular type I system, similar to that involved in erythromycin biosynthesis23, is encoded by a very large operon, ppsABCDE, and functions in the production of phenolphthiocerol5. The absence of a second type I polyketide synthase suggests that the related lipids phthiocerol A and B, phthiodiolone A and phthiotriol may all be synthesized by the same system, either from alternative primers or by differential postsynthetic modification. It is physiologically significant that the pps gene cluster occurs immediately upstream of mas, which encodes the multifunctional enzyme mycocerosic acid synthase (MAS), as their products phthiocerol and mycocerosic acid esterify to form the very abundant cell-wall-associated molecule phthiocerol dimycocerosate (Fig. 4c). Members of another large group of polyketide synthase enzymes are similar to MAS, which also generates the multiply methylbranched fatty acid components of mycosides and phthiocerol dimycocerosate, abundant cell-wall-associated molecules5. Although some of these polyketide synthases may extend type I FAS CoA primers to produce other long-chain methyl-branched fatty acids such as mycolipenic, mycolipodienic and mycolipanolic acids or the phthioceranic and hydroxyphthioceranic acids, or may even show functional overlap5, there are many more of these enzymes than there are known metabolites. Thus there may be new lipid and polyketide metabolites that are expressed only under certain conditions, such as during infection and disease. A fourth class of polyketide synthases is related to the plant enzyme superfamily that includes chalcone and stilbene synthase23. These polyketide synthases are phylogenetically divergent from all other polyketide and fatty acid synthases and generate unreduced polyketides that are typically associated with anthocyanin pigments and flavonoids. The function of these systems, which are often linked to apparent type I modules, is unknown. An example is the gene cluster spanning pks10, pks7, pks8 and pks9, which includes two of the chalcone-synthase-like enzymes and two modules of an apparent type I system. The unknown metabolites produced by these enzymes are interesting because of the potent biological activities of some polyketides such as the immunosuppressor rapamycin. Siderophores. Peptides that are not ribosomally synthesized are 542 made by a process that is mechanistically analogous to polyketide synthesis23,27. These peptides include the structurally related ironscavenging siderophores, the mycobactins and the exochelins2,28, which are derived from salicylate by the addition of serine (or threonine), two lysines and various fatty acids and possible polyketide segments. The mbt operon, encoding one apparent salicylateactivating protein, three amino-acid ligases, and a single module of a type I polyketide synthase, may be responsible for the biosynthesis of the mycobacterial siderophores. The presence of only one nonribosomal peptide-synthesis system indicates that this pathway may generate both siderophores and that subsequent modification of a single e-amino group of one lysine residue may account for the different physical properties and function of the siderophores28. Immunological aspects and pathogenicity Given the scale of the global tuberculosis burden, vaccination is not only a priority but remains the only realistic public health intervention that is likely to affect both the incidence and the prevalence of the disease29. Several areas of vaccine development are promising, including DNA vaccination, use of secreted or surface-exposed proteins as immunogens, recombinant forms of BCG and rational attenuation of M. tuberculosis29. All of these avenues of research will benefit from the genome sequence as its availability will stimulate more focused approaches. Genes encoding ,90 lipoproteins were identified, some of which are enzymes or components of transport systems, and a similar number of genes encoding preproteins (with type I signal peptides) that are probably exported by the Secdependent pathway. M. tuberculosis seems to have two copies of secA. The potent T-cell antigen Esat-6 (ref. 30), which is probably secreted in a Sec-independent manner, is encoded by a member of a multigene family. Examination of the genetic context reveals several similarly organized operons that include genes encoding large ATPhydrolysing membrane proteins that might act as transporters. One of the surprises of the genome project was the discovery of two extensive families of novel glycine-rich proteins, which may be of immunological significance as they are predicted to be abundant and potentially polymorphic antigens. The PE and PPE multigene families. About 10% of the coding capacity of the genome is devoted to two large unrelated families of acidic, glycine-rich proteins, the PE and PPE families, whose genes are clustered (Figs 1, 2) and are often based on multiple copies of the polymorphic repetitive sequences referred to as PGRSs, and major polymorphic tandem repeats (MPTRs), respectively31,32. The names PE and PPE derive from the motifs Pro–Glu (PE) and Pro–Pro– Glu (PPE) found near the N terminus in most cases33. The 99 members of the PE protein family all have a highly conserved Nterminal domain of ,110 amino-acid residues that is predicted to have a globular structure, followed by a C-terminal segment that varies in size, sequence and repeat copy number (Fig. 5). Phylogenetic analysis separated the PE family into several subfamilies. The largest of these is the highly repetitive PGRS class, which contains 61 members; members of the other subfamilies, share very limited sequence similarity in their C-terminal domains (Fig. 5). The predicted molecular weights of the PE proteins vary considerably as a few members contain only the N-terminal domain, whereas most have C-terminal extensions ranging in size from 100 to 1,400 residues. The PGRS proteins have a high glycine content (up to 50%), which is the result of multiple tandem repetitions of Gly– Gly–Ala or Gly–Gly–Asn motifs, or variations thereof. The 68 members of the PPE protein family (Fig. 5) also have a conserved N-terminal domain that comprises ,180 amino-acid residues, followed by C-terminal segments that vary markedly in sequence and length. These proteins fall into at least three groups, one of which constitutes the MPTR class characterized by the presence of multiple, tandem copies of the motif Asn–X–Gly–X– Gly–Asn–X–Gly. The second subgroup contains a characteristic, well-conserved motif around position 350, whereas the third contains Nature © Macmillan Publishers Ltd 1998 NATURE | VOL 393 | 11 JUNE 1998 8 article proteins that are unrelated except for the presence of the common 180-residue PPE domain. The subcellular location of the PE and PPE proteins is unknown and in only one case, that of a lipase (Rv3097), has a function been demonstrated. On examination of the protein database from the extensively sequenced M. leprae15, no PGRS- or MPTR-related polypeptides were detected but a few proteins belonging to the non-MPTR subgroup of the PPE family were found. These proteins include one of the major antigens recognized by leprosy patients, the serine-rich antigen34. Although it is too early to attribute biological functions to the PE and PPE families, it is tempting to speculate that they could be of immunological importance. Two interesting possibilities spring to mind. First, they could represent the principal source of antigenic variation in what is otherwise a genetically and antigenically homogeneous bacterium. Second, these glycine-rich proteins might interfere with immune responses by inhibiting antigen processing. Several observations and results support the possibility of antigenic variation associated with both the PE and the PPE family proteins. The PGRS member Rv1759 is a fibronectin-binding protein of relative molecular mass 55,000 (ref. 35) that elicits a variable antibody response, indicating either that individuals mount different immune responses or that this PGRS protein may vary between strains of M. tuberculosis. The latter possibility is supported by restriction fragment length polymorphisms for various PGRS and MPTR sequences in clinical isolates33. Direct support for genetic variation within both the PE and the PPE families was obtained by comparative DNA sequence analysis (Fig. 5). The gene for the PE–PGRS protein Rv0746 of BCG differs from that in H37Rv by the deletion of 29 codons and the insertion of 46 codons. Similar variation was seen in the gene for the PPE protein Rv0442 (data not shown). As these differences were all associated with repetitive sequences they could have resulted from intergenic or intragenic recombinational events or, more probably, from strand slippage during replication32. These mechanisms are known to generate antigenic variability in other bacterial pathogens36. There are several parallels between the PGRS proteins and the Epstein–Barr virus nuclear antigens (EBNAs). Members of both polypeptide families are glycine-rich, contain extensive Gly–Ala repeats, and exhibit variation in the length of the repeat region between different isolates. The Gly–Ala repeat region of EBNA1 functions as a cis-acting inhibitor of the ubiquitin/proteasome antigen-processing pathway that generates peptides presented in the context of major histocompatibility complex (MHC) class I molecules37,38. MHC class I knockout mice are very susceptible to M. tuberculosis, underlining the importance of a cytotoxic T-cell response in protection against disease3,39. Given the many potential effects of the PPE and PE proteins, it is important that further studies are performed to understand their activity. If extensive antigenic variability or reduced antigen presentation were indeed found, this would be significant for vaccine design and for understanding protective immunity in tuberculosis, and might even explain the varied responses seen in different BCG vaccination programmes40. Pathogenicity. Despite intensive research efforts, there is little information about the molecular basis of mycobacterial virulence41. However, this situation should now change as the genome sequence will accelerate the study of pathogenesis as never before, because other bacterial factors that may contribute to virulence are becoming apparent. Before the completion of the genome sequence, only three virulence factors had been described41: catalase-peroxidase, which protects against reactive oxygen species produced by the phagocyte; mce, which encodes macrophage-colonizing factor42; and a sigma factor gene, sigA (aka rpoV), mutations in which can lead to attenuation41. In addition to these single-gene virulence factors, the mycobacterial cell wall4 is also important in pathology, NATURE | VOL 393 | 11 JUNE 1998 but the complex nature of its biosynthesis makes it difficult to identify critical genes whose inactivation would lead to attenuation. On inspection of the genome sequence, it was apparent that four copies of mce were present and that these were all situated in operons, comprising eight genes, organized in exactly the same manner. In each case, the genes preceding mce code for integral membrane proteins, whereas mce and the following five genes are all predicted to encode proteins with signal sequences or hydrophobic stretches at the N terminus. These sets of proteins, about which little is known, may well be secreted or surface-exposed; this is consistent with the proposed role of Mce in invasion of host cells42. Furthermore, a homologue of smpB, which has been implicated in intracellular survival of Salmonella typhimurium, has also been identified43. Among the other secreted proteins identified from the genome sequence that could act as virulence factors are a series of phospholipases C, lipases and esterases, which might attack cellular or vacuolar membranes, as well as several proteases. One of these phospholipases acts as a contact-dependent haemolysin (N. Stoker, personal communication). The presence of storage proteins in the bacillus, such as the haemoglobin-like oxygen captors described above, points to its ability to stockpile essential growth factors, allowing it to persist in the nutrient-limited environment of the phagosome. In this regard, the ferritin-like proteins, encoded by bfrA and bfrB, may be important in intracellular survival as the capacity to acquire enough iron in the vacuole is very M limited. 8 ......................................................................................................................... Methods Sequence analysis. Initially, ,3.2 Mb of sequence was generated from cosmids8 and the remainder was obtained from selected BAC clones7 and 45,000 whole-genome shotgun clones. Sheared fragments (1.4–2.0 kb) from cosmids and BACs were cloned into M13 vectors, whereas genomic DNA was cloned in pUC18 to obtain both forward and reverse reads. The PGRS genes were grossly underrepresented in pUC18 but better covered in the BAC and cosmid M13 libraries. We used small-insert libraries44 to sequence regions prone to compression or deletion and, in some cases, obtained sequences from products of the polymerase chain reaction or directly from BACs7. All shotgun sequencing was performed with standard dye terminators to minimize compression problems, whereas finishing reactions used dRhodamine or BigDye terminators (http://www.sanger.ac.uk). Problem areas were verified by using dye primers. Thirty differences were found between the genomic shotgun sequences and the cosmids; twenty of which were due to sequencing errors and ten to mutations in cosmids (1 error per 320 kb). Less than 0.1% of the sequence was from areas of single-clone coverage, and ,0.2% was from one strand with only one sequencing chemistry. Informatics. Sequence assembly involved PHRAP, GAP4 (ref. 45) and a customized perl script that merges sequences from different libraries and generates segments that can be processed by several finishers simultaneously. Sequence analysis and annotation was managed by DIANA (B.G.B. et al., unpublished). Genes encoding proteins were identified by TB-parse46 using a hidden Markov model trained on known M. tuberculosis coding and noncoding regions and translation-initiation signals, with corroboration by positional base preference. Interrogation of the EMBL, TREMBL, SwissProt, PROSITE47 and in-house databases involved BLASTN, BLASTX48, DOTTER (http://www.sanger.ac.uk) and FASTA49. tRNA genes were located and identified using tRNAscan and tRNAscan-SE50. The complete sequence, a list of annotated cosmids and linking regions can be found on our website (http:// www. sanger.ac.uk) and in MycDB (http://www.pasteur.fr/mycdb/). Received 15 April; accepted 8 May 1998. 1. Snider, D. E. Jr, Raviglione, M. & Kochi, A. in Tuberculosis: Pathogenesis, Protection, and Control (ed. Bloom, B. R.) 2–11 (Am. Soc. Microbiol., Washington DC, 1994). 2. Wheeler, P. R. & Ratledge, C. in Tuberculosis: Pathogenesis, Protection, and Control (ed. Bloom, B. R.) 353–385 (Am. Soc. Microbiol., Washington DC, 1994). 3. Chan, J. & Kaufmann, S. H. E. in Tuberculosis: Pathogenesis, Protection, and Control (ed. Bloom, B. R.) 271–284 (Am. Soc. Microbiol., Washington DC, 1994). 4. Brennan, P. J. & Draper, P. in Tuberculosis: Pathogenesis, Protection, and Control (ed. Bloom, B. R.) 271–284 (Am. Soc. Microbiol., Washington DC, 1994). 5. Kolattukudy, P. E., Fernandes, N. D., Azad, A. K., Fitzmaurice, A. M. & Sirakova, T. D. Biochemistry Nature © Macmillan Publishers Ltd 1998 543 article and molecular genetics of cell-wall lipid biosynthesis in mycobacteria. Mol. Microbiol. 24, 263–270 (1997). 6. Sreevatsan, S. et al. Restricted structural gene polymorphism in the Mycobacterium tuberculosis complex indicates evolutionarily recent global dissemination. Proc. Natl Acad. Sci. USA 94, 9869– 9874 (1997). 7. Brosch, R. et al. Use of a Mycobacterium tuberculosis H37Rv bacterial artificial chromosome library for genome mapping, sequencing and comparative genomics. Infect. Immun. 66, 2221–2229 (1998). 8. Philipp, W. J. et al. An integrated map of the genome of the tubercle bacillus, Mycobacterium tuberculosis H37Rv, and comparison with Mycobacterium leprae. Proc. Natl Acad. Sci. USA 93, 3132– 3137 (1996). 9. Blattner, F. R. et al. The complete genome sequence of Escherichia coli K-12. Science 277, 1453–1462 (1997). 10. Cole, S. T. & Saint-Girons, I. Bacterial genomics. FEMS Microbiol. Rev. 14, 139–160 (1994). 11. Freiberg, C. et al. Molecular basis of symbiosis between Rhizobium and legumes. Nature 387, 394–401 (1997). 12. Bardarov, S. et al. Conditionally replicating mycobacteriophages: a system for transposon delivery to Mycobacterium tuberculosis. Proc. Natl Acad. Sci. USA 94, 10961–10966 (1997). 13. Mahairas, G. G., Sabo, P. J., Hickey, M. J., Singh, D. C. & Stover, C. K. Molecular analysis of genetic differences between Mycobacterium bovis BCG and virulent M. bovis. J. Bacteriol. 178, 1274–1282 (1996). 14. Kunst, F. et al. The complete genome sequence of the gram-positive bacterium Bacillus subtilis. Nature 390, 249–256 (1997). 15. Smith, D. R. et al. Multiplex sequencing of 1.5 Mb of the Mycobacterium leprae genome. Genome Res. 7, 802–819 (1997). 16. Greenacre, M. Theory and Application of Correspondence Analysis (Academic, London, 1984). 17. Ratledge, C. R. in The Biology of the Mycobacteria (eds Ratledge, C. & Stanford, J.) 53–94 (Academic, San Diego, 1982). 18. Av-Gay, Y. & Davies, J. Components of eukaryotic-like protein signaling pathways in Mycobacterium tuberculosis. Microb. Comp. Genomics 2, 63–73 (1997). 19. Cole, S. T. & Telenti, A. Drug resistance in Mycobacterium tuberculosis. Eur. Resp. Rev. 8, 701S–713S (1995). 20. Riley, M. & Labedan, B. in Escherichia coli and Salmonella (ed. Neidhardt, F. C.) 2118–2202 (ASM, Washington, 1996). 21. Mdluli, K. et al. Inhibition of a Mycobacterium tuberculosis b-ketoacyl ACP synthase by isoniazid. Science 280, 1607–1610 (1998). 22. Banerjee, A. et al. inhA, a gene encoding a target for isoniazid and ethionamide in Mycobacterium tuberculosis. Science 263, 227–230 (1994). 23. Hopwood, D. A. Genetic contributions to understanding polyketide synthases. Chem. Rev. 97, 2465– 2497 (1997). 24. Minnikin, D. E. in The Biology of the Mycobacteria (eds Ratledge, C. & Stanford, J.) 95–184 (Academic, London, 1982). 25. Barry, C. E. III et al. Mycolic acids: structure, biosynthesis, and phsyiological functions. Prog. Lipid Res. (in the press). 26. Belisle, J. T. et al. Role of the major antigen of Mycobacterium tuberculosis in cell wall biogenesis. Science 276, 1420–1422 (1997). 27. Marahiel, M. A., Stachelhaus, T. & Mootz, H. D. Modular peptide synthetases involved in nonribosomal peptide synthesis. Chem. Rev. 97, 2651–2673 (1997). 28. Gobin, J. et al. Iron acquisition by Mycobacterium tuberculosis: isolation and characterization of a family of iron-binding exochelins. Proc. Natl Acad. Sci. USA 92, 5189–5193 (1995). 29. Young, D. B. & Fruth, U. in New Generation Vaccines (eds Levine, M., Woodrow, G., Kaper, J. & Cobon, G. S.) 631–645 (Marcel Dekker, New York, 1997). 30. Sorensen, A. L., Nagai, S., Houen, G., Andersen, P. & Anderson, A. B. Purification and characterization of a low-molecular-mass T-cell antigen secreted by Mycobacterium tuberculosis. Infect. Immun. 63, 544 1710–1717 (1995). 31. Hermans, P. W. M., van Soolingen, D. & van Embden, J. D. A. Characterization of a major polymorphic tandem repeat in Mycobacterium tuberculosis and its potential use in the epidemiology of Mycobacterium kansasii and Mycobacterium gordonae. J. Bacteriol. 174, 4157–4165 (1992). 32. Poulet, S. & Cole, S. T. Characterisation of the polymorphic GC-rich repetitive sequence (PGRS) present in Mycobacterium tuberculosis. Arch. Microbiol. 163, 87–95 (1995). 33. Cole, S. T. & Barrell, B. G. in Genetics and Tuberculosis (eds Chadwick, D. J. & Cardew, G., Novartis Foundation Symp. 217) 160–172 (Wiley, Chichester, 1998). 34. Vega-Lopez, F. et al. Sequence and immunological characterization of a serine-rich antigen from Mycobacterium leprae. Infect. Immun. 61, 2145–2153 (1993). 35. Abou-Zeid, C. et al. Genetic and immunological analysis of Mycobacterium tuberculosis fibronectinbinding proteins. Infect. Immun. 59, 2712–2718 (1991). 36. Robertson, B. D. & Meyer, T. F. Genetic variation in pathogenic bacteria. Trends Genet. 8, 422–427 (1992). 37. Levitskaya, J. et al. Inhibition of antigen processing by the internal repeat region of the Epstein-Barr virus nuclear antigen-1. Nature 375, 685–688 (1995). 38. Levitskaya, J., Sharipo, A., Leonchiks, A., Ciechanover, A. & Masucci, M. G. Inhibition of ubiquitin/ proteasome-dependent protein degradation by the Gly-Ala repeat domain of the Epstein-Barr virus nuclear antigen 1. Proc. Natl Acad. Sci. USA 94, 12616–12621 (1997). 39. Flynn, J. L., Goldstein, M. A., Treibold, K. J., Koller, B. & Bloom, B. R. Major histocompatability complex class-I restricted T cells are required for resistance to Mycobacterium tuberculosis infection. Proc. Natl Acad. Sci. USA 89, 12013–12017 (1992). 40. Bloom, B. R. & Fine, P. E. M. in Tuberculosis: Pathogenesis, Protection, and Control (ed. Bloom, B. R.) 531–557 (Am. Soc. Microbiol., Washington DC, 1994). 41. Collins, D. M. In search of tuberculosis virulence genes. Trends Microbiol. 4, 426–430 (1996). 42. Arruda, S., Bomfim, G., Knights, R., Huima-Byron, T. & Riley, L. W. Cloning of an M. tuberculosis DNA fragment associated with entry and survival inside cells. Science 261, 1454–1457 (1993). 43. Baumler, A. J., Kusters, J. G., Stojikovic, I. & Heffron, F. Salmonella typhimurium loci involved in survival within macrophages. Infect. Immun. 62, 1623–1630 (1994). 44. McMurray, A. A., Sulston, J. E. & Quail, M. A. Short-insert libraries as a method of problem solving in genome sequencing. Genome Res. 8, 562–566 (1998). 45. Bonfield, J. K., Smith, K. F. & Staden, R. A new DNA sequence assembly program. Nucleic Acids Res. 24, 4992–4999 (1995). 46. Krogh, A., Mian, I. S. & Haussler, D. A hidden Markov model that finds genes in E. coli DNA. Nucleic Acids Res. 22, 4768–4778 (1994). 47. Bairoch, A., Bucher, P. & Hofmann, K. The PROSITE database, its status in 1997. Nucleic Acids Res. 25, 217–221 (1997). 48. Altschul, S., Gish, W., Miller, W., Myers, E. & Lipman, D. A basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990). 49. Pearson, W. & Lipman, D. Improved tools for biological sequence comparisons. Proc. Natl Acad. USA 85, 2444–2448 (1988). 50. Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic DNA. Nucleic Acids Res. 25, 955–964 (1997). Acknowledgements. We thank Y. Av-Gay, F.-C. Bange, A. Danchin, B. Dujon, W. R. Jacobs Jr, L. Jones, M. McNeil, I. Moszer, P. Rice and J. Stephenson for advice, reagents and support. This work was supported by the Wellcome Trust. Additional funding was provided by the Association Française Raoul Follereau, the World Health Organisation and the Institut Pasteur. S.V.G. received a Wellcome Trust travelling research fellowship. Correspondence and requests for materials should be addressed to B.G.B. (barrell@sanger.ac.uk) or S.T.C. (stcole@pasteur.fr). The complete sequence has been deposited in EMBL/GenBank/DDBJ as MTBH37RV, accession number AL123456. Nature © Macmillan Publishers Ltd 1998 NATURE | VOL 393 | 11 JUNE 1998 8 Table 1. Functional classification of Mycobacterium tuberculosis protein-coding genes I. Small-molecule metabolism A. Degradation 1. Carbon compounds Rv0186 bglS β-glucosidase Rv2202c cbhK carbohydrate kinase Rv0727c fucA L-fuculose phosphate aldolase Rv1731 gabD1 succinate-semialdehyde dehydrogenase Rv0234c gabD2 succinate-semialdehyde dehydrogenase Rv0501 galE1 UDP-glucose 4-epimerase Rv0536 galE2 UDP-glucose 4-epimerase Rv0620 galK galactokinase Rv0619 galT galactose-1-phosphate uridylyltransferase C-term Rv0618 galT' galactose-1-phosphate uridylyltransferase N-term Rv0993 galU UTP-glucose-1-phosphate uridylyltransferase Rv3696c glpK ATP:glycerol 3-phosphotransferase Rv3255c manA mannose-6-phosphate isomerase Rv3441c mrsA phosphoglucomutase or phosphomannomutase Rv0118c oxcA oxalyl-CoA decarboxylase Rv3068c pgmA phosphoglucomutase Rv3257c pmmA phosphomannomutase Rv3308 pmmB phosphomannomutase Rv2702 ppgK polyphosphate glucokinase Rv0408 pta phosphate acetyltransferase Rv0729 xylB xylulose kinase Rv1096 carbohydrate degrading enzyme 2. Amino acids and amines Rv1905c aao D-amino acid oxidase Rv2531c adi ornithine/arginine decarboxylase Rv2780 ald L-alanine dehydrogenase Rv1538c ansA L-asparaginase Rv1001 arcA arginine deiminase Rv0753c mmsA methylmalmonate semialdehyde dehydrogenase Rv0751c mmsB methylmalmonate semialdehyde oxidoreductase Rv1187 rocA pyrroline-5-carboxylate dehydrogenase Rv2322c rocD1 ornithine aminotransferase Rv2321c rocD2 ornithine aminotransferase Rv1848 ureA urease γ subunit Rv1849 ureB urease β subunit Rv1850 ureC urease α subunit Rv1853 ureD urease accessory protein Rv1851 ureF urease accessory protein Rv1852 ureG urease accessory protein Rv2913c probable D-amino acid aminohydrolase Rv3551 possible glutaconate CoAtransferase 3. Fatty acids Rv2501c accA1 Rv0973c accA2 Rv2502c accD1 Rv0974c accD2 Rv3667 Rv3409c Rv0222 acs choD echA1 Rv0456c echA2 Rv0632c echA3 Rv0673 echA4 Rv0675 echA5 Rv0905 echA6 Rv0971c echA7 Rv1070c echA8 Rv1071c echA9 Rv1142c echA10 Rv1141c echA11 Rv1472 echA12 Rv1935c echA13 Rv2486 echA14 Rv2679 echA15 acetyl/propionyl-CoA carboxylase, α subunit acetyl/propionyl-CoA carboxylase, α subunit acetyl/propionyl-CoA carboxylase, β subunit acetyl/propionyl-CoA carboxylase, β subunit acetyl-CoA synthase cholesterol oxidase enoyl-CoA hydratase/isomerase superfamily enoyl-CoA hydratase/isomerase superfamily enoyl-CoA hydratase/isomerase superfamily enoyl-CoA hydratase/isomerase superfamily enoyl-CoA hydratase/isomerase superfamily enoyl-CoA hydratase/isomerase superfamily (aka eccH) enoyl-CoA hydratase/isomerase superfamily enoyl-CoA hydratase/isomerase superfamily enoyl-CoA hydratase/isomerase superfamily enoyl-CoA hydratase/isomerase superfamily enoyl-CoA hydratase/isomerase superfamily enoyl-CoA hydratase/isomerase superfamily enoyl-CoA hydratase/isomerase superfamily enoyl-CoA hydratase/isomerase superfamily enoyl-CoA hydratase/isomerase Rv2831 Rv3039c Rv3373 Rv3374 Rv3516 Rv3550 Rv3774 Rv0859 Rv0243 Rv1074c Rv1323 Rv3546 Rv3556c Rv0860 Rv0468 Rv1715 Rv3141 Rv1912c Rv1750c Rv0270 Rv3561 Rv0214 Rv0166 Rv1206 Rv0119 Rv0551c Rv2590 Rv0099 Rv1550 Rv1549 Rv1427c Rv3089 Rv1058 Rv2187 Rv0852 Rv3506 Rv3513c Rv3515c Rv1185c Rv2948c Rv3826 Rv1529 Rv1521 Rv2930 Rv0275c Rv2941 Rv2950c Rv0404 Rv1925 Rv3801c Rv1345 Rv0035 Rv2505c Rv1193 Rv0131c Rv0154c Rv0215c Rv0231 Rv0244c Rv0271c Rv0400c Rv0672 Rv0752c Rv0873 Rv0972c Rv0975c Rv1346 Rv1467c Rv1679 Rv1934c Rv1933c Rv2500c Rv2724c Rv2789c Rv3061c Rv3140 Rv3139 Rv3274c Rv3504 Rv3505 Rv3544c superfamily enoyl-CoA hydratase/isomerase superfamily echA17 enoyl-CoA hydratase/isomerase superfamily echA18 enoyl-CoA hydratase/isomerase superfamily, N-term echA18' enoyl-CoA hydratase/isomerase superfamily, C-term echA19 enoyl-CoA hydratase/isomerase superfamily echA20 enoyl-CoA hydratase/isomerase superfamily echA21 enoyl-CoA hydratase/isomerase superfamily fadA β oxidation complex, β subunit (acetyl-CoA C-acetyltransferase) fadA2 acetyl-CoA C-acetyltransferase fadA3 acetyl-CoA C-acetyltransferase fadA4 acetyl-CoA C-acetyltransferase (aka thiL) fadA5 acetyl-CoA C-acetyltransferase fadA6 acetyl-CoA C-acetyltransferase fadB β oxidation complex, α subunit (multiple activities) fadB2 3-hydroxyacyl-CoA dehydrogenase fadB3 3-hydroxyacyl-CoA dehydrogenase fadB4 3-hydroxyacyl-CoA dehydrogenase fadB5 3-hydroxyacyl-CoA dehydrogenase fadD1 acyl-CoA synthase fadD2 acyl-CoA synthase fadD3 acyl-CoA synthase fadD4 acyl-CoA synthase fadD5 acyl-CoA synthase fadD6 acyl-CoA synthase fadD7 acyl-CoA synthase fadD8 acyl-CoA synthase fadD9 acyl-CoA synthase fadD10 acyl-CoA synthase fadD11 acyl-CoA synthase, N-term fadD11' acyl-CoA synthase, C-term fadD12 acyl-CoA synthase fadD13 acyl-CoA synthase fadD14 acyl-CoA synthase fadD15 acyl-CoA synthase fadD16 acyl-CoA synthase fadD17 acyl-CoA synthase fadD18 acyl-CoA synthase fadD19 acyl-CoA synthase fadD21 acyl-CoA synthase fadD22 acyl-CoA synthase fadD23 acyl-CoA synthase fadD24 acyl-CoA synthase fadD25 acyl-CoA synthase fadD26 acyl-CoA synthase fadD27 acyl-CoA synthase fadD28 acyl-CoA synthase fadD29 acyl-CoA synthase fadD30 acyl-CoA synthase fadD31 acyl-CoA synthase fadD32 acyl-CoA synthase fadD33 acyl-CoA synthase fadD34 acyl-CoA synthase fadD35 acyl-CoA synthase fadD36 acyl-CoA synthase fadE1 acyl-CoA dehydrogenase fadE2 acyl-CoA dehydrogenase fadE3 acyl-CoA dehydrogenase fadE4 acyl-CoA dehydrogenase fadE5 acyl-CoA dehydrogenase fadE6 acyl-CoA dehydrogenase fadE7 acyl-CoA dehydrogenase fadE8 acyl-CoA dehydrogenase (aka aidB) fadE9 acyl-CoA dehydrogenase fadE10 acyl-CoA dehydrogenase fadE12 acyl-CoA dehydrogenase fadE13 acyl-CoA dehydrogenase fadE14 acyl-CoA dehydrogenase fadE15 acyl-CoA dehydrogenase fadE16 acyl-CoA dehydrogenase fadE17 acyl-CoA dehydrogenase fadE18 acyl-CoA dehydrogenase fadE19 acyl-CoA dehydrogenase (aka mmgC) fadE20 acyl-CoA dehydrogenase fadE21 acyl-CoA dehydrogenase fadE22 acyl-CoA dehydrogenase fadE23 acyl-CoA dehydrogenase fadE24 acyl-CoA dehydrogenase fadE25 acyl-CoA dehydrogenase fadE26 acyl-CoA dehydrogenase fadE27 acyl-CoA dehydrogenase fadE28 acyl-CoA dehydrogenase echA16 Nature © Macmillan Publishers Ltd 1998 Rv3543c Rv3560c Rv3562 Rv3563 Rv3564 Rv3573c Rv3797 Rv3761c Rv1175c Rv0855 Rv1143 Rv1492 fadE29 fadE30 fadE31 fadE32 fadE33 fadE34 fadE35 fadE36 fadH far mcr mutA Rv1493 mutB Rv2504c scoA Rv2503c scoB Rv1136 Rv1683 - acyl-CoA dehydrogenase acyl-CoA dehydrogenase acyl-CoA dehydrogenase acyl-CoA dehydrogenase acyl-CoA dehydrogenase acyl-CoA dehydrogenase acyl-CoA dehydrogenase acyl-CoA dehydrogenase 2,4-Dienoyl-CoA Reductase fatty acyl-CoA racemase α-methyl acyl-CoA racemase methylmalonyl-CoA mutase, β subunit methylmalonyl-CoA mutase, α subunit 3-oxo acid:CoA transferase, α subunit 3-oxo acid:CoA transferase, β subunit probable carnitine racemase possible acyl-CoA synthase 4. Phosphorous compounds Rv2368c phoH ATP-binding pho regulon component Rv1095 phoH2 PhoH-like protein Rv3628 ppa probable inorganic pyrophosphatase Rv2984 ppk polyphosphate kinase B. Energy metabolism 1. Glycolysis Rv1023 eno enolase Rv0363c fba fructose bisphosphate aldolase Rv1436 gap glyceraldehyde 3-phosphate dehydrogenase Rv0489 gpm phosphoglycerate mutase I Rv3010c pfkA phosphofructokinase I Rv2029c pfkB phosphofructokinase II Rv0946c pgi glucose-6-phosphate isomerase Rv1437 pgk phosphoglycerate kinase Rv1617 pykA pyruvate kinase Rv1438 tpi triosephosphate isomerase Rv2419c putative phosphoglycerate mutase Rv3837c putative phosphoglycerate mutase 2. Pyruvate dehydrogenase Rv2241 aceE pyruvate dehydrogenase E1 component Rv3303c lpdA dihydrolipoamide dehydrogenase Rv2497c pdhA pyruvate dehydrogenase E1 component α subunit Rv2496c pdhB pyruvate dehydrogenase E1 component β subunit Rv2495c pdhC dihydrolipoamide acetyltransferase Rv0462 probable dihydrolipoamide dehydrogenase 3. TCA cycle Rv1475c acn Rv0889c citA Rv2498c citE Rv1098c fum Rv1131 gltA1 Rv0896 gltA2 Rv3339c icd1 Rv0066c icd2 Rv0794c lpdB Rv1240 mdh Rv2967c pca Rv3318 sdhA Rv3319 sdhB Rv3316 sdhC Rv3317 sdhD Rv1248c Rv2215 sucA sucB Rv0951 Rv0952 sucC sucD 4. Glyoxylate bypass Rv0467 aceA Rv1915 aceAa Rv1916 aceAb Rv1837c glcB Rv3323c gphA aconitate hydratase citrate synthase 2 citrate lyase β chain fumarase citrate synthase 3 citrate synthase 1 isocitrate dehydrogenase isocitrate dehydrogenase dihydrolipoamide dehydrogenase malate dehydrogenase pyruvate carboxylase succinate dehydrogenase A succinate dehydrogenase B succinate dehydrogenase C subunit succinate dehydrogenase D subunit 2-oxoglutarate dehydrogenase dihydrolipoamide succinyltransferase succinyl-CoA synthase β chain succinyl-CoA synthase α chain isocitrate lyase isocitrate lyase, α module isocitrate lyase, β module malate synthase phosphoglycolate phosphatase 5. Pentose phosphate pathway Rv1445c devB glucose-6-phosphate 1-dehydrogenase Rv1844c gnd 6-phosphogluconate dehydrogenase (Gram –) Rv1122 gnd2 6-phosphogluconate dehydrogenase (Gram +) Rv1446c opcA unknown function, may aid G6PDH 8 Rv2436 Rv1408 Rv2465c Rv1448c Rv1449c Rv1121 rbsK rpe rpi tal tkt zwf Rv1447c zwf2 6. Respiration a. aerobic Rv0527 ccsA Rv0529 ccsB Rv1451 ctaB Rv2200c Rv3043c ctaC ctaD Rv2193 ctaE Rv1542c Rv2470 Rv2249c glbN glbO glpD1 Rv3302c glpD2 Rv0694 lldD1 Rv1872c Rv1854c Rv3145 Rv3146 Rv3147 Rv3148 Rv3149 Rv3150 Rv3151 Rv3152 Rv3153 Rv3154 Rv3155 Rv3156 Rv3157 Rv3158 Rv2195 lldD2 ndh nuoA nuoB nuoC nuoD nuoE nuoF nuoG nuoH nuoI nuoJ nuoK nuoL nuoM nuoN qcrA Rv2196 qcrB Rv2194 qcrC b. anaerobic Rv2392 cysH Rv2899c Rv2900c fdhD fdhF Rv1552 frdA Rv1553 frdB Rv1554 frdC Rv1555 frdD Rv1161 Rv1162 Rv1164 Rv1163 Rv1736c Rv2391 narG narH narI narJ narX nirA Rv0252 Rv0253 nirB nirD ribokinase ribulose-phosphate 3-epimerase phosphopentose isomerase transaldolase transketolase glucose-6-phosphate 1-dehydrogenase glucose-6-phosphate 1-dehydrogenase cytochrome c-type biogenesis protein cytochrome c-type biogenesis protein cytochrome c oxidase assembly factor cytochrome c oxidase chain II cytochrome c oxidase polypeptide I cytochrome c oxidase polypeptide III hemoglobin-like, oxygen carrier hemoglobin-like, oxygen carrier glycerol-3-phosphate dehydrogenase glycerol-3-phosphate dehydrogenase L-lactate dehydrogenase (cytochrome) L-lactate dehydrogenase probable NADH dehydrogenase NADH dehydrogenase chain A NADH dehydrogenase chain B NADH dehydrogenase chain C NADH dehydrogenase chain D NADH dehydrogenase chain E NADH dehydrogenase chain F NADH dehydrogenase chain G NADH dehydrogenase chain H NADH dehydrogenase chain I NADH dehydrogenase chain J NADH dehydrogenase chain K NADH dehydrogenase chain L NADH dehydrogenase chain M NADH dehydrogenase chain N Rieske iron-sulphur component of ubiQ-cytB reductase cytochrome β component of ubiQcytB reductase cytochrome b/c component of ubiQ-cytB reductase 3'-phosphoadenylylsulfate (PAPS) reductase affects formate dehydrogenase-N molybdopterin-containing oxidoreductase fumarate reductase flavoprotein subunit fumarate reductase iron sulphur protein fumarate reductase 15kD anchor protein fumarate reductase 13kD anchor protein nitrate reductase α subunit nitrate reductase β chain nitrate reductase γ chain nitrate reductase δ chain fused nitrate reductase probable nitrite reductase/sulphite reductase nitrite reductase flavoprotein probable nitrite reductase small subunit c. Electron transport Rv0409 ackA acetate kinase Rv1623c appC cytochrome bd-II oxidase subunit I Rv1622c cydB cytochrome d ubiquinol oxidase subunit II Rv1620c cydC ABC transporter Rv1621c cydD ABC transporter Rv2007c fdxA ferredoxin Rv3554 fdxB ferredoxin Rv1177 fdxC ferredoxin 4Fe-4S Rv3503c fdxD probable ferredoxin Rv3029c fixA electron transfer flavoprotein β subunit Rv3028c fixB electron transfer flavoprotein α subunit Rv3106 fprA adrenodoxin and NADPH ferredoxin reductase Rv0886 fprB ferredoxin, ferredoxin-NADP reductase Rv3251c rubA rubredoxin A Rv3250c rubB rubredoxin B 7. Miscellaneous oxidoreductases and oxygenases 171 8. ATP-proton motive force Rv1308 atpA ATP synthase Rv1304 atpB ATP synthase Rv1311 atpC ATP synthase Rv1310 atpD ATP synthase Rv1305 atpE ATP synthase Rv1306 atpF ATP synthase Rv1309 atpG ATP synthase Rv1307 atpH ATP synthase α chain α chain e chain β chain c chain b chain γ chain δ chain C. Central intermediary metabolism 1. General Rv2589 gabT 4-aminobutyrate aminotransferase Rv3432c gadB glutamate decarboxylase Rv1832 gcvB glycine decarboxylase Rv1826 gcvH glycine cleavage system H protein Rv2211c gcvT T protein of glycine cleavage system Rv1213 glgC glucose-1-phosphate adenylyltransferase Rv3842c glpQ1 glycerophosphoryl diester phosphodiesterase Rv0317c glpQ2 glycerophosphoryl diester phosphodiesterase Rv3566c nhoA N-hydroxyarylamine o-acetyltransferase Rv0155 pntAA pyridine transhydrogenase subunit α1 Rv0156 pntAB pyridine transhydrogenase subunit α2 Rv0157 pntB pyridine transhydrogenase subunit β Rv1127c ppdK similar to pyruvate, phosphate dikinase 2. Gluconeogenesis Rv0211 pckA phosphoenolpyruvate carboxykinase Rv0069c sdaA L-serine dehydratase 1 3. Sugar nucleotides Rv1512 epiA nucleotide sugar epimerase Rv3784 epiB probable UDP-galactose 4epimerase Rv1511 gmdA GDP-mannose 4,6 dehydratase Rv0334 rmlA glucose-1-phosphate thymidyltransferase Rv3264c rmlA2 glucose-1-phosphate thymidyltransferase Rv3464 rmlB dTDP-glucose 4,6-dehydratase Rv3634c rmlB2 dTDP-glucose 4,6-dehydratase Rv3468c rmlB3 dTDP-glucose 4,6-dehydratase Rv3465 rmlC dTDP-4-dehydrorhamnose 3,5-epimerase Rv3266c rmlD dTDP-4-dehydrorhamnose reductase Rv0322 udgA UDP-glucose dehydrogenase/GDP-mannose 6dehydrogenase Rv3265c wbbL dTDP-rhamnosyl transferase Rv1525 wbbl2 dTDP-rhamnosyl transferase Rv3400 probable β-phosphoglucomutase 4. Amino sugars Rv3436c glmS 5. Sulphur Rv0711 Rv3299c Rv0663 Rv3077 Rv0296c Rv3796 Rv1285 Rv1286 Rv2131c Rv3248c Rv3283 Rv2291 Rv3118 Rv0814c Rv3762c glucosamine-fructose-6phosphate aminotransferase metabolism atsA arylsulfatase atsB proable arylsulfatase atsD proable arylsulfatase atsF proable arylsulfatase atsG proable arylsulfatase atsH proable arylsulfatase cysD ATP:sulphurylase subunit 2 cysN ATP:sulphurylase subunit 1 cysQ homologue of M.leprae cysQ sahH adenosylhomocysteinase sseA thiosulfate sulfurtransferase sseB thiosulfate sulfurtransferase sseC thiosulfate sulfurtransferase sseC2 thiosulfate sulfurtransferase probable alkyl sulfatase D. Amino acid biosynthesis 1. Glutamate family Rv1654 argB acetylglutamate kinase Rv1652 argC N-acetyl-γ-glutamyl-phosphate reductase Rv1655 argD acetylornithine aminotransferase Rv1656 argF ornithine carbamoyltransferase Rv1658 argG arginosuccinate synthase Rv1659 argH arginosuccinate lyase Rv1653 argJ glutamate N-acetyltransferase Rv2220 glnA1 glutamine synthase class I Rv2222c glnA2 glutamine synthase class II Nature © Macmillan Publishers Ltd 1998 Rv1878 Rv2860c Rv2918c Rv2221c glnA3 glnA4 glnD glnE Rv3859c gltB Rv3858c gltD Rv3704c gshA Rv2427c Rv2439c Rv0500 proA proB proC 2. Aspartate family Rv3708c asd Rv3709c Rv2201 Rv3565 Rv0337c Rv2753c Rv2773c Rv1202 ask asnB aspB aspC dapA dapB dapE Rv2141c dapE2 Rv2726c Rv1293 Rv3341 Rv1079 Rv3340 Rv1133c dapF lysA metA metB metC metE Rv2124c metH Rv1392 Rv0391 metK metZ Rv1294 Rv1296 Rv1295 thrA thrB thrC 3. Serine family Rv0815c cysA2 Rv3117 cysA3 Rv2335 cysE Rv0511 cysG Rv2847c cysG2 Rv2334 Rv1336 Rv1077 Rv0848 Rv1093 Rv0070c Rv2996c cysK cysM cysM2 cysM3 glyA glyA2 serA Rv0505c serB Rv3042c serB2 Rv0884c serC probable glutamine synthase proable glutamine synthase uridylyltransferase glutamate-ammonia-ligase adenyltransferase ferredoxin-dependent glutamate synthase small subunit of NADH-dependent glutamate synthase possible γ-glutamylcysteine synthase γ-glutamyl phosphate reductase glutamate 5-kinase pyrroline-5-carboxylate reductase aspartate semialdehyde dehydrogenase aspartokinase asparagine synthase B aspartate aminotransferase aspartate aminotransferase dihydrodipicolinate synthase dihydrodipicolinate reductase succinyl-diaminopimelate desuccinylase ArgE/DapE/Acy1/Cpg2/yscS family diaminopimelate epimerase diaminopimelate decarboxylase homoserine o-acetyltransferase cystathionine γ-synthase cystathionine β-lyase 5-methyltetrahydropteroyltriglutamate-homocysteine methyltransferase 5-methyltetrahydrofolate-homocysteine methyltransferase S-adenosylmethionine synthase o-succinylhomoserine sulfhydrylase homoserine dehydrogenase homoserine kinase homoserine synthase thiosulfate sulfurtransferase thiosulfate sulfurtransferase serine acetyltransferase uroporphyrin-III c-methyltransferase multifunctional enzyme, siroheme synthase cysteine synthase A cysteine synthase B cystathionine β-synthase putative cysteine synthase serine hydroxymethyltransferase serine hydroxymethyltransferase D-3-phosphoglycerate dehydrogenase probable phosphoserine phosphatase C-term similar to phosphoserine phosphatase phosphoserine aminotransferase 4. Aromatic amino acid family Rv3227 aroA 3-phosphoshikimate 1-carboxyvinyl transferase Rv2538c aroB 3-dehydroquinate synthase Rv2537c aroD 3-dehydroquinate dehydratase Rv2552c aroE shikimate 5-dehydrogenase Rv2540c aroF chorismate synthase Rv2178c aroG DAHP synthase Rv2539c aroK shikimate kinase I Rv3838c pheA prephenate dehydratase Rv1613 trpA tryptophan synthase α chain Rv1612 trpB tryptophan synthase β chain Rv1611 trpC indole-3-glycerol phosphate synthase Rv2192c trpD anthranilate phosphoribosyltransferase Rv1609 trpE anthranilate synthase component I Rv2386c trpE2 anthranilate synthase component I Rv3754 tyrA prephenate dehydrogenase 5. Histidine Rv1603 hisA Rv1601 hisB Rv1600 hisC Rv3772 hisC2 Rv1599 hisD phosphoribosylformimino-5aminoimidazole carboxamide ribonucleotide isomerase imidazole glycerol-phosphate dehydratase histidinol-phosphate aminotransferase histidinol-phosphate aminotransferase histidinol dehydrogenase 8 Rv1605 hisF Rv2121c Rv1602 Rv2122c hisG hisH hisI Rv1606 hisI2 Rv0114 - 6. Pyruvate family Rv3423c alr imidazole glycerol-phosphate synthase ATP phosphoribosyltransferase amidotransferase phosphoribosyl-AMP cyclohydrolase probable phosphoribosyl-AMP 1,6 cyclohydrolase similar to HisB Rv3048c nrdG Rv3053c nrdH Rv3052c Rv3247c Rv2764c Rv0570 Rv3752c nrdI tmk thyA nrdZ - subunit ribonucleoside-diphosphate small subunit glutaredoxin electron transport component of NrdEF system NrdI/YgaO/YmaA family thymidylate kinase thymidylate synthase ribonucleotide reductase, class II probable cytidine/deoxycytidylate deaminase alanine racemase 7. Branched amino acid family Rv1559 ilvA threonine deaminase Rv3003c ilvB acetolactate synthase I large subunit Rv3470c ilvB2 acetolactate synthase large subunit Rv3001c ilvC ketol-acid reductoisomerase Rv0189c ilvD dihydroxy-acid dehydratase Rv2210c ilvE branched-chain-amino-acid transaminase Rv1820 ilvG acetolactate synthase II Rv3002c ilvN acetolactate synthase I small subunit Rv3509c ilvX probable acetohydroxyacid synthase I large subunit Rv3710 leuA α-isopropyl malate synthase Rv2995c leuB 3-isopropylmalate dehydrogenase Rv2988c leuC 3-isopropylmalate dehydratase large subunit Rv2987c leuD 3-isopropylmalate dehydratase small subunit E. Polyamine synthesis Rv2601 speE spermidine synthase F. Purines, pyrimidines, nucleosides and nucleotides 1. Purine ribonucleotide biosynthesis Rv1389 gmk putative guanylate kinase Rv3396c guaA GMP synthase Rv1843c guaB1 inosine-5'-monophosphate dehydrogenase Rv3411c guaB2 inosine-5'-monophosphate dehydrogenase Rv3410c guaB3 inosine-5'-monophosphate dehydrogenase Rv1017c prsA ribose-phosphate pyrophosphokinase Rv0357c purA adenylosuccinate synthase Rv0777 purB adenylosuccinate lyase Rv0780 purC phosphoribosylaminoimidazolesuccinocarboxamide synthase Rv0772 purD phosphoribosylamine-glycine ligase Rv3275c purE phosphoribosylaminoimidazole carboxylase Rv0808 purF amidophosphoribosyltransferaseRv0957 purH phosphoribosylaminoimidazolecarboxamide formyltransferase Rv3276c purK phosphoribosylaminoimidazole carboxylase ATPase subunit Rv0803 purL phosphoribosylformylglycinamidine synthase II Rv0809 purM 5'-phosphoribosyl-5-aminoimidazole synthase Rv0956 purN phosphoribosylglycinamide formyltransferase I Rv0788 purQ phosphoribosylformylglycinamidine synthase I Rv0389 purT phosphoribosylglycinamide formyltransferase II Rv2964 purU formyltetrahydrofolate deformylase 4. Salvage of nucleosides and nucleotides Rv3313c add probable adenosine deaminase Rv2584c apt adenine phosphoribosyltransferases Rv3315c cdd probable cytidine deaminase Rv3314c deoA thymidine phosphorylase Rv0478 deoC deoxyribose-phosphate aldolase Rv3307 deoD probable purine nucleoside phosphorylase Rv3624c hpt probable hypoxanthine-guanine phosphoribosyltransferase Rv3393 iunH probable inosine-uridine preferring nucleoside hydrolase Rv0535 pnp phosphorylase from Pnp/MtaP family 2 Rv3309c upp uracil phophoribosyltransferase 5. Miscellaneous nucleoside/nucleotide reactions Rv0733 adk probable adenylate kinase Rv2364c bex GTP-binding protein of Era/ThdF family Rv1712 cmk cytidylate kinase Rv2344c dgt probable deoxyguanosine triphosphate hydrolase Rv2404c lepA GTP-binding protein LepA Rv2727c miaA tRNA δ(2)-isopentenylpyrophosphate transferase Rv2445c ndkA nucleoside diphosphate kinase Rv2440c obg Obg GTP-binding protein Rv2583c relA (p)ppGpp synthase I G. Biosynthesis of cofactors, prosthetic groups and carriers 1. Biotin Rv1568 bioA adenosylmethionine-8-amino-7oxononanoate aminotransferase Rv1589 bioB biotin synthase Rv1570 bioD dethiobiotin synthase Rv1569 bioF 8-amino-7-oxononanoate synthase Rv0032 bioF2 C-terminal similar to B. subtilis BioF Rv3279c birA biotin apo-protein ligase Rv1442 bisC biotin sulfoxide reductase Rv0089 possible bioC biotin synthesis gene 2. Folic acid Rv2763c dfrA Rv2447c folC Rv3356c folD Rv3609c Rv3606c folE folK Rv3608c Rv1207 Rv3607c folP folP2 folX Rv0013 pabA Rv1005c Rv0812 pabB pabC 2. Pyrimidine ribonucleotide biosynthesis Rv1383 carA carbamoyl-phosphate synthase subunit Rv1384 carB carbamoyl-phosphate synthase subunit Rv1380 pyrB aspartate carbamoyltransferase Rv1381 pyrC dihydroorotase Rv2139 pyrD dihydroorotate dehydrogenase Rv1385 pyrF orotidine 5'-phosphate decarboxylase Rv1699 pyrG CTP synthase Rv2883c pyrH uridylate kinase Rv0382c umpA probable uridine 5'-monophosphate synthase 3. Lipoate Rv2218 lipA Rv2217 lipB 3. 2'-deoxyribonucleotide metabolism Rv0321 dcd deoxycytidine triphosphate deaminase Rv2697c dut deoxyuridine triphosphatase Rv0233 nrdB ribonucleoside-diphosphate reductase B2 (eukaryotic-like) Rv3051c nrdE ribonucleoside diphosphate reductase α chain Rv1981c nrdF ribonucleotide reductase small 4. Molybdopterin Rv3109 moaA Rv0869c moaA2 Rv0438c moaA3 Rv3110 moaB Rv0984 moaB2 Rv3111 moaC Rv0864 moaC2 Rv3324c moaC3 Rv3112 moaD Rv0868c moaD2 dihydrofolate reductase folylpolyglutamate synthase methylenetetrahydrofolate dehydrogenase GTP cyclohydrolase I 7,8-dihydro-6-hydroxymethylpterin pyrophosphokinase dihydropteroate synthase dihydropteroate synthase may be involved in folate biosynthesis p-aminobenzoate synthase glutamine amidotransferase p-aminobenzoate synthase aminodeoxychorismate lyase lipoate biosynthesis protein A lipoate biosynthesis protein B molybdenum cofactor biosynthesis, protein A molybdenum cofactor biosynthesis, protein A molybdenum cofactor biosynthesis, protein A molybdenum cofactor biosynthesis, protein B molybdenum cofactor biosynthesis, protein B molybdenum cofactor biosynthesis, protein C molybdenum cofactor biosynthesis, protein C molybdenum cofactor biosynthesis, protein C molybdopterin converting factor subunit 1 molybdopterin converting factor Nature © Macmillan Publishers Ltd 1998 Rv3119 moaE Rv0866 moaE2 Rv3322c moaE3 Rv0994 Rv3116 Rv2338c Rv1681 Rv1355c Rv3206c moeA moeB moeW moeX moeY moeZ Rv0865 mog 5. Pantothenate Rv1092c coaA Rv2225 panB panC panD Rv3602c Rv3601c 6. Pyridoxine Rv2607 pdxH 7. Pyridine Rv1594 Rv1595 Rv1596 Rv0423c subunit 1 molybdopterin-converting factor subunit 2 molybdopterin-converting factor subunit 2 molybdopterin-converting factor subunit 2 molybdopterin biosynthesis molybdopterin biosynthesis molybdopterin biosynthesis weak similarity to E. coli MoaA weak similarity to E. coli MoeB probably involved in molybdopterin biosynthesis molybdopterin biosynthesis pantothenate kinase 3-methyl-2-oxobutanoate hydroxymethyltransferase pantoate-β-alanine ligase aspartate 1-decarboxylase pyridoxamine 5'-phosphate oxidase nucleotide nadA quinolinate synthase nadB L-aspartate oxidase nadC nicotinate-nucleotide pyrophosphatase thiC thiamine synthesis, pyrimidine moiety 8. Thiamine Rv0422c thiD Rv0414c thiE Rv0417 thiG Rv2977c thiL 9. Riboflavin Rv1940 ribA Rv1415 ribA2 Rv1412 ribC Rv2671 ribD Rv2786c ribF Rv1409 ribG Rv1416 ribH Rv3300c - phosphomethylpyrimidine kinase thiamine synthesis, thiazole moiety thiamine synthesis, thiazole moiety probable thiamine-monophosphate kinase GTP cyclohydrolase II probable GTP cyclohydrolase II riboflavin synthase α chain probable riboflavin deaminase riboflavin kinase riboflavin biosynthesis riboflavin synthase β chain probable deaminase, riboflavin synthesis 10. Thioredoxin, glutaredoxin and mycothiol Rv0773c ggtA putative γ-glutamyl transpeptidase Rv2394 ggtB γ -glutamyltranspeptidase precursor Rv2855 gorA glutathione reductase homologue Rv0816c thiX equivalent to M. leprae ThiX Rv1470 trxA thioredoxin Rv1471 trxB thioredoxin reductase Rv3913 trxB2 thioredoxin reductase Rv3914 trxC thioredoxin 11. Menaquinone, PQQ, ubiquinone and other terpenoids Rv2682c dxs 1-deoxy-D-xylulose 5-phosphate synthase Rv0562 grcC1 heptaprenyl diphosphate synthase II Rv0989c grcC2 heptaprenyl diphosphate synthase II Rv3398c idsA geranylgeranyl pyrophosphate synthase Rv2173 idsA2 geranylgeranyl pyrophosphate synthase Rv3383c idsB transfergeranyl, similar geranyl pyrophosphate synthase Rv0534c menA 4-dihydroxy-2-naphthoate octaprenyltransferase Rv0548c menB naphthoate synthase Rv0553 menC o-succinylbenzoate-CoA synthase Rv0555 menD 2-succinyl-6-hydroxy-2,4-cyclohexadiene-1-carboxylate synthase Rv0542c menE o-succinylbenzoic acid-CoA ligase Rv3853 menG S-adenosylmethionine: 2-demethylmenaquinone Rv3397c phyA phytoene synthase Rv0693 pqqE coenzyme PQQ synthesis protein E Rv0558 ubiE ubiquinone/menaquinone biosynthesis methyltransferase 12. Heme Rv0509 Rv0512 Rv0510 Rv2678c and porphyrin hemA glutamyl-tRNA reductase hemB δ-aminolevulinic acid dehydratase hemC porphobilinogen deaminase hemE uroporphyrinogen decarboxylase 8 Rv1300 Rv0524 hemK hemL Rv2388c hemN Rv2677c Rv1485 hemY' hemZ 13. Cobalamin Rv2849c cobA Rv2848c cobB Rv2231c Rv2236c Rv2064 Rv2065 Rv2066 Rv2070c Rv2072c Rv2071c Rv2062c Rv2208 cobC cobD cobG cobH cobI cobK cobL cobM cobN cobS Rv2207 cobT Rv0254c Rv0255c Rv3713 Rv0306 cobU cobQ cobQ2 - 14. Iron utilization Rv1876 bfrA Rv3841 bfrB Rv3215 entC Rv3214 entD Rv2895c viuB Rv3525c - protoporphyrinogen oxidase glutamate-1-semialdehyde aminotransferase oxygen-independent coproporphyrinogen III oxidase protoporphyrinogen oxidase ferrochelatase cob(I)alamin adenosyltransferase cobyrinic acid a,c-diamide synthase aminotransferase cobinamide synthase percorrin reductase precorrin isomerase CobI-CobJ fusion protein precorrin reductase probable methyltransferase precorrin-3 methylase cobalt insertion cobalamin (5'-phosphate) synthase nicotinate-nucleotide-dimethylbenzimidazole transferase cobinamide kinase cobyric acid synthase possible cobyric acid synthase similar to BluB cobalamin synthesis protein R. capsulatus bacterioferritin bacterioferritin probable isochorismate synthase weak similarity to many phosphoglycerate mutases similar to proteins involved in vibriobactin uptake similar to ferripyochelin binding protein H. Lipid biosynthesis 1. Synthesis of fatty and mycolic acids Rv3285 accA3 acetyl/propionyl CoA carboxylase α subunit Rv0904c accD3 acetyl/propionyl CoA carboxylase β subunit Rv3799c accD4 acetyl/propionyl CoA carboxylase β subunit Rv3280 accD5 acetyl/propionyl CoA carboxylase β subunit Rv2247 accD6 acetyl/propionyl CoA carboxylase β subunit Rv2244 acpM acyl carrier protein (meromycolate extension) Rv2523c acpS CoA:apo-[ACP] pantethienephosphotransferase Rv2243 fabD malonyl CoA-[ACP] transacylase Rv0649 fabD2 malonyl CoA-[ACP] transacylase Rv1483 fabG1 3-oxoacyl-[ACP] reductase (aka MabA) Rv1350 fabG2 3-oxoacyl-[ACP] Reductase Rv2002 fabG3 3-oxoacyl-[ACP] reductase Rv0242c fabG4 3-oxoacyl-[ACP] reductase Rv2766c fabG5 3-oxoacyl-[ACP] reductase Rv0533c fabH β-ketoacyl-ACP synthase III Rv2524c fas fatty acid synthase Rv1484 inhA enoyl-[ACP] reductase Rv2245 kasA β-ketoacyl-ACP synthase (meromycolate extension) Rv2246 kasB β-ketoacyl-ACP synthase (meromycolate extension) Rv1618 tesB1 thioesterase II Rv2605c tesB2 thioesterase II Rv0033 possible acyl carrier protein Rv1344 possible acyl carrier protein Rv1722 possible biotin carboxylase Rv3221c resembles biotin carboxyl carrier Rv3472 possible acyl carrier protein 2. Modification of fatty and mycolic acids Rv3391 acrA1 fatty acyl-CoA reductase Rv3392c cmaA1 cyclopropane mycolic acid synthase 1 Rv0503c cmaA2 cyclopropane mycolic acid synthase 2 Rv0824c desA1 acyl-[ACP] desaturase Rv1094 desA2 acyl-[ACP] desaturase Rv3229c desA3 acyl-[ACP] desaturase Rv0645c mmaA1 methoxymycolic acid synthase 1 Rv0644c mmaA2 methoxymycolic acid synthase 2 Rv0643c mmaA3 methoxymycolic acid synthase 3 Rv0642c mmaA4 methoxymycolic acid synthase 4 Rv0447c ufaA1 unknown fatty acid methyltransferase Rv3538 ufaA2 unknown fatty acid methyltransferase Rv0469 umaA1 unknown mycolic acid methyltransferase Rv0470c umaA2 unknown mycolic acid methyltransferase 3. Acyltransferases, mycoloyltransferases and phospholipid synthesis Rv2289 cdh CDP-diacylglycerol phosphatidylhydrolase Rv2881c cdsA phosphatidate cytidylyltransferase Rv3804c fbpA antigen 85A, mycolyltransferase Rv1886c fbpB antigen 85B, mycolyltransferase Rv3803c fbpC1 antigen 85C, mycolyltransferase Rv0129c fbpC2 antigen 85C', mycolytransferase Rv0564c gpdA1 glycerol-3-phosphate dehydrogenase Rv2982c gpdA2 glycerol-3-phosphate dehydrogenase Rv2612c pgsA CDP-diacylglycerol-glycerol-3phosphate phosphatidyltransferase Rv1822 pgsA2 CDP-diacylglycerol-glycerol-3phosphate phosphatidyltransferase Rv2746c pgsA3 CDP-diacylglycerol-glycerol-3phosphate phosphatidyltransferase Rv1551 plsB1 glycerol-3-phosphate acyltransferase Rv2482c plsB2 glycerol-3-phosphate acyltransferase Rv0437c psd putative phosphatidylserine decarboxylase Rv0436c pssA CDP-diacylglycerol-serine o-phosphatidyltransferase Rv0045c possible dihydrolipoamide acetyltransferase Rv0914c lipid transfer protein Rv1543 probable fatty-acyl CoA reductase Rv1627c lipid carrier protein Rv1814 possible C-5 sterol desaturase Rv1867 similar to acetyl CoA synthase/lipid carriers Rv2261c apolipoprotein N-acyltransferase-a Rv2262c apolipoprotein N-acyltransferase-b Rv3523 lipid carrier protein Rv3720 C-term similar to cyclopropane fatty acid synthases I. Polyketide and non-ribosomal peptide synthesis Rv2940c mas mycocerosic acid synthase Rv2384 mbtA mycobactin/exochelin synthesis (salicylate-AMP ligase) Rv2383c mbtB mycobactin/exochelin synthesis (serine/threonine ligation) Rv2382c mbtC mycobactin/exochelin synthesis Rv2381c mbtD mycobactin/exochelin synthesis (polyketide synthase) Rv2380c mbtE mycobactin/exochelin synthesis (lysine ligation) Rv2379c mbtF mycobactin/exochelin synthesis (lysine ligation) Rv2378c mbtG mycobactin/exochelin synthesis (lysine hydroxylase) Rv2377c mbtH mycobactin/exochelin synthesis Rv0101 nrp unknown non-ribosomal peptide synthase Rv1153c omt PKS o-methyltransferase Rv3824c papA1 PKS-associated protein, unknown function Rv3820c papA2 PKS-associated protein, unknown function Rv1182 papA3 PKS-associated protein, unknown function Rv1528c papA4 PKS-associated protein, unknown function Rv2939 papA5 PKS-associated protein, unknown function Rv2946c pks1 polyketide synthase Rv1660 pks10 polyketide synthase (chalcone synthase-like) Rv1665 pks11 polyketide synthase (chalcone synthase-like) Rv2048c pks12 polyketide synthase (erythronolide synthase-like) Rv3800c pks13 polyketide synthase Rv1342c pks14 polyketide synthase (chalcone synthase-like) Rv2947c pks15 polyketide synthase Rv1013 pks16 polyketide synthase Rv1663 pks17 polyketide synthase Rv1372 pks18 polyketide synthase Rv3825c pks2 polyketide synthase Rv1180 pks3 polyketide synthase Rv1181 pks4 polyketide synthase Rv1527c pks5 polyketide synthase Rv0405 pks6 polyketide synthase Rv1661 pks7 polyketide synthase Rv1662 pks8 polyketide synthase Rv1664 pks9 polyketide synthase Nature © Macmillan Publishers Ltd 1998 Rv2931 Rv2932 Rv2933 Rv2934 Rv2935 Rv2928 Rv1544 ppsA ppsB ppsC ppsD ppsE tesA - phenolpthiocerol synthesis (pksB) phenolpthiocerol synthesis (pksC) phenolpthiocerol synthesis (pksD) phenolpthiocerol synthesis (pksE) phenolpthiocerol synthesis (pksF) thioesterase probable ketoacyl reductase J. Broad regulatory functions 1. Repressors/activators Rv1657 argR arginine repressor Rv1267c embR regulator of embAB genes (AfsR/DndI/RedD family) Rv1909c furA ferric uptake regulatory protein Rv2359 furB ferric uptake regulatory protein Rv2919c glnB nitrogen regulatory protein Rv2711 ideR iron dependent repressor, IdeR Rv2720 lexA LexA, SOS repressor protein Rv1479 moxR transcriptional regulator, MoxR homologue Rv3692 moxR2 transcriptional regulator, MoxR homologue Rv3164c moxR3 transcriptional regulator, MoxR homologue Rv0212c nadR similar to E.coli NadR Rv0117 oxyS transcriptional regulator (LysR family) Rv1379 pyrR regulatory protein pyrimidine biosynthesis Rv2788 sirR iron-dependent transcriptional repressor Rv3082c virS putative virulence regulating protein (AraC/XylS family) Rv3219 whiB1 WhiB transcriptional activator homologue Rv3260c whiB2 WhiB transcriptional activator homologue Rv3416 whiB3 WhiB transcriptional activator homologue Rv3681c whiB4 WhiB transcriptional activator homologue Rv0023 putative transcriptional regulator Rv0043c transcriptional regulator (GntR family) Rv0067c transcriptional regulator (TetR/AcrR family) Rv0078 transcriptional regulator (TetR/AcrR family) Rv0081 transcriptional regulator (ArsR family) Rv0135c putative transcriptional regulator Rv0144 putative transcriptional regulator Rv0158 transcriptional regulator (TetR/AcrR family) Rv0165c transcriptional regulator (GntR family) Rv0195 transcriptional regulator (LuxR/UhpA family) Rv0196 transcriptional regulator (TetR/AcrR family) Rv0232 transcriptional regulator (TetR/AcrR family) Rv0238 transcriptional regulator (TetR/AcrR family) Rv0273c putative transcriptional regulator Rv0302 transcriptional regulator (TetR/AcrR family) Rv0324 putative transcriptional regulator Rv0328 transcriptional regulator (TetR/AcrR family) Rv0348 putative transcriptional regulator Rv0377 transcriptional regulator (LysR family) Rv0386 transcriptional regulator (LuxR/UhpA family) Rv0452 putative transcriptional regulator Rv0465c transcriptional regulator (PbsX/Xre family) Rv0472c transcriptional regulator (TetR/AcrR family) Rv0474 transcriptional regulator (PbsX/Xre family) Rv0485 transcriptional regulator (ROK family) Rv0494 transcriptional regulator (GntR family) Rv0552 putative transcriptional regulator Rv0576 putative transcriptional regulator Rv0586 transcriptional regulator (GntR family) Rv0650 transcriptional regulator (ROK family) Rv0653c putative transcriptional regulator Rv0681 transcriptional regulator (TetR/AcrR family) Rv0691c transcriptional regulator (TetR/AcrR family) Rv0737 putative transcriptional regulator Rv0744c putative transcriptional regulator Rv0792c transcriptional regulator (GntR 8 Nature © Macmillan Publishers Ltd 1998 2,100,001 1,950,001 1,800,001 1,650,001 1,500,001 1,350,001 1,200,001 1,050,001 900,001 750,001 600,001 450,001 300,001 150,001 1 aA sM 2 43 09 urF p 42 09 cy bU pra etB m 44 45 09 09 12 12 C PE 15 12 pB 10 pC tr 16 B trp 55 856 1 18 A od m 27 17 28 729 17 1 h nd 26 17 A 61 h 18 ad 63 64 65 18 18 18 32 33 34 35 17 17 17 17 B C od od m m D od m 1 bD ga 30 17 bc rX na trp A E trp 47 13 74 14 aA ch 73 14 uT le t lg 18 19 20 12 12 12 RS G _P PE cD su 14 fa dE oY ph 2 bP 17 12 12 A B hA trx trx ec fa 86 10 50 09 cC su p T ho RS G _P PE 15 dE fa ctp D 4 3 s1 34 pk 1 21 05 20 05 ga 3 rK na 35 01 41 01 54 09 22 08 rp oC 66 18 2 rK na n ac 48 13 E sig 38 17 87 PE 03 P pk 42 01 nA 49 13 A htr 67 18 39 17 py kA 55 09 23 08 78 14 68 18 40 41 42 17 17 17 1 sB te 77 14 44 01 ro dA 26 12 19 16 69 18 nE pk fa 8 dE 70 71 D2 18 18 lld bH fa A en m dE 6 fa C 47 17 p ap hA in 73 74 75 18 18 18 dB cy 1 56 13 bG 97 10 86 14 A bfr 34 12 qY 25 uV le 16 77 18 49 17 48 17 27 90 14 79 18 51 17 52 17 91 14 lA A2 c ac lp E PP sA rp 54 17 31 16 utB m 62 63 13 13 RS G _P PE ' tB ly D plc 66 13 67 13 RS G _P PE 45 08 89 90 18 18 35 16 99 00 14 15 rF lp cA su 94 18 91 92 93 18 18 18 RS ) G 22 _P ag PE (w 34 16 98 14 10 58 61 IS 17 56 57 17 17 88 18 rL na 90 691 0 06 43 08 8 82 02 pta hE ad 95 18 8 s1 pk 50 12 d2 gn 60 17 lp 98 18 pD 61 62 17 17 rA uv oB J lip 10 61 IS cin A sX ly 09 15 77 13 10 15 78 13 X htp 03 04 19 19 o aa p yrR 55 12 1 dA gp 91 02 51 00 dA gm p yrB 56 12 29 11 06 19 07 19 69 17 p yrC 57 12 45 16 tG ka 70 17 58 12 A1 glt grc C2 65 05 73 17 72 17 47 16 15 15 c 24 04 61 62 12 12 etE m 62 08 p PE 2 iB am yrF 14 19 13 19 76 17 75 17 eS ph Aa e ac 77 17 eT ph G sig 83 01 H ctp Z nrd 71 05 Ab e ac 78 17 fa 5 2 dD 79 17 nH pk E PP 80 17 81 17 arg 12 pL m m C bR em 37 38 39 11 11 11 F k 0 9 IH dfp m gm 13 65 12 RS G _P PE E PP E PP 36 11 J A 64 00 26 07 75 05 02 10 RS G _P PE 93 13 D arg E PP 82 17 B arg 24 15 44 11 04 10 26 15 95 13 83 17 19 19 ic 74 08 20 19 pF lp arg 73 12 45 11 31 07 47 11 76 08 85 17 s5 22 19 86 17 0 s1 pk pk 97 98 pH 13 13 li D lip 94 01 80 05 qN lp 81 82 05 05 39 04 I lip 24 19 E PP 01 14 77 12 fa 31 dD 4 pA 26 19 E PP s7 pa pk p riA 78 12 51 11 49 50 11 11 09 10 79 08 E IK -L IS 08 10 E PP P RE 48 11 77 08 37 07 2 EL gro 38 07 84 05 41 04 11 03 fa 4 2 dD 05 14 27 19 28 29 30 31 19 19 19 19 95 17 tp 18 dE fa x 17 dE faa 88 08 36 19 98 17 S ile rG lp 7 s1 pk 10 14 sN cy 37 19 pT s9 pk 90 08 rL lp 52 04 E PP 88 12 rH na 89 12 90 12 rJ rI na na X 66 16 E PP A hB 39 ep 19 rib 1 s1 pk sA an 94 08 RS G _P PE 22 14 P RE 45 19 pG lp 04 05 18 18 72 16 44 45 46 15 15 15 21 14 S arg qW lp 1 aE dn 24 14 64 04 E PP dD fa rB th PE 12 E PP 54 19 fa 39 03 66 04 58 59 60 19 19 19 62 19 61 19 12 18 16 80 dE 16 fa 74 11 63 19 13 18 PE 14 18 82 16 B1 pls X oe m 29 14 dH 3 0 2 aA um 1 aA um 60 15 IS 67 07 07 09 16 06 A ald 83 16 64 965 19 1 15 16 18 18 A frd 31 14 rfe 43 03 C lip 79 11 PE E ctp 69 07 3 ce m 17 18 67 19 RS G _P PE 68 19 85 86 87 16 16 16 84 16 qJ lp 69 19 19 18 88 16 ty G atp tA gg rM lp Z glg i tp 71 19 G ilv 79 04 25 02 13 09 74 07 4 Y glg cA 2 76 19 72 73 74 75 19 19 19 19 se cN re X glg bis C urA m 3 pA pa 48 10 PE tP be 79 07 cB re 84 04 s rr 85 04 E PP 98 16 77 19 78 19 79 19 4 pt6 m F nrd cC f2 2 dD 1 f rr 82 19 P RE B 84 19 RS G _P PE v gc t tk 18 13 cA ro 57 10 23 09 85 19 33 18 03 17 72 15 cy cA 86 19 34 18 93 04 94 04 56 03 36 02 11 01 95 04 rA pu 86 07 27 09 I sig 20 13 90 11 87 88 19 19 89 90 19 19 36 18 E PP 91 19 qI lp 92 07 93 07 dB lp 4 dA 3 dD B G ctp glc 07 17 6 RS G _P PE fa fa 63 qV 10 lp r qo 87 88 15 15 93 94 19 19 95 19 41 18 96 19 56 14 F ctp 42 18 13 17 57 458 1 14 27 13 93 15 dA na 59 14 P glg pE da 9 37 09 2 1 98 19 aB gu 99 19 d gn 72 10 dB na 60 14 48 02 22 23 01 01 47 02 45 46 18 18 fa 01 bG fa 20 47 A B 18 ure ure 20 21 17 17 3 03 20 C ure 22 17 23 17 30 13 98 15 dC 597 1 na 61 14 75 10 05 12 3 dA 39 09 05 08 53 06 lJ lL rp rp 2 pL m m 04 08 G din 04 12 73 10 38 09 rL pu 37 dD 650 0 2 pS m m fa 03 12 00 20 46 02 21 01 69 70 71 72 03 03 03 03 45 02 sA fu rB se 68 03 8 hA chA e tA ps 01 12 ec tC ps 2 02 08 pC 801 pe 0 48 06 2 4 aA 050 cm 00 12 69 10 92 15 5 dE 7 dD 3 T 6 8 14 19 dB 71 717 71 17 1 1 1 17 pro fa B 90 91 bio 15 15 55 14 99 11 fa fa 65 366 367 03 0 0 02 05 1 oS ph 99 07 81 97 98 10 11 11 IS B glg P RE tB ps 98 07 64 03 fa A 2 dA c ox 1 lE ga S 47 06 a fb y ox RS G _P PE 4 53 14 k 09 10 11 17 17 17 cm 86 15 47 15 IS tS ps E PP bG 97 07 G lip C pro gtE m fa RS G _P PE 38 39 RS G 18 18 _P PE 08 17 94 11 24 13 PE RS G _P PE 65 66 10 10 nD pk 10 61 IS 95 96 07 07 4 3 2 1 aA aA aA aA m m m m m m m m 82 583 584 585 15 1 1 1 cta B 92 11 61 03 16 01 41 02 98 499 04 0 60 03 97 04 15 01 38 39 40 02 02 02 hA 14 gm 01 58 59 03 03 96 04 a gc 2 2 1 oS tC tA ps ps ph 91 07 21 22 13 13 91 11 77 78 79 80 81 15 15 15 15 15 E PP 35 18 76 15 26 09 89 90 07 07 rQ pu 59 060 061 062 1 10 1 1 RS G _P PE 19 13 88 11 fa 14 dD 25 09 87 07 E G rT etT 35 36 37 pT c s lK lA th m 06 06 06 tr se nu rp rp 92 04 RS G _P 10 01 PE 35 02 3 gX p m nra 34 06 re 08 01 D 71 73 74 75 bio 15 15 15 15 A 86 11 alk 02 17 F bio l ta t og 84 07 3 nX 85 07 se E PP 2 bD ga 3 33 hA 06 22 09 35 15 IS 21 09 83 07 ec 88 pm 04 g 32 rdB 02 n I ctp 54 55 X 56 10 10 leu 10 00 01 17 17 A bio zw fa 53 10 52 10 20 09 4 dE 87 04 fa 54 15 IS 30 31 18 18 rG py 66 67 15 15 rrl 84 11 44 vB cA 14 de op 51 10 T arg 18 19 09 09 re 86 04 Bb trBa rC p pu ptr 10 pL m m 65 15 97 16 06 01 29 230 02 0 aJ pR dn hs 28 02 B m rp 49 50 10 10 43 14 78 07 83 04 E grp 27 02 04 01 2 sA 23 24 25 vH 27 28 29 pg 18 18 18 gc 18 18 18 95 16 RS G _P PE 14 13 rB pu cD re urB m aK dn E PP 47 10 81 10 IS 57 15 IS 46 10 13 13 92 93 A 16 16 tly s G ec C 12 atp 13 s pk 14 09 76 07 75 07 B ctp 26 02 80 81 04 04 28 06 26 27 06 06 44 45 10 10 39 14 D atp 43 10 rJ 91 lp 16 60 61 15 15 k pg 3 24 02 02 01 47 348 349 0 03 0 25 06 22 23 24 06 06 06 E IK -L IS 41 42 10 10 rS A ilv p ga 23 02 09 10 11 12 09 09 09 09 s pk A atp P2 aro e A1 ch 74 475 476 477 eoC 04 0 0 0 d 21 06 73 04 45 03 21 02 70 71 urD 07 07 p 6 pL 8 m 55 1 m 35 14 33 34 14 14 B dC D 56 frd fr frd 15 32 14 H 03 B E F 13 atp atp atp atp xC 78 fd 11 33 34 35 36 37 38 PE P 10 10 10 10 10 10 76 11 19 02 71 72 04 04 42 03 18 02 nrp ' lK 14 15 17 lT lT 06 06 06 ga ga ga 2 dB 66 07 fa 41 03 W lip 00 01 16 02 1 dD 40 03 fa 6 hA 906 ec 0 65 07 K m 301 1 he fa 32 10 cD ac eA 13 06 ac 3 dE fa 97 98 00 00 4 dD 12 06 pC kd 03 09 E A m rp prf ' 11 11 dD dD fa fa 28 14 o rh 10 gtC 18 m 76 bF 678 16 ds 1 O lip rC th 73 11 02 09 pB kd pA 00 01 om 09 09 pA kd 98 08 10 611 0 06 65 04 13 02 E PP 59 60 hB 762 763 764 07 07 ad 0 0 0 oR ph 52 53 55 56 57 19 19 19 19 19 E PP 75 16 48 49 50 51 19 19 19 19 E PP 97 08 P o ph 70 171 11 1 25 14 rA th PE pD kd A2 glt rV 56 th 07 38 03 dR na 05 06 07 08 09 06 06 06 06 06 36 15 IS P RE 94 095 0 00 62 463 04 0 93 00 A ck p pC as A ctp 59 60 61 04 04 04 36 03 03 qO 06 lp PE 73 674 16 1 47 19 PE 23 14 sA ly pE kd E PP E PP 95 08 67 11 RS G _P PE 69 70 71 16 16 16 43 15 42 43 44 19 19 19 41 19 E PP u 68 16 rI lp lbN g 67 16 0 pA 54 1 ls vrC 91 V 12 arg 65 11 57 04 32 33 lA 03 03 rm 10 02 58 04 91 00 09 02 98 99 00 01 rA 05 05 06 06 tc 31 03 07 08 02 02 88 89 090 0 00 00 24 25 26 10 10 10 97 05 93 08 o en sA m m 95 96 05 05 55 A2 04 ch e 54 04 cE hy 3 pL m m 29 30 03 03 92 08 qU lp 9 94 05 E PP dE fa 21 10 91 08 sB m m 27 03 cQ hy 28 03 05 02 cP hy A2 bH 17 rH 19 rib ri 14 lp 14 87 12 fd m din 13 14 14 14 rG na lp 92 05 4 pS m m 23 03 04 02 cD hy 24 25 26 03 03 03 03 02 83 00 RS G _P 48 49 50 07 07 07 PE 91 05 4 pL m m A cit C rib 84 ysD c 12 19 10 90 05 gA ud 11 pL m m 79 80 81 82 00 00 00 00 20 d 03 dc T U ln glm g 2 utT m 13 hA ec 97 17 49 04 2 ce m RS G _P PE pB op 35 15 G rib pC 59 11 op 34 15 e rp 96 17 s8 pk 33 15 pD 87 08 45 07 qT rsA lpp p rBB fp 58 11 op u fm 57 11 48 04 86 87 588 0 05 05 43 44 07 07 pth rplY 85 08 32 15 31 15 t fm pA 56 11 6 s1 pk op h ad 92 93 94 PE 17 17 17 03 14 04 14 79 12 55 11 rC se 11 12 10 10 t 4 om 115 52 11 gA ks 83 08 ' 57 15 IS gllp p pc 01 02 99 00 01 02 78 00 2 3 Q lyU 18 g 03 98 01 76 77 00 00 15 16 03 03 75 00 44 K 46 A1 04 sig 04 ufa 85 05 43 04 14 03 97 01 74 00 RS PG 39 40 41 _ 07 07 07 PE E PP 12 03 13 03 73 00 95 96 01 01 ' 71 P 72 00 RE 00 80 81 82 08 08 08 10 03 08 09 03 03 gly A2 ' L 36 k ap ad m sig 07 E PP PE 76 12 A3 79 05 oa m 07 03 aA sd 06 03 etS m Y c se rB rC lp lp 46 11 75 08 H 30 07 93 01 68 00 sA psd ps 67 00 06 10 E PP 92 01 RS G _P PE 35 04 RS G _P PE 84 17 d2 91 01 bB pa F R G arg arg arg l2 bb w 94 13 cr m fa 0 1 dE lB xy 34 04 90 01 65 00 77 05 72 12 28 07 76 05 03 10 cA fu 68 69 rA 71 12 12 lp 12 23 15 D ilv C 30 31 d 33 04 04 so 04 E PP 87 88 01 01 11 10 hA chA ec e 40 11 arc etK m arg 99 00 09 10 cs 25 07 pB f de 74 05 2 D2 A 870 oa oa 0 m m V 96 97 98 09 ala 09 09 64 12 34 11 J rim b glS 63 00 28 04 A p sp 73 05 26 thA 04 x 72 05 67 08 C2 E2 63 oa og oa 08 m m m lA ce 84 85 01 01 60 61 00 00 RS G _P 98 99 00 01 02 03 02 02 03 03 03 03 PE 81 01 59 00 D lN lX lE sN sH lF lR sE m lO rp rp rp rp rp rp rp rp rp rp 69 05 80 01 A oe m 13 07 68 05 lU ga G ats rO lp aB dn 17 518 519 520 1 15 1 1 arB 60 12 74 17 48 16 16 15 59 12 32 11 12 07 61 08 90 91 92 09 09 09 fa dB A ats 66 05 iC th 95 02 rT 567 0 ty 5 rA 910 pC dB 1 lp fa fu 71 17 PE 14 15 82 arA 13 c 30 11 88 09 fa dA 93 02 94 02 75 76 77 78 01 01 01 01 52 sF b sR lI 57 00 rp ss rp rp 00 20 21 iD 04 04 th 92 02 74 01 qM lp iA 513 ep 1 58 08 87 09 56 57 08 08 I fC m lT nR in rp rp ts 54 12 28 11 P RE RS G _P PE L aD de dK pp sc m 86 09 54 r 08 fa B2 oa m 66 67 17 17 nT na c pd C1 grc lp qL rK lp nA po C sJ lC lD lW lB sS lV sC lP m sQ rp rp rp rp rp rp rp rp rp rp rp 61 05 15 16 iG 04 04 th 90 02 72 01 49 00 87 88 89 02 02 02 71 01 47 048 00 0 59 60 05 05 76 13 rE lp iE th iE ub 83 09 26 11 ' B9 IS 08 15 75 13 25 11 fa 6 1 dD E PP 70 01 46 00 3 utT m PE 98 99 06 06 57 05 12 04 82 09 65 17 39 16 74 13 51 12 63 64 17 17 hC ep 81 09 07 15 73 13 bp 03 04 05 06 15 15 15 15 71 13 f zw 56 05 ' 06 51 16 08 IS 49 50 08 08 97 06 D en m H gln 84 02 1 ce m 44 045 0 00 67 168 01 0 42 043 00 0 96 06 3 sM cy 95 06 79 RS 09 G _P PE lp qS 96 97 18 18 02 15 37 16 36 16 01 15 69 70 13 13 10 61 IS 49 12 18 19 20 11 11 11 RS G _P PE 46 08 nG pk fa 5 dD C oC en m bp kA ac D1 lld 52 05 uS le 83 02 65 01 63 64 01 01 qE 92 06 pq dD fa 07 04 81 02 61 01 39 40 00 00 38 00 12 13 14 15 16 17 11 11 11 11 11 11 76 09 42 08 L lip rB uv 06 04 E PP 37 00 B 49 50 en 05 05 m 89 06 88 06 PE 36 00 45 46 47 12 12 12 65 13 qZ lp 87 18 32 16 13 11 11 dE fa 94 95 96 14 14 14 bU rs 87 06 46 47 05 05 40 841 08 0 2 PE s6 pk 34 dD fa RS G _P PE 58 01 33 34 00 00 86 06 cD ac 39 08 09 11 qR f tu dh 41 42 m 12 12 06 eB eA 11 xs xs utA m F2 E 3 4 A en 54 54 pit m 0 0 pE 882 883 884 885 pB 1 1 1 1 fb lp po fu 30 dD fa RS G _P PE sA 41 05 37 08 E PP 05 11 rA co 60 13 C 04 11 g su 36 08 7 12 hA dE ec fa 70 09 lp qQ 80 18 59 13 gB su 39 40 05 05 1 pS m m 77 02 81 sL sG 06 rp rp 38 05 1 pL m m dD fa 76 02 2 53 dE 01 fa bio A B tA tA ntB pn pn p 27 28 29 30 31 00 00 00 00 00 01 02 03 11 11 11 gA su V ctp 27 628 16 1 A3 gln 1 dD fa 26 16 89 14 58 13 lp 99 10 00 11 67 68 09 09 87 488 14 1 m fu 33 12 24 16 Z m he 57 13 32 12 96 10 62 09 31 12 2 oH ph 60 61 09 09 RS G _P PE 79 80 06 06 78 06 37 05 7 dE fa 01 04 74 02 26 00 PE 72 273 02 0 5 pS m m 2 lE ga PE 23 24 25 00 00 00 qK lp 63 64 65 66 09 09 09 09 RS RS G G T pT eT _P _P PE gluas ph PE 5 pL m m p pn 98 03 fa 50 01 49 01 95 96 97 03 03 03 d fa D2 48 01 21 022 0 00 uT le A4 74 hA5 06 ec h ec 30 12 nF pk 82 14 Y oe m dD cy 81 14 rp m sA de 59 09 94 03 RS G _P PE 31 sT 08 ly 2 31 05 93 03 20 00 47 01 69 02 19 00 68 02 46 01 29 30 08 08 44 45 17 17 dC cy qX lp A gly 54 13 27 12 58 09 R 80 ox m 14 53 13 lp 92 03 30 05 qP p pp rU na 45 01 ' 27 28 05 08 08 16 IS 26 08 d en sB cc 90 etZ m 03 aA co rH pu 25 08 28 05 rT pu 66 02 43 01 pb pA 2 bG 51 52 fa 13 13 25 12 24 12 RS G _P PE rN pu 1 sA de 69 06 25 26 sA 05 05 cc 76 14 15 16 16 16 22 12 90 PE PE 10 53 09 23 05 L m he 86 03 62 63 64 B2 02 02 02 fec 37 01 nB pk 38 39 40 01 01 01 12 bA 00 pa 36 01 10 11 00 00 iA 85 03 18 819 08 0 rp oB 19 05 60 02 e pp F ph 08 00 33 dD 85 10 16 12 44 13 84 10 rD uv B 18 05 clp iX 17 th 08 64 65 66 06 06 06 17 05 57 58 59 02 02 02 32 01 33 01 07 T T 00 ile ala F I2 B H A pA his his his im his his 24 25 17 17 his 64 465 466 14 1 1 F G D ure ure ure D his 62 63 14 14 E PP 16 05 47 48 09 09 82 83 10 10 C glg A 81 gre 10 i pg 1 dE fa 13 C2 A2 08 se ys s c D ats 15 05 C ab p B 3 4 m 1 1 he 05 05 10 11 08 08 rM pu b co Q 2 pC fb 30 01 rA gy 80 381 pA 383 0 um 03 0 77 378 ec 03 0 s co D nir 28 01 rB gy 56 57 58 59 60 61 62 06 06 06 06 06 06 06 G s cy 76 03 B nir 27 01 cF 004 re 0 2 08 09 gA 11 lP fo 12 12 ta 12 41 09 07 08 55 06 C m he 74 75 03 03 26 01 aN 31 32 333 334 335 ysM 337 urI 339 hA 41 m rp 13 13 13 1 1 1 c 1 1 6 dD fa U lip 40 09 Y s cp 54 06 A m he 73 03 08 05 dn 49 50 sp 02 02 h RS G _P pA pe PE dn 8 2,250,000 2,100,000 1,950,000 1,800,000 1,650,000 1,500,000 1,350,000 1,200,000 1,050,000 900,000 750,000 600,000 450,000 300,000 150,000 4,350,001 4,200,001 4,050,001 3,900,001 3,750,001 3,600,001 3,450,001 3,300,001 3,150,001 3,000,001 2,850,001 2,700,001 2,550,001 2,400,001 2,250,001 lE fo PE rA ty 26 32 hD ad B 30 25 07 24 82 22 A E 74 at6 PP 38 es 08 20 28 32 85 34 76 38 X pro 36 dE 77 38 fa 60 37 F lip 86 34 3 85 22 11 24 86 22 Z fts 11 20 09 10 20 20 90 30 91 30 79 38 hA 92 93 34 34 80 38 64 37 81 38 65 37 19 20 PE P 36 36 qH lp 18 36 91 34 B K PE lp 82 38 26 36 96 34 96 30 83 38 PE 27 36 97 34 39 32 84 38 96 97 22 22 urF m 22 24 98 22 urE m 24 20 gK 24 24 C ald 05 06 27 27 04 27 a pc 59 28 09 27 a 29 36 4 ce m 85 38 73 37 C2 his pp 98 34 86 38 02 xD 35 fd 88 38 fa 26 dE 27 35 36 78 37 fa dE 89 90 91 PE P 38 38 38 77 37 2 lB rm rU se 76 37 31 32 33 36 36 36 87 38 21 hA E ec lip 30 36 qB lp 61 62 63 64 33 33 33 33 B prf rA fa 17 65 33 PE 79 37 34 15 IS o 39 36 sp oU m pE gc 40 36 94 38 68 33 43 36 95 38 iB ep 54 32 87 37 79 29 pA 99 38 X hfl 66 25 X 2 dA 02 39 91 37 03 39 92 37 10 35 77 33 27 31 nA 51 36 78 33 69 25 84 28 31 27 L 8 08 39 09 39 bA em 54 55 56 57 58 36 36 36 36 36 id B trb lD rm 31 31 lS f ts sB rp 60 36 10 39 bB em 93 29 91 28 H 19 62 36 71 32 pC dp 98 37 lM cw 57 15 IS B2 C trx trx pD dp 19 hA 517 3 ec S 55 24 his 4 16 39 cD ac pB dp rA pa pA dp 19 35 73 32 rC qc 50 20 lA re X clp rB pa 20 35 1 00 30 0 gid 392 s t hF fd ap 58 24 E PP 68 36 23 35 94 33 W oe m cF 49 50 751 27 27 2 se P2 lpP clp c 37 23 25 35 aA 71 36 oE nu 02 38 72 36 sA id 1 pC fb 73 nth 36 28 35 yA ph pA da PE pA fb 31 35 05 38 A pfk 12 29 B c da bH co O glb t dg 80 36 E PP 06 07 38 38 79 36 01 34 tC ga 13 30 91 32 08 38 4 hiB w E PP 02 34 t la ' nA glf po 34 35 03 34 oL nu B G pir 83 36 36 35 bK C plc 12 22 bM co 76 24 plc B B p pe bL co ffh 95 32 oN nu 17 30 p cs V 84 36 pro 37 35 RS G _P PE 85 36 ufa A2 D gln E PP B gln 23 30 81 10 IS t am 13 814 815 816 38 3 3 3 89 36 1 2 2 ltp 354 354 2 aB gu 17 38 90 36 29 dE fa 28 18 38 91 36 dE fa 19 38 c sm 78 79 27 27 E PP 93 36 94 36 21 38 fa 95 36 p K glp 49 35 sI gp dA x hp alr 8 pL m m 97 36 1 pA pa 99 36 53 35 E PP iA am 74 31 00 37 01 702 3 37 55 35 28 34 36 30 s2 pk 03 37 p up 6 dA fa B m pm hA gs 57 35 23 26 05 37 E PP 91 27 92 27 27 26 fa dD 23 30 dE fa dB ga as k as 29 38 cB 32 38 31 38 aQ dn fa 33 dE S glm hA sd 98 27 sC pp 33 38 se 12 37 oA nh c rS 37 38 34 34 B sp etU m 2 69 35 35 38 36 38 14 cR 16 37 re 37 Q ob 67 68 35 35 39 440 rsA 34 3 m 27 33 E nrd J sig 94 31 37 eA 38 ph 18 37 17 37 70 35 71 572 35 3 sD pp 29 33 95 31 Q glp 1 43 38 X aZ dn C gI su 97 31 22 37 75 35 13 28 48 34 c ac 48 22 10 61 IS B2 ars dA 47 so 38 26 37 79 35 50 34 35 33 34 33 99 31 37 38 33 33 22 dE B C drr drr fa cy sS 15 21 54 34 d1 81 82 83 35 35 35 etC m 19 28 B lig 28 37 qE lp 54 38 G 49 50 51 s en 38 38 38 hn m 27 37 11 T 25 his K lip 12 25 81 10 IS E2 trp 13 25 54 55 56 257 22 22 22 2 52 53 22 22 87 23 58 22 tA etA 342 m 3 02 32 cs ra dA 55 38 29 37 87 588 35 3 56 57 38 38 86 35 ala U 63 34 D glt C lig RS G _P PE 30 37 pS lp ly sU A nir E PP Z oe m 07 32 33 37 B glt 32 37 91 35 34 37 B2 ilv 37 37 62 38 63 38 E E PP PP oA bp 72 34 C clp 71 34 ats ly fa sA s 41 42 37 37 cy 1 nK pk s pk PE 17 32 43 37 68 38 sT 69 38 45 PE 37 44 37 21 32 S vir 2 dD 2 83 30 80 34 22 H 32 sig fa 51 37 70 38 71 38 49 50 rX 52 53 37 37 se 37 37 47 48 37 37 05 lK lX lP 36 fo fo fo 79 34 04 36 E PP 20 32 81 30 s dx 02 24 77 22 rD py 40 sA 842 28 nu 2 01 24 76 22 pL lp 81 26 bI su 75 22 5 s1 pk fB in 1 hiB w 03 36 E PP cy 74 22 15 hA 680 2 ec sW 18 32 37 fA 28 rb E m he cy sS 599 600 anD anC 3 3 p p tP kg 46 33 64 865 866 867 38 3 3 3 40 37 r2 ls 10 61 IS 69 22 RS G _P PE 68 22 pN 71 72 73 lp 22 22 22 33 34 35 36 37 21 21 21 21 21 32 21 Y' m he sQ F din cy tD ntC 216 en 3 e 79 30 78 30 74 75 34 34 13 32 F pX lp RS G _P PE 76 30 67 22 2 sS 75 76 26 26 74 26 95 23 66 22 cy pE gpA u ug 44 29 12 32 pB ug 43 29 33 15 IS lE rh 75 30 60 61 38 38 35 736 37 3 RS G _P PE 94 35 E hp m RS G _P PE 10 32 09 32 74 30 pC ug 16 hA ec 73 26 22 cpS 25 a 72 26 20 25 p bc tB gg 65 22 29 21 sP 128 2 an D rib 7 pL m m 72 73 30 30 3 lB rm 92 qF 35 lp P RE 70 26 64 22 sH 93 cy 23 PE 63 22 RS G _P PE 25 21 28 29 30 28 28 28 08 32 69 70 71 30 30 30 fa 8 27 28 2 dD 26 28 lC 66 67 lB rm rm 34 34 05 32 25 28 ' 81 10 IS A m pg 24 28 V 04 lip 32 utY m 16 17 25 25 89 90 23 23 61 62 22 22 ' 62 63 64 65 66 lpX 668 669 2 2 c 26 26 26 26 26 rE 66 67 em 30 30 23 28 64 30 as m 20 21 22 28 28 28 etH m 2 hE 260 2 ad N m he E PP 14 515 25 2 G isI 20 21 his h 19 21 A A lQ D K M J fA tru rp rpo rps rps rps rpm in 5 pA pa 18 28 01 32 ic 10 25 btA m 51 22 18 21 7 pK 1 lp 21 51 52 53 54 655 656 657 658 659 660 661 26 26 26 26 2 2 2 2 2 2 2 09 25 50 22 51 52 53 34 34 34 S trp 50 26 16 17 28 28 08 25 btB m 00 32 48 38 60 30 A drr 10 61 IS 14 21 D1 glp 13 21 06 507 2 25 D6 12 21 14 15 28 28 24 25 37 37 44 45 38 38 rV 23 se 37 5 49 34 33 33 2 rD uv 59 30 sE pp 77 35 gA na nM pk 3 dD btC m sB fa ka B 11 1 prc 2 T T 5 6 7 8 9 glycys alU264 264 64 64 264 v 2 2 oB coA s A 44 lT 26 va sc btD m 57 058 3 30 12 28 04 16 IS 74 35 30 33 96 31 1 ars cD 55 inP 30 d 47 34 10 28 prc M A bD cp as k a E PE PP ac fa 41 42 26 26 1 ' 55 08 09 15 11 28 28 IS 28 34 dE 20 37 fa 39 40 B 38 38 bfr 19 37 cA 39 40 26 26 ac 42 22 10 61 IS 05 06 21 21 I H 54 nrdnrd 30 05 06 07 28 28 28 6 sI lM 44 45 rp rp 34 34 344 47 15 IS 93 31 50 30 btE m E 9 E 9 cit 249 dE1 fa e ac 03 04 21 21 02 21 A 35 36 ed 638 d 26 26 2 02 03 04 28 28 28 25 26 33 33 49 30 01 28 92 31 hA pd RS G _P PE hB pd 20 21 E3 hA C3 10 33 33 oa gp oa 61 IS m m hB sd 91 31 03 16 IS 00 28 32 33 26 26 G 46 47 30 30 nrd C dh a a 97 27 99 27 hC pd lZ he lV pE 39 40 va ah 22 22 37 22 btF m 00 21 bD co 91 92 93 94 24 24 24 24 29 630 631 2 26 2 90 31 fe 30 38 uA le fa 32 dE hC hD sd sd 31 dE fa pV lp 28 26 PE 32 33 A 35 22 22 ptp 22 RS G _P PE 76 tH tG 23 mb mb 88 89 31 31 34 35 34 34 28 38 37 15 IS fa 3 dD d cd 10 61 IS oA de 33 34 27 38 d d ad 10 61 IS D cta 82 83 84 85 186 187 3 3 31 31 31 31 2 rB se sB pp B 94 95 tru 27 27 25 626 26 2 A 97 20 75 23 bC co 96 20 hrc 30 22 RS G _P PE J2 72 23 dna 29 22 93 94 95 20 20 20 28 22 PE 02 16 IS 24 26 89 24 06 07 37 37 59 35 31 34 52 15 IS 12 33 80 81 31 31 11 33 79 31 30 34 1 ltp 22 26 88 24 40 E 15 PP IS 10 33 pB rn 27 22 38 17 040 041 30 3 3 hA ec A s pp 21 dE fa 37 30 R sir 26 22 lY he 67 oH 69 370 23 ph 23 2 19 20 21 26 26 26 18 26 77 78 31 31 35 30 87 27 17 26 16 26 91 20 B n pa 90 20 5 6 x be 236 236 24 22 pE pe RS G _P PE 32 15 IS oD S lip de 27 34 xB fd E PP iB am 75 31 34 30 26 dD fa pU sO ribF lp rp 33 30 2 iA am nJ pk 14 W hA arg ec 23 22 RS G _P PE Q lip sA 29 te 29 32 30 24 34 98 36 20 hA 551 552 3 3 ec 04 33 72 73 31 31 31 30 rS th 84 24 60 61 362 23 23 2 A2 gln 56 15 IS 84 85 86 87 20 20 20 20 58 rB 23 fu 83 20 c 6 7 rn 292 292 lp 30 30 70 31 g fp pR pe 11 sA 13 26 pg 26 83 24 E gln S gly 82 20 I 1 2 rim 342 342 D2 glp gc 48 35 22 38 47 35 ES gro 5 dA 1 00 Y1 33 ho p EL gro 3 hiB w 67 31 A fix 23 29 81 27 10 26 69 31 B fix ald B2 pls E PP A1 gln 81 20 pJ lp 09 26 68 31 26 027 30 3 2 pA pa R2 ox m 45 35 B ats 5 6 R3 16 16 3 3 ox m 25 30 77 27 xH pd 81 24 10 61 IS 10 61 IIS E PP 19 22 79 20 54 55 23 23 78 479 480 24 2 2 D 15 13 34 sig 34 qC lp 12 34 i ne 24 30 Y fts 78 20 16 pB pA li li 22 76 27 75 27 06 26 77 24 E PP B 76 77 20 20 c su 71 72 pB 74 27 27 da 27 61 162 163 31 3 3 86 87 88 36 36 36 PE 3 60 31 aB gu r lh 75 20 03 604 B2 2 tes 26 02 26 A 74 20 plc hD ep 73 20 19 PE PE PE P P 30 PE E PP oD ch E PP 17 29 E PP 06 07 08 34 34 34 94 32 lp qA 5 67 bG 27 95 96 97 98 99 00 peE 25 25 25 25 25 26 s fa oM nu 15 30 C co sig 74 75 24 24 72 73 24 24 65 27 04 05 34 34 ald 15 29 35 35 A lig 92 32 nI pk gc vT C bla 46 47 48 23 23 23 E ilv 67 20 45 23 71 24 09 22 bI co vB vA vC ru ru ru RS G _P PE 68 469 2 24 aG dn oH uoI uoJ uoK n n nu n tA 13 29 ga F W 88 89 sig rsb 32 32 77 78 36 36 30 35 75 76 36 36 29 35 00 34 3 oG nu tB ga cA ac 08 30 99 33 oF nu 07 30 9 dD fa pe pD pQ 42 lp 23 05 22 bG co S 06 obT ob c 22 c 63 20 M S' 57 58 59 60 61 62 A yA 54 27 hsd hsd 27 27 27 27 27 27 dfr th bT ga nT as 04 22 03 22 bN co i 64 rp 466 24 2 hK cb D M 08 sP 10 trm rim 29 rp 29 pZ lp lp 88 25 81 82 seA 284 32 32 s 3 oD nu 05 30 26 527 35 3 gu 4,411,529 32 dD fa 69 hE 36 ep 24 35 95 33 5 cD ac oA oB oC nu nu nu B ilv 04 30 pW 52 27 cD se 9 pL m m B n as 61 20 0 06 59 2 20 V P gly lip tig roU p 2 2 G 2 sR sN m mB rp rp rp rp 54 20 01 hB pB lS 29 rn le rp 85 25 59 24 78 irA 32 b 77 32 43 31 53 20 97 S3 199 taC 21 mp 2 c m 52 20 36 23 K fts 21 22 pA H 39 39 rn rpm 22 35 ac 3 s1 pk 21 35 1 nH iu rE urK pu p c E ys rB qc N C ilv ilv 47 27 42 31 B4 d fa pY lp aA cm 25 dE fa c K ys rA qc 51 20 98 hD 28 fd 3 E2 d fa 98 29 97 28 rA ac 4 E2 d fa 97 29 96 28 iB pp 56 24 33 23 E cta 49 20 43 ag 45 A3 27 d_ 27 gs k p 35 81 25 42 27 qD lp 18 35 A pfl D trp ez m RS G _P PE B viu 89 33 72 32 37 31 se rA B lin rC xe 40 27 91 21 31 23 54 24 pP lp RS G _P PE 35 dE fa C ctp 93 28 uB le E PP 94 29 E PP dD fa 78 25 38 39 27 27 M 12 sig 39 ats 61 36 60 15 IS 51 24 1 rK na 90 21 50 452 453 24 2 2 77 25 E PP PE 89 21 86 87 33 33 68 69 32 32 33 34 31 31 84 85 33 33 RS G _P PE sB 67 32 32 31 U U ltS glu gln g 91 29 49 24 76 25 cA re 88 21 2 s1 pk 27 23 73 74 75 25 25 25 va 26 23 15 dD fa 35 cX 27 re iC am 34 27 90 29 87 28 89 29 bb w 10 61 IS lC pS as fo 80 81 tB ly 33 33 1 dD fa RS RS G G _P _P PE PE RS G _P PE 30 31 2 lA rm 29 31 86 28 39 15 IS 33 27 71 25 uC le 85 28 32 27 63 32 79 33 28 31 kA 446 nd 2 70 25 25 23 85 86 21 21 24 23 82 83 84 21 21 21 2 1 23 cD cD 23 ro ro 81 21 pB leuD hu 1 utT m rH 30 27 e rn cE ro 68 25 py 62 32 pc bC em PE frr 26 31 RS G _P PE 76 33 61 32 E PP 04 05 06 39 39 39 49 36 2 58 32 whiB 59 32 p pk 19 23 29 27 tA dc 67 25 u 47 20 79 80 21 21 pI lp C sp G aro 8 A ia 272 m 24 31 83 29 ' 18 18 hA hA iD ec ec am 47 pA 36 cs ilv u E sp A m lU rp rp pF 22 23 31 31 gp A m pm 21 31 lA dd g da ob u A sp 77 21 58 15 IS T lip 3 9 0 sA erT t5 7 8 m mp 28 28 cd 0 6 pt7 7 m 28 B2 ots 90 37 00 01 39 39 88 89 37 37 to 71 33 A an m 56 32 80 29 20 dE 3 E sA eC oa 20 cy ss m 31 78 29 RS G _P PE 45 36 2 aE 86 37 dn 53 32 B oe m iL th 38 15 IS 74 28 B pro nL pk 42 cA 44 20 pn 20 15 23 75 21 41 20 74 21 14 23 fa 38 24 65 25 22 23 27 27 37 24 3 71 72 pt8 28 28 m 21 27 Q gln sK rb 13 23 2 sA id 39 040 20 2 10 11 12 23 23 23 72 21 38 20 96 897 898 38 3 3 85 37 fic 642 rU 644 th 3 3 83 37 53 15 IS RS G _P PE 69 33 49 bB bA 52 32 ru ru 32 RS G _P PE 15 31 g un 70 28 81 10 IS 74 975 29 2 69 28 xA le 61 62 63 25 25 25 35 24 B C D oa oa oa 13 14 m m m 31 31 hH sa aA cG re 80 81 bE 37 37 rf 36 637 638 3 3 36 dD 08 31 72 29 67 28 60 25 9 etV 0 m 23 08 23 67 68 69 21 21 21 37 20 70 pM 21 lp 34 35 36 20 20 20 10 61 IS 33 20 17 18 19 27 27 27 16 27 59 25 15 27 65 66 28 28 14 27 57 58 25 25 E E 32 33 34 PP P 24 24 24 k trA tm m 71 29 64 28 07 31 trB m 56 25 32 20 65 166 2 21 X p hs 06 07 23 23 64 21 30 20 pC pD ah ah 05 23 13 27 pB pb B pfk 63 28 N lip 12 27 fp S ala A pro 62 28 eR id 68 69 29 29 m ap 54 25 26 24 28 20 03 04 23 23 01 02 23 23 B 60 33 42 43 32 32 57 58 59 33 33 33 41 32 00 501 35 3 51 52 53 55 lD 33 33 33 33 fo 54 33 cA se 27 20 RS G _P PE sig 53 25 25 24 00 23 A4 gln 08 27 07 27 49 50 51 E 25 25 25 aro tB 66 kd 29 57 28 A sig 23 24 58 15 IS G htp 61 21 26 20 59 60 21 21 25 20 98 ssr 099 pB sX sE 103 104 ft 3 ft 3 30 3 sm rU pu T nic pp pA pB 545 546 547 548 lp 2 2 2 2 lp 70 771rgU rT 37 3 a se 68 69 37 37 J es m rN lp t hp 67 37 qG 66 37 94 34 fB 237 238 ke 3 3 94 30 95 30 62 29 63 29 rA go 61 29 95 22 urX m 18 19 20 21 24 24 24 24 hB su 00 27 42 25 17 24 99 26 98 26 41 25 16 24 94 22 urD m 20 21 22 23 20 20 20 20 18 19 20 20 RS G _P 54 PE 28 59 60 29 29 93 30 F aro 15 24 92 93 22 22 W fts 16 017 20 2 t 96 du 26 aro 95 26 aro 14 24 35 32 E PP 92 30 58 29 52 28 93 94 26 26 31 dS 33 34 32 pv 32 32 ep 62 37 51 28 56 957 29 2 50 28 A 88 89 34 34 ots 30 32 55 29 bA co 90 26 D aro 36 25 urG m 15 20 B 88 h pO e 22 cd lp ss urC m 13 24 sT yjc E 07 16 IS 13 14 20 20 A B trk trk rp Q fts 12 20 Q 32 sB fp 25 nu e pep 10 24 78 38 13 dD fa 54 29 co sA de 53 29 bB 09 24 89 26 i ad PE 83 M 22 lip 12 13 14 15 616 36 36 36 36 3 11 36 sA cp 08 49 16 33 IS 2 88 30 52 29 sG cy ' 61 48 15 33 IS aro 87 30 51 29 A efp 86 87 688 26 26 2 29 25 06 24 05 24 fd xA 44 31 146 147 148 fiH y 21 ag 2 2 2 w B pit 43 21 ots 55 Z W roV 37 pro pro p H fts B ars 82 83 34 34 25 32 85 30 S rr m pro 29 dD fa 81 34 24 32 R lip 49 29 A ars 43 44 28 28 83 26 25 25 pA 26 27 25 25 pR lp le 80 22 10 61 IS 42 21 uU le 78 79 22 22 05 20 2 40 pE 21 da 04 20 8 Nature © Macmillan Publishers Ltd 1998 4,350,000 4,200,000 4,050,000 3,900,000 3,750,000 3,600,000 3,450,000 3,300,000 3,150,000 3,000,000 2,850,000 2,700,000 2,550,000 2,400,000 Rv0823c - Rv0827c - Rv0890c - Rv0891c Rv0894 Rv1019 - Rv1049 - Rv1129c - Rv1151c Rv1152 - Rv1167c Rv1219c Rv1255c - Rv1332 Rv1353c - Rv1358 - Rv1359 Rv1395 - Rv1404 - Rv1423 Rv1460 Rv1474c - Rv1534 - Rv1556 Rv1674c Rv1675c Rv1719 - Rv1773c - Rv1776c Rv1816 Rv1846c Rv1931c - Rv1956 Rv1963c Rv1985c - Rv1990c Rv1994c - Rv2017 - Rv2021c Rv2034 - Rv2175c Rv2250c Rv2258c Rv2282c - Rv2308 Rv2324 - Rv2358 - Rv2488c - Rv2506 - Rv2621c Rv2640c - Rv2642 - Rv2669 Rv2745c Rv2779c - Rv2887 - Rv2912c - Rv2989 - Rv3050c Rv3055 Rv3058c Rv3060c - Rv3066 Rv3095 Rv3124 - family) transcriptional regulator (NifR3/Smm1 family) transcriptional regulator (ArsR family) transcriptional regulator (LuxR/UhpA family) putative transcriptional regulator putative transcriptional regulator transcriptional regulator (TetR/AcrR family) transcriptional regulator (MarR family) transcriptional regulator (PbsX/Xre family) putative transcriptional regulator transcriptional regulator (GntR family) putative transcriptional regulator putative transcriptional regulator transcriptional regulator (TetR/AcrR family) putative transcriptional regulator transcriptional regulator (TetR/AcrR family) transcriptional regulator (LuxR/UhpA family) putative transcriptional regulator transcriptional regulator (AraC/XylS family) transcriptional regulator (MarR family) putative transcriptional regulator putative transcriptional regulator transcriptional regulator (TetR/AcrR family) transcriptional regulator (TetR/AcrR family) putative transcriptional regulator putative transcriptional regulator putative transcriptional regulator transcriptional regulator (IclR family) transcriptional regulator (IclR family) putative transcriptional regulator putative transcriptional regulator putative transcriptional regulator transcriptional regulator (AraC/XylS family) putative transcriptional regulator putative transcriptional regulator transcriptional regulator (LysR family) putative transcriptional regulator transcriptional regulator (MerR family) putative transcriptional regulator (PbsX/Xre family) putative transcriptional regulator transcriptional regulator (ArsR family) putative transcriptional regulator putative transcriptional regulator putative transcriptional regulator transcriptional regulator (LysR family) putative transcriptional regulator transcriptional regulator (Lrp/AsnC family) transcriptional regulator (ArsR family) transcriptional regulator (LuxR/UhpA family) transcriptional regulator (TetR/AcrR family) putative transcriptional regulator transcriptional regulator (ArsR family) transcriptional regulator (ArsR family) putative transcriptional regulator putative transcriptional regulator transcriptional regulator (Lrp/AsnC family) transcriptional regulator (MarR family) transcriptional regulator (TetR/AcrR family) transcriptional regulator (IclR family) putative transcriptional regulator putative transcriptional regulator putative transcriptional regulator transcriptional regulator (GntR family) putative transcriptional regulator putative transcriptional regulator transcriptional regulator (AfsR/DndI/RedD family) Rv3160c Rv3167c Rv3173c - Rv3183 Rv3208 - Rv3249c - Rv3291c - Rv3295 - Rv3334 - Rv3405c Rv3522 Rv3557c - Rv3574 - Rv3575c - Rv3583c Rv3676 - Rv3678c - Rv3736 - Rv3744 - Rv3830c - Rv3833 - Rv3840 Rv3855 - putative transcriptional regulator putative transcriptional regulator transcriptional regulator (TetR/AcrR family) putative transcriptional regulator transcriptional regulator (TetR/AcrR family) transcriptional regulator (TetR/AcrR family) transcriptional regulator (Lrp/AsnC family) transcriptional regulator (TetR/AcrR family) transcriptional regulator (MerR family) putative transcriptional regulator putative transcriptional regulator transcriptional regulator (TetR/AcrR family) transcriptional regulator (TetR/AcrR family) transcriptional regulator (LacI family) putative transcriptional regulator transcriptional regulator (Crp/Fnr family) transcriptional regulator (LysR family) transcriptional regulator (AraC/XylS family) transcriptional regulator (ArsR family) transcriptional regulator (TetR/AcrR family) transcriptional regulator (AraC/XylS family) putative transcriptional regulator putative transcriptional regulator 2. Two component systems Rv1028c kdpD sensor histidine kinase Rv1027c kdpE two-component response regulator Rv3246c mtrA two-component response regulator Rv3245c mtrB sensor histidine kinase Rv0844c narL two-component response regulator Rv0757 phoP two-component response regulator Rv0758 phoR sensor histidine kinase Rv0491 regX3 two-component response regulator Rv0490 senX3 sensor histidine kinase Rv0602c tcrA two-component response regulator Rv0260c two-component response regulator Rv0600c sensor histidine kinase Rv0601c sensor histidine kinase Rv0818 two-component response regulator Rv0845 sensor histidine kinase Rv0902c sensor histidine kinase Rv0903c two-component response regulator Rv0981 two-component response regulator Rv0982 sensor histidine kinase Rv1032c sensor histidine kinase Rv1033c two-component response regulator Rv1626 two-component response regulator Rv2027c sensor histidine kinase Rv2884 two-component response regulator Rv3132c sensor histidine kinase Rv3133c two-component response regulator Rv3143 putative sensory transduction protein Rv3220c sensor histidine kinase Rv3764c sensor histidine kinase Rv3765c two-component response regulator 3. Serine-threonine protein kinases and phosphoprotein phosphatases Rv0015c pknA serine-threonine protein kinase Rv0014c pknB serine-threonine protein kinase Rv0931c pknD serine-threonine protein kinase Rv1743 pknE serine-threonine protein kinase Rv1746 pknF serine-threonine protein kinase Rv0410c pknG serine-threonine protein kinase Rv1266c pknH serine-threonine protein kinase Rv2914c pknI serine-threonine protein kinase Rv2088 pknJ serine-threonine protein kinase Rv3080c pknK serine-threonine protein kinase Rv2176 pknL serine-threonine protein kinase, Nature © Macmillan Publishers Ltd 1998 Rv0018c ppp Rv2234 ptpA Rv0153c - truncated putative phosphoprotein phosphatase low molecular weight protein-tyrosine-phosphatase putative protein-tyrosine-phosphatase II. Macromolecule metabolism A. Synthesis and modification of macromolecules 1. Ribosomal protein synthesis and modification Rv3420c rimI ribosomal protein S18 acetyl transferase Rv0995 rimJ acetylation of 30S S5 subunit Rv0641 rplA 50S ribosomal protein L1 Rv0704 rplB 50S ribosomal protein L2 Rv0701 rplC 50S ribosomal protein L3 Rv0702 rplD 50S ribosomal protein L4 Rv0716 rplE 50S ribosomal protein L5 Rv0719 rplF 50S ribosomal protein L6 Rv0056 rplI 50S ribosomal protein L9 Rv0651 rplJ 50S ribosomal protein L10 Rv0640 rplK 50S ribosomal protein L11 Rv0652 rplL 50S ribosomal protein L7/L12 Rv3443c rplM 50S ribosomal protein L13 Rv0714 rplN 50S ribosomal protein L14 Rv0723 rplO 50S ribosomal protein L15 Rv0708 rplP 50S ribosomal protein L16 Rv3456c rplQ 50S ribosomal protein L17 Rv0720 rplR 50S ribosomal protein L18 Rv2904c rplS 50S ribosomal protein L19 Rv1643 rplT 50S ribosomal protein L20 Rv2442c rplU 50S ribosomal protein L21 Rv0706 rplV 50S ribosomal protein L22 Rv0703 rplW 50S ribosomal protein L23 Rv0715 rplX 50S ribosomal protein L24 Rv1015c rplY 50S ribosomal protein L25 Rv2441c rpmA 50S ribosomal protein L27 Rv0105c rpmB 50S ribosomal protein L28 Rv2058c rpmB2 50S ribosomal protein L28 Rv0709 rpmC 50S ribosomal protein L29 Rv0722 rpmD 50S ribosomal protein L30 Rv1298 rpmE 50S ribosomal protein L31 Rv2057c rpmG 50S ribosomal protein L33 Rv3924c rpmH 50S ribosomal protein L34 Rv1642 rpmI 50S ribosomal protein L35 Rv3461c rpmJ 50S ribosomal protein L36 Rv1630 rpsA 30S ribosomal protein S1 Rv2890c rpsB 30S ribosomal protein S2 Rv0707 rpsC 30S ribosomal protein S3 Rv3458c rpsD 30S ribosomal protein S4 Rv0721 rpsE 30S ribosomal protein S5 Rv0053 rpsF 30S ribosomal protein S6 Rv0683 rpsG 30S ribosomal protein S7 Rv0718 rpsH 30S ribosomal protein S8 Rv3442c rpsI 30S ribosomal protein S9 Rv0700 rpsJ 30S ribosomal protein S10 Rv3459c rpsK 30S ribosomal protein S11 Rv0682 rpsL 30S ribosomal protein S12 Rv3460c rpsM 30S ribosomal protein S13 Rv0717 rpsN 30S ribosomal protein S14 Rv2056c rpsN2 30S ribosomal protein S14 Rv2785c rpsO 30S ribosomal protein S15 Rv2909c rpsP 30S ribosomal protein S16 Rv0710 rpsQ 30S ribosomal protein S17 Rv0055 rpsR 30S ribosomal protein S18 Rv2055c rpsR2 30S ribosomal protein S18 Rv0705 rpsS 30S ribosomal protein S19 Rv2412 rpsT 30S ribosomal protein S20 Rv3241c member of S30AE ribosomal protein family 2. Ribosome modification and maturation Rv1010 ksgA 16S rRNA dimethyltransferase Rv2838c rbfA ribosome-binding factor A Rv2907c rimM 16S rRNA processing protein 3. Aminoacyl tRNA synthases and their modification Rv2555c alaS alanyl-tRNA synthase Rv1292 argS arginyl-tRNA synthase Rv2572c aspS aspartyl-tRNA synthase Rv3580c cysS cysteinyl-tRNA synthase Rv2130c cysS2 cysteinyl-tRNA synthase Rv1406 fmt methionyl-tRNA formyltransferase Rv3011c gatA glu-tRNA-gln amidotransferase, subunit B Rv3009c gatB glu-tRNA-gln amidotransferase, subunit A Rv3012c gatC glu-tRNA-gln amidotransferase, subunit C Rv2992c gltS glutamyl-tRNA synthase Rv2357c glyS glycyl-tRNA synthase Rv2580c hisS histidyl-tRNA synthase Rv1536 ileS isoleucyl-tRNA synthase Rv0041 leuS leucyl-tRNA synthase Rv3598c lysS lysyl-tRNA synthase Rv1640c lysX C-term lysyl-tRNA synthase Rv1007c metS methionyl-tRNA synthase Rv1649 pheS phenylalanyl-tRNA synthase α subunit 8 Rv1650 Rv2845c Rv3834c Rv2614c Rv2906c Rv3336c Rv1689 Rv2448c pheT proS serS thrS trmD trpS tyrS valS 4. Nucleoproteins Rv1407 fmu Rv3852 hns Rv2986c hupB Rv1388 mIHF phenylalanyl-tRNA synthase β subunit prolyl-tRNA synthase seryl-tRNA synthase threonyl-tRNA synthase tRNA (guanine-N1)-methyltransferase tryptophanyl tRNA synthase tyrosyl-tRNA synthase valyl-tRNA synthase similar to Fmu protein HU-histone protein DNA-binding protein II integration host factor 5. DNA replication, repair, recombination and restriction/modification Rv1317c alkA DNA-3-methyladenine glycosidase II Rv2836c dinF DNA-damage-inducible protein F Rv1329c dinG probable ATP-dependent helicase Rv3056 dinP DNA-damage-inducible protein Rv1537 dinX probable DNA-damage-inducible protein Rv0001 dnaA chromosomal replication initiator protein Rv0058 dnaB DNA helicase (contains intein) Rv1547 dnaE1 DNA polymerase III, α subunit Rv3370c dnaE2 DNA polymerase III α chain Rv2343c dnaG DNA primase Rv0002 dnaN DNA polymerase III, β subunit Rv3711c dnaQ DNA polymerase III e chain Rv3721c dnaZX DNA polymerase III, γ (dnaZ) and τ (dnaX) Rv2924c fpg formamidopyrimidine-DNA glycosylase Rv0006 gyrA DNA gyrase subunit A Rv0005 gyrB DNA gyrase subunit B Rv2092c helY probable helicase, Ski2 subfamily Rv2101 helZ probable helicase, Snf2/Rad54 family Rv2756c hsdM type I restriction/modification system DNA methylase Rv2755c hsdS' type I restriction/modification system specificity determinant Rv3296 lhr ATP-dependent helicase Rv3014c ligA DNA ligase Rv3062 ligB DNA ligase Rv3731 ligC probable DNA ligase Rv1020 mfd transcription-repair coupling factor Rv2528c mrr restriction system protein Rv2985 mutT1 MutT homologue Rv1160 mutT2 MutT homologue Rv0413 mutT3 MutT homologue Rv3589 mutY probable DNA glycosylase Rv3297 nei probable endonuclease VIII Rv3674c nth probable endonuclease III Rv1316c ogt methylated-DNA-protein-cysteine methyltransferase Rv1629 polA DNA polymerase I Rv1402 priA putative primosomal protein n' (replication factor Y) Rv3585 radA probable DNA repair RadA homologue Rv2737c recA recombinase (contains intein) Rv0630c recB exodeoxyribonuclease V Rv0631c recC exodeoxyribonuclease V Rv0629c recD exodeoxyribonuclease V Rv0003 recF DNA replication and SOS induction Rv2973c recG ATP-dependent DNA helicase Rv1696 recN recombination and DNA repair Rv3715c recR RecBC-Independent process of DNA repair Rv2736c recX regulatory protein for RecA Rv2593c ruvA Holliday junction binding protein, DNA helicase Rv2592c ruvB Holliday junction binding protein Rv2594c ruvC Holliday junction resolvase, endodeoxyribonuclease Rv0054 ssb single strand binding protein Rv1210 tagA DNA-3-methyladenine glycosidase I Rv3646c topA DNA topoisomerase Rv2976c ung uracil-DNA glycosylase Rv1638 uvrA excinuclease ABC subunit A Rv1633 uvrB excinuclease ABC subunit B Rv1420 uvrC excinuclease ABC subunit C Rv0949 uvrD DNA-dependent ATPase I and helicase II Rv3198c uvrD2 putative UvrD Rv0427c xthA exodeoxyribonuclease III Rv0071 group II intron maturase Rv0861c probable DNA helicase Rv0944 possible formamidopyrimidineDNA glycosylase Rv1688 probable 3-methylpurine DNA glycosylase Rv2090 - Rv2191 - Rv2464c - Rv3201c - Rv3202c Rv3263 Rv3644c - 6. Protein Rv0429c Rv2534c Rv2882c Rv0684 Rv0120c Rv1080c Rv3462c Rv2839c Rv1641 Rv0009 Rv2582 Rv1299 Rv3105c Rv2889c Rv0685 translation def efp frr fusA fusA2 greA infA infB infC ppiA ppiB prfA prfB tsf tuf partially similar to DNA polymerase I similar to both PolC and UvrC proteins probable DNA glycosylase, endonuclease VIII probable ATP-dependent DNA helicase similar to UvrD proteins probable DNA methylase similar in N-term to DNA polymerase III 2. DNA Rv0670 Rv1108c Rv1107c Rv2460c clpP2 and modification polypeptide deformylase elongation factor P ribosome recycling factor elongation factor G elongation factor G transcription elongation factor G initiation factor IF-1 initiation factor IF-2 initiation factor IF-3 peptidyl-prolyl cis-trans isomerase peptidyl-prolyl cis-trans isomerase peptide chain release factor 1 peptide chain release factor 2 elongation factor EF-Ts elongation factor EF-Tu Rv2457c clpX 7. RNA synthesis, RNA modification and DNA transcription Rv1253 deaD ATP-dependent DNA/RNA helicase Rv2783c gpsI pppGpp synthase and polyribonucleotide phosphorylase Rv2841c nusA transcription termination factor Rv2533c nusB N-utilization substance protein B Rv0639 nusG transcription antitermination protein Rv3907c pcnA polynucleotide polymerase Rv3232c pvdS alternative sigma factor for siderophore production Rv3211 rhlE probable ATP-dependent RNA helicase Rv1297 rho transcription termination factor rho Rv3457c rpoA α subunit of RNA polymerase Rv0667 rpoB β subunit of RNA polymerase Rv0668 rpoC β' subunit of RNA polymerase Rv1364c rsbU SigB regulation protein Rv3287c rsbW anti-sigma B factor Rv2703 sigA RNA polymerase sigma factor (aka MysA, RpoV) Rv2710 sigB RNA polymerase sigma factor (aka MysB) Rv2069 sigC ECF subfamily sigma subunit Rv3414c sigD ECF subfamily sigma subunit Rv1221 sigE ECF subfamily sigma subunit Rv3286c sigF ECF subfamily sigma subunit Rv0182c sigG sigma-70 factors ECF subfamily Rv3223c sigH ECF subfamily sigma subunit Rv1189 sigI ECF family sigma factor Rv3328c sigJ similar to SigI, ECF family Rv0445c sigK ECF-type sigma factor Rv0735 sigL sigma-70 factors ECF subfamily Rv3911 sigM probable sigma factor, similar to SigE Rv3366 spoU probable rRNA methylase Rv3455c truA probable pseudouridylate synthase Rv2793c truB tRNA pseudouridine 55 synthase Rv1644 tsnR putative 23S rRNA methyltransferase Rv3649 ATP-dependent DNA/RNA helicase 8. Polysaccharides (cytoplasmic) Rv1326c glgB 1,4-α-glucan branching enzyme Rv1328 glgP probable glycogen phosphorylase Rv1564c glgX probable glycogen debranching enzyme Rv1563c glgY putative α-amylase Rv1562c glgZ maltooligosyltrehalose trehalohydrolase Rv0126 probable glycosyl hydrolase Rv1781c probable 4-α-glucanotransferase Rv2471 probable maltase α-glucosidase B. Degradation of macromolecules 1. RNA Rv1014c pth peptidyl-tRNA hydrolase Rv2925c rnc RNAse III Rv2444c rne similar at C-term to ribonuclease E Rv2902c rnhB ribonuclease HII Rv3923c rnpA ribonuclease P protein component Rv1340 rphA ribonuclease PH Nature © Macmillan Publishers Ltd 1998 end xseA xseB endonuclease IV (apurinase) exonuclease VII large subunit exonuclease VII small subunit 3. Proteins, peptides Rv3305c amiA Rv3306c amiB Rv3596c clpC Rv2461c clpP Rv2667 clpX' Rv3419c Rv2725c Rv1223 Rv2861c Rv0734 gcp hflX htrA map map' Rv0319 Rv0125 Rv2213 Rv0800 Rv2467 Rv2089c Rv2535c Rv2782c pcp pepA pepB pepC pepD pepE pepQ pepR Rv2109c Rv2110c Rv0782 Rv0781 Rv0724 prcA prcB ptrBa ptrBb sppA Rv0198c Rv0457c Rv0840c Rv0983 Rv1977 Rv3668c Rv3671c Rv3883c Rv3886c - 4. Polysaccharides, lipids Rv0062 celA Rv3915 cwlM Rv0315 Rv1090 Rv1327c - Rv1333 Rv3463 Rv3717 - and glycopeptides probable aminohydrolase probable aminohydrolase ATP-dependent Clp protease ATP-dependent Clp protease proteolytic subunit ATP-dependent Clp protease proteolytic subunit ATP-dependent Clp protease ATP-binding subunit ClpX similar to ClpC from M. leprae but shorter glycoprotease GTP-binding protein serine protease methionine aminopeptidase probable methionine aminopeptidase pyrrolidone-carboxylate peptidase probable serine protease aminopeptidase A/I aminopeptidase I probable aminopeptidase cytoplasmic peptidase cytoplasmic peptidase protease/peptidase, M16 family (insulinase) proteasome α-type subunit 1 proteasome β-type subunit 2 protease II, α subunit protease II, β subunit protease IV, signal peptide peptidase probable zinc metalloprotease probable peptidase probable proline iminopeptidase probable serine protease probable zinc metallopeptidase probable alkaline serine protease probable serine protease probable secreted protease protease lipopolysaccharides and phosphocellulase/endoglucanase hydrolase probable β-1,3-glucanase probable inactivated cellulase/endoglucanase probable glycosyl hydrolase, αamylase family probable hydrolase probable neuraminidase possible N-acetylmuramoyl-L-alanine amidase 5. Esterases and lipases Rv0220 lipC probable esterase Rv1923 lipD probable esterase Rv3775 lipE probable hydrolase Rv3487c lipF probable esterase Rv0646c lipG probable hydrolase Rv1399c lipH probable lipase Rv1400c lipI probable lipase Rv1900c lipJ probable esterase Rv2385 lipK probable acetyl-hydrolase Rv1497 lipL esterase Rv2284 lipM probable esterase Rv2970c lipN probable lipase/esterase Rv1426c lipO probable esterase Rv2463 lipP probable esterase Rv2485c lipQ probable carboxlyesterase Rv3084 lipR probable acetyl-hydrolase Rv3176c lipS probable esterase/lipase Rv2045c lipT probable carboxylesterase Rv1076 lipU probable esterase Rv3203 lipV probable lipase Rv0217c lipW probable esterase Rv2351c plcA phospholipase C precursor Rv2350c plcB phospholipase C precursor Rv2349c plcC phospholipase C precursor Rv1755c plcD partial CDS for phospholipase C Rv1104 probable esterase pseudogene Rv1105 probable esterase pseudogene 6. Aromatic hydrocarbons Rv3469c mhpE probable 4-hydroxy-2-oxovalerate aldolase Rv0316 probable muconolactone isomerase Rv0771 probable 4-carboxymuconolactone decarboxylase Rv0939 probable dehydrase Rv1723 6-aminohexanoate-dimer hydro- 8 Rv2715 - Rv3530c Rv3534c Rv3536c - lase 2-hydroxymuconic semialdehyde hydrolase probable cis-diol dehydrogenase 4-hydroxy-2-oxovalerate aldolase aromatic hydrocarbon degradation C. Cell envelope 1. Lipoproteins (lppA-lpr0) 65 2. Surface polysaccharides, lipopolysaccharides, proteins and antigens Rv0806c cpsY probable UDP-glucose-4epimerase Rv3811 csp secreted protein Rv1677 dsbF highly similar to C-term Mpt53 Rv3794 embA involved in arabinogalactan synthesis Rv3795 embB involved in arabinogalactan synthesis Rv3793 embC involved in arabinogalactan synthesis Rv3875 esat6 early secretory antigen target Rv0112 gca probable GDP-mannose dehydratase Rv0113 gmhA phosphoheptose isomerase Rv2965c kdtB lipopolysaccharide core biosynthesis protein Rv2878c mpt53 secreted protein Mpt53 Rv1980c mpt64 secreted immunogenic protein Mpb64/Mpt64 Rv2875 mpt70 major secreted immunogenic protein Mpt70 precursor Rv2873 mpt83 surface lipoprotein Mpt83 Rv0899 ompA member of OmpA family Rv3810 pirG cell surface protein precursor (Erp protein) Rv3782 rfbE similar to rhamnosyl transferase Rv1302 rfe undecaprenyl-phosphate α-Nacetylglucosaminyltransferase Rv2145c wag31 antigen 84 (aka wag31) Rv0431 tuberculin related peptide (AT103) Rv0954 cell envelope antigen Rv1514c involved in polysaccharide synthesis Rv1518 involved in exopolysaccharide synthesis Rv1758 partial cutinase Rv1910c probable secreted protein Rv1919c weak similarity to pollen antigens Rv1984c probable secreted protein Rv1987 probable secreted protein Rv2223c probable exported protease Rv2224c probable exported protease Rv2301 probable cutinase Rv2345 precursor of probable membrane protein Rv2672 putative exported protease Rv3019c similar to Esat6 Rv3036c probable secreted protein Rv3449 probable precursor of serine protease Rv3451 probable cutinase Rv3452 probable cutinase precursor Rv3724 probable cutinase precursor 3. Murein Rv2911 Rv2981c Rv3809c Rv1018c Rv3382c Rv1110 Rv1315 Rv0482 Rv2152c Rv2155c Rv2158c Rv2157c Rv2153c Rv1338 Rv2156c Rv3332 Rv0016c Rv2163c Rv0050 Rv3682 Rv0017c Rv0907 sacculus and peptidoglycan dacB penicillin binding protein ddlA D-alanine-D-alanine ligase A glf UDP-galactopyranose mutase glmU UDP-N-acetylglucosamine pyrophosphorylase lytB LytB protein homologue lytB' very similar to LytB murA UDP-N-acetylglucosamine-1-carboxyvinyltransferase murB UDP-N-acetylenolpyruvoylglucosamine reductase murC UDP-N-acetyl-muramate-alanine ligase murD UDP-N-acetylmuramoylalanine-Dglutamate ligase murE meso-diaminopimelate-adding enzyme murF D-alanine:D-alanine-adding enzyme murG transferase in peptidoglycan synthesis murI glutamate racemase murX phospho-N-acetylmuramoylpetapeptide transferase nagA N-acetylglucosamine-6-Pdeacetylase pbpA penicillin-binding protein pbpB penicillin-binding protein 2 ponA penicillin-bonding protein ponA' class A penicillin binding protein rodA FtsW/RodA/SpovE family probable penicillin binding protein Rv1367c Rv1730c Rv1922 Rv2864c Rv3330 Rv3627c - probable probable probable probable probable probable penicillin penicillin penicillin penicillin penicillin penicillin binding binding binding binding binding binding protein protein protein protein protein protein 4. Conserved membrane proteins Rv0402c mmpL1 conserved large membrane protein Rv0507 mmpL2 conserved large membrane protein Rv0206c mmpL3 conserved large membrane protein Rv0450c mmpL4 conserved large membrane protein Rv0676c mmpL5 conserved large membrane protein Rv1557 mmpL6 conserved large membrane protein Rv2942 mmpL7 conserved large membrane protein Rv3823c mmpL8 conserved large membrane protein Rv2339 mmpL9 conserved large membrane protein Rv1183 mmpL10 conserved large membrane protein Rv0202c mmpL11 conserved large membrane protein Rv1522c mmpL12 conserved large membrane protein Rv0403c mmpS1 conserved small membrane protein Rv0506 mmpS2 conserved small membrane protein Rv2198c mmpS3 conserved small membrane protein Rv0451c mmpS4 conserved small membrane protein Rv0677c mmpS5 conserved small membrane protein 5. Other membrane proteins 211 III. Cell processes A. Transport/binding proteins 1. Amino acids Rv2127 ansP L-asparagine permease Rv0346c aroP2 probable aromatic amino acid permease Rv0917 betP glycine betaine transport Rv1704c cycA transport of D-alanine, D-serine and glycine Rv3666c dppA probable peptide transport system permease Rv3665c dppB probable peptide transport system permease Rv3664c dppC probable peptide transport system permease Rv3663c dppD probable ABC-transporter Rv0522 gabP probable 4-amino butyrate transporter Rv0411c glnH putative glutamine binding protein Rv2564 glnQ probable ATP-binding transport protein Rv1280c oppA probable oligopeptide transport protein Rv1283c oppB oligopeptide transport protein Rv1282c oppC oligopeptide transport system permease Rv1281c oppD probable peptide transport protein Rv2320c rocE arginine/ornithine transporter Rv3253c probable cationic amino acid transport Rv3454 possible proline permease 2. Cations Rv2920c amt Rv1607 chaA Rv1239c corA Rv0092 Rv0103c Rv3270 Rv1469 ctpA ctpB ctpC ctpD Rv0908 Rv1997 Rv1992c Rv0425c ctpE ctpF ctpG ctpH Rv0107c ctpI Rv0969 Rv3044 Rv0265c trate Rv1029 ctpV fecB fecB2 kdpA putative ammonium transporter putative calcium/proton antiporter probable magnesium and cobalt transport protein cation-transporting ATPase cation transport ATPase cation transport ATPase probable cadmium-transporting ATPase probable cation transport ATPase probable cation transport ATPase probable cation transport ATPase C-terminal region putative cationtransporting ATPase probable magnesium transport ATPase cation transport ATPase putative FeIII-dicitrate transporter iron transport protein FeIII dicitransporter potassium-transporting ATPase A chain Nature © Macmillan Publishers Ltd 1998 Rv1030 kdpB Rv1031 kdpC Rv3236c kefB Rv2877c merT Rv1811 mgtC Rv0362 mgtE Rv2856 Rv0924c nicT nramp Rv2691 trkA Rv2692 trkB Rv2287 Rv2723 yjcE - Rv3162c Rv3237c - Rv3743c - potassium-transporting ATPase B chain potassium-transporting ATPase C chain probable glutathione-regulated potassium-efflux protein possible mercury resistance transport system probable magnesium transport ATPase protein C putative magnesium ion transporter probable nickel transport protein transmembrane protein belonging to Nramp family probable potassium uptake protein probable potassium uptake protein probable Na+/H+ exchanger probable membrane protein, tellurium resistance probable membrane protein possible potassium channel protein probable cation-transporting ATPase 3. Carbohydrates, organic acids and alcohols Rv2443 dctA C4-dicarboxylate transport protein Rv3476c kgtP sugar transport protein Rv1902c nanT probable sialic acid transporter Rv1236 sugA membrane protein probably involved in sugar transport Rv1237 sugB sugar transport protein Rv1238 sugC ABC transporter component of sugar uptake system Rv3331 sugI probable sugar transport protein Rv2835c ugpA sn-glycerol-3-phosphate permease Rv2833c ugpB sn-glycerol-3-phosphate-binding periplasmic lipoprotein Rv2832c ugpC sn-glycerol-3-phosphate transport ATP-binding protein Rv2834c ugpE sn-glycerol-3-phosphate transport system protein Rv2316 uspA sugar transport protein Rv2318 uspC sugar transport protein Rv2317 uspE sugar transport protein Rv1200 probable sugar transporter Rv2038c probable ABC sugar transporter Rv2039c probable sugar transporter Rv2040c probable sugar transporter Rv2041c probable sugar transporter 4. Anions Rv2684 Rv2685 Rv3578 Rv2643 Rv2397c arsA arsB arsB2 arsC cysA Rv2399c cysT Rv2398c cysW Rv1857 Rv1858 modA modB Rv1859 modC Rv1860 modD Rv2329c Rv1737c Rv0261c Rv0267 narK1 narK2 narK3 narU Rv0934 phoS1 Rv0928 phoS2 Rv0820 phoT Rv3301c phoY1 Rv0821c phoY2 Rv0545c pitA Rv2281 Rv0930 pitB pstA1 Rv0936 pstA2 Rv0933 pstB Rv0935 pstC Rv0929 pstC2 probable arsenical pump probable arsenical pump probable arsenical pump probable arsenical pump sulphate transport ATP-binding protein sulphate transport system permease protein sulphate transport system permease protein molybdate binding protein transport system permease, molybdate uptake molybdate uptake ABCtransporter precursor of Apa (45/47 kD secreted protein) probable nitrite extrusion protein nitrite extrusion protein nitrite extrusion protein1 similar to nitrite extrusion protein 2 PstS component of phosphate uptake PstS component of phosphate uptake phosphate transport system ABC transporter phosphate transport system regulator phosphate transport system regulator low-affinity inorganic phosphate transporter phosphate permease PstA component of phosphate uptake PstA component of phosphate uptake ABC transport component of phosphate uptake PstC component of phosphate uptake membrane-bound component of 8 Rv0932c pstS Rv2400c Rv0143c Rv1707 Rv1739c Rv3679 Rv3680 subI - phosphate transport system PstS component of phosphate uptake sulphate binding precursor probable chloride channel probable sulphate permease possible sulphate transporter possible anion transporter probable anion transporter 5. Fatty acid transport Rv2790c ltp1 non-specific lipid transport protein Rv3540c ltp2 non-specific lipid transport protein 6. Efflux proteins Rv2936 drrA Rv2937 drrB Rv2938 drrC Rv2846c Rv3065 Rv0783c Rv0849 Rv1145 Rv1146 Rv1250 Rv1258c efpA emrE - Rv1410c Rv1634 Rv1819c - Rv2136c - Rv2209 Rv2333c - Rv2994 - Rv1877 Rv2459 - similar daunorubicin resistance ABC-transporter similar daunorubicin resistance transmembrane protein similar daunorubicin resistance transmembrane protein putative efflux protein resistance to ethidium bromide multidrug resistance protein possible quinolone efflux pump probable drug transporter probable drug transporter probable drug efflux protein probable multidrug resistance pump probable drug efflux protein probable drug efflux protein probable multidrug resistance pump putative bacitracin resistance protein probable drug efflux protein probable tetracenomycin C resistance protein probable fluoroquinolone efflux protein probable drug efflux protein probable drug efflux protein B. Chaperones/Heat shock Rv0384c clpB heat shock protein Rv0352 dnaJ acts with GrpE to stimulate DnaK ATPase Rv2373c dnaJ2 DnaJ homologue Rv0350 dnaK 70 kD heat shock protein, chromosome replication Rv3417c groEL1 60 kD chaperonin 1 Rv0440 groEL2 60 kD chaperonin 2 Rv3418c groES 10 kD chaperone Rv0351 grpE stimulates DnaK ATPase activity Rv2374c hrcA heat-inducible transcription repressor Rv0251c hsp possible heat shock protein Rv0353 hspR heat shock regulator Rv2031c hspX 14kD antigen, heat shock protein Hsp20 family Rv2299c htpG heat shock protein Hsp90 family Rv0563 htpX probable (transmembrane) heat shock protein Rv2701c suhB putative extragenic suppressor protein Rv3269 probable heat shock protein C. Cell division Rv3641c fic Rv3102c ftsE Rv3610c ftsH Rv2748c Rv2151c Rv2154c ftsK ftsQ ftsW Rv3101c Rv2921c Rv2150c Rv3919c Rv3625c Rv3917c ftsX ftsY ftsZ gid mesJ parA Rv3918c parB Rv2922c smc Rv0012 Rv0435c Rv2115c Rv3213c - Rv1708 - possible cell division protein membrane protein inner membrane protein, chaperone chromosome partitioning ingrowth of wall at septum membrane protein (shape determination) membrane protein cell division protein FtsY circumferential ring, GTPase glucose inhibited division protein B probable cell cycle protein chromosome partitioning; DNA binding possibly involved in chromosome partitioning member of Smc1/Cut3/Cut14 family possible cell division protein ATPase of AAA-family ATPase of AAA-family possible role in chromosome segregation possible role in chromosome partitioning D. Protein and peptide secretion Rv2916c ffh signal recognition particle protein Rv2903c lepB signal peptidase I Rv1614 lgt prolipoprotein diacylglyceryl transferase Rv1539 lspA lipoprotein signal peptidase Rv0379 sec probable transport protein SecE/Sec61- γ family Rv3240c secA SecA, preprotein translocase sub- Rv1821 secA2 Rv2587c Rv0638 Rv2586c Rv1440 secD secE secF secG Rv0732 secY Rv2462c tig Rv2813 - unit SecA, preprotein translocase subunit protein-export membrane protein SecE preprotein translocase protein-export membrane protein protein-export membrane protein SecG SecY subunit of preprotein translocase chaperone protein, similar to trigger factor probable general secretion pathway protein E. Adaptations and atypical conditions Rv1901 cinA competence damage protein Rv3648c cspA cold shock protein, transcriptional regulator Rv0871 cspB probable cold shock protein Rv3063 cstA starvation-induced stress response protein Rv3490 otsA probable α,α-trehalose-phosphate synthase Rv2006 otsB trehalose-6-phosphate phosphatase Rv3372 otsB2 trehalose-6-phosphate phosphatase Rv3758c proV osmoprotection ABC transporter Rv3757c proW transport system permease Rv3759c proX similar to osmoprotection proteins Rv3756c proZ transport system permease Rv1026 probable pppGpp-5'phosphohydrolase F. Detoxification Rv2428 ahpC Rv2429 ahpD Rv2238c ahpE Rv2521 bcp Rv1608c bcpB Rv3473c bpoA Rv1123c bpoB Rv0554 bpoC Rv3617 Rv1938 Rv1124 Rv2214c Rv3670 Rv0134 Rv3171c ephA ephB ephC ephD ephE ephF hpx Rv1908c Rv3846 Rv0432 katG sodA sodC Rv1932 Rv0634c Rv2581c Rv3177 tpx - IV. Other A. Virulence Rv0169 mce1 Rv0589 mce2 Rv1966 mce3 Rv3499c mce4 Rv3100c smpB Rv1694 tlyA Rv0024 Rv0167 Rv0168 Rv0170 Rv0171 Rv0172 Rv0174 Rv0587 Rv0588 Rv0590 Rv0591 Rv0592 Rv0594 Rv1085c Rv1477 Rv1478 - Rv1566c - Rv1964 Rv1965 Rv1967 Rv1968 Rv1969 Rv1971 Rv2190c Rv3494c Rv3496c Rv3497c Rv3498c - alkyl hydroperoxide reductase member of AhpC/TSA family member of AhpC/TSA family bacterioferritin comigratory protein probable bacterioferritin comigratory protein probable non-heme bromoperoxidase probable non-heme bromoperoxidase probable non-heme bromoperoxidase probable epoxide hydrolase probable epoxide hydrolase probable epoxide hydrolase probable epoxide hydrolase probable epoxide hydrolase probable epoxide hydrolase probable non-heme haloperoxidase catalase-peroxidase superoxide dismutase superoxide dismutase precursor (Cu-Zn) thiol peroxidase putative glyoxylase II putative glyoxylase II probable non-heme haloperoxidase cell invasion protein cell invasion protein cell invasion protein cell invasion protein probable small protein b cytotoxin/hemolysin homologue putative p60 homologue part of mce1 operon part of mce1 operon part of mce1 operon part of mce1 operon part of mce1 operon part of mce1 operon part of mce2 operon part of mce2 operon part of mce2 operon part of mce2 operon part of mce2 operon part of mce2 operon possible hemolysin putative exported p60 protein homologue putative exported p60 protein homologue putative exported p60 protein homologue part of mce3 operon part of mce3 operon part of mce3 operon part of mce3 operon part of mce3 operon part of mce3 operon putative p60 homologue part of mce4 operon part of mce4 operon part of mce4 operon part© of mce4 operon Nature Macmillan Publishers Ltd 1998 Rv3500c Rv3501c Rv3896c Rv3922c - part of mce4 operon part of mce4 operon putative p60 homologue possible hemolysin B. IS elements, Repeated sequences, and Phage 1. IS elements IS6110 16 copies IS1081 6 copies Others 37 copies 2. REP13E12 family 7 copies 3. Phage-related functions Rv2894c xerC integrase/recombinase Rv1701 xerD integrase/recombinase Rv1054 integrase-a Rv1055 integrase-b Rv1573 phiRV1 phage related protein Rv1574 phiRV1 phage related protein Rv1575 phiRV1 phage related protein Rv1576c phiRV1 phage related protein Rv1577c phiRV1 possible prohead protease Rv1578c phiRV1 phage related protein Rv1579c phiRV1 phage related protein Rv1580c phiRV1 phage related protein Rv1581c phiRV1 phage related protein Rv1582c phiRV1 phage related protein Rv1583c phiRV1 phage related protein Rv1584c phiRV1 phage related protein Rv1585c phiRV1 phage related protein Rv1586c phiRV1 integrase Rv2309c integrase Rv2310 excisionase Rv2646 phiRV2 integrase Rv2647 phiRV2 phage related protein Rv2650c phiRV2 phage related protein Rv2651c phiRV2 prohead protease Rv2652c phiRV2 phage related protein Rv2653c phiRV2 phage related protein Rv2654c phiRV2 phage related protein Rv2655c phiRV2 phage related protein Rv2656c phiRV2 phage related protein Rv2657c similar to gp36 of mycobacteriophage L5 Rv2658c phiRV2 phage related protein Rv2659c phiRV2 integrase Rv2830c similar to phage P1 phd gene Rv3750c excisionase Rv3751 putative integrase C. PE and PPE families 1. PE family PE subfamily 38 members PE_PGRS subfamily 61 members 2. PPE family 68 members D. Antibiotic production and resistance Rv2068c blaC class A β-lactamase Rv3290c lat lysine-e aminotransferase Rv2043c pncA pyrazinamide resistance/sensitivity Rv0133 possible puromycin N-acetyltransferase Rv0262c aminoglycoside 2'-N-acetyltransferase Rv0802c acetyltransferase Rv1082 similar to S. lincolnensis lmbE Rv1170 similar to S. lincolnensis lmbE Rv1347c possible aminoglycoside 6'-Nacetyltransferase Rv2036 similar to lincomycin production genes Rv2303c similar to S. griseus macrotetrolide resistance protein Rv3225c probable aminoglycoside 3'-phosphotransferases Rv3700c probable acetyltransferase Rv3817 probable aminoglycoside 3'-phosphotransferase E. Bacteriocin-like proteins 3 F. Cytochrome P450 enzymes 22 G. Coenzyme F420-dependent enzymes 3 H. Miscellaneous transferases 61 I. Miscellaneous phosphatases, lyases, and hydrolases 18 J. Cyclases 6 K. Chelatases 2 V. Conserved hypotheticals 912 VI. Unknowns 606 TOTAL 3924 8