Goran Neshich - Embrapa/CNPTIA
Transcription
Goran Neshich - Embrapa/CNPTIA
Structural BioInformatics Laboratory: SBI Proteins and their 3 D Structure Paula Kuser Goran Neshich Embrapa Informática Agropecuária Cidade Universitária - UNICAMP Campinas, SP http://www.cbi.cnptia.embrapa.br Goran Neshich Structure Sequence Blast STING Function Role SMS Microarray Image Analysing Semantic Lexical Pragmatic Sintactic •Gene Anotation •Gene Comparison •Function Descriptors •Structure Descriptors http://www.cbi.cnptia.embrapa.br •Gene Expression Networks •Proteomics Goran Neshich Bringing Genome Into Three Dimensions Cabral’s map of Brazil Parallels that help us to see the problem better http://www.cbi.cnpia.embrapa.br Bringing Genome Into Three Dimensions Old protein map Parallels that help us to see the problem better http://www.cbi.cnpia.embrapa.br Bringing Genome Into Three Dimensions Satellite map http://www.cbi.cnpia.embrapa.br Structure/function descriptors in JPD http://www.cbi.cnptia.embrapa.br Goran Neshich Data/information deluge and flavors of Bioinformatics http://www.cbi.cnptia.embrapa.br Goran Neshich Datalibrary – 2003 (february) 23.950.735 nucleotide sequences,37.486.732.136 bp 112 Published-complete genomes 590 Genomes being done 830.525 Protein Sequences 20.417 Protein Structures 5.300 Plasmodium falciparum genes, 23.000.000 bp 35.000 Genes in Homo sapiens,3.164.000.000 bp, 27936 genes in Xyllela fastidiosa, 2.519.802 Bases, 2775 proteins 10.000.000 Publications in PubMedline http://www.cbi.cnptia.embrapa.br Goran Neshich Datalibrary – 2003 (October) 29,189,427 nucleotide sequences (~40 x 109 bp) Published-complete genomes: Virus: 1421; Archaea:16; Bacteria:135; Eucariots: 9 +4 vertebrates+7 plants 590 Genomes being done 1,139,154 22,700 Protein Sequences Protein Structures (PDB) 480 genes in 35,000 27,936 Mycoplasma genitalium: 580,000 bp Genes in Homo sapiens (3.164 x 109 bp) genes in Xyllela fastidiosa, 2.519.802 Bases, >10,000,000 Publications in PubMedline http://www.cbi.cnptia.embrapa.br Goran Neshich SMS and Protein Dossier – Drug Target DB Onde atuamos? Sequenciamento de Genomas Genômica Estrutural Livro da vida Structural DB Descritores de estrutura anotação Estrutura-Funcão Busca por novos efetores Mutational and dynamic studies Interação proteína-ligante (matching DB) http://www.cbi.cnptia.embrapa.br Drug Discovery Docking Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2. Sequence alignments 3. Structure alignment 4. Secondary structure prediction 5. Structure modeling (homology modeling) 6. Structure prediction (threding) 7. Characterization of structure 8. Relationship: sequence-structure-function 9. Function modifiers 10.Compiling the list of pairs: structure and its function modifier http://www.cbi.cnptia.embrapa.br Goran Neshich Structural Bioinformatics Sequence similarity search Sequence alignment Parallels that help us to see the problem better http://www.cbi.cnptia.embrapa.br Goran Neshich Structural Bioinformatics AKWHGGAFWPPH WAAGAHWPHAQD Parallels that help us to see the problem better http://www.cbi.cnptia.embrapa.br Goran Neshich Bringing Genome Into Three Dimensions Functional Genomics Milestone: How well function can be inherited from similar sequences? From sequence to function: desires and problems http://www.cbi.cnpia.embrapa.br Structural Bioinformatics From gene to functional protein Parallels that help us to see the problem better http://www.cbi.cnptia.embrapa.br Goran Neshich http://www.cbi.cnptia.embrapa.br Goran Neshich Structural Bioinformatics http://www.cbi.cnptia.embrapa.br Goran Neshich Structural Bioinformatics http://www.cbi.cnptia.embrapa.br Goran Neshich Structural Bioinformatics What did we learn from genomics projects? http://www.cbi.cnptia.embrapa.br Goran Neshich Bioinformatics: insight in life on earth..... Interesting information from genome data: Two sequences can be instantly compared Human and tomato histone H4 are ~ identical – as are many "housekeeping" proteins across taxa. Such comparisons have been very useful in the annotation of genomes 97 Complete Microbial Genomes: [A] Archaea - 16 species [B] - Bacteria - 81 species (October 28, 2002) http://www.ncbi.nlm.nih.gov/PMGifs/Genom es/micr.html http://www.cbi.cnptia.embrapa.br Goran Neshich Bioinformatics: insight in life on earth..... Interesting information from genome data: The smallest genome of any free-living organism is that of the bacterium Mycoplasma genitalium – it is only 500,000 bp long (0.017% that of the human genome) and has 450 genes (1.5% of the human genome); Only 12 genes are of unknown function The core biochemical and genetic complement of life http://www.cbi.cnptia.embrapa.br Goran Neshich Structural Bioinformatics Sequence alignment Parallels that help us to see the problem better http://www.cbi.cnptia.embrapa.br Goran Neshich Scoring Matrices Instead of using points at match/mismatch, we may use “scoring matrix” For DNA/RNA match=1, mismatch = 0 “dotplot” is now converted into diagram of numbers and best alignment corresponds to this diagonal with greatest numerical value T G A C http://www.cbi.cnptia.embrapa.br T 1 0 0 0 G 0 1 0 0 A 0 0 1 0 C 0 0 0 1 Goran Neshich Dotplot with scores Two proteins aligned produce “score dotplot” from which one can calculate optimal alignment P A W H E A E H -2 -2 -3 10 0 -2 0 E -1 -1 -3 0 6 -1 6 A -1 5 -3 -2 -1 5 -1 http://www.cbi.cnptia.embrapa.br G -2 0 -3 -2 -3 0 -3 A -1 5 -3 -2 -1 5 -1 W -4 -3 15 -3 -3 -3 -3 G -2 0 -3 -2 -3 0 -3 H -2 -2 -3 10 0 -2 0 E -1 -1 -3 0 6 -1 6 E -1 -1 -3 0 6 -1 6 Goran Neshich Structural Bioinformatics Structure elements Parallels that help us to see the problem better http://www.cbi.cnptia.embrapa.br Goran Neshich STING Millennium Suite: Analysing structure of proteins and their complexes - What do we know about structure and its relationship with function? What are the building blocks of microfactories, better known as PROTEINS? What is the structural hierarchi in proteins? http://www.cbi.cnptia.embrapa.br Goran Neshich STING Millennium Suite: Analysing structure of proteins and their complexes Secondary structure elements: Helix Turn Sheet Coil http://www.cbi.cnptia.embrapa.br Goran Neshich STING Millennium Suite: Analysing structure of proteins and their complexes - Peptide bond and other types of “intimate” amino acid contacts http://www.cbi.cnptia.embrapa.br Goran Neshich STING Millennium Suite: Analysing structure of proteins and their complexes - http://www.cbi.cnptia.embrapa.br Goran Neshich Ramachandran Plot http://www.cbi.cnptia.embrapa.br Goran Neshich Structural Bioinformatics Secondary structure prediction Parallels that help us to see the problem better http://www.cbi.cnptia.embrapa.br Goran Neshich Pij = http://www.cbi.cnptia.embrapa.br nij / ni N j / NT Goran Neshich α Helix β Sheet β Turn aa C& F L pr aa C& F L pr aa C& F L Glu 1.51 1.44 ++ Val 1.70 1.49 ++ Asn 1.56 1.28 Met 1.45 1.47 ++ Ile 1.60 1.45 ++ Gly 1.56 1.64 Ala 1.42 1.29 ++ Tyr 1.47 1.25 ++ Pro 1.52 1.91 Leu 1.21 1.30 ++ Phe 1.38 1.32 + Asp 1.46 1.41 Lys 1.16 1.23 + Trp 1.37 1.14 + Ser 1.43 1.32 Phe 1.13 1.07 + Leu 1.30 1.02 + Cys 1.19 0.81 Gln 1.11 1.27 + Cys 1.19 0.74 + Tyr 1.14 1.05 Trp 1.08 0.99 + Thr 1.19 1.21 + Lys 1.01 0.96 Ile 1.08 0.97 + Gln 1.10 0.80 + Gln 0.98 0.98 Val 1.06 0.91 + Met 1.05 0.97 + Thr 0.96 1.04 Asp 1.01 1.04 = Arg 0.93 0.99 = Trp 0.96 0.76 His 1.00 1.22 = Asn 0.89 0.76 = Arg 0.95 0.88 Arg 0.98 0.96 = His 0.87 1.08 = His 0.95 0.68 Thr 0.83 0.82 = Ala 0.83 0.90 - Glu 0.74 0.99 Ser 0.77 0.82 = Ser 0.75 0.95 - Ala 0.66 0.77 Cys 0.70 1.11 = Gly 0.75 0.92 - Met 0.60 0.41 Tyr 0.69 0.72 - Lys 0.74 0.77 - Phe 0.60 0.59 Asn 0.67 0.90 - Pro 0.55 0.64 -- Leu 0.59 0.58 Pro 0.57 0.52 -- Asp 0.54 0.72 -- Val 0.50 0.47 Gly 0.57 0.56 -- Glu 0.37 0.75 -- Ile 0.47 0.51 http://www.cbi.cnptia.embrapa.br Table 1. Aminoacid Propensities for secondary structure element formation, according to ChouFasman (1978b) (C&F) and Levitt (1978) (L). Goran Neshich Structural Bioinformatics Structure modelling Parallels that help us to see the problem better http://www.cbi.cnptia.embrapa.br Goran Neshich Probably non-globular Protein 15% As yet unobserved folds 20% Sequence-based fold Recognition 50% Full threading methods 15% Figure 5 Hypothentical applicability of diferent categories of fold-recognition methods to the open Reading Frames of small bacterial genomes. At present sequance-based fold recognition (e.g. GenTHREADER) is successful for aroud 50% of the ORFs. Structures of a further 15% of ORFs can probably be assigned. By full threading methods such as THREADER, and the reamaining 35% cannot currently be recognized either because the fold has not yet observed, or because the ORF encodes a non-globular protein (e.g. aTransmembrane protein). http://www.cbi.cnptia.embrapa.br Goran Neshich Unannotated regions PDB match region Transmembrane or Low complexity region Pie Chart of structural assignments to the proteome of the bacterium Mycoplasma genitalium. Almost half of the amino acids (49%) in the Mycoplasma genitalium proteins have a structural annotation. In this case, the structural anotation was taken from the SUPERFAMILY database(version 1.59, September 2002), described in Section 11.3.2.Roughty one fifth of the proteome is predicted to be a transmembrane helix or low complexity region by therelevant computer programs. The remaining 30% of the proteome is unassigned. http://www.cbi.cnptia.embrapa.br Goran Neshich Structural Bioinformatics Structure alignment Parallels that help us to see the problem better http://www.cbi.cnptia.embrapa.br Goran Neshich Structural Bioinformatics Function modifiers: drugs Parallels that help us to see the problem better http://www.cbi.cnptia.embrapa.br Goran Neshich Final goal: complement Genome Track SMS and Protein Dossier – Drug Target DB Complete Genome Sequence Homology Modeling Small molecules Database Local PDB files Fingerprint Fingerprint 2D Contour map surface matching Ligand-binding site 2-D information (for search ) Mutational and dynamic studies http://www.cbi.cnptia.embrapa.br Protein-binding site 2-D information (for search) Protein/Ligand interaction (matching DB) Docking Goran Neshich Structural Bioinformatics http://www.cbi.cnptia.embrapa.br Goran Neshich Structural Bioinformatics http://www.cbi.cnptia.embrapa.br Goran Neshich Otimização do uso dos computadores disponíveis no sistema Embrapa Tecnologia GRID (Grade computacional) Serviços Web Tecnologia GRID Serviços GRID http://www.cbi.cnptia.embrapa.br Goran Neshich Acesso online aos recursos computacionais descentralizados ampla disseminação Busca por novos fármacos In silico Armazém dos arquivos Clientes com desktop Desafio de instalação de Grade Computacional na Embrapa http://www.cbi.cnptia.embrapa.br Goran Neshich Arquivo de coordenadas pdb HEADER COMPND SOURCE AUTHOR REVDAT JRNL REMARK REMARK REMARK SEQRES SEQRES SEQRES FTNOTE FTNOTE SCALE1 SCALE2 SCALE3 ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM COMPLEX(SERINE PROTEINASE-INHIBITOR) 04-MAR-88 1CHO ALPHA-CHYMOTRYPSIN (E.C.3.4.21.1) COMPLEX WITH TURKEY BOVINE (BOS $TAURUS) PANCREAS AND TURKEY (MELEAGRIS M.FUJINAGA,A.R.SIELECKI,R.J.READ,W.ARDELT,M.LASKOWSKI 1 16-JUL-88 1CHO 0 REF J.MOL.BIOL. V. 195 397 1987 2 RESOLUTION. 1.8 ANGSTROMS. 3 REFINEMENT. BY THE RESTRAINED LEAST SQUARES PROCEDURE OF 3 THE R VALUE IS 0.168 FOR 19178 REFLECTIONS 1 E 245 CYS GLY VAL PRO ALA ILE GLN PRO VAL LEU SER GLY LEU 2 E 245 EXC EXC ILE VAL ASN GLY GLU GLU ALA VAL PRO GLY SER 5 I 56 PHE GLY LYS CYS 1 LYS E 36 - POOR DENSITY FOR ALL ATOMS BEYOND CG. 2 GLY E 74 THROUGH SER E 77 - POOR DENSITY FOR BOTH SIDE 0.022262 0.000000 0.005509 0.00000 0.000000 0.018342 0.000000 0.00000 0.000000 0.000000 0.018016 0.00000 1 N CYS E 1 -15.451 30.900 -6.779 1.00 19.02 2 CA CYS E 1 -15.185 29.544 -7.307 1.00 17.58 3 C CYS E 1 -14.777 29.619 -8.768 1.00 17.23 4 O CYS E 1 -15.105 30.601 -9.455 1.00 16.79 5 CB CYS E 1 -16.428 28.675 -7.320 1.00 16.57 6 SG CYS E 1 -17.883 29.299 -8.170 1.00 19.13 7 N GLY E 2 -14.110 28.558 -9.177 1.00 16.79 8 CA GLY E 2 -13.678 28.296 -10.503 1.00 15.13 9 C GLY E 2 -12.640 29.105 -11.184 1.00 17.48 10 O GLY E 2 -12.427 28.886 -12.414 1.00 16.69 11 N VAL E 3 -11.950 29.979 -10.481 1.00 18.30 http://www.cbi.cnptia.embrapa.br Goran Neshich 1CHO 22 1CHO 1CHO 1CHO 1CHO 1CHO 1CHO 1CHO 1CHO 1CHO 1CHO 1CHO 1CHO 1CHO 1CHO 1CHO 1CHO 1CHO 1CHO 1CHO 61 62 84 86 88 153 154 155 156 157 158 159 160 161 162 163 164 165 166 PDB file format Field No. Column range 1. 1–6 A6 Record ID (eg ATOM, HETATM) 2. 7 – 11 I5 Atom serial number - 12 – 12 1X Blank 3. 13 – 16 A4 Atom name (eg " CA " , " ND1") 4. 17 – 17 A1 Alternative location code (if any) 5. 18 – 20 A3 Standard 3-letter amino acid code for residue - 21 – 21 1X Blank 6. 22 – 22 A1 Chain identifier code 7. 23 – 26 I4 Residue sequence number 8. 27 – 27 A1 Insertion code (if any) - 28 - 30 3X Blank 9. 31 – 38 F8.3 Atom's x-coordinate 10. 39 – 46 F8.3 Atom's y-coordinate 11. 47 – 54 F8.3 Atom's z-coordinate 12. 55 – 60 F6.2 Occupancy value for atom 13. 61 – 66 F6.2 B-value (thermal factor) - 67 – 67 1X Blank 14. 68 - 70 I3 Footnote number http://www.cbi.cnptia.embrapa.br FORTRAN format Description Goran Neshich Formato PDB Amino-ácido Cadeia Átomo x 26811 arquivos pdbs (17/08/2004) http://www.cbi.cnptia.embrapa.br Goran Neshich y z NMR PDB files For NMR ensembles the coordinates of each model should be preceded by a MODEL record, and terminated by an ENDMDL record. The format of the former is ('MODEL',5X,I4), where the I4 holds the model number. For example:MODEL 1 MODEL 2 ... pdb file 1abt http://www.cbi.cnptia.embrapa.br Goran Neshich Ca atoms only 1bax ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM 1 2 3 4 5 6 7 8 CA CA CA CA CA CA CA CA MET GLY GLN GLU LEU SER GLN HIS 1 2 3 4 5 6 7 8 -10.673 -8.177 -4.858 -4.446 -1.388 -1.477 -3.336 -0.185 -2.781 0.101 -1.792 -4.403 -4.290 -0.519 -0.468 -1.291 -24.877 -25.131 -25.204 -22.446 -20.154 -20.745 -17.418 -15.497 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 9.53 6.91 5.34 3.07 1.88 1.77 1.68 1.30 C C C C C C C C 1 cho ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM 1 2 3 4 5 6 7 8 9 10 11 N CA C O CB SG N CA C O N CYS CYS CYS CYS CYS CYS GLY GLY GLY GLY VAL E E E E E E E E E E E 1 1 1 1 1 1 2 2 2 2 3 http://www.cbi.cnptia.embrapa.br -15.451 -15.185 -14.777 -15.105 -16.428 -17.883 -14.110 -13.678 -12.640 -12.427 -11.950 30.900 29.544 29.619 30.601 28.675 29.299 28.558 28.296 29.105 28.886 29.979 -6.779 -7.307 -8.768 -9.455 -7.320 -8.170 -9.177 -10.503 -11.184 -12.414 -10.481 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 19.02 17.58 17.23 16.79 16.57 19.13 16.79 15.13 17.48 16.69 18.30 1CHO 1CHO 1CHO 1CHO 1CHO 1CHO 1CHO 1CHO 1CHO 1CHO 1CHO Goran Neshich 156 157 158 159 160 161 162 163 164 165 166 PDB Metrics http://www.cbi.cnptia.embrapa.br Goran Neshich Literature The PDB format, mmCIF formats and other data formats. Describes the different data formats and protocols used to represent PDB structures. Westbrook, J and Fitzgerald, PM (2003): Structural Bioinformatics. P. E. Bourne and H. Weissig. Hoboken, NJ, John Wiley & Sons, Inc. pp. 161-179. The Protein Data Bank. Nucleic Acids Research, 28 pp. 235-242 (2000) H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne: http://www.cbi.cnptia.embrapa.br Goran Neshich Future - The Structure Should be the User Interface Ligand - What other entries contain this? Chain - What other entries have chains with >90% sequence identity? Residue - What is the environment of this residue? STING!!!!! http://www.cbi.cnptia.embrapa.br Goran Neshich Our mission 1. Understand fully relationship b/w structure and function by collecting and mapping all respective descriptors in a DB 2. Make this DB accessible for analysis through interactive interface 3. Elaborate specific data and structure classifications 4. Analyze: folding, stability, binding, families, etc. http://www.cbi.cnptia.embrapa.br Goran Neshich But: Graphics The natural human brain language is VISUAL http://www.cbi.cnptia.embrapa.br Goran Neshich Structure/function descriptors in JPD http://www.cbi.cnptia.embrapa.br Goran Neshich STING Millennium Suite (SMS) General Computational Biology/Bioinformatics procedures 1. sequence homology search and position specific sequence conservation; 2. sequence alignments; 3. structural alignments; 4. search for sequence to structure relationship; 5. definition of structural parameters like intra and inter chain contacts; 6. structure modeling with evaluation of the quality of obtained models; 7. protein interfaces and identification of active and ligand binding sites; 8. cumulative statistics on protein family characteristics. http://www.cbi.cnptia.embrapa.br Goran Neshich Gold STING entry page http://www.cbi.cnptia.embrapa.br Goran Neshich SBI entry page http://www.cbi.cnptia.embrapa.br Goran Neshich Gold STING web page http://www.cbi.cnptia.embrapa.br Goran Neshich GOld STING mirror sites http://www.cbi.cnptia.embrapa.br Goran Neshich http://www.cbi.cnptia.embrapa.br http://www.cbi.cnptia.embrapa.br Goran Neshich STING evolution 2004 2005 2006 http://www.cbi.cnptia.embrapa.br Goran Neshich Publications: http://www.cbi.cnptia.embrapa.br Goran Neshich Java Protein Dossier Configure view Reload Sequence Legenda Select residues Unselect selected residues Show selected residues Stop flashing residues Lock scroll bar Change chains Show shortest contacts only Hide contacts with water http://www.cbi.cnptia.embrapa.br Goran Neshich Structurally aligned: chymotrypsin & trypsin http://www.cbi.cnptia.embrapa.br Goran Neshich JPD IFR alignement for two enzymes http://www.cbi.cnptia.embrapa.br Goran Neshich X-Y plots in Gold STING http://www.cbi.cnptia.embrapa.br Goran Neshich Navigating STING http://www.cbi.cnptia.embrapa.br Goran Neshich Final Objective: automatic function assignment Attributes = Parametrs Set of parameters can be used to describe the protein function Therefore = > predict function having the structure. http://www.cbi.cnptia.embrapa.br Goran Neshich Final Objective: automatic function assignment Enzyme Classification (EC) Six classes of enzymes at top level: 1. Oxidoreductase 2. Transferase 3. Hydrolase 4. Lyase 5. Isomerase 6. Ligase http://www.cbi.cnptia.embrapa.br Goran Neshich Protein Dossier: applications Function / activity http://www.cbi.cnptia.embrapa.br Goran Neshich Gold STING Why 8000 clones??? http://www.cbi.cnptia.embrapa.br Goran Neshich http://www.cbi.cnptia.embrapa.br Goran Neshich Protein Dossier: applications Searching and Learning by Selecting http://www.cbi.cnptia.embrapa.br Goran Neshich http://www.cbi.cnptia.embrapa.br Goran Neshich http://www.cbi.cnptia.embrapa.br Goran Neshich Protein Dossier: applications Folding and Stability http://www.cbi.cnptia.embrapa.br Goran Neshich Three folding crytical residues http://www.cbi.cnptia.embrapa.br Goran Neshich Parameter Name/Parameter value (range) or description Total Unused Contact Energy> 72 Kcal/mol Density> 1,08 for Probing Sphere 3A centered at Ca Sponge> 0,75 for Probing Sphere 3A centered at Ca Cross Presence Order>= 1,1,2 (at Ca, Cb and LHA) Secondary Structure Element: Beta Sheet Conservation Sh2Qs / Evolutionary Pressure< 28 Electrostatic Potential @ Ca>= 7 Electrostatic Potential @ Surface<= 0 http://www.cbi.cnptia.embrapa.br Goran Neshich Parameter set defining active site Is there a set of parameters which can define UNIQUELY an amino acid ensemble which coincides with the active site of a given protein ? http://www.cbi.cnptia.embrapa.br Goran Neshich Protein Dossier: applications Active site identification HIV-1 integrase is an essential enzyme in the life cycle of the virus, responsible for catalyzing the insertion of the viral genome into the host cell chromosome; it provides an attractive target for antiviral drug design!! The evidence was obtained from site-directed mutagenesis experiments in which it was demonstrated that even the most conservative substitutions of any of the three absolutely conserved carboxylate residues, D64, D116, and E152 (the so-called D,D-35-E motif), abolished catalytic activity!!! http://www.cbi.cnptia.embrapa.br Goran Neshich Protein Dossier: applications Active site identification http://www.cbi.cnptia.embrapa.br Goran Neshich STING Report http://www.cbi.cnptia.embrapa.br Goran Neshich STING Report 1ot5.pdb http://www.cbi.cnptia.embrapa.br 1gci.pdb 1ic6.pdb Pro-hormone convertases. Goran Neshich Structure descriptors pointing to the enzyme: Serin-protease and alpha-amylase family case http://www.cbi.cnptia.embrapa.br Goran Neshich Structural alignment of SERPRO using PrISM algorithm SERPRO super pdb file ready for docking www.cbi.cnptia.embrapa.br/SMS/SPIDER Goran Neshich IFR data and analysis for SDM http://www.cbi.cnptia.embrapa.br Goran Neshich Protein Dossier: applications Active site identification Another example we have examined is the cytosolic ascorbate peroxidase, the protein that plays a key role in hydrogen peroxide removal in the chloroplasts and cytosol of higher plants. The evidence was obtained from sitedirected mutagenesis experiments that there are 3 residues: R38, H42 and N71 are constituting a complete catalytic site and any substitution abolished catalytic activity! By first selecting number of parameters and then fixing the range for their numerical values, we were able to select only the amino acids of the cytosolic ascorbate peroxidase that constitue the catalytic site. The parameters used and ranges for them were: http://www.cbi.cnptia.embrapa.br Goran Neshich Protein Dossier: applications http://www.cbi.cnptia.embrapa.br Goran Neshich Structure descriptors and fingerprint matching http://www.cbi.cnptia.embrapa.br Goran Neshich #1. Example DNA Polymerase: The non-mutable active site residue! Why is this so? 3ktq.pdb chain A and Chain B residues from 605-618 Asp-610 #2. Example Acylphosphatase: the folding essential residues! How to identify them? 2acy.pdb chain: _ residues: Y11, P54, F94, Total Unused Contact Energy> 72 Kcal/molDensity> 1,08 for Probing Sphere 3A centered at CaSponge> 0,75 for Probing Sphere 3A centered at CaCross Presence Order>= 1,1,2 (at Ca, Cb and LHA)Secondary Structure Element: Beta SheetConservation Sh2Qs / Evolutionary Pressure< 28Electrostatic Potential @ Ca>= 7Electrostatic Potential @ Surface<= 0 #3. Example Active site identification HIV-1 integrase: even the most conservative substitutions of any of the three absolutely conserved carboxylate residues, D64, D116, and E152 (the so-called D,D-35-E motif), abolished catalytic activity!!! 1biu.pdb Chain:A Residues: D64, D116, E152 Conservation: SH2Qs: Relative Entropy < 30 Physical-Chemical: Electrostatic Potential: Average < -20 kT/J/mol Geometric: Pocket/Cavity in Complex: Volume > 0 #4. Example Active site identification Another example we have examined is the cytosolic ascorbate peroxidase, the protein that plays a key role in hydrogen peroxide removal in the chloroplasts and cytosol of higher plants. The evidence was obtained from site-directed mutagenesis experiments that there are 3 residues: R38, H42 and N71 are constituting a complete catalytic site and any substitution abolished catalytic activity! 1apx.pdb chain:A 1. Relevant site: Residue Location: Surface 2. Conservation: HSSP: Relative Entropy < 25 3. Conservation: HSSP: Relative Entropy 100 < 25 4. Conservation: HSSP: Evolutionary Pressure < 25 5. Conservation: HSSP: Reliability > 25% 6. Difference Conservation: HSSP and SH2Qs: Reliability > 25% http://www.cbi.cnpia.embrapa.br 7. Geometric: Pocket/Cavity in Complex: Volume > 0 Staff ¾ Roberto H. Higa ¾ MSc in Computer Science ¾ Adauto L. Mancini ¾ BSc in Computer Scince ¾ Michel B. Yamaghishi ¾ PhD in Mathematics ¾ Paula Kuser Falcão ¾ PhD in Protein Crystallography ¾ Renato Fileto ¾ PhD in DataBase Management ¾ Goran Neshich ¾ PhD in Biophysics, Group Leader http://www.cbi.cnptia.embrapa.br Goran Neshich Tesla’s land Yugoslavia/Srbia “Science is but a perversion of itself unless it has as its ultimate goal the betterment of humanity.” http://www.cbi.cnptia.embrapa.br Goran Neshich