Goran Neshich - Embrapa/CNPTIA

Transcription

Goran Neshich - Embrapa/CNPTIA
Structural BioInformatics Laboratory: SBI
Proteins and their 3 D Structure
Paula Kuser
Goran Neshich
Embrapa Informática Agropecuária
Cidade Universitária - UNICAMP
Campinas, SP
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Structure
Sequence
Blast
STING
Function
Role
SMS
Microarray
Image
Analysing
Semantic
Lexical
Pragmatic
Sintactic
•Gene Anotation
•Gene Comparison
•Function
Descriptors
•Structure
Descriptors
http://www.cbi.cnptia.embrapa.br
•Gene Expression
Networks
•Proteomics
Goran Neshich
Bringing Genome Into Three Dimensions
Cabral’s map of Brazil
Parallels that help us to see the problem better
http://www.cbi.cnpia.embrapa.br
Bringing Genome Into Three Dimensions
Old protein map
Parallels that help us to see the problem better
http://www.cbi.cnpia.embrapa.br
Bringing Genome Into Three Dimensions
Satellite map
http://www.cbi.cnpia.embrapa.br
Structure/function descriptors in JPD
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Data/information deluge
and
flavors of Bioinformatics
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Datalibrary – 2003 (february)
23.950.735 nucleotide sequences,37.486.732.136 bp
112
Published-complete genomes
590
Genomes being done
830.525
Protein Sequences
20.417
Protein Structures
5.300
Plasmodium falciparum genes, 23.000.000 bp
35.000
Genes in Homo sapiens,3.164.000.000 bp,
27936
genes in Xyllela fastidiosa,
2.519.802 Bases, 2775 proteins
10.000.000 Publications in PubMedline
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Datalibrary – 2003 (October)
29,189,427 nucleotide sequences (~40 x 109 bp)
Published-complete genomes:
Virus: 1421; Archaea:16; Bacteria:135;
Eucariots: 9 +4 vertebrates+7 plants
590
Genomes being done
1,139,154
22,700
Protein Sequences
Protein Structures (PDB)
480 genes in
35,000
27,936
Mycoplasma genitalium:
580,000 bp
Genes in Homo sapiens (3.164 x 109 bp)
genes in Xyllela fastidiosa,
2.519.802 Bases,
>10,000,000 Publications in PubMedline
http://www.cbi.cnptia.embrapa.br
Goran Neshich
SMS and Protein Dossier – Drug Target DB
Onde atuamos?
Sequenciamento
de Genomas
Genômica
Estrutural
Livro da vida
Structural DB
Descritores de estrutura
anotação
Estrutura-Funcão
Busca por novos efetores
Mutational and
dynamic studies
Interação proteína-ligante
(matching DB)
http://www.cbi.cnptia.embrapa.br
Drug Discovery
Docking
Goran Neshich
Structural Bioinformatics
1. Sequence similarity search
2. Sequence alignments
3. Structure alignment
4. Secondary structure prediction
5. Structure modeling (homology modeling)
6. Structure prediction (threding)
7. Characterization of structure
8. Relationship: sequence-structure-function
9. Function modifiers
10.Compiling the list of pairs: structure and its function
modifier
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Structural Bioinformatics
Sequence similarity search
Sequence alignment
Parallels that help us to see the problem better
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Structural Bioinformatics
AKWHGGAFWPPH
WAAGAHWPHAQD
Parallels that help us to see the problem better
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Bringing Genome Into
Three Dimensions
Functional Genomics Milestone:
How well function can be
inherited from similar
sequences?
From sequence to function: desires and problems
http://www.cbi.cnpia.embrapa.br
Structural Bioinformatics
From gene to functional protein
Parallels that help us to see the problem better
http://www.cbi.cnptia.embrapa.br
Goran Neshich
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Structural Bioinformatics
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Structural Bioinformatics
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Structural Bioinformatics
What did we learn from genomics
projects?
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Bioinformatics: insight in life on earth.....
Interesting information from genome data:
Two sequences can be instantly compared
Human and tomato histone H4 are ~
identical – as are many "housekeeping"
proteins across taxa.
Such comparisons have been very useful
in the annotation of genomes
97 Complete Microbial Genomes: [A] Archaea - 16 species [B] - Bacteria - 81
species (October 28, 2002)
http://www.ncbi.nlm.nih.gov/PMGifs/Genom
es/micr.html
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Bioinformatics: insight in life on earth.....
Interesting information from genome data:
The smallest genome of any free-living
organism is that of the bacterium
Mycoplasma genitalium – it is only 500,000
bp long (0.017% that of the human genome)
and has 450 genes (1.5% of the human
genome);
Only 12 genes are of unknown function
The core biochemical and genetic
complement of life
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Structural Bioinformatics
Sequence alignment
Parallels that help us to see the problem better
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Scoring Matrices
Instead of using points at match/mismatch, we may use
“scoring matrix”
For DNA/RNA match=1, mismatch = 0
“dotplot” is now converted into diagram of numbers and
best alignment corresponds to this diagonal with greatest
numerical value
T
G
A
C
http://www.cbi.cnptia.embrapa.br
T
1
0
0
0
G
0
1
0
0
A
0
0
1
0
C
0
0
0
1
Goran Neshich
Dotplot with scores
Two proteins aligned produce “score dotplot” from which
one can calculate optimal alignment
P
A
W
H
E
A
E
H
-2
-2
-3
10
0
-2
0
E
-1
-1
-3
0
6
-1
6
A
-1
5
-3
-2
-1
5
-1
http://www.cbi.cnptia.embrapa.br
G
-2
0
-3
-2
-3
0
-3
A
-1
5
-3
-2
-1
5
-1
W
-4
-3
15
-3
-3
-3
-3
G
-2
0
-3
-2
-3
0
-3
H
-2
-2
-3
10
0
-2
0
E
-1
-1
-3
0
6
-1
6
E
-1
-1
-3
0
6
-1
6
Goran Neshich
Structural Bioinformatics
Structure elements
Parallels that help us to see the problem better
http://www.cbi.cnptia.embrapa.br
Goran Neshich
STING Millennium Suite:
Analysing structure of proteins and their complexes -
What do we know about structure
and its relationship with function?
What are the building blocks of
microfactories, better known as
PROTEINS?
What is the structural hierarchi in
proteins?
http://www.cbi.cnptia.embrapa.br
Goran Neshich
STING Millennium Suite:
Analysing structure of proteins and their complexes Secondary
structure elements:
Helix
Turn
Sheet
Coil
http://www.cbi.cnptia.embrapa.br
Goran Neshich
STING Millennium Suite:
Analysing structure of proteins and their complexes -
Peptide bond and
other types of
“intimate” amino acid
contacts
http://www.cbi.cnptia.embrapa.br
Goran Neshich
STING Millennium
Suite:
Analysing structure of
proteins and their
complexes -
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Ramachandran Plot
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Structural Bioinformatics
Secondary structure prediction
Parallels that help us to see the problem better
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Pij =
http://www.cbi.cnptia.embrapa.br
nij / ni
N j / NT
Goran Neshich
α Helix
β Sheet
β Turn
aa
C&
F
L
pr
aa
C&
F
L
pr
aa
C&
F
L
Glu
1.51
1.44
++
Val
1.70
1.49
++
Asn
1.56
1.28
Met
1.45
1.47
++
Ile
1.60
1.45
++
Gly
1.56
1.64
Ala
1.42
1.29
++
Tyr
1.47
1.25
++
Pro
1.52
1.91
Leu
1.21
1.30
++
Phe
1.38
1.32
+
Asp
1.46
1.41
Lys
1.16
1.23
+
Trp
1.37
1.14
+
Ser
1.43
1.32
Phe
1.13
1.07
+
Leu
1.30
1.02
+
Cys
1.19
0.81
Gln
1.11
1.27
+
Cys
1.19
0.74
+
Tyr
1.14
1.05
Trp
1.08
0.99
+
Thr
1.19
1.21
+
Lys
1.01
0.96
Ile
1.08
0.97
+
Gln
1.10
0.80
+
Gln
0.98
0.98
Val
1.06
0.91
+
Met
1.05
0.97
+
Thr
0.96
1.04
Asp
1.01
1.04
=
Arg
0.93
0.99
=
Trp
0.96
0.76
His
1.00
1.22
=
Asn
0.89
0.76
=
Arg
0.95
0.88
Arg
0.98
0.96
=
His
0.87
1.08
=
His
0.95
0.68
Thr
0.83
0.82
=
Ala
0.83
0.90
-
Glu
0.74
0.99
Ser
0.77
0.82
=
Ser
0.75
0.95
-
Ala
0.66
0.77
Cys
0.70
1.11
=
Gly
0.75
0.92
-
Met
0.60
0.41
Tyr
0.69
0.72
-
Lys
0.74
0.77
-
Phe
0.60
0.59
Asn
0.67
0.90
-
Pro
0.55
0.64
--
Leu
0.59
0.58
Pro
0.57
0.52
--
Asp
0.54
0.72
--
Val
0.50
0.47
Gly
0.57
0.56
--
Glu
0.37
0.75
--
Ile
0.47
0.51
http://www.cbi.cnptia.embrapa.br
Table 1. Aminoacid
Propensities
for
secondary
structure
element
formation,
according to ChouFasman (1978b) (C&F)
and Levitt (1978) (L).
Goran Neshich
Structural Bioinformatics
Structure modelling
Parallels that help us to see the problem better
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Probably non-globular
Protein
15%
As yet unobserved folds
20%
Sequence-based fold
Recognition
50%
Full threading methods
15%
Figure 5 Hypothentical applicability of diferent categories of fold-recognition methods to the open
Reading Frames of small bacterial genomes. At present sequance-based fold recognition (e.g.
GenTHREADER) is successful for aroud 50% of the ORFs. Structures of a further 15% of ORFs can
probably be assigned. By full threading methods such as THREADER, and the reamaining 35%
cannot currently be recognized either because the fold has not yet observed, or because the ORF
encodes a non-globular protein (e.g. aTransmembrane protein).
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Unannotated regions
PDB match region
Transmembrane or
Low complexity
region
Pie Chart of structural assignments to the proteome of the bacterium Mycoplasma genitalium. Almost
half of the amino acids (49%) in the Mycoplasma genitalium proteins have a structural annotation. In
this case, the structural anotation was taken from the SUPERFAMILY database(version 1.59,
September 2002), described in Section 11.3.2.Roughty one fifth of the proteome is predicted to be a
transmembrane helix or low complexity region by therelevant computer programs. The remaining 30%
of the proteome is unassigned.
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Structural Bioinformatics
Structure alignment
Parallels that help us to see the problem better
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Structural Bioinformatics
Function modifiers: drugs
Parallels that help us to see the problem better
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Final goal: complement Genome Track
SMS and Protein Dossier – Drug Target DB
Complete Genome
Sequence
Homology Modeling
Small molecules
Database
Local PDB files
Fingerprint
Fingerprint
2D Contour map surface
matching
Ligand-binding site 2-D
information (for search )
Mutational and
dynamic studies
http://www.cbi.cnptia.embrapa.br
Protein-binding site 2-D
information (for search)
Protein/Ligand interaction
(matching DB)
Docking
Goran Neshich
Structural Bioinformatics
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Structural Bioinformatics
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Otimização do uso dos computadores disponíveis no sistema Embrapa
Tecnologia GRID (Grade computacional)
Serviços Web
Tecnologia GRID
Serviços GRID
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Acesso online aos recursos computacionais
descentralizados
ampla
disseminação
Busca por novos fármacos
In silico
Armazém
dos arquivos
Clientes com
desktop
Desafio
de instalação de Grade Computacional
na Embrapa
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Arquivo de coordenadas pdb
HEADER
COMPND
SOURCE
AUTHOR
REVDAT
JRNL
REMARK
REMARK
REMARK
SEQRES
SEQRES
SEQRES
FTNOTE
FTNOTE
SCALE1
SCALE2
SCALE3
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
COMPLEX(SERINE PROTEINASE-INHIBITOR)
04-MAR-88
1CHO
ALPHA-CHYMOTRYPSIN (E.C.3.4.21.1) COMPLEX WITH TURKEY
BOVINE (BOS $TAURUS) PANCREAS AND TURKEY (MELEAGRIS
M.FUJINAGA,A.R.SIELECKI,R.J.READ,W.ARDELT,M.LASKOWSKI
1
16-JUL-88 1CHO
0
REF
J.MOL.BIOL.
V. 195
397 1987
2 RESOLUTION. 1.8 ANGSTROMS.
3 REFINEMENT. BY THE RESTRAINED LEAST SQUARES PROCEDURE OF
3 THE R VALUE IS 0.168 FOR 19178 REFLECTIONS
1 E 245 CYS GLY VAL PRO ALA ILE GLN PRO VAL LEU SER GLY LEU
2 E 245 EXC EXC ILE VAL ASN GLY GLU GLU ALA VAL PRO GLY SER
5 I
56 PHE GLY LYS CYS
1 LYS E 36 - POOR DENSITY FOR ALL ATOMS BEYOND CG.
2 GLY E 74 THROUGH SER E 77 - POOR DENSITY FOR BOTH SIDE
0.022262 0.000000 0.005509
0.00000
0.000000 0.018342 0.000000
0.00000
0.000000 0.000000 0.018016
0.00000
1 N
CYS E
1
-15.451 30.900 -6.779 1.00 19.02
2 CA CYS E
1
-15.185 29.544 -7.307 1.00 17.58
3 C
CYS E
1
-14.777 29.619 -8.768 1.00 17.23
4 O
CYS E
1
-15.105 30.601 -9.455 1.00 16.79
5 CB CYS E
1
-16.428 28.675 -7.320 1.00 16.57
6 SG CYS E
1
-17.883 29.299 -8.170 1.00 19.13
7 N
GLY E
2
-14.110 28.558 -9.177 1.00 16.79
8 CA GLY E
2
-13.678 28.296 -10.503 1.00 15.13
9 C
GLY E
2
-12.640 29.105 -11.184 1.00 17.48
10 O
GLY E
2
-12.427 28.886 -12.414 1.00 16.69
11 N
VAL E
3
-11.950 29.979 -10.481 1.00 18.30
http://www.cbi.cnptia.embrapa.br
Goran Neshich
1CHO
22
1CHO
1CHO
1CHO
1CHO
1CHO
1CHO
1CHO
1CHO
1CHO
1CHO
1CHO
1CHO
1CHO
1CHO
1CHO
1CHO
1CHO
1CHO
1CHO
61
62
84
86
88
153
154
155
156
157
158
159
160
161
162
163
164
165
166
PDB file format
Field
No.
Column
range
1.
1–6
A6
Record ID (eg ATOM, HETATM)
2.
7 – 11
I5
Atom serial number
-
12 – 12
1X
Blank
3.
13 – 16
A4
Atom name (eg " CA " , " ND1")
4.
17 – 17
A1
Alternative location code (if any)
5.
18 – 20
A3
Standard 3-letter amino acid code for residue
-
21 – 21
1X
Blank
6.
22 – 22
A1
Chain identifier code
7.
23 – 26
I4
Residue sequence number
8.
27 – 27
A1
Insertion code (if any)
-
28 - 30
3X
Blank
9.
31 – 38
F8.3
Atom's x-coordinate
10.
39 – 46
F8.3
Atom's y-coordinate
11.
47 – 54
F8.3
Atom's z-coordinate
12.
55 – 60
F6.2
Occupancy value for atom
13.
61 – 66
F6.2
B-value (thermal factor)
-
67 – 67
1X
Blank
14.
68 - 70
I3
Footnote number
http://www.cbi.cnptia.embrapa.br
FORTRAN
format
Description
Goran Neshich
Formato PDB
Amino-ácido
Cadeia
Átomo
x
26811 arquivos pdbs
(17/08/2004)
http://www.cbi.cnptia.embrapa.br
Goran Neshich
y
z
NMR PDB files
For NMR ensembles the coordinates of each model
should be preceded by a MODEL record, and
terminated by an ENDMDL record. The format of
the former is ('MODEL',5X,I4), where the I4 holds
the model number. For example:MODEL 1
MODEL 2
...
pdb file 1abt
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Ca atoms only
1bax
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
1
2
3
4
5
6
7
8
CA
CA
CA
CA
CA
CA
CA
CA
MET
GLY
GLN
GLU
LEU
SER
GLN
HIS
1
2
3
4
5
6
7
8
-10.673
-8.177
-4.858
-4.446
-1.388
-1.477
-3.336
-0.185
-2.781
0.101
-1.792
-4.403
-4.290
-0.519
-0.468
-1.291
-24.877
-25.131
-25.204
-22.446
-20.154
-20.745
-17.418
-15.497
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
9.53
6.91
5.34
3.07
1.88
1.77
1.68
1.30
C
C
C
C
C
C
C
C
1 cho
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
1
2
3
4
5
6
7
8
9
10
11
N
CA
C
O
CB
SG
N
CA
C
O
N
CYS
CYS
CYS
CYS
CYS
CYS
GLY
GLY
GLY
GLY
VAL
E
E
E
E
E
E
E
E
E
E
E
1
1
1
1
1
1
2
2
2
2
3
http://www.cbi.cnptia.embrapa.br
-15.451
-15.185
-14.777
-15.105
-16.428
-17.883
-14.110
-13.678
-12.640
-12.427
-11.950
30.900
29.544
29.619
30.601
28.675
29.299
28.558
28.296
29.105
28.886
29.979
-6.779
-7.307
-8.768
-9.455
-7.320
-8.170
-9.177
-10.503
-11.184
-12.414
-10.481
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
19.02
17.58
17.23
16.79
16.57
19.13
16.79
15.13
17.48
16.69
18.30
1CHO
1CHO
1CHO
1CHO
1CHO
1CHO
1CHO
1CHO
1CHO
1CHO
1CHO
Goran Neshich
156
157
158
159
160
161
162
163
164
165
166
PDB Metrics
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Literature
The PDB format, mmCIF formats and other data
formats.
Describes the different data formats and protocols
used to represent PDB structures. Westbrook, J and
Fitzgerald, PM (2003):
Structural Bioinformatics. P. E. Bourne and H.
Weissig. Hoboken, NJ, John Wiley & Sons, Inc. pp.
161-179.
The Protein Data Bank.
Nucleic Acids Research, 28 pp. 235-242 (2000)
H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland,
T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne:
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Future - The Structure Should
be the User Interface
Ligand - What other
entries contain this?
Chain - What other
entries have chains with
>90% sequence identity?
Residue - What is the
environment of this residue?
STING!!!!!
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Our mission
1. Understand fully relationship b/w structure
and function by collecting and mapping all
respective descriptors in a DB
2. Make this DB accessible for analysis
through interactive interface
3. Elaborate specific data and structure
classifications
4. Analyze: folding, stability, binding,
families, etc.
http://www.cbi.cnptia.embrapa.br
Goran Neshich
But: Graphics
The natural human brain language is
VISUAL
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Structure/function descriptors in JPD
http://www.cbi.cnptia.embrapa.br
Goran Neshich
STING Millennium Suite (SMS)
General Computational Biology/Bioinformatics procedures
1. sequence homology search and position specific sequence
conservation;
2. sequence alignments;
3. structural alignments;
4. search for sequence to structure relationship;
5. definition of structural parameters like intra and inter
chain contacts;
6. structure modeling with evaluation of the quality of
obtained models;
7. protein interfaces and identification of active and ligand
binding sites;
8. cumulative statistics on protein family characteristics.
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Gold STING entry page
http://www.cbi.cnptia.embrapa.br
Goran Neshich
SBI entry page
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Gold STING web page
http://www.cbi.cnptia.embrapa.br
Goran Neshich
GOld STING mirror sites
http://www.cbi.cnptia.embrapa.br
Goran Neshich
http://www.cbi.cnptia.embrapa.br
http://www.cbi.cnptia.embrapa.br
Goran Neshich
STING evolution
2004
2005
2006
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Publications:
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Java Protein Dossier
Configure view
Reload Sequence
Legenda
Select residues
Unselect selected residues
Show selected residues
Stop flashing residues
Lock scroll bar
Change chains
Show shortest contacts only
Hide contacts with water
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Structurally aligned: chymotrypsin &
trypsin
http://www.cbi.cnptia.embrapa.br
Goran Neshich
JPD
IFR alignement for two enzymes
http://www.cbi.cnptia.embrapa.br
Goran Neshich
X-Y plots in Gold STING
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Navigating STING
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Final Objective: automatic function assignment
Attributes = Parametrs
Set of parameters can be used to describe the
protein function
Therefore = > predict function having the
structure.
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Final Objective: automatic function assignment
Enzyme Classification (EC)
Six classes of enzymes at top level:
1. Oxidoreductase
2. Transferase
3. Hydrolase
4. Lyase
5. Isomerase
6. Ligase
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Protein Dossier: applications
Function / activity
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Gold STING
Why 8000 clones???
http://www.cbi.cnptia.embrapa.br
Goran Neshich
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Protein Dossier: applications
Searching and Learning by Selecting
http://www.cbi.cnptia.embrapa.br
Goran Neshich
http://www.cbi.cnptia.embrapa.br
Goran Neshich
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Protein Dossier: applications
Folding and Stability
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Three folding crytical residues
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Parameter Name/Parameter value (range) or description
Total Unused Contact Energy> 72 Kcal/mol
Density> 1,08 for Probing Sphere 3A centered at Ca
Sponge> 0,75 for Probing Sphere 3A centered at Ca
Cross Presence Order>= 1,1,2 (at Ca, Cb and LHA)
Secondary Structure Element: Beta Sheet
Conservation Sh2Qs / Evolutionary Pressure< 28
Electrostatic Potential @ Ca>= 7
Electrostatic Potential @ Surface<= 0
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Parameter set defining active site
Is there a set of parameters which can define
UNIQUELY an amino acid ensemble which
coincides with the active site of a given
protein
?
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Protein Dossier: applications
Active site identification
HIV-1 integrase is an essential enzyme in the life cycle of the
virus, responsible for catalyzing the insertion of the viral
genome into the host cell chromosome; it provides an
attractive target for antiviral drug design!!
The evidence was obtained from site-directed mutagenesis experiments in which
it was demonstrated that even the most conservative substitutions of any of
the three absolutely conserved carboxylate residues, D64, D116, and E152
(the so-called D,D-35-E motif), abolished catalytic activity!!!
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Protein Dossier: applications
Active site identification
http://www.cbi.cnptia.embrapa.br
Goran Neshich
STING Report
http://www.cbi.cnptia.embrapa.br
Goran Neshich
STING Report
1ot5.pdb
http://www.cbi.cnptia.embrapa.br
1gci.pdb
1ic6.pdb
Pro-hormone convertases.
Goran Neshich
Structure descriptors pointing to
the enzyme: Serin-protease and
alpha-amylase family case
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Structural alignment of SERPRO using PrISM algorithm
SERPRO super pdb file ready for docking
www.cbi.cnptia.embrapa.br/SMS/SPIDER
Goran Neshich
IFR data and analysis for SDM
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Protein Dossier: applications
Active site identification
Another example we have examined is the cytosolic
ascorbate peroxidase, the protein that plays a key role in
hydrogen peroxide removal in the chloroplasts and cytosol
of higher plants. The evidence was obtained from sitedirected mutagenesis experiments that there are 3 residues:
R38, H42 and N71 are constituting a complete catalytic site
and any substitution abolished catalytic activity! By first
selecting number of parameters and then fixing the range for
their numerical values, we were able to select only the amino
acids of the cytosolic ascorbate peroxidase that constitue the
catalytic site. The parameters used and ranges for them were:
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Protein Dossier: applications
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Structure descriptors and
fingerprint matching
http://www.cbi.cnptia.embrapa.br
Goran Neshich
#1. Example
DNA Polymerase: The non-mutable active site residue! Why is this so?
3ktq.pdb
chain A and Chain B
residues from 605-618
Asp-610
#2. Example
Acylphosphatase: the folding essential residues! How to identify them?
2acy.pdb
chain: _
residues: Y11, P54, F94,
Total Unused Contact Energy> 72 Kcal/molDensity> 1,08 for Probing Sphere 3A centered at CaSponge> 0,75 for Probing Sphere 3A centered at CaCross Presence
Order>= 1,1,2 (at Ca, Cb and LHA)Secondary Structure Element: Beta SheetConservation Sh2Qs / Evolutionary Pressure< 28Electrostatic Potential @ Ca>=
7Electrostatic Potential @ Surface<= 0
#3. Example
Active site identification
HIV-1 integrase: even the most conservative substitutions of any of
the three absolutely conserved carboxylate residues, D64, D116, and E152
(the so-called D,D-35-E motif), abolished catalytic activity!!!
1biu.pdb
Chain:A
Residues: D64, D116, E152
Conservation: SH2Qs: Relative Entropy < 30
Physical-Chemical: Electrostatic Potential: Average < -20 kT/J/mol
Geometric: Pocket/Cavity in Complex: Volume > 0
#4. Example
Active site identification
Another example we have examined is the cytosolic ascorbate peroxidase, the protein that plays a key role in hydrogen peroxide removal in the chloroplasts and
cytosol of higher plants. The evidence was obtained from site-directed mutagenesis experiments that there are 3 residues: R38, H42 and N71 are constituting a
complete catalytic site and any substitution abolished catalytic activity!
1apx.pdb
chain:A
1. Relevant site: Residue Location: Surface
2. Conservation: HSSP: Relative Entropy < 25
3. Conservation: HSSP: Relative Entropy 100 < 25
4. Conservation: HSSP: Evolutionary Pressure < 25
5. Conservation: HSSP: Reliability > 25%
6. Difference Conservation: HSSP and SH2Qs: Reliability > 25%
http://www.cbi.cnpia.embrapa.br
7. Geometric: Pocket/Cavity in Complex:
Volume > 0
Staff
¾ Roberto H. Higa
¾ MSc in Computer Science
¾ Adauto L. Mancini
¾ BSc in Computer Scince
¾ Michel B. Yamaghishi
¾ PhD in Mathematics
¾ Paula Kuser Falcão
¾ PhD in Protein Crystallography
¾ Renato Fileto
¾ PhD in DataBase Management
¾ Goran Neshich
¾ PhD in Biophysics, Group Leader
http://www.cbi.cnptia.embrapa.br
Goran Neshich
Tesla’s land
Yugoslavia/Srbia
“Science is but a perversion of itself unless it has as its ultimate
goal the betterment of humanity.”
http://www.cbi.cnptia.embrapa.br
Goran Neshich

Similar documents