- ChemAxon

Transcription

- ChemAxon
Development of High-Content Molecular
Libraries for Filling the Gap between Target
and Ligand Chemical Spaces
Dr. Mireille KRIER
Dr. Didier ROGNAN
Bioinformatics of the Drug
CNRS UMR 7081
F-67400 Illkirch, France
mireille.krier@merck.de
didier.rognan@pharma.u-strasbg.fr
Databases for the Drug Discovery Process
Overview of our inhouse databases
ƒ BioinfoDB
ƒ SBI
ƒ Sc-PDB
ƒ hGPCR-Lig
BioinfoDB
The supplier compounds database
http://bioinfo-pharma.u-strasbg.fr/bioinfo
Commercially-available screening collections
ƒImportant sources for identifying hits by virtual screening (VS)
ƒThey cannot be directly used as such for VS because of some issues:
9 redundancy (intra and inter-duplicates)
9 diversity
9 unknown drug- or lead-likeness
9 unsuitable format (non-ionized, counter-ions, racemates)
ƒ ‘Unified’ screening collections are available
http://www.chemnavigator.com
i-Research chemical Library
21 million samples
not ready to screen
not ‘clean’
not free
http://www.mdli.com
MDL screening Compounds
Directory
3.5 million structures
not ready to screen
± clean
not free
http://blaster.docking.org/zinc/
Zinc
3.3 million structures
ready to screen
relatively clean
free
Preprocessing workflow for BioinfoDB
Raw
Libraries
File and Data
handling
Filters
ƒError checking
ƒMolecule separation
ƒDuplicate removal
Definition of 162 filtering rules
(property, functional group)
ƒ 8 topological descriptors (e.g. MW,
PSA, etc..)
ƒ 11 atom-based matchcounts
(inorganic, carbon/heteroatom ratio,
etc ..)
ƒ 78 chemical moities with
matchcounts (aldehyde, aziridine,
etc…)
ƒ 32 dyes
ƒ 34 promiscuous binding motifs
3D structure
generation
ƒStereoisomer(s)
Protomeric
state
Descriptor
calculations
BioinfoDB
Rognan (2005) La Gazette du CINES, 20, 1-4.
The ‘BioinfoDB’ Library
Necessity to customise a high-content collection of commerciallyavailable ‘drug-like’ compounds:
- coverage of all stock compounds, deliverable in vials
- removal of redundancy (within and between diverse collections)
- selection of user-defined profiles (drug-like, lead-like, scaffolds, fragments)
- accurate chemoinformatics (ionization, stereochemistry, tautomerism, descriptors)
- avoid format conversions
- storage in a SQL database (1-D: smiles, 2-D: sd, 3-D: mol2)
- easy to browse (web interface)
- easy to update with a fully automated protocol
=> Choice of Chemaxon to customise a database of high-quality ‘drug-like’
compounds
Jchem
Marvin beans
Filter the structures
Evaluator: JChem module to filter molecules
by chemical expression according a user-defined
intensity (pharmacological tool, drug-like, lead-like, fragment
SMARTS definitions
Filtering rules
Drug-like structures
Browsing the BioinfoDB
ƒ Import the annotated SD file in a SQL table under JChem Base
ƒ Browsing by JSP queries
http://bioinfo-pharma.u-strasbg.fr/bioinfo
Bioinfo
release
Bioinfo
5.15.1
release
Drug-likeness of commercial Libraries ?
Drug likeness %
100
80
60
ACD
Asinex
Bionet
Biospecs
Chembridge
ChemDiv
ChemStar
40
CNRS
InterBioScreen
20
0
LeadQuest
Maybridge
Timtec
VitasM
Krier et al. (2006), J. Chem. Info. Model., 46, 512-524
SBI
The scaffolds database
http://bioinfo-pharma.u-strasbg.fr/scaffolds
Diversity analysis workflow of compound libraries
Cpds
Lib.
Cluster by MCS
Duplicates and tautomers
detected by InChI
Remove redundancy
Classes
Singletons
Calculate
R-groups
Scaffolds
Lib.
21 393 scaffolds
No
Yes
Krier et al. (2006)
J. Chem. Info. Model.,
46, 512-524
Quantify diversity
by PC50C, NC50C
25
Percentage of scaffolds
Rare
scaffolds
Number
of compounds
in class > 25
R-Group distribution
20
15
10
5
0
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17
Number of R-Groups on scaffold
Diversity of commercial collections
MedChem scaffolds (>25 cpds in class)
CBG
Scaffold
Diversity
Collection
Size
„
Low
Large
„
Low
Medium
„
Medium
Medium
„
High
Small
Classified compounds ( *10^3 )
150
IBSs
CDIc
100
ASIg
ASIp
SPE
50
VITs
TRI
CDIi
MDDR*
TIMs
CST
MAY
NET
IBSn
VITt
0
5
10
15
CNR
20
TIMn
35
PC50C = % of classes containing 50% of the classified compounds
MDDR = MDL Drug Data Report
ASIg = ASINEX Gold
Asip = ASINEX Platinum
CBG = CHEMBRIDGE
CDIc = CHEMDIV Clab
CDIi = CHEMDIV Idc
CNR = CNRS Patrimoine
CST = CHEMSTAR
IBSn = INTERBIOSCREEN Natural
IBSs = INTERBIOSCREEN Synthetic
MAY = MAYBRIDGE
NET = BIONET
TRI = TRIPOS
VITs = VITAS-M Synthetic
VITt = VITAS-M Natural
SPE = SPECS
TIMn = TIMTEC Natural
TIMs = TIMTEC Synthetic
Krier et al. (2006)
J. Chem. Info. Model., 46, 512-524
Browsing the SBI
Sc-PDB
The database annotating
Proteins, Active Sites & Ligands
of the Protein Data Bank
http://bioinfo-pharma.u-strasbg.fr/scPDB
sc-PDB Development
undesirable
entries
30 000 entries
solvent,
detergent,
etc…
Organic
Ligand
Cofactor/
Ions
Peptide
Ligand
Potential Ligands
undesirable
cofactors/
Ions
Target
undesirable
ligands
Ligands
Active sites
Topological screen
1 Ligand / Site
pair
6 415 entries
Target
Ligand
Site
Paul et al. (2004) Proteins, 54, 671-680.
Kellenberger et al. (2006) J. Chem.
Info. Model., 46, 717-727.
Sc-PDB: Distribution
1000
1 706 non redundant proteins
2 721 non redundant ligands
number of ligands
number of proteins
100
50
40
30
20
10
0
10 20 30 40 50 60 70 80
140 150 160 170 180
number of occurences
100
70
60
50
40
30
20
10
0
0
20
40
60
1850 (35%)
100
120
140
160
number of occurences
169 (3.2%)
253 (4.8%)
1129 (22%)
369 (7%)
80
Peptides, pseudopeptides (13%)
oxidoreductase
transferase
hydrolase
lyase
isomerase
ligase
Nucleic acids (12%)
Sugars (10%)
Lipids (0.51%)
1469 (28%)
Organics (64%)
Browsing the Sc-PDB
The sc-PDB can be browsed to prioritize protein-ligand
complexes using simple user-defined queries based on :
- Ligand/cofactor properties
AND/OR
- Target properties
hGPCR-Lig
The database matching the GPCR protein space with
the GPCR ligand space
http://bioinfo-pharma.u.strasbg.fr/hGPCRLig
GPCR Topology
ƒ7 Transmembrane Helical domains
N
E1
E2
E3
1 2 3 4 5 6 7
I1
I2
I3
C
ƒBroad Ligand diversity
photon
monoamines
peptides
chemokines
hormones
Ca++
glutamate
Thrombin
Anaphylatoxin
C3a, C5a
EGF-TM7
GPCR Chemoproteomics
Similar binding sites should recognize similar ligands
ƒ Predict the ligands of a given target
ƒ Predict the target(s) of a given ligand
ƒ Compare targets (ligand binding sites)
ƒ Predict selectivity profiles (ligand, target)
Matching Target with Ligand space
Ligand space
Target space
ca. 800 human GPCRs
ca. 17 000 known GPCR ligands
(MDDR database)
All druggable ?
drug-like, lead-like, fragment-like?
How to organize it ?
How to organize it ?
Match both spaces ?
Assist hit discovery for new GPCRs ?
GPCR Target space: hGPCR database
ƒ contains most human non-olfactory
GPCRs
ƒ obtain reliable sequence alignments
(7-TMs)
ƒ generate reliable high-throughput 3D models
ƒ not bias the TM cavity by the X-ray
structure of bovine rhodopsin
1f88 PDB entry
http://bioinfo-pharma.u.strasbg.fr/hGPCRLig
GPCR-Mod: High-throughput modelling of GPCRs
UniProt
Sequences
GPCR-Align
Multiple Alignement (TMs)
369 3-D Models
(ground state)
369 TM cavities
(30 residues)
Bissantz et al. (2004) JCICS, 44, 1162-1176.
GPCR-Gen
Automated generation of
3-D coordinates (TMs)
GPCR-find
TM cavity Comparison
Reducing the complexity of information
Set of 369 human GPCRs
Highly variable amino acid sequences (290Æ 6,200 residues)
How to reduce complexity w/o loosing information ?
pl i
Si
m
s
7-TMs
(189)
cu
Fo
c it
y
Cavity
(30)
Full sequences
(290-6,230)
Information
Surgand et al. (2006) Proteins, 62, 509-532
Chemoproteomic analysis of human GPCRs
1. Determine a consensus TM cavity
2. Concatenate TM cavitylining residues in ungapped
sequences
(30 residues pointing inwards the cavity
and frequently used by most neutral
antagonists/inverse agonists)
7.35
7.39
1.35
1.39
1.42
7.43
6.51
6.48
6.44
6.52
6.55
5.43
7.45
2.65
3.36
1.46
2.61
2.58
3.32
2.57
3.40
5.39
5.46
3.33
3.28 3.29
5.38
4.56
4.60
5.42
Chemoproteomic analysis of human GPCRs
3. Derive a TM cavity-biased phylogenetic tree
Pairwise distance: identity
Hierarchical clustering: UPGMA
Bootstrapping 1,000 replica
Consensus tree
Prostanoids (8)
906
799 Adhesion (33)
Glycoproteins (8)
894
SREBs (6)
MAS (11)
648
Opsins (10)
Secretin (15)
Glutamate
(23)
1000
238
806
775
Amines (45) 883
780
Melanocortin (5)
1000
Brain-gut
peptides
(10)
273
Adenosine (6)449
620
Frizzled (11)
485
Lipids (14)
Vasopeptides
211
726
(7)
Melatonin (7) Peptides
(26) 676
Opiates (13)
431
409
747
Purines (35)
Chemokines (23)
Chemoattractants
(17)
909
Acids (5)
Surgand et al. (2006) Proteins, 62, 509-532
Organising GPCR Ligand space
MDDR database
150 K cpds Hand-curated
+
GPCR-Ligands
2,5 K cpds
Keyword-based
Search
17 K
GPCR ligands
MCS
Clustering
958 scaffolds
Creation of an annotated compound library directed to the GPCR family
Matching Target and Ligand spaces
Clusters
Scaffolds
S1
C1
C2
C3
C4
C5
Enrichment, %
100
S2
S3
80
S4
S5
S6
60
40
S7
S8
S9
S10
20
0
Matching Target and Ligand spaces
Cluster
Class
# Cpds
Scaffold Enrichment, %
Cluster significance
Matching Ligand to Target space
N N
N
N
H
1.Database
search
O
AG2R AG2S AG22
GHSR L4R1 L4R2
6 known
GPCR targets
2.Cavity
alignment
OH
3.Extracting hotspots
Privileged structures
Chemoproteomic link
APJ
C5L2
FMLR
GALS
GPR1
Q9GZQ4
C3AR
CML1
G2A
GP15
MTLR
SPR1
C5AR
FML1
GALR
GP44
NTR1
4. Cavity search
17 putative new GPCR targets
Expt. Validation: AT1, AT2 ligands Æ GPR44 (CRTh2) ligands
Frimurer et al. (Bioorg Med Chem Lett 2005; 15:3707-3712 )
TM hotspots
Measuring Distances between 2 GPCRs
1-D Approach
3-D Approach
Measure identity of
30 cavity-lining residues
Entry
Projecting descriptors
onto a cavity-centered sphere
TM Cavity
5HT1A TLLAVLAQFIDVCIIPYTSTAFWFFAGNYN
5HT7R ILITVMVDFIDVCIIPYTSTAFWFFSELYN
Similarity = Common/Total = 0.7
Discretized sphere (80 triangles)
3 geometrical descriptors
5 physchem descriptors
Projection to
Cα atoms
Normalized score
Similarity (5HT1A vs. 5HT7R) = 0.94
Browsing the hGPCR-Lig
Conclusions
ƒ Creation of annotated compound libraries
ƒ Chemaxon tools help us to build up the basis
for chemoproteomics analysis
ƒ Easy interfacing with other applications
Acknowledgements
Claire SCHALON
Dr. Esther KELLENBERGER
Guillaume BRET
Dr. Didier ROGNAN
Nicolas FOATA
Pascal MULLER
Dr. Jean-Sébastien SURGAND