slides

Transcription

slides
QSTAR
An integrated analysis of biological, chemical and
gene expression data in drug discovery
9 October 2014
IWT grant QSTAR (2011-2013)
Joint analysis of three high dimensional data types
[Image: Andreas Bender]
The team
Janssen scientists from Oncology,
Neurology, Infectious Diseases
Academic collaborations:
chemoinformatics, statistics,
machine learning, platform-specic
data preprocessing
Further Janssen team members:
Medicinal chemistry,
chemoinformactics, biology,
molecular proling, statistics, IT,
HTS, exploratory toxicology
The team
[Images: Jörg Kurt Wegner]
Motivation
Life science becomes ever more complex
Craig Federighi, Apple's Senior VP of Software Engineering
[Image: http://lifehacker.com/new-is-easy-right-is-hard-1371919541]
Motivation
Drug Discovery: Innovation <> Approval of new drugs
Scannell et al., Diagnosing the decline in pharmaceutical R&D eciency
[Image: Nature Reviews Drug Discovery 11, 191-200 (March 2012)]
Motivation
Consequences of increasing complexity and technological innovations
1
Specic expertise not
present within a company
2
Diculties in reproducing
research
Need to establish collaborations
with people
who have the missing
expertise
who generate reproducible
(= reliable) data
[Graphic:
http://www.nature.com/scitable/ebooks/englishcommunication-for-scientists-14053993/writingscientic-papers-14239285]
Increasing complexity in life sciences
Lack of expertise
Pharmaceutical companies are trying to expand the number of
collaborations to be successful.
Since 2011, six of Janssen's nine newly
approved drugs have come from
outside.
About half of Janssen's
drug-development pipeline is from
outside, versus about 20% in 2002.
[WSJ, 9 March 2014]
[Graphic: Jonathan D. Rocko,
The Wall Street Journal]
Increasing complexity in life sciences
Diculties in reproducing research
A pharmaceutical company must rely upon
high quality scientic ndings
However, diculties in reproducing results
statistical hypothesis inference testing
signicance is no indicator of
practical relevance
[R. Nuzzo, Scientic method: Statistical
[Graphic: Regina Nuzzo,
Nature, 12 February 2014]
errors, Nature, 12 February 2014]
Possible cause: researchers tackling questions beyond their
level of expertise
Possible solution: broader and more interdisciplinary teams
Increasing complexity in life sciences
Diculties in reproducing research
[Graphic: Janssen DDI Sensibilisation Campaign]
IWT grant QSTAR (2011-2013)
Joint analysis of three high dimensional data types
[Image: Andreas Bender]
Drug discovery
Target-based assays
Phenotypic assays
[Image:
http://www.idigitalmotion.com/drugdisc.htm]
Phenotypic assays
Unknown target
[Image: Yeom et al. Journal of Translational Medicine 2009 7:70]
[Image: Science Vol. 307, no.
5707, 14 January 2005]
Scientic challenges
Hit classication
Bioactivity and promiscuity
Grouping by functional similarity
Number, magnitude of changes
Annotation or screening
Target deconvolution
Similarity to external perturbagens
Concentration, analogs, cells
Target / Scaold hopping
Early detection of liabilities
Based on functional similarity
Functional similarity to known liabilities
Context
IWT R&D project QSTAR (2011-2013)
Can we detect and utilize
Quantitative Relations between three
high dimensional data sets?
chemical data on compound
Structures and substructures
biological data from
Transcriptional proles as a global measure of
bioactivity / polypharmacology
biological
Assays (IC50s)
Data
Chemical structures
Need to be represented in a
computer-understandable
way but also need to be
interpretable by a medicinal
chemist!
Binary data for presence /
absence of specic
compound substructures
Numerical data for global
physicochemical properties
Fingerprinting
Algorithm
Data
Bioassays (ABCD)
Level of activity in specic biological
assay (e.g., IC50)
Lots of missing values for a
compound
Can we ll the matrix via target
prediction?
Compounds x Bioassays
Data
Gene activity proles
Need for holistic bioactivity measures
of a compound (vs. specic)
Complex molecular characterization
of a cellular phenotype
Transcriptional proles as a global
measure of bioactivity /
polypharmacology
Compounds x Genes
Toolbox
Data analysis:
Standardized data objects
(chemical structures / bioassays / gene activity proles)
Same ECFP6 - ngerprint ID across all projects
Same compound-specic IUPAC InChIKey ID across all
Same Entrez Gene ID across bioassay and gene expression data
Medicinal chemistry:
Translatable structure encoding
ECFP6 ngerprints
Global molecular characteristics
Toolbox
Complete data sets
Internally (business relevant)
Project specic
All of J&J
Public dataset (academic partners)
ChEMBL
Connectivity map
Toolbox
Version control for data objects and data preprocessing
R packages for sharing documented analysis functions
Sweave documents
explaining data analysis processes
reproducible results
OpenAtrium communication platform
Accessible data server
White paper
QSTAR Pipeline
Stable, reproducible data generation
Stable ECFP6 - ngerprint IDs
across data sets
Visualization of ECFP6 - dened
substructures on molecule
(Arcadia)
Scientic challenges
Focus of the project
1. Chemotype / compound prioritization (hit to lead)
2. Compound optimization (lead optimization)
3. Bioactivity and promiscuity
Link transcriptional prole <> bioassay
ROS1 example
Number of dierentially expressed genes as an
indicator for compound promiscuity
Level of promiscuity largely stable during optimization
for potency in primary assay
Transcriptional prole for chemotype selection
Ongoing: Link transcriptional prole <> bioassay
L1000 platform
Increase number of compound proles (>10x)
Connect data to LINCS data (http://www.lincsproject.org/)
Started pre-competitive L1000 user/interest group
(AZ, Boehringer, BMS, Eli Lilly, EMD Serono, GSK, Janssen,
Novartis, Pzer)
Ongoing: Link compound data <> bioassay
Target Prediction
Using
Machine
Learning
ap-
proaches we can currently predict
the
activity
on
these
targets
correctly for
4 out 5 for 300 protein
targets (Virtual assay")
get an enrichment with
active compounds for 600
protein targets
(Enrichment")
Bioassays linked to a specic
have no predictive power for
protein target
200 targets
ChEMBL: 734
Janssen: >1000
Acknowledgements
Janssen
University of Hasselt
Durham University
Ilse Van den Wyngaert
Ziv Shkedy
Adetayo Kasim
An De Bondt
Nolen Joy P.
Pieter Peeters
Martin Otavain
Miroslav Cik
Philippe Haldermans
Dirk Wuyts
Marc Mercken
Tarig Bashir
Karine Smans
Berthold Wroblowski
Joerg Wegner
University of Michigan
Andreas Mayr
University of Rochester
Matthew McCall
Lieven Clement
Willem Talloen
Pushpike Thilakarathne
Dhammika Amaratunga
Nandini Raghavan
Harrie Gijsen
Ulrich Bodenhofer
Günter Klambauer
KU Leuven
Herman Van Vlijmen
Djork-Arne Clevert
Fan Meng
Geert Verheyen
Patrick Marichal
Sepp Hochreiter
Manhong Dai
Karin Verstraeten
Luc Bijnens
University of Linz
Cambridge University
Andreas Bender
Aakash Ravindranath
University of Ghent
Olivier Thas
Bie Verbist
Andreas Mitterecker
Martin Heusel
OpenAnalytics
Tobias Verbeke
Aditya Bhagwat
Sogeti
Steven Osselaer
Randstad
Liesbet Vervoort