LRI-AMBIT: a predictive tool dressed in IUCLID, quality and power

Transcription

LRI-AMBIT: a predictive tool dressed in IUCLID, quality and power
LRI-AMBIT: a predictive tool dressed in IUCLID,
quality and power built-in
FACTS ABOUT QSAR MODELS AND READ ACROSS
Mario Negri Institute, Milano, April 29, 2015
Dr Bruno Hubesch
Cefic LRI Programme Mgr
bhu@cefic.be
Talk outline
1. What is LRI ? (in brief)
2. AMBIT-IUCLID:
A. the tool
B. the power
C. policy impact
2
What is LRI? – A Global Programme
• Commitment demonstration
to RC and GPS
• ICCA-LRI: A global effort
under ICCA
• ACC
• Cefic
• JCIA
3
Executing the long-term refocus of
the LRI Key questions (< 10y)
•
Omics:
21st Century Toxicology; Link information at molecular level to health impact – interpretation of results until we
know all; 3Rs for animal testing
•
Predictive tools for health impact:
pragmatic approaches to reduce complexity, but still robust in predicting health effects
•
Combination effects:
Identify combination effects scenarios of concern
•
Eco-systems approach:
New concept for Ecosystems approach with population relevance
•
« Real life Exposure » :
Predictive, validated exposure scenarios for environmental stressors
•
Comparative assessment:
impact of health and environmental stressors
•
Benefit – risks approaches:
societal drivers for public acceptance for innovation
4
CEFIC LRI PROJECT EEM9.3-IC
A. THE TOOL: AMBIT-IUCLID
Linking LRI AMBIT Chemoinformatic System
with the IUCLID Substance Database to Support
Read across of Substance Endpoint data &
Category Formation
Sep 2013 – Sep 2015
NINA JELIAZKOVA
IdeaConsult Ltd.
Sofia, Bulgaria
www.ideaconsult.net
http://ambit.sourceforge.net
AMBIT CEFIC LRI-EEM9.3
 “Building blocks for a future (Q)SAR decision support
system: databases, applicability domain, similarity
assessment and structure conversions” (2004, 2008)
 Part of the LRI Toolbox (http://www.cefic-lri.org/lritoolbox/ambit)
AMBIT REST web services

OpenTox Application Programming Interface (API)
since 2009

Dataset web services (import / export)

Chemical search [exact, similar, substructure] and

Prediction tools e.g. Cramer rules , Protein binding, pKa

Data analysis tools e.g. regression, classification,

Web Applications using AMBIT REST web services

New 2014: Embeddable JS widgets
meta data, data pooling, structure QA , structure
optimisation, tautomers generation
etc
Open source
clustering, descriptor calculation
Multiplatform (Windows, Unix, Mac)
AMBIT CEFIC LRI-EEM9.3-IC (2013)
2011, 3:18 doi:10.1186/1758-2946-3-18
6
MAJOR GOALS OF THE CEFIC LRI PROJECT
EEM9.3-IC
Enhancing the predictive power of in-silico tools like LRI
AMBIT
 Use of Substance data which are already available in a company
IUCLID Database.
 In addition CEFIC LRI is checking how to receive the data from the
ECHA Dissemination site for ca. 7000 substances.
 Other reliable (quality) data will be integrated as well
Support read across and category formation workflows
 Avoiding animal tests and enhance the justification for read
across and category formation as requested by authorities
Minimize overall testing and resource costs
 Using available studies for other substances as well + WoE
IDEACONSULT LTD.
7
PROJECT OBJECTIVES
1. Implement Web services allowing
communication of IUCLID and AMBIT for all
data elements important to support readacross and category formation within AMBIT.
2. Establish filters to select the most relevant and
most reliable data to be accessed by AMBIT
e.g. only data from key studies, only data with
Klimisch code 1 or 2, and excluding data from
UVCB substances that have no information on
composition or structural identity, etc.
3. Enable AMBIT to handle substance IDs,
including structures, composition of monoand multi-constituent substances, and, in
special cases of UVCBs, addressing
constituents, impurities and additives in the
way IUCLID is structured. This would help
avoiding the disadvantages of other
chemoinformatic systems where in some
cases multi-constituent substances are
erroneously linked to a single structure.
4. Add specific search functions to AMBIT
a) to allow tracing of identified toxicants back to
one or more substances in IUCLID
b) to allow tracing of identified toxicants back to
one or more company product(s) using a
company product number integrated in AMBIT &
IUCLID.
5. Improve methodologies and user interface in
AMBIT to support the user in establishing a readacross or category formation strategy using the
(filtered) data from IUCLID and to help assessing
the results.
6. Improve the linking of other (high) quality
databases & other chemoinformatic systems
with AMBIT e.g. DEREK, METEOR, etc.
7. Analyse how the data in the ECHA REACH
Dissemination Web site can be accessed from
AMBIT to check whether data could be made
available for read-across and category formation.
8. Ensure that AMBIT is compatible with the OpenTox
Framework to allow flexible development using
publicly available methodology instead of
following proprietary approaches (upgrades)
IDEACONSULT LTD.
8
PROJECT OVERVIEW
Data
transfer
Company IUCLID DB
ECHA IUCLID DB
As
Major Data Source
Search for data
possible but
not for structures
AMBIT FUNCTIONS
 Assigning structures
-to constituents, impurities …
 Search structures & Data
- exact, similar, substructure
- combined with data search
 Read across/category formation
-Workflows supporting the user
CEFIC LRI AMBIT
Chemoinformatics  Prediction tools
-Cramer rules, Protein binding etc
System
 Data analysis tools
-Regressions, clustering etc.
 Data management
Supporting
-flexible import/export of data
Read across
& Category formation  Data exchange
- manual or automated via
webservices
9
SUBSTANCE IDENTITY
STRUCTURES OF CONSTITUENTS, IMPURITIES, ADDITIVES
“Only a limited number
of tools are capable to
provide easily
accessible data on
substance identity,
composition
together with chemical
structures and high
quality and detailed
endpoint data”
All examples are for illustration only.
The data is fictitious.
10
ENDPOINTS DATA IN IUCLID 5
Data retrieval:
 “Manually”
 Query Subst. ID  Select Substance 
Select Endpoint  Select EP record 
Scroll to data point of interest
 Using the IUCLID Query Tool
Example: Searching a Substance and a
specific endpoint e.g. Biodegradation
Filing of Query blocks required
SUBSTANCE ENDPOINT RECORDS
IUCLID 5.5 XML schema
 large and complex data model:
thousands of data fields, different
documents linked by UUIDs, different
data types and deeply nested data
structures;
 includes 126 OECD Harmonized
templates (OHT) for Endpoint study
records (physicochemical,
environmental, health hazard and
toxicity endpoints);
 79 templates, defining Endpoint
summaries.
the specific
templates and
fields imported are
based on
recommendations
by the Clariant
CompTox team
IDEACONSULT LTD.
12
AMBIT: 43 SUPPORTED IUCLID5 ENDPOINTS
IDEACONSULT LTD.
13
AMBIT: ENDPOINT STUDY RECORDS
PHYSICOCHEMICAL PROPERTIES
The endpoint study records are displayed in tabs,
grouping physicochemical properties, environmental fate,
ecotoxicity and toxicity endpoints
IDEACONSULT LTD.
14
AMBIT: ENDPOINT STUDY RECORDS
ENVIRONMENTAL FATE
IDEACONSULT LTD.
15
AMBIT: ENDPOINT STUDY RECORDS
ECOTOXICITY
IDEACONSULT LTD.
16
AMBIT: ENDPOINT STUDY RECORDS - TOXICITY
IDEACONSULT LTD.
17
TRANSFERRING DATA FROM IUCLID5 TO AMBIT
IUCLID has a so called Bulk Export Assistant (manual interaction
required)
IUCLID can exchange data via Webservice ((semi)automatically),
automatic update of data as well
18
AMBIT COMMUNICATION
Other
Databases
& Tools
19
Proposed by
Clariant CompTox team
Maybe iterative process
AMBIT Workflows on
Read across & Category
formation
Creating Assessment and
defining initial target
Searching for other
targets and related
structures
Selecting relevant
structures
Selecting data
sources
Selecting
endpoints
Data gathering
Gap filling
Evaluation
Doc link
Report generation using
predefined templates
Assessment Storage
Finalizing Assessment
dataset
20
READ ACROSS AND CATEGORY WORKFLOW
Assessment identifier Collect structures
Endpoint data used
Assessment details
Report
5 MAIN STEPS
Assessment identifier Collect structures
7 SUB STEPS
Collect structures
Assessment identifier Collect structures
Endpoint data used
Endpoint data used
Endpoint data used
Assessment details
Endpoint data used
Report
Selection of endpoints
Assessment details
Initial matrix
Assessment identifier Collect structures
Report
List of collected structures
Search substance(s)
Assessment identifier Collect structures
Assessment details
Working matrix
Assessment details
Report
Final matrix
Report
 The assessment workflow is organized in five main tabs. Some of the
main tabs contain sub-tabs.
 The workflow goes from left to right in the main tabs as well as the subtabs.
 In principle, it should be not allowed to leave out any of the steps.
21
READ ACROSS / CATEGORY FORMATION WORKFLOW
 Currently under testing
 by Clariant CompTox team
 Next steps:
 Report format to be defined
 Automated data filling from estimation tools or
other databases
 Selection of reliable external databases and
loading into AMBIT
 Adding new prediction tools
 Others?
IDEACONSULT LTD.
22
AMBIT SUMMARY
REQUIREMENTS
U P D AT E S :
Server:
• Java 1.6/1.7
• Tomcat 5.5, 6.x, 7.x
• MySQL 5.5/5.6
Clients:
• Modern JavaScript enabled
browser
• Desktop applications
• Workflow engines
• Substances and
compositions
• Endpoint records
• Read across/ category
formation workflow
• User management system
to grant access rights via
roles
The clients communicate via REST web service API
(Application Programming Interface)
CLARIANT HQ, DE
23
C. LRI tools impact ? Science for policy
& innovation / R&D-acad. + regulatory
24
Hands-on session
Sept.23 at Cefic
(Brussels)
ACKNOWLEDGEMENTS
 CEFIC LRI EEM9.3-IC
 Project idea for LRI EEM9.3-IC
 Volker Koch, Clariant
 Project input : Clariant CompTox Team
 Udo Jensch (Toxicologist)
 Volker Koch (Ecotoxicologist)
 Qiang Li (Toxicologist)
 Joachim Schneider-Reigl (Ecotoxicologist)
 Project implementation : IdeaConsult Ltd.
26
Bruno Hubesch bhu@cefic.be
+32 2 676 74 92