RSC Project Prospect

Transcription

RSC Project Prospect
RSC Project Prospect
Introducing semantics into chemical
science publishing
RSC Publishing
a la Web 2.0
User Generated Content
ƒ 1841
Discussion Forums
ƒ 1947
ƒ …and flame wars
Social Networking
Interactivity
Aims of Project Prospect
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
Preserve authors’ science
Capture the science as XML
Apply ontologies to find meaning
Make it part of the publishing process
Provide new views to readers
Computer-readable chemistry
Meaningful searches
Why is it hard for chemistry?
ƒ Searching
ƒ Free text or proprietary
ƒ Related articles
ƒ No standards
Animal ontology
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
those that belong to the Emperor,
embalmed ones,
those that are trained,
suckling pigs,
mermaids,
fabulous ones,
stray dogs,
those included in the present classification,
those that tremble as if they were mad,
innumerable ones,
those drawn with a very fine camelhair brush,
others,
those that have just broken a flower vase,
those that from a long way off look like flies.
Allegedly from “Celestial Emporium of Benevolent Knowledge”
The Analytical Language of John Wilkins, Jorge Luis Borges
Classification problems
2,4,6-trinitrotoluene
biosynthesis
“This term is obsolete as
2,4,6-trinitrotoluene
is not synthesized by
living organisms“
From the Gene Ontology
The RSC Prospect model
ƒ Enhance existing publications
ƒ Attract authors
ƒ Add value for subscribers
ƒ Promote dissemination of chemical science
ƒ Build straight into existing workflow
ƒ Marked small molecule libraries…
ƒ …lysyl oxidase in zebrafish notochord
morphogenesis
How?
ƒ Collaborations
ƒ Unilever Centre/Computer Lab
ƒ Others
ƒ Technology
ƒ
ƒ
ƒ
ƒ
OSCAR text mining
Standards
Workflow
RSS feeds
OSCAR text mining
Open Source Chemical Analysis Routines
ƒ University of Cambridge:
Chemistry (Unilever Centre)
–Computer Lab collaboration
ƒ SciBorg
ƒ Analyses single words and
short chemical-looking
phrases to determine structure
ƒ Attaches structures to text
ƒ Extended to subjects
InChI
ƒ IUPAC International
Chemical Identifier
ƒ InChI=1/C8H10N4O2/c110-4-9-6-5(10)7(13)12
(3)8(14)11(6)2/h4H,13H3
ƒ Machine readable
ƒ InChIkey
RYYVLZVUVIJVGHUHFFFAOYAW
Subject terms
ƒ Open Biomedical Ontologies
ƒ Gene Ontology (GO)
ƒ Sequence Ontology (SO)
ƒ Cell Ontology (CL)
ƒ IUPAC Gold Book
ƒ Dictionary of chemical terminology
RSC Publishing process
ƒ XML up front
ƒ Technical editors edit and correct the XML,
then publish throughout the day
ƒ Understanding of XML and context, and the
subject areas
RSC Publishing process
ƒ Editing
ƒ Mining
ƒ Add & Tidy
Enhanced RSS feeds
ƒ Make available chemical structures and
subject terms
ƒ Subject terms and structures in plain text
ƒ Subject term IDs and InChI in RDF
metadata
ƒ OWL for the subject terms
ƒ InChI for compounds
Enhanced
RSS
feeds
New data standards
ƒ Experimental data
checker
ƒ Validation
ƒ Visualisation
ƒ Role for
publishers
Feeding back into Open projects
ƒ Improving chemistry recognition
within Project SciBorg
ƒ Improving Gene Ontology
(10,000 enzyme terms added)
ƒ Tidying the Sequence Ontology
It was like a researchers Aladdin's cave…
…it was fantastic to dig around the subject
area and discover things I never knew
existed. My only criticism would be the
need for a time warning! I spent 4 hours
digging about which generated at least six
new research ideas, printed half a ream of
paper and I missed my bus home...
…At least it was a new excuse my wife had
not heard, so another first. Well done to you
all but I don't think my wife is that keen on
it.
Steve Haswell, Hull
What’s unique?
ƒ Technology
ƒ Semantic enrichment
ƒ Compounds
ƒ Ontology terms
ƒ Enhanced RSS feeds
ƒ Implementation across existing portfolio
ƒ Incorporation into workflow
ƒ Shows the possibilities for structured science
For other publishers?
ƒ Building up new open ontologies
ƒ Publishers will compete on display and
application, but can still share links
ƒ …via CrossRef?
Future for RSC Project Prospect
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
Experimental data
Linguistics
Extensive linking
Other subject areas
E-books, databases and backfile
Peer review
Prospect services
RSC Project Prospect
Richard Kidd
Manager, Informatics
kiddr@rsc.org
"My first forays show [Project Prospect is] brilliant. [It's] great to
see the compounds and have machine readable SMILES and
InChIs"
"Your new system is very impressive, I am sure it will become
very useful to a large community”
•It is great and exciting!! Just from the very few minutes I
looked into it I realized its great potential and immediately ran to
show it to my students!"
“This is a fantastic resource for the community, and a great
use of the GO and SO. Nice work"
”I have … found it very intuitive/
straightforward to use.[I] believe
that it will make the manuscript even
more appealing to readers."