Wellness Chip - Bio

Transcription

Wellness Chip - Bio
Cambridge Healthtech Media Group
Indispensable Technologies Driving Discover y, Development, and Clinical Trials
Download a PDF
of This Issue
www.bio-itworld.com
SEE PAGE 4
JULY | AUGUSTr70-/0
Larry Gold’s
‘Wellness
Chip’
SomaLogic CEO
targets protein
biomarkers for
common diseases
Page 25
NEW iDEAs IN DATA
VISUALIZATION 10
TALKING BIO-IT AT BGI 8
THE CLINICAL BUZZ AT DIA 23
FUTURE OF CLINICAL TRIALS
SURVEY 39
SPECIAL
R E P O R T:
2011 BEST PRACTICES Awards 31
REGISTER BY 9 SEPTEMBER AND SAVE UP TO €150!
Cambridge Healthtech Institute’s Third Annual
11-13 October 2011
Exhibition Grounds
Hannover, Germany
11–12 October
12–13 October
IT Infrastructure &
the Cloud
Bioinformatics
NGS Data Management
Drug Discovery
Informatics
Distinguished Faculty
x Dimitris K. Agrafiotis, Vice President, Informatics, Johnson
& Johnson Pharmaceutical Research & Development
x Björn Andersson, Director of HPC, BlueArc
x Stefan Baumann, Head of Imaging Infrastructure,
Biomarker Development / Clinical Imaging, Novartis
Pharma AG
x Andrea Braeutigam, Ph.D., Postdoctoral Associate, Plant
Biochemistry, Heinrich-Heine-University of Duesseldorf
x Ting-Chao Chou, Ph.D., Director, Preclinical Pharmacology
Core, Molecular Pharmacology & Chemistry Program,
Memorial Sloan-Kettering Cancer Center
x Thomas Eickermann, Ph.D., Head of Communication
Systems Division, Juelich Supercomputing Centre,
Research Ctr. Jülich
x Samuel Flores, Ph.D., Assistant Professor, Cell and
Molecular Biology, Uppsala University
x Yuriy Gankin, Ph.D., CSO, GGA Software Services LLC
x Laurent Gautier, Head of Core Facility and Senior
Researcher, Systems Biology, Technical University of
Denmark
x Carole Goble, Professor, Computer Science, University of
Manchester
x Yike Guo, Ph.D., Professor, Computing Science, Computing,
Imperial College London
x Jonas Hagberg, M.Sc., Project Leader, UPPNEX; System
Expert, UPPMAX, Uppsala University
x Robert Haines, Research Computing Services, University
of Manchester
x Barry Hardy, Ph.D., Project Coordinator, Scientists Against
Malaria and SYNERGY
x Ian Harrow, Pistoia Alliance
x Steve Hoffmann, Ph.D., Junior Research Group,
Transcriptome Bioinformatics, Leipzig University
x Lars Jorgensen, Ph.D., Sr. Scientific Manager, Production
Software & Sequencing Informatics, Wellcome Trust
Sanger Institute
x Misha Kapushesky, Functional Genomics Team Leader,
Microarray Informatic, European Bioinformatics Institute
x Jean-Pierre Kocher, Ph.D., Bioinformatics Core Director,
Health Sciences Research, Mayo Clinic
x Karol Kozak, Ph.D., Data Handling Coordinator of Light
Microscopy Center - RNAi Screening Center, High Content
Screening Center, Swiss Federal Institute of Technology
Zurich
x Andreas Kremer, Ph.D., Department of Bioinformatics,
Erasmus University Medical Center, Rotterdam
x Detlef Labrenz, Sales Representative, DataDirect
Networks, Inc.
x Hermann Lederer, Ph.D., Deputy Director, Garching
Computing Centre of the Max Planck Society
x Hans Lehrach, Ph.D., Director & Head, Vertebrate
Genomics, Max Planck Institute for Molecular Genetics
x Urban Liebel, Ph.D., Group Leader, Head of Screening
Centre, Inst. of Toxicology and Genetics, Karlsruhe Institute
of Technology
x Paul Lukowicz, Ph.D., Professor & Chair, Embedded
Systems & Pervasive Computing, University of Passau
KĸĐŝĂůWƵďůŝĐĂƟŽŶ͗
x Andrew Lyall, Ph.D., ELIXIR Project Manager, European
Bioinformatics Institute
x Daniel MacLean, Head of Bioinformatics and Training, The
Sainsbury Laboratory, Norwich
x Alberto Magi, Ph.D., Research Fellow, Center for the Study
of Complex Dynamics (CSDC), Department of Medical
and Surgical Critical Care, Careggi School of Medicine,
University of Florence
WƌĞŵŝĞƌ^ƉŽŶƐŽƌƐ͗
x Brian Marsden, Ph.D., Principal Investigator, Research
Informatics, Structural Genomics Consortium, University
of Oxford
x M. Scott Marshall, Ph.D., Co-Chair W3C Health Care and
Life Sciences Interest Group, University of Amsterdam /
Leiden University Medical Center
x Ilya Mazo, Ph.D., President, Ariadne
x Folker Meyer, Ph.D., Computational Biologist, Institute for
Genomics and Systems Biology, Argonne National Lab
x Geoffrey Noer, Director, Product Marketing, Panasas, Inc.
x Jan van Oeveren, Ph.D., Biostatistician, Bioinformatics,
Keygene N.V.
x John Overington, Team Leader, Chemogenomics, EMBL
EBI Hinxton
ŽƌƉŽƌĂƚĞ^ƉŽŶƐŽƌƐ͗
x Rolf Porsche, Ph.D., IBM Partner, Head of Pharma, Life
Sciences and Healthcare, IBM
x Corrado Priami, Ph.D., Professor, President & CEO,
Microsoft Research, University of Trento, Centre for
Computational and Systems Biology
x Alban Ramette, Ph.D., Research Scientist, Microbial
Habitat Group, Max Planck Institute for Marine
Microbiology
x Keith Robison, Ph.D., Lead Senior Scientist, Informatics,
Infinity Pharmaceuticals, Inc.
x Reinhard Schneider, Ph.D., Head, Bioinformatics Core
Facility, Luxembourg Center for Systems Biomedicine,
University of Luxembourg
ŽƌƉŽƌĂƚĞ^ƵƉƉŽƌƚ
^ƉŽŶƐŽƌƐ͗
x Thomas Schulthess, Director, Swiss National
Supercomputing Center
x Ola Spjuth, Ph.D., Researcher, Pharmaceutical
Biosciences, Uppsala University, Sweden; Project Leader,
Bioclipse
x Etzard Stolte, Ph.D., Global Head Strategy & Architecture,
R&D Informatics, F. Hoffmann La Roche AG
x Chris Taylor, Ph.D., Senior Technical Officer, European
Bioinformatics Institute
x Burkhard Tümmler, Ph.D., Professor, Pediatric Pneumology,
Allergology and Neonatology, Hannover Medical School
,ĞůĚŝŶŽŶũƵŶĐƟŽŶǁŝƚŚ
Pre-Conference Short Courses*
(SC3)
(SC4)
(SC9)
(SC10)
Cloud Computing: Using Cloud Computing Infrastructure as a Service to Aid Research Scientists
Microscopy Imaging Analysis: Quantitative Analysis of Large-Scale Biological Image Data
Visualization of Large-Scale Biological Data
NGS: Data Analysis
* Separate registration required
Organized by Cambridge Healthtech Institute, 250 First Ave., Ste. 300, Needham, MA 02494
Bio-ITWorldExpoEurope.com
Europe’s No.1 Event in
Biotechnology and Life Sciences
Contents
[&'~
Download a PDF
of This Issue
]
CLICK HERE!
Special Report
BiotIT World’s 2011 Best Practices Awards
31 The Select Six Best Practices
32 Enrollment Modeling Results in Productivity Gains for Merck
33 Novartis’ Open Source Clinical Imaging Platform
34 GSK’s Helium Rises to the Top
35 Accelrys Pipeline Pilot Guides ONT’s Nascent NGS Data Handling
36 CliniWorks Provides Patients as a Service
37 CDD’s Tuberculosis Collaboration Tool
38 2011 Best Practices Entries
Up Front
Next-Gen Data
8 Journal, Cloud, and Tool News from BGI
40 Open Source Genome Analytics for All
9 Eric Schadt Leads ‘Multiscale’ Institute at
Mount Sinai
41 Ion Torrent Offers Sequencing at
‘Biblical Proportions’
10 Illumina Showcases New Visions in
Genomic Interpretation
42 Charges Fly over Ion Torrent Licenses
11 Briefs
IT / Workflow
45 Gordon Puts Flash into Supercomputing
12 Ignite Institute Finds a Match at Fox Chase
47 NVIDIA Unveils New Flagship GPU Processor
5IF4LFQUJDBM0VUTJEFS
48 Panasas ActiveStor Storage Goes to 11
14 Big-Bucks Biology’s Broken Business Model
5IF#VTI%PDUSJOF
15 Limits to Drug Discovery Collaboration?
In Every Issue
*OTJHIUT0VUMPPL
5 Best of Best Practices; an Asian Engagement
16 The Cloud and Next Generation Sequencing
5IF3VTTFMM5SBOTDSJQU BY JOHN RUSSELL
49 DREAM6 Breaks New Ground
Clinical Trials
22 Euphoria over EHR/EDC Interoperability
May be Misplaced
23 DIA 2011—Compliance, Collaboration and
the Cloud
Computational Biology
25 Larry Gold’s Wellness Chip Detects
Disease Biomarkers in Blood
28 Open Source Solutions for Image Data Analysis
[4 ]#*0t*5 803-%+6-:|"6(6452011
'JSTU#BTF BY KEVIN DAVIES
www.bio-itworld.com
6 Company and Advertiser Index
7 On Deck
50 Educational Opportunities
SPECIAL
ADVERTISING SECTION
Begins on page 18
BEST
New Products
& Services
$PWFSQIPUPHSBQICZ.BUU4UBWFS
First Base
Summer
Heat
KEVIN DAVIES
An Asian Engagement
Following our successful foray into Europe in 2009, we are
thrilled to announce that we will be holding our first full
Bio-IT World conference in Asia next summer (June 5-8,
2012). We’ve selected Singapore as the destination, and the
spectacular 57-floor Marina Bay Sands convention center
(and casino) as the venue. (I’m still trying to persuade my
colleagues to convene one of the pre-conference workshops
in the rooftop Infinity pool. We’ll see...)
The move to Asia isn’t just a reflection of the gratifying
growth in attendees, exhibitors and sponsors at our flagship conference in Boston and the European event. From
talented software start-ups in India to the emerging power
of BGI in China (see page 8), the Asian region is having an
unprecedented impact on life sciences and biopharma, and
is ripe with opportunities for partnerships and collaboration. We’ve been hearing from many regular attendees at
Bio-IT World Expo how much they would like to reach the
Asian scientific community under the right conditions. We
intend to provide that forum for genuine technological and
scientific exchange.
We have put together a very impressive advisory board
and we are now accepting speaker proposals at our website:
www.bio-itworldasia.com. We welcome your input and contributions. In the meantime, we hope you’ll make plans to
join us at Bio-IT World Europe this October
(www.bio-itworldeurope.com).
www.bio-itworld.com
JULY | AUGUST 2011
#*0t*5 803-%
[5]
CONTENTS
E
ach year around this time, we like to showcase the
winning entries in our annual Best Practices Awards
competition, which has been held nearly every year
since 2003. We invite a diverse group of judges to evaluate and rank dozens of entries from academia and
industry highlighting best practices impacting data
management in life sciences, however that might be defined.
Our winners for 2011 were announced at the Bio-IT World
Expo back in April. Our six winners—CliniWorks’ AccelFind,
Collaborative Drug Discovery’s TB Database, GlaxoSmithKline’s
delightfully named “Helium in Excel” (nominated by Ceiba Solutions); Merck’s Clinical Enrollment Optimization (nominated
by DecisionView); Novartis’ ImagEDC solution; and Oxford
Nanopore’s work with Accelrys on the Pipeline Pilot NGS Collection—are discussed elsewhere in this issue (see pages 31-38).
It’s not feasible to pay tribute to every entry, but a few
should be noted for making the judges’ task particularly difficult this year. The strongest of the four main categories
this year was Knowledge Management. Andrew Su (Scripps
Institute, formerly at the Genomics Institute of the Novartis
Research Foundation, San Diego), submitted the Gene Wiki, a
true collaborative project attracting 4 million views/month to
help annotate and disseminate genome data. Another highly
praised entry was Pfizer’s Oyster Imaging Collaborative Portal,
designed in partnership with Radiant Sage Ventures, which has
significantly improved image sharing and data access.
In the IT Infrastructure category, judges also liked the
UCLA Neuroimaging Lab’s unified storage infrastructure project, nominated by data storage vendor Isilon. Partnering with
Accelrys, the London School of Hygiene and Tropical Medicine
presented a high-throughput parasite imaging tool, while the
Smithsonian Institution offered a LIMS tool for the Moorea
Biocode project, providing barcode sequencing for 40,000
tropical species.
Oxford Nanopore’s win in the Research and Discovery
category is somewhat ironic, given how stealthy the British
sequencing company has been. The Brits edged out some tough
competition, including the Food and Drug Administration’s
drug toxicity tool for animal testing, which one judge deemed
“a dramatic step forward.” Many other entries would have competed strongly for top honors if they had featured more realworld deployments and collaborations.
We’ll be announcing further details on the scope and timing
of the 2012 Best Practices Awards shortly.
®
Company Index
23andMe . . . . . . . . . . . . . . . . . . . . 40
454 Life Sciences . . . . . . . . . . . . . . 42
Abbott Laboratories. . . . . . . . . . . . . 38
Accelrys . . . . . . . . . . . . .5, 11, 35, 38
Agilent . . . . . . . . . . . . . . . . . . . . . . 26
Beijing Institutes of Life Science . . . 10
Biomatters . . . . . . . . . . . . . . . . . . . 38
BrainCells . . . . . . . . . . . . . . . . . . . . 38
BrainLab . . . . . . . . . . . . . . . . . . . . . 29
Brigham and Women’s Hospital . . . . 28
Bristol-Myers Squibb . . . . . . . . . . . . 26
British Columbia Cancer
Research Centre . . . . . . . . . . . . . 10
Broad Institute . . . . . . . . . . . . .10, 28
Ceiba Solutions. . . . . . . . . . . . .34, 38
ChemAxon . . . . . . . . . . . . . . . . . . . 34
Children’s Hospital Boston . . . . . . . 13
ClearCanvas . . . . . . . . . . . . . . . . . . 29
ClearTrial . . . . . . . . . . . . . . . . . . . . . 38
Clinical Data Interchange
Standards Consortium . . . . . . . . . 22
Clinical Ink . . . . . . . . . . . . . . . . . . . 22
CliniWorks . . . . . . . . . . . . . . . . .36, 38
Collaborative Drug Discovery . . .37, 38
Dana-Farber Cancer Institute . . . . . . 10
DecisionView. . . . . . . . . . . .23, 32, 38
DIA . . . . . . . . . . . . . . . . . . . . . . . . . 38
Drug Safety Alliance . . . . . . . . . . . . 23
Enlis Genomics . . . . . . . . . . . . .10, 11
ePharmaSolutions . . . . . . . . . . . . . . 38
ERT . . . . . . . . . . . . . . . . . . . . . . . . . 38
FastTrack . . . . . . . . . . . . . . . . . . . . . 22
FDA Division of Animal Research . . . 38
Food and Drug Administration . . . . . 23
Fox Chase Cancer Center . . . . . . . . 12
Fred Hutchinson Cancer Center . . . . 40
Fudan University . . . . . . . . . . . . . . . 10
Genomatix . . . . . . . . . . . . . . . . . . . 11
Genome Institute of Singapore . . . . 40
GlaxoSmithKline . . . . . . . . .23, 34, 38
Harvard University . . . . . . . . . . . . . . 11
Helicos Biosciences . . . . . . . . . . . . 13
IBM. . . . . . . . . . . . . . . . . . . . . . . . . 49
Ignite Institute for Individualized
Health . . . . . . . . . . . . . . . . . . . . . 12
Illumina . . . . . . . . . . . . . . . . . . , 13, 9
ImmunoProfiles . . . . . . . . . . . . . . . . 11
Insights . . . . . . . . . . . . . . . . . . . . . . 14
Institute for Molecular
Biosciences . . . . . . . . . . . . . . . . . 41
IO Informatics . . . . . . . . . . . . . . . . . 38
Ion Torrent . . . . . . . . . . . . . . . . .13, 42
Isilon. . . . . . . . . . . . . . . . . . . . . . . . . 5
Janssen Pharmaceutica. . . . . . . . . . 38
J. Craig Venter Institute . . . . . . . . . . 40
Life Technologies . . . . . . . . . . . .12, 42
London School of Hygiene and
Tropical Medicine. . . . . . . . . . . . . 38
Max Planck Institute . . . . . . . . . . . . 47
Medidata . . . . . . . . . . . . . . . . . . . . 22
Merck . . . . . . . . . . . . . . .23, 9, 32, 36
National Cancer Institute . . . . . .22, 38
Navigenics . . . . . . . . . . . . . . . . . . . 40
New England Biolabs . . . . . . . . . . . 26
Nextrials . . . . . . . . . . . . . . . . . . . . . 22
Novartis . . . . . . . . . . . . . . .33, 36, 38
Novartis Institute for Biomedical
Research . . . . . . . . . . . . . . . . . . . 38
NVIDIA . . . . . . . . . . . . . . . . . . . . . . 47
Ochsner Health System . . . . . . . . . . 38
OpenEye . . . . . . . . . . . . . . . . . . . . . 47
Oracle . . . . . . . . . . . . . . . . . . . . . . . 38
Orion Health . . . . . . . . . . . . . . . . . . 38
Otsuka . . . . . . . . . . . . . . . . . . . . . . 26
Oxford Nanopore . . . . . . . . . . . . . . . 16
Oxford Nanopore Technologies. .35, 38
Pacific Biosciences . . . . . . . . . . . . . , 9
Panasas . . . . . . . . . . . . . . . . . . . . . 48
Parexel . . . . . . . . . . . . . . . . . . . . . . 36
Partek . . . . . . . . . . . . . . . . . . . . . . . 11
Pennsylvania State University . .10, 11
Pfizer. . . . . . . . . . . . . . . . . . .5, 26, 38
Phlexglobal . . . . . . . . . . . . . . . . . . . 38
PHT Corporation . . . . . . . . . . . . . . . 11
PPD . . . . . . . . . . . . . . . . . . . . . . . . 38
ProtonMedia . . . . . . . . . . . . . . . . . . 23
Queensland Centre for Medical
Genomics . . . . . . . . . . . . . . . . . . 41
Quest Diagnostics . . . . . . . . . . . . . . 25
Radiant Sage Ventures . . . . . . . . . . . 5
Recombinant Data Corp . . . . . . . . . 38
Roche . . . . . . . . . . . . . . . . . . . .23, 38
Rota Consortium, South Africa . . . . . 38
SAFE-BioPharma Association . . . . . 38
ScaleMP . . . . . . . . . . . . . . . . . . . . . 38
Scripps Institute . . . . . . . . . . . . . . . . 5
Selventa . . . . . . . . . . . . . . . . . . . . . 38
Smithsonian Institution . . . . . . . . 5, 38
SomaLogic . . . . . . . . . . . . . . . . . . . 25
Stanford University . . . . . . . . . .10, 42
Strand Life Sciences . . . . . . . . .10, 38
Strand Scientific Intelligence . . . . . . 11
Synexus Clinical Research . . . . . . . . 38
The Centre for Proteomic and
Genomic Research . . . . . . . . . . . 11
TIBCO . . . . . . . . . . . . . . . . . . . . . . . 34
UCLA . . . . . . . . . . . . . . . . . . . . . . . . 5
UCSF . . . . . . . . . . . . . . . . . . . . . . . . 9
University Hospitals of Geneva. . . . . 29
University of California,
San Diego . . . . . . . . . . . . . . . . . . 11
University of California,
Santa Cruz . . . . . . . . . . . . . . . . . 42
University of Colorado . . . . . . . . . . . 26
University of Delaware . . . . . . . . . . . 11
University of Florida. . . . . . . . . . . . . 38
University of Georgia . . . . . . . . . . . . 11
University of Maryland. . . . . . . . . . . 10
University of Texas in Austin . . . . . . . 42
University of Texas Southwestern
Medical Center at Dallas . . . . . . . 38
University of Tübingen . . . . . . . . . . . 11
VIB . . . . . . . . . . . . . . . . . . . . . . . . . 11
Indispensable Technologies Driving
Discovery, Development, and Clinical Trials
EDITOR-IN-CHIEF
Kevin Davies (781) 972-1341
kevin_davies@bio-itworld.com
MANAGING EDITOR
Allison Proffitt (617) 233-8280
aproffitt@healthtech.com
ART DIRECTOR
Mark Gabrenya (781) 972-1349
mark_gabrenya@bio-itworld.com
VP BUSINESS DEVELOPMENT
Angela Parsons (781) 972-5467
aparsons@healthtech.com
VP SALES —
LEAD GENERATION PROGRAMS
Alan El Faye (213) 300-3886
alan_elfaye@bio-itworld.com
ACCOUNT MANAGER —
ACCOUNTS A–K
John J. Kistner (781) 972-1354
jkistner@healthtech.com
ACCOUNT MANAGER —
ACCOUNTS L–Z
Tim McLucas (781) 972-1342
tmclucas@healthtech.com
CORPORATE MARKETING
COMMUNICATIONS DIRECTOR
Lisa Scimemi (781) 972-5446
lscimemi@healthtech.com
PROJECT/MARKETING MANAGER
Lynn Cloonan (781) 972-1352
lcloonan@healthtech.com
ADVERTISING OPERATIONS COORDINATOR
Stephanie Cline (781) 972-5465
scline@healthtech.com
DESIGN DIRECTOR
Tom Norton (781) 972-5440
tnorton@healthtech.com
Contributing Editors
Advertiser Index
Advertiser
Page #
Bio-IT World & Bio-IT World Europe Conference & Expo. . . 2-3
Bio-ITWorldExpo.com, bio-itworldexpoeurope.com
Advertiser
Page #
Clinical Ink . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
www.clinicalink.com
Bio-IT World Asia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Bio-ITWorldAsia.com
DiscoveRx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-19
www.discoverx.com
Bio-IT World’s Cloud Computing Summit . . . . . . . . . . . . . . 52
Bio-ITCloudSummit.com
Educational Opportunities. . . . . . . . . . . . . . . . . . . . . . . 50-51
Bio-ITWorld.com
BioBase. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
biobase-international.com
Michael Goldman, Karen Hopkin,
Deborah Janssen, John Russell,
Salvatore Salamone, Deborah Borfitz
Ann Neuer, Tracy Smith Schmidt
Insight Pharma Reports . . . . . . . . . . . . . . . . . . . . . . . . . . 30
InsightPharmaReports.com
CHI Professional Marketing Services . . . . . . . . . . . . . . . . 21
www.bio-itworld.com/BioIT/WhitePapers.aspx!
This index is provided as an additional service. The publisher does not assume any liability for errors or omissions.
VO L U M E 1 0 , N O. 4
Editorial, Advertising, and Business Offices: 250 First Avenue, Suite 300, Needham, MA 02494; (781) 972-5400
BiorIT World (ISSN 1538-5728) is published bi-monthly by Cambridge Bio Collaborative, 250 First Avenue, Suite 300, Needham, MA 02494.
Bio r IT World is free to qualified life science professionals. Periodicals postage paid at Boston, MA, and at additional post offices. The one-year
subscription rate is $199 in the U.S., $240 in Canada, and $320 in all other countries (payable in U.S. funds on a U.S. bank only).
POSTMASTER: Send change of address to Bio-IT World, 250 First Avenue, Suite 300, Needham, MA 02494. Canadian Publications Agreement
Number 41318023. CANADIAN POSTMASTER: Please return undeliverables to PBIMS, Station A, PO Box 54, Windsor, ON N9A 6J5 or email
custserviceil@IMEX.PB.com.
Subscriptions: Address inquires to Bio-IT World, 250 First Avenue, Suite 300, Needham, MA 02494 888-999-6288 or e-mail
kfinnell@healthtech.com
Reprints: Copyright © 2011 by Bio-IT World All rights reserved. Reproduction of material printed in Bio r IT World is forbidden without written
permission. For reprints and/or copyright permission, please contact John J. Kistner, (781) 972-1354, jkistner@healthtech.com or
Tim McLucas, (781) 972-1342, tmclucas@healthtech.com.
Advisory Board
Jeffrey Augen, Mark Boguski,
Steve Dickman, Kenneth Getz,
Jim Golden, Andrew Hopkins,
Caroline Kovac, Mark Murcko,
John Reynders, Bernard P. Wess Jr.
Cambridge Healthtech Institute
PRESIDENT
Phillips Kuhl
Contact Information
editor@healthtech.com
250 First Avenue, Suite 300
Needham, MA 02494
Follow us on Twitter, LinkedIn, and Facebook
http://twitter.com/bioitworld
www.linkedin.com/groupRegistration?gid=3141702
www.facebook.com/bioitworld
[6 ]#*0t*5 803-%+6-:|"6(6452011
www.bio-itworld.com
On Deck
Coming in #JPt*58PSME’s September/October Issue
Special Report:
Laying the Genome Informatics Pipeline
#JPt*58PSME editors present a series of compelling stories,
features, and interviews on the latest advances in genome
informatics and interpretation:
t )PXBSFCJPJOGPSNBUJDJBOTQSFQBSJOHGPSUIFPOTMBVHIUPG
clinical genomics?
t 8IBUOFXDPNNFSDJBMBOEPQFOTPVSDFUPPMTBOEQMBUGPSNT
are driving next-gen sequencing analysis?
t )PXBSFTPNFPGUIFMBSHFTUUPPMDPNQBOJFTBOEUIFOJNCMFTU
software providers adapting to the NGS market?
For advertising and sponsorship
opportunities in this exciting
Special Report contact:
Also in the September/October issue:
Accounts A–K
John J. Kistner
(781) 972-1354
jkistner@healthtech.com
t IT / Workflow Putting an IT Infrastructure in the Cloud
CONTENTS
t 8IBUJOTQJSFTUIFBSDIJUFDUPGUIFXPSMETMBSHFTUTFRVFODJOH
factory?
Accounts L–Z
Tim McLucas
(781) 972-1342
tmclucas@healthtech.com
t Clinical Trials The evolution of a clinical CRO
And the Winner Is...
We had an overwhelming response to our reader survey in the
last issue of #JPt*58PSME. Your responses will be very helpful as
we craft our digital strategy in the coming months.
The winner of the Apple® iPad® was
Michael Chin from Sanofi-Aventis.
Thank you for your time and feedback,
and we invite you to take part in this
issue’s survey on page 39.
Kevin Davies
Editor-In-Chief
*Apple is not a participant or supporter of this promotion.
www.bio-itworld.com+6-:|"6(6452011
#*0t*5 803-%
[7]
Up Front News
Big News from BGI
#(*EFCVUTOFXKPVSOBMDMPVETFSWJDFTBOEUPPMT
S
BY ALLISON PROFFITT
CONTENTS
HENZHEN, CHINA—At the BioIT APAC conference hosted by BGI
in Shenzhen in July*, researchers
announced two new Cloud-based
software-as-a-service offerings for nextgen data analysis, several new open
source assembly tools, and launched a
new journal for next-gen data.
Structural variation requires a “totally
different scale of technology,” said Yingrui
Li, explaining that structural variations
are more unique to individuals than
SNPs. He called for more de novo sequencing, suggesting that whole genome
de novo assembly would be offer a more
comprehensive structural variant map.
Li’s message aligned well with updates
to the SOAP algorithms. The Short Oligonucleotide Analysis Package gained a
de novo short reads assembler (SOAPdenovo 2), the alignment tool (SOAP3GPU/CPU), a graph-based indel finder
(SOAPindel) and assembly-based structural variation finder (SOAPsv). These
updates join the existing alignment tool
(SOAPaligner/soap2) and re-sequencing
consensus sequence builder (SOAPsnp).
SOAP3 reflects improvements in two
branches of the algorithm. GPU-accelerated alignment with BWT could take 2.6
seconds to perform exact matching for 1
million 100bp reads. SOAP3-CPU shows
improved accuracy over SOAP2 with
similar speed. SOAPdenovo 2 is designed
to assemble human-sized genomes, and
reflects algorithms changes to contig construction, scaffolding, and gap closure.
The SOAP toolkit is available at http://
soap.genomics.org.cn/.
Flex Time
Hecate and Gaea (named for Greek gods)
are two “flexible computing” solutions
for de novo assembly and genome resequencing that make the most of the new
SOAP algorithms. These are “cloud-based
#JP*5"1"$$POGFSFODF&YQP4IFO[IFO$IJOB
+VMZ
[8 ]#*0t*5 803-%+6-:|"6(6452011
services for genetic researchers” so users
don’t need to “purchase your own cloud
clusters,” said Evan Xiang, part of the flexible computing group at BGI Shenzhen
(see, “BGI Cloud on the Horizon,” %LRv,7
World, Jan 2011). Hecate will do de novo
assembly, and Gaea will run the SOAP2,
BWA, Samtools, DIndel, and BGI’s realSFS algorithms. Xiang expects an updated
version of Gaea to be released later this
year with more algorithms available.
H
aving DOIs for
datasets will enable
reserachers to cite
datasets in their work.
Flexible computing, explained Xiang,
is a more efficient cluster architecture
than traditional Cloud. Jobs of different
types are grouped on the cluster to make
the most of computing power and address scalability issues. For instance, CPU
intensive jobs are grouped; memoryintensive jobs are grouped; and input/
output intensive jobs are grouped.
Both the Hecate and Gaea services will
run on the BGI compute cluster because
“Amazon is slow,” Xiang said. Running the
services on an in-house cluster also alleviates any internet access issues.
Hecate is based on a series of distributed algorithms to recognize and simplify
non-branching repeat-free regions of the
genome, correct errors and resolve the
ambiguous bubbles and short repeats, together with the distributed graph shrinkage algorithms to construct a linear DNA
sequence. Based on BGI’s SOAPdenovo
and SOAP2 algorithms, Hecate is more
scalable than those algorithms alone.
Xiang presented results from speed
comparisons showing significant cost
and time savings using Hecate for de novo
www.bio-itworld.com
assembly. Running SOAPdenovo on a
single server for 70 hours resulted in 80%
genome coverage at a hardware price of
$150,000. Using 96 Hecate cores, the
genome coverage increased to 84% in 42
hours at a price of $60,000.
Gaea is designed to distribute resequencing computation to a cluster of
nodes based on the Hadoop Streaming
framework with personalized algorithm
interfaces for SOAP and BWA. For the
current version of Gaea (v1.2), Xiang reported speed increases of 75x for SOAP2
and 90x for BWA using 100 cores. At 400
cores those numbers rose to 300x and
346x speed increases compared to running either algorithm on a single core.
Xiang expects Gaea v2.0 to see further
improvements.
Gaea is also optimized for a biomarker
analysis toolkit that includes SOAPsnp,
DIndel and realSFS for SNP calling, indel
calling, and gap alignment.
Data Citing
Also at the event BGI formally announced
its new journal, GigaScience, which will
launch in November 2011. Co-published
by BGI and BioMed Central, GigaScience
is an integrated journal and database, said
Scott Edmunds, editor of the journal.
GigaScience plans to stress usability
and reproducibility in its review process.
The journal will solicit “big data” studies and hopes to provide a forum for
dealing with the difficulties of handling
large-scale data from all areas of the life
sciences. In addition to traditionally peerreviewed papers, GigaScience will publish
citable datasets, each with permanent
digital object identifiers (DOIs). Datasets
will be hosted on the BGI cloud along
with the SOAP toolkit and other BGI
products. This will facilitate tool testing,
Edmunds said, as the tools and data are in
the same place.
Having DOIs for datasets will enable
researchers to cite datasets used in their
work and, Edmunds hopes, speed data
release and dissemination. “Dealing with
data is not just about storage,” he said,
“but dissemination too.” To prime the
pump, BGI released eight animal genomes, each with a DOI that enables the
dataset to be freely used by researchers
and then cited in publications. x
Partnering on Multiscale Biology
1BD#JP$40&SJD4DIBEUUPMFBEBA.VMUJTDBMF*OTUJUVUFBU.PVOU4JOBJ
“It’s not an exclusive relationship with
PacBio,” says Charney. “We have Illumina
machines and so forth. You can’t stick
with one technology platform. So we’re
going to be active in all of those platforms.
However, I can see that Illumina might
say, ‘It’s fine if you buy our commercial
machines, but we’re not going to share
our latest next-gen machine.’ That may be
a consequence of this, but we’re certainly
going to be buying other commercial
machines.”
BY KEVIN DAVIES
Dual Role
The new institute is an expansion of
MSSM’s Institute of Genomics, which
was formed a couple of years ago. Charney
had been recruiting a new leader for the
institute to succeed medical geneticist
Robert Desnick, who is stepping down
as director and as the Chair of the MSSM
Eric Schadt
genetics department. Schadt is taking on
both positions.
Charney began recruiting Schadt as
the new head of the department. “I went
after him big time,” he says. It quickly
became clear that “a partnership with
PacBio would be good for Mount Sinai,
apparently good for PacBio, and was
something that Eric really endorsed. To
me, it seemed like a win-win,” he continued. “We get access to PacBio technology,
which we’re all very excited about. And
they get, in a sense, access to the great
research we’re doing here, the patient
populations that could be like a testing
site.”
Schadt waves off concerns that other
next-generation sequencing and technology vendors might not be inclined to collaborate with the new MSSM multiscale
institute team. “Other companies will be
very hungry to work with the institute at
MSSM,” Schadt told %LRv,7 :RUOG, “because our vision is to become a dominant
force in integrating data from many technologies and developing predictive models
that impact physician and patient decision
making. This will help grow the market
and all will benefit, so there will be strong
incentive for many companies to be part of
that—or watch it from the outside!”
Patient Partner
For its part, PacBio had been seeking
a potential academic partner for more
than a year to help develop new applications for its technology and gain access to
patients to move the technology into the
clinic. MSSM stepped into the picture
after talks with UCSF stalled.
Sounding themes around the integration of genomic, expression, and clinical
data that have typified his earlier career
at Rosetta Inpharmatics and Merck (see,
“Eric Schadt’s Integrative Approach to
Predictive Biology,” %LRv,7 :RUOG, Oct
2008), Schadt said: “Multiscale data
integration, including genomic, expression, metabolite, protein, and clinical
information, will ultimately define the
future of patient care. With our intent
to collaborate in areas such as newborn
screening for rare genetic disorders,
infectious diseases and cancer, we hope
to accelerate this revolution, starting by
integrating clinical data with previously
untapped biological information to build
new computational models for predicting
human disease.”
“Multiscale Biology” is a term that
Schadt coined. The way Charney understands the term, “we’re talking about
systems genetics, integrative genetics,
systems biology, which we’re very strong
in at Mount Sinai. The idea of one gene/
one disease or looking at genes in isolation from pathways doesn’t make sense.
That’s totally in line with the way we’re
doing things.” x
www.bio-itworld.com+6-:|"6(6452011
#*0t*5 803-%
[9]
CONTENTS
Pacific Biosciences has announced a
partnership with Mount Sinai School of
Medicine (MSSM) in New York to create
the Institute for Genomics and Multiscale
Biology.
The director of the new institute will
be PacBio’s chief science officer, Eric
Schadt, who is retaining his non-operational position at PacBio while moving to
New York this summer to run the center.
The institute will be the hub of genomics
research at MSSM, collaborating with 13
other translational and core facilities at
Mount Sinai, and incorporating a user
facility featuring PacBio’s technology.
The MSSM SMRT Biology facility will
be equipped with R&D versions or “Astros” of the PacBio instruments. They will
be available for use by institute researchers as well as other collaborators in the
eastern half of the country.
MSSM Dean Dennis Charney says he
expects to invest more than $100 million in the new institute over the next
5-6 years, as part of a $1-billion capital
campaign for MSSM. “We’ve raised $750
million and it’s ahead of schedule,” said
Charney, with the initial funding coming
from philanthropy.
Charney believes that the large-scale
generation and integration of multiple
sources of biological data, integrated with
clinical information, will expand MSSM’s
ability to characterize disease and ultimately benefit patients. “The Institute
for Genomics and Multiscale Biology
will be at the forefront of the revolution
in genetics and genomic sciences, which
will fundamentally change the practice of
medicine,” he added.
Up Front News
Illumina Showcases New Visions
in Genomic Interpretation
J%&"DPOGFSFODFOBNFTEBUBWJ[XJOOFST
BY ALLISON PROFFITT
CONTENTS
SAN DIEGO—Illumina CEO Jay Flatley kicked off the iDEA (Illumina Data
Excellence Awards) conference* with a
striking prediction: We would be at the
$1,000 genome—all in—within three
to five years and that there was no need
for any new technology. Then he passed
the microphone to the dozen finalists
competing in the inaugural iDEA awards
challenge.
Scott Kahn, chief informatics officer at
Illumina, assured me over lunch the next
day that Flatley’s promises didn’t scare
him. “Being the one responsible for the
informatics part, I’m not shuddering,” he
said. “Jay said $1,000 all in. ‘All in’ means
everything to do with the sample, everything to do with the analysis. Go back
two years, what everyone did was take the
images off the machine and they had this
huge informatics pipeline. They spent
probably more than $1,000 to $2,000
just in raw CPU cycles to do the analysis
and image processing.”
Flatley insisted that the next-generation sequencing field needs improved
technologies, not new ones; faster cameras, for instance, not different ones. “Jay’s
alluding to the fact that… you’re going to
see more of the downstream alignment
and variant calling move onto the instrument because it can, because it reduces
costs and the time to result,” said Kahn.
He was adamant that Flatley’s “all in”
doesn’t include the cost of interpretation.
“It’s just the genome; that’s why I’m not
shuddering.”
Even as the technical costs continue
to fall and sequencing gets faster (Illumina’s mini MiSeq, which launches later
this year, will generate more than one
gigabase per run in about a day), Kahn
acknowledges that there are still huge
problems that need to be addressed in the
interpretation and use of genomic data.
J%&"$IBMMFOHF4BO%JFHP+VOF
[10 ]#*0t*5 803-%+6-:|"6(6452011
New Ideas
To help bridge the gap between data and
interpretation, Illumina hosted its first
iDEA challenge and conference. Announced in May 2010, the competition
was designed to challenge both commercial and academic entrants to develop
new and creative visualization and data
analysis techniques.
From 30 entries, judges selected 12
finalists based on technical merit—7
academic entries, 5 commercial (see,
“Top Twelve”). Each finalist gave a pre-
Stephan Schuster [from Penn State] gave
on inGAP… The challenge the judges are
going to have is how to weight entries that
explore very different variables.”
“There are clearly some entries that
are scratching an itch that people didn’t
know was there, and there are others
that are very novel approaches to solving
problems that other people have solved,”
Kahn said. “You can solve an unmet problem incompletely, but at least it’s a partial
solution. How do you weigh that against,
‘Here’s something that takes a capability
and enhances it significantly?’”
Kahn had his own ideas, but the iDEA
entries were evaluated by an independent
group of judges, including Steven Jones
(Genome Science Centre at the British
Columbia Cancer Research Centre, Canada), Jared Maguire (Broad Institute),
John Quackenbush (Dana-Farber Cancer
Institute), Steven Salzberg
(University of Maryland),
Gavin Sherlock (Stanford
University), and Bang
Wong (Broad Institute).
The Envelope Please
The judges awarded sculptures from glass artist
Barry Entner to the six
winners. One of Kahn’s
favorites, inGAP from
Pennsylvania State University, in conjunction
with Fudan University
and the Beijing Institutes
of Life Science (BioLS),
Chinese Academy of Sciences, won the overall
academic award and a
GenomeRing helps visualize indels and SNPs compared to a
$50,000
grant from Ilmaster genome, with each color representing a genome’s
progression on a single chromosome or across several
lumina to further develop
chromosomes.
the software. inGAP, an
Integrated Next-gen Gesentation and answered questions at the
nome Analysis Pipeline, started in 2007
conference.
as a SNP calling tool for Sanger sequence
Just before the winners were andata, and now includes aligners, detects
nounced, Kahn said was pleased with
SNPs, indels and structural variation, and
the quality of talks and the entries, and
does comparative genome assembly all
said he had learned a couple of new
with a graphic user interface. The award
things. “Some of the methods are very
grant will be used to extend inGAP to
cool, good ideas… I like the idea of a nonmetagenomics and transcriptomics studlinear representation of the genome, like
ies, said Fangqing Zhao at BioLS.
the [Strand Life Sciences Avadis entry]
Enlis Genomics received the overall
elastic browser stuff. I liked the talk that
award in the commercial category and a
www.bio-itworld.com
Briefs
Top Twelve iDEAs
Enlis Genomics
Genomatix Software
Harvard University, Seqeyes
ImmunoProfiles
Partek
Pennsylvania State University, inGAP
Strand Scientific Intelligence,
Avadis NGS
University of California, San Diego,
STAR Genome Browser
University of Delaware
University of Georgia, DawgPack
University of Tübingen, GenomeRing
VIB, GenomeView
PROTEIN-PROTEIN
AGGREGATION PREDICTION
AccelrysJTIPQJOHUPCPPTUTDJFOUJGJDDPMMBCPSBUJPOBOEFGGJDJFODZBT
well as address a major challenge
JOUIFEFWFMPQNFOUPGCJPUIFSBpeutics with the latest release of
Discovery Studio. The new release
incorporates what Accelrys says is
UIFGJSTUDPNNFSDJBMMZBWBJMBCMF
software for predicting proteinprotein aggregation to advance
CJPUIFSBQFVUJDTSFTFBSDI*UFOBCMFT
protein engineers to identify the
MPDBUJPOPGSFHJPOTPOBOUJCPEies prone to aggregation and then
QSFEJDUTVCTUJUVUJPOTUPJNQSPWF
NPMFDVMBSTUBCJMJUZ
SOUTH AFRICAN
PHARMACOGENOMICS
The Centre for Proteomic and
Genomic Research (CPGR) in South
Africa has joined forces with the
Division of Human Genetics at the
University of Cape Town (UCT), and
the Pharmacogenomics for Every
Nation Initiative (PGENI), to map
genetic traits underlying the efficacy of drug treatments in Southern
"GSJDBOQPQVMBUJPOT5IFDPMMBCPration will form a regional PGENI
Centre of Excellence and, following
a pilot phase, hopes to conduct
large-scale pharmacogenetic studies to correlate the prevalence of
DNA polymorphic traits in local
populations with the efficacy of
DPNNPOMZQSFTDSJCFEESVHT
PATIENT REPORTING
PREDICTIONS
PHT Corporation is freely disseminating its ePRO Modality Tool,
XIJDIFOBCMFTTQPOTPSTBOE$30T
to determine which of five methPET‰TNBSUQIPOFUBCMFU*OUFSOFU
digital pen and hand held device—
is most effective for collecting ePRO
EBUBCBTFEPOUIFJSTQFDJGJDTUVEZ
protocol and questionnaires. By
QVCMJTIJOHUIFUPPMBOEMJGUJOHBMM
previous restrictions for its use,
PHT hopes to promote further ePRO
adoption.
www.bio-itworld.com+6-:|"6(6452011
#*0t*5 803-%
[ 11 ]
CONTENTS
one-year co-marketing agreement with
Illumina. Another entry that impressed
Kahn, Enlis hopes to enable “point-andclick genomics” for biologists rather than
bioinformaticians. “Existing software
packages have been focused on the bioinformatic tasks of assembling a genome,
but our software is the first commercial
package to recognize that for many biologists, the work of connecting genomic
data to biology starts after variants have
been called,” said Devon Jensen, Enlis’
founder. Fast algorithms enable variation filtering and genome comparisons,
and Enlis’ .genome file format wraps all
genomic data into a single compact and
efficient file.
GenomeRing from the University
of Tübingen (Germany) and Partek won
awards for the most creative algorithms.
GenomeRing is an interactive tool to
visualize indels, SNPs, and other changes
in dozens of genomes in a circular, rather
than linear, view by constructing a “SuperGenome” and using the structure to
compare different genomes. “We feel
really honored,” said Kay Nieselt, head
of the integrative transcriptomics group.
“We now hope to be able to apply for new
funding in the area of Visual Analytics in
Bioinformatics. We are very motivated
to continue our work to create new innovative algorithms and visualizations in
the area of next-generation sequencing
technologies.”
Partek debuted Gene-Specific Modeling for the iDEA Challenge along with
the company’s Flow, Genomics Suite, and
Pathway products. Gene-Specific Modeling takes the position that one model will
not fit all genes, for example age affects
some genes and doesn’t others. By using
the algorithm to select the best model for
each gene, users can identify which and
how many genes are affected by which
factors and make more accurate statistical
analysis due to better model fit. “For years
Partek has been recognized as a leader
in making powerful statistical methods
easily accessible to medical researchers,”
said Tom Downey, president. “So, to be
recognized for doing that again by a panel
of renowned scientists is very satisfying.
I’m proud of our team and glad that their
hard work paid off.”
GenomeView from VIB (Flanders,
Belgium) and Genomatix received the
most creative visualization awards. GenomeView enables users to dynamically
browse high volumes of aligned short
read data, with dynamic navigation and
semantic zooming, viewing whole genome alignments of dozens of genomes
relative to a reference sequence. “There
is still a lot of work to be done. Everybody
agrees there is a clear need for visualization tools for genomics data and GenomeView has at least part of the solution.
The iDEA award tells me that we’re
doing something right. Now the trick is
to convert that information into papers
and grant money and we’re good to go,”
said Thomas Abeel, a postdoctoral/Broad
fellow at VIB.
Finally, Genomatix presented several
workflow tools that one judge called “very
intuitive” to cover the complete analysis
of the iDEA datasets from mapping to
the generation of biological networks.
These included Transcriptome Viewer to
interactively inspect transcript expression, splicing graphs and paired-end coverages in one view; a one-step mapping
approach; and ElDorado, Genomatix’
genomic annotation database. “The iDEA
challenge really sparked our interest from
the very first moment we heard about it,”
said Jochen Supper, project manager.
“We felt that getting our hands on a high
quality, diverse dataset like the one Illumina provided would be ideally suited
to try and test our approach of combining multiple lines of evidence to get from
sequencing data to biological results.” x
Up Front News
Ignite Institute Finds a Match at Fox Chase
%JFUSJDI4UFQIBOTQFSTPOBMJ[FENFEJDJOFDFOUFSGJOETBIPNFJO1IJMBEFMQIJB
BY KEVIN DAVIES
CONTENTS
Following the widely publicized demise
of plans to locate the Ignite Institute for
Individualized Health in Northern Virginia, the institute has found a new home
as part of a three-way partnership at the
Fox Chase Cancer Center in Philadelphia.
Only it won’t be called Ignite anymore.
Ignite has been rolled into pre-existing
plans at Fox Chase to build a center for
personalized medicine, says Jeff Boyd, senior vice president of Molecular Medicine
at Fox Chase. “The Ignite Institute and
Fox Chase are working together with Life
Technologies to launch what is now the
Cancer Genome Institute at Fox Chase,”
says Boyd. Ignite’s founder, Dietrich
Stephan, serves as consulting chief scientific officer of the new institute.
While searching for a home for Ignite,
Stephan had forged a provisional deal
with Life Technologies for 100 next-generation sequencing (NGS) instruments.
“After the big Ignite deal in Northern Virginia went away, the relationship between
Life and Ignite went with it,” says Boyd. A
new partnership between Fox Chase and
Life Technologies was announced in June,
although the Ignite name was notable for
its absence in the news release.
“We didn’t mention Ignite [when that
Jeff Boyd
[12 ]#*0t*5 803-%+6-:|"6(6452011
Fox Chase Cancer Center in Philadelphia.
was announced]—that was intentional,”
Boyd explains. “We got tired of negative
reporters who want to dig into what happened to Ignite [in Northern Virginia]
and dredge up that experience.”
“Dietrich’s grand plan was to do personalized medicine in any number of
manifestations—pediatrics, neurological,
cancer, all in one big institute. But he saw
that it made a lot of sense to step back and
silo things out. He’s landed here at Fox
Chase with respect to the oncology piece
of his vision. We had a similar vision.”
For his part, Stephan says he “crisscrossed the country multiple times, looking for a situation where we could land
the whole shebang. It’s hard to build a
$150-million research building.” Stephan
says there were many organizations eager
to get into the personalized medicine
space, even if they couldn’t support a
full-bore TGen or Broad Institute model.
“So my idea was to break Ignite into five
disease models and decentralize” (see,
“Gene Partnership”).
‘Ome Coming
In 2009, Fox Chase had established its
own self-funded nascent Institute for
Personalized Medicine. “Most of the leaders in this space believe this is the future,
www.bio-itworld.com
especially in the cancer arena,” says Boyd.
“We’re not going to get any further with
combinations of cytotoxic drugs. Combination therapies are clearly what we
need to be thinking about, hence analysis
of the tumor, and at some point exomes,
transcriptomes and whole genomes.
Something with ‘ome!”
Fox Chase management hired PricewaterhouseCoopers as consultants to
decide how to evolve the institute into
the clinical arena. “They were helping us
build a business plan, which required a
lot of philanthropy to develop a larger institute of personalized medicine. We were
introduced to Dietrich, and he introduced
us to Life Technologies.” Stephan says he
felt “a lot of allegiance to Life Technologies. They wanted to stick with me. When
Fox Chase started looking real, I brought
Life Technologies in to bring closure to
that deal.”
Boyd says Fox Chase had a lot to offer
as an intellectual, medical, and technology partner. “We’re a free standing NCIfunded comprehensive cancer center, a
northeast location, we have an incredible
biosample repository, top Phase 1 clinical
trial center, and brand new, state of the art
lab space available.”
Details of the Life Technologies deal
Gene Partnership
The other four areas under Dietrich
4UFQIBOTPSJHJOBM*HOJUFVNCSFMMB
XFSFQFEJBUSJDTNFUBCPMJDEJTease, cardiac disease and neurology.
Stephan has found another northeast
home for his interests in pediatrics,
or “germline disease,” in Boston.
“$IJMESFOT)PTQJUBMIBECFFOUIJOLJOHBCPVUTPNFUIJOHTJNJMBS<UPNF>
*TQFOUUJNFXJUIUIBUUFBN5IFZMM
CFQVUUJOHNJMMJPOJOUPBZFBS
effort called The Gene Partnership.”
5IF(FOF1BSUOFSTIJQ5(1
JTCJMMFE
as “a cutting-edge research initiaUJWFUIBUDPNCJOFTUIFJOOPWBUJPOPG
genomic research and IT to create the
SJDIFTUMPOHJUVEJOBMLOPXMFEHFCBTF
of genetic and clinical pediatric data
so on, we’ll have a Fall start for enrolling patients. We’ve sequenced dozens of
exomes, transcriptomes, from all manner
of samples—fresh tissue, frozen tissue,
microdissection, paraffin-embedded, but
haven’t embarked on patient care yet.”
“I think we’re going to leapfrog the
5500 XL and once the Ion Torrent has
reached the stage where we can think
about whole exomes and genomes, we’ll
shift from SOLiD 4s to ultimately [Life
Technologies’] third-generation instrument, based on the Ion Torrent technology. We’re quite optimistic that will
become the industry standard.”
N=1
Boyd says the institute will create its own
model of patient care, focusing on “a rigorous analysis” of druggable targets and
genes in cancer-related signaling pathways. The institute will see patients with
most kinds of cancer, although it does
not care for patients with brain cancer or
pediatric cases.
“We won’t fiddle with the standard of
care for new cancer patients,” says Boyd.
“Pancreatic cancer, stage IV ovarian
cancer, breast cancer, lung cancer, those
might be examples where we utilize genome sequencing out of the gate.”
Initially, Boyd will offer transcriptome
Dietrich Stephan
expand into the other three areas once
UBSHFUFECZ*HOJUFi*UIJOL*WFHPU
enough going on right now,” he says.
K.D.
analysis in tandem with exome sequencing to provide insight into druggable
pathways. “That comes as a package.
We’re not offering full genome yet. But we
don’t think it will be many years until we
offer full genome. It’s a clinical decision:
each patient will have to be considered
uniquely in terms of life expectancy, cost,
etc. It is expected to decrease substantially. There’ll be individual decisions made
for each patient in consultation with their
medical oncologist at the center.”
While Boyd hesitates to say when full
genome sequencing will become routine
for cancer care, he does believe there is
promise in looking at exome sequencing
clinically, rather than focusing on just
a group of “hotspot” genes frequently
mutated in cancer. He says his group will
remain at the front edge of the technology
bell curve.
Fox Chase admits 8,000 new patients/
year, a number that will increase as the
genome center unfolds. “It’s both extraordinarily exciting and a little terrifying at
the same time. But we’ve chosen to devote
a lot of energy and resources to it, and
we’ve cast our lot,” says Boyd.
Stephan expects the center to sequence a couple of hundred patients
this year, ramping up to 2,500 patients
annually. x
www.bio-itworld.com+6-:|"6(6452011
#*0t*5 803-%
[ 13 ]
CONTENTS
are confidential, says Boyd, but he does
say it is a multidisciplinary partnership
involving state-of-the-art technology,
“both from deep sequencing as well as
from an IT/bioinformatics standpoint.
They’re an enormous company with a lot
of depth. This project is quite complex,
more than just grinding up tumors and
looking for mutations in pathways.”
For genome analysis to become a
routine part of clinical care, Boyd stresses
that many issues still have to be worked
out. “Patient flow, charging, informed
consent, CLIA, etc. I think Life Tech is
looking to us to represent how one would
do that, how we’d create such an institute.
We could establish a model for other
institutions to follow that would benefit
the field.”
Fox Chase currently uses some Illumina instrumentation, but Boyd says he
is “satisfied that the SOLiD 4 has the sensitivity and specificity that is comparable
to anything Illumina has to offer.” But he
is also making a bet on the scalability of
the Ion Torrent semiconductor sequencing technology.
The Fox Chase Cancer Institute is currently deploying six SOLiD instruments
in an R&D setting, plus a couple of Ion
Torrent machines. “Once we get plugged
in, sign informed consent documents, and
in the world.” Stephan has high hopes
for TGP, of which he is the executive
director.
i*NBOFNJTTBSZPGTPSUTGPSUIJT
GPDVTJOHPOUIFQSPWJEFSTJEF5IFZMM
CMPXUIFEPPSTPGGUIJTBU$IJMESFOTw
says Stephan. “If this ever takes root as
an integral part of medicine, it has to
CFNPOFUBSJMZTVTUBJOBCMFw4UFQIBO
TBZTBOVNCFSPG#PTUPOCJPUFDIWFUerans, including venture capitalists
/PVCBS"GFZBOBOE4UBOMFZ-BQJEVT
(co-founders of Helicos Biosciences),
were among “a true rock-star team” to
discuss the concept. “Ultimately, the
hospital had enough faith to make the
investment.”
8JUIFOHBHFNFOUTBU'PY$IBTF
BOE$IJMESFOT)PTQJUBMUPNBOBHF
Stephan has no immediate plans to
Up Front The Skeptical Outsider
Big-Bucks
Biology’s Broken
Business Model
BILL FREZZA
CONTENTS
ell me how someone is compensated and I’ll tell
you how they’ll behave,” goes the old adage. If
non-monetary rewards are considered alongside financial remuneration this pretty much
describes why federally funded research in the
life sciences is producing less and less bang for
more and more bucks. And why the scientific literature is at
risk of becoming polluted with overreaching claims, obfuscated
shortcomings, and non-reproducible results.
Scientists labor to discover nature’s truths, not design products. This makes it unfair to demand that they “cure cancer” in
return for living on the public dole. Rather, we expect academic
scientists to report on the fundamental rules that govern health
and disease, passing their knowledge to commercial players to
come up with products and services that improve our lives.
We also realize it can take years, sometimes even decades,
for scientific advances to find their way into pharmacopeia and
physician practice. This makes the benefits taxpayers receive
from supporting scientists both indirect and difficult to measure. Which makes it fair to ask: Is the $31 billion of taxpayer
money funneled to scientists each year through the National
Institutes of Health being spent wisely?
For an endeavor that consumes billions, academic research
remains a cottage industry of individual practitioners called
Principle Investigators (PIs). Their search for truth begins with
the quest for grant money, the mother’s milk of modern science.
PI’s unlock the door to the treasury by having their grant applications reviewed by... fellow PIs.
PIs are business units unto themselves, employing laboratory slaves otherwise known as graduate students, who perform the bulk of the hands-on scientific work. As long as grant
money keeps flowing, PIs answer to no one. They pay overhead
to the universities that give them lab space and in return these
universities confer upon PIs the singular ability to manumit
their lab slaves by awarding them Ph.Ds. One cannot build a
life as a tenured, taxpayer-supported scientist without one.
Lab slaves convert grant money into scientific papers that
bear their PI’s names. Because lab slaves are comparatively
‘T
[14 ]#*0t*5 803-%+6-:|"6(6452011
www.bio-itworld.com
cheap, the level of automation and efficiency in most academic
labs is appallingly low. Hand work and manual data collection
are the rule, both prone to error and vulnerable to selection bias.
Only data approved by the PI gets submitted for publication.
Chasing Impact Factor
Scientific journals exist within a status hierarchy. Publishing
in a high-impact journal gives PIs the one thing they crave
as much as grant money—academic fame. Useful collaboration between grad students working for different PIs is often
discouraged, as too many PI names on the papers dilute fame.
The most prestige goes to PIs who plant their flag first in a new
area, whether they develop it or not. Like the race to the South
Pole, no one cares who got there second.
Before papers can be published they must be reviewed by...
fellow PIs. PIs that review each other’s papers are not tasked
with reproducing results, though politics can certainly play a
role in critiquing conclusions. For some this means turning
peer review into pal review. For others, it might mean delaying
a rival’s pending publication.
If the whole system sounds like a medieval guild, that’s because it is.
We can be thankful that the vast majority of PIs operate
with the highest degree of intellectual integrity. Such a small
fraction of scientists engage in outright fabrication that when
fraud is uncovered it makes national news. But the grey region
short of fabrication covers a lot of
Scientists labor
ground, especially when pumping
$31 billion a year through a medito discover
eval guild system.
nature’s truths,
How many times does an
experiment have to be repeated
making it unfair before it is judged “successful?”
to demand they What if that one-time “success”
can’t be reproduced? How much
“cure cancer.”
inconvenient data gets discarded
on the road to publication? Lab
slaves that give PIs the results that they want, especially results
confirming pet theories, move one step closer to freedom. Lab
slaves that displease their PIs can wash glassware for years, or
wash out with a master’s degree. Or worse, as in the infamous
case of the Harvard chemistry professor who had three grad
students commit suicide before the administration stepped in
to make changes.
Reforming our graduate education system by introducing
more transparency, accountability, and efficiency would help
ensure taxpayers get their money’s worth. Is that too much to
ask of a “War on Cancer” that has gone on for 40 years?
Bill Frezza is a consultant and venture capitalist living in
Boston. He is a regular contributor to RealClearMarkets and
Forbes.com. Bill can be reached at bill@vereverus.com.
The Bush Doctrine
Collaboration
Limits?
ERNIE BUSH
The Target Validation Consortium
Since the core of our business has always been about the “collaborative advantage,” we were approached a couple years ago
by someone who wanted to build a collaborative organization
of pharma companies for the purposes of establishing a “pharmacological targets” validation consortium.
The central thesis of his proposal was that all large pharmaceutical companies have a constant need to discover and
validate novel biological targets as a basis for developing new
medical therapies. In particular, he felt that the target discovery
field is dominated by the academic research community because they are focused on the basic biological sciences needed
to uncover the functional activities of proteins and pathways.
This basic biology is then supplemented by activities needing
larger investments, which include large-scale protein synthesis,
building and executing high-throughput functional screens,
Ernie Bush is VP and scientific director of Cambridge Health
Associates. He can be reached at: ebush@chacorporate.com.
www.bio-itworld.com+6-:|"6(6452011
#*0t*5 803-%
[ 15 ]
CONTENTS
I
n a recent brief for Nature Reviews Drug Discovery, John
Arrowsmith of Thomson Reuters reported the following
information regarding Phase II clinical failures in the
period 2008-10. Of the 108 reported failures, 87 listed
reasons for failure and they were distributed in four categories as follows: Efficacy (51%); Strategic (29%); Safety
(19% both preclinical and clinical); and PK/BA (1%).
When he combined these data with the biological target
areas for the failed compounds, Arrowsmith made the following observation:
“Although it is difficult to draw conclusions from these data,
the finding that a substantial proportion of Phase II failures
were due to strategic reasons suggests that one important
underlying factor could be overlapping R&D activity between
companies with drugs in Phase II trials. This raises the question of whether an increase in collaborative efforts between
companies up to the point of proof-of-concept for novel targets
or mechanisms might be more cost- and time-effective.”
While I know many companies conduct joint development
efforts on a single compound, I find the idea of multiple companies joining to conduct clinical target validation studies, potentially across a range of compounds, a very intriguing idea. It
is one that has come up before, however, but in a more limited
context.
and animal testing capabilities. Of course, all of these supplemental activities are usually provided or funded by pharma
companies.
Unfortunately, these larger investments are often replicated
across multiple pharma companies as is the cost of conducting
early safety studies to determine if the target has unwanted
pharmacology. As it is very common for multiple companies to
be chasing the same targets, the overall duplication of efforts
and expense to achieve a validated target is wasteful and arguably not as thorough or comprehensive as could be achieved in
a collaborative effort. Or so he proposed... unfortunately, when
we tried to formalize and build such a consortium, his employer
decided they could not support the idea of taking collaboration
to that level.
Of course, what is interesting to me is that Arrowsmith’s
suggestion takes this proposal one step further. He is basically
asking, “Why don’t companies collaborate not only on the discovery target validation, but on the validation of the target all
the way through to human clinical studies?” This has the possibility of greatly reducing the number of failures both due to
lack of efficacy but also of reducing the number of failures due
to “strategic” reasons.
In a logical extension of Arrowsmith’s observation, one
could also ask: Why would collaboration have to stop at target
validation? What about collaborative Phase II and/or Phase III
studies? At the very least, having a shared safety database on
compounds hitting the same targets would be of value to the
companies, to the regulators, and to the general public health.
Or am I clear off the reservation?
As such, the larger question becomes one of “what are the
limits of collaboration within the pharmaceutical R&D space?”
Recent years have seen many types of ‘collaborations’ introduced such as the Enlight Biosciences (see, “Big Pharma’s Road
to Enlight(enment),” %LRv,7:RUOG, Sept 2008) co-investment
collaboration, the Pistoia Alliance (see, “The Italian (Informatics) Job,” %LRv,7:RUOG, Jan 2010) open software standards
collaboration and the Preclinical Safety Testing Consortia
biomarker discovery/development collaboration. All of these
share the common thread of multiple pharmas joining together
to achieve an objective that would be difficult or expensive for
any one company to achieve independently; but they have very
different operational characteristics and an even wider diversity
of missions. But I must say that the concept of companies actually collaborating on the clinical development space truly looks
like a significant step beyond anything actively ongoing to date.
If there is real synergy and leverage to be obtained through
collaboration, then why not collaborate across the whole R&D
space? Or is the fact that contract organizations are becoming
increasingly dominate across the entire R&D space really just
another expression of this concept?
Up Front Insights | Outlook
Cloud and the
Next Generation
T
odd Smith is the senior leader of research and applications at Geospiza, now part of PerkinElmer (acquired
May 2011). As senior leader, Smith helps develop the
company’s research roadmap around high-performance computing and ensures Geospiza’s GeneSifter
software scales to meet the future demands of highthroughput sequencing systems. Smith was interviewed by
,QVLJKW3KDUPD5HSRUWVIRULWVODWHVWUHSRUWRQQH[WJHQHUDWLRQ
VHTXHQFLQJ+HUHDUHVRPHH[WUDFWVIURPWKDWLQWHUYLHZ
CONTENTS
On Geospiza’s cloud strategy: I guess we’ve always felt the
cloud would be very important, so we’ve always had that as part
of our strategy. Going forward it becomes more of a technical
implementation of our strategy. So we’re not saying we have
to do more of this or less of it. In our marketing, we probably
stress the word “cloud” more than “application service provider” or other terms we used to employ. So in that sense, we’re
going with the flow, but the cloud’s always been very important
in our strategy, because in general IT costs certainly can be prohibitive in getting started with next-generation sequencing. So
when people try out cloud services and do some experiments,
I think they definitely find some scale issues. When they have
a data center-size operation, they need to consider accessing
someone’s hosted service center versus building their own. I
think those are the kinds of things people consider, and we
need to consider those things as we mature and increase our
business. But I’m going to call them technical implementation
issues. How do you offer more services at a lower cost? That’s
something we focus a lot of energy on.
There’s an appeal to being able to use cloud services for data
storage, most importantly for backup and for the infrastructure
that goes with maintaining the data. In our cost structure, the
way the fees work is transaction-based, so it’s focused on the
analysis.
On third-generation sequencing systems and the way informatics deals with the data: There will be new problems to
solve. I think at one level they will be incremental problems.
One of the very interesting features of Pacific Biosciences’ system is the ability to produce very long sequences, and a lot of
alignment algorithms that are now doing very high-throughput
work are dealing with short sequences. So people have to adapt
those tools to handle longer sequences, and they will. There
[16 ]#*0t*5 803-%+6-:|"6(6452011
www.bio-itworld.com
will be strategies to deal with that. I’m a little less familiar with
Oxford Nanopore in terms of the kinds of data that are coming
out. But largely, they are producing bases like any other system.
There is 20 or 30 years of alignment experience now in the collective community, and if you considered that by individuals
it would be many hundreds of years of cumulative experience.
People will solve those kinds of problems.
Some interesting work is going on using MapReduce kinds
of technologies to make these things super-scalable. Each
new instrument is going to produce new varieties of the data
that people will need to deal with. I don’t think any of these
are going to limit adoption of the technology or be intractable
given the vast amount of experience that now exists. What is
a challenge is what to do with those alignments. How do you
then go that next step and summarize and visualize the information contained in the large bodies of data? I did a FinchTalk
post in which I talked about Illumina’s new HiSeq instrument
and recent articles about cloud computing. Often these conversations focus on the alignment challenges, and yet there’s a
far greater challenge, once you’ve done those alignments, with
using that body of information to understand what your data
means. That’s where I think we’ve done a particularly good job,
and people like what we’ve done.
On library preparation for sequencing: The benefits of
next-generation sequencing override the library preparation
difficulties, and this has been demonstrated in literature. We
certainly see it in plenty of examples. Compared to microarrays, you’re going to get a higher dynamic range in terms of
the sensitivities, so with next-generation sequencing you get at
genes that are less expressed. Also you don’t blow out your signal, if you will, so you can measure high levels of expression to
a finer degree. But more importantly with microarrays you can
only measure with the probes you have on that chip. With the
next-generation sequencing we’re finding that there are many
regions of the genome that aren’t annotated and are showing
expression. These sequences are not on today’s microarrays so
I have a [chance] to discover new genes, gene boundaries, and
exons through next-generation sequencing. This is information
which until now you couldn’t get in a microarray experiment.
Having said that, there are artifacts that you can see in an
RNA-Seq that you’d never measure in a microarray, and those
can get in the way. Ribosomal RNA is an example. It’s very
important to have good preparation methods to remove those
contaminating molecules. So there are some trade-offs. One of
the nice things is that since we have a LIMS product, we can
start to capture laboratory information about the experimental
process. Our strategy integrates that laboratory information
with the analytical information so that people know more
quickly whether their experiments are on track.
Further reading:
/FYU(FOFSBUJPO4FRVFODJOH(BJOT.PNFOUVN.BSLFUT3FTQPOEUP5FDIOPMPHZBOE
*OOPWBUJPO"EWBODFT+VOFXXXJOTJHIUQIBSNBSFQPSUTDPN
Cambridge Healthtech Institute’s Inaugural
FOCUSED SESSION TRACKS
x IT Infrastructure and the Cloud
x Next-Generation Sequencing Data
Management and Interpretation
x Bioinformatics
x Drug Discovery Informatics
CONFERENCE
June 5-8, 2012 | Marina Bay Sands, Singapore
MARK YOUR CALENDARS
Recent advances in scientific technology have left us
confronted with the task of discovering scientific knowledge
from enormous amounts of data generated in genomics,
pharmaceutics, medicine and other life science areas.
Cambridge Healthtech Institute’s Inaugural Bio-IT World
Asia Conference, building upon the exciting momentum of
the flagship Bio-IT World conference in Boston, will provide
an ideal occasion for both the life science community
and the information technology industry to meet and
discuss the challenges and solutions on data management
infrastructure, interoperability, and the complexity of data
analysis in biomedical research and drug development
process.
Official Publication
Bio-ITWorldAsia.com
ADVISORY BOARD
x M. K. Bhan, M.B.B.S, M.D., D. Sc.,
Secretary, Government of India,
Department of Biotechnology,
Ministry of Science & Technology
x Linh Hoang, Director of Genomic
Medicine, Life Technologies
x Chris Blessington, Senior Director,
Marketing & Communications, Isilon
x Krishan Kalra, Chairman & CEO,
BioGenex Labs Inc.
x Stephen Rudd, Ph.D., CSO,
Malaysian Genomics Resource
Centre Berhad
x Peter Little, Ph.D., Research Director,
Life Science Institute, National
University of Singapore
x Parthiban Srinivasan, Ph.D.,
President & CEO, Parthys Reverse
Informatics, India
x Yusuke Nakamura, Ph.D., Professor,
Laboratory of Molecular Medicine,
Institute of Medical Science, The
University of Tokyo; Secretary
General, Office of Medical
Innovation, Cabinet Secretariat,
Government of Japan
x Tin-Wee Tan, Ph.D., Deputy Head
of Department of Biochemistry ,
National University of Singapore
x Han Cao, Ph.D., Founder, CSO,
BioNanomatrix, Inc.
x Laurie Goodman, Ph.D., Editor-inChief, (Giga)n Science, BGI-Shenzhen
x Sean Grimmond, Ph.D., Professor,
Molecular Bioscience, University of
Queensland
x Yike Guo, Ph.D., Professor in
Computing Science, Imperial College
London
x Pauline Ng, Ph.D., Group Leader,
Computational and Mathematical
Biology, Genome Institute of
Singapore
x Alain Van Gool, Ph.D., Head
Molecular Profiling, Translational
Medicine Research Center, Merck
Sharpe & Dohme
ĂŵďƌŝĚŐĞ,ĞĂůƚŚƚĞĐŚ/ŶƐƟƚƵƚĞ
ϮϱϬ&ŝƌƐƚǀĞŶƵĞ͕^ƵŝƚĞϯϬϬͮEĞĞĚŚĂŵ͕DϬϮϰϵϰͮd͗ϳϴϭͲϵϳϮͲϱϰϬϬŽƌdŽůůͲĨƌĞĞŝŶƚŚĞh͘^͘ϴϴϴͲϵϵϵͲϲϮϴϴͮ&͗ϳϴϭͲϵϳϮͲϱϰϮϱ
SPECIAL ADVERTISING SECTION
BEST
New Products
& Services
scan.0%&™/PWFM5PPMTUP&YQMPSF
&YQMPJU"DUJWBUJPO4UBUF%FQFOEFOU,JOBTF$POGPSNBUJPOT
GPS4USVDUVSF(VJEFE%SVH%FTJHO0QUJNJ[BUJPO
Daniel Jones, Daniel Treiber, Ph.D., Sailaja Kuchibhatla
scanMODE™ represents the next generation of activation statespecific kinase assays available on the KINOMEscan™ kinase
assay platform that may be used to gain structural insights in
the absence of cocrystal data and to collect in vitro data most
predictive of inhibitor potency in cellular assays.
scanMODE employs a panel of phosphorylated/nonphosphorylated ABL assay pairs & autoinhibited/non-autoinhibited PDGFR
family RTK assay pairs that facilitate understanding of how activation state-dependent conformational changes may affect inhibitor
affinity and a novel approach to guide the strategic optimization
of inhibitors best suited for specific disease indications.
'FBUVSF#FOFàUT
■ Classify inhibitors as having Type I or Type II binding modes
without a requirement for cocrystal structures
■ Reports on the compatibility of an inhibitor’s binding mode
with the autoinhibited conformation
■ Provides activation state-specific biochemical PDGFR family
RTK inhibition data necessary to predict & interpret potency
in cellular assays
■ Further differentiate inhibitors based on activation statespecific binding
■ Explore & exploit inhibitor binding to diverse activation statedependent kinase conformations
selection decision making. However, binding mode determination
can be difficult, time consuming, and expensive, often requiring the
use of x-ray crystallography or in silico modeling.
Classify inhibitors as having Type I or Type II
binding modes without a requirement for cocrystal
structures scanMODE capitalizes on several key observations
enabling the use of these assay pairs to serve as surrogates to
classify an inhibitor’s binding mode.
■ Type II inhibitors preferentially bind to the nonphosphorylated
state of ABL, whereas Type I inhibitor binding is phosphorylation
state-independent
■ Binding mode is generally maintained across kinases (e.g. imatinib
is a Type II ABL inhibitor and a Type II LCK inhibitor).
■ Inhibitors that primarily target kinases other than ABL are correctly
classified as Type I or Type II when tested against the differentially
phosphorylated ABL assay pairs
■ A significant fraction of known kinase inhibitors have sufficient
off-target affinity for ABL and/or ABL mutants to qualify for scanMODE analysis
*OIJCJUPS#JOEJOH.PEF$MBTTJàDBUJPO
The majority of ATP-competitive kinase inhibitors are classified as
having either Type I or Type II binding modes. Although both Type
I and II inhibitors generally contact the ATP binding site, only Type
II inhibitors access an “allosteric” site unmasked in the inactive
DFG-out conformation. Consequently, Type II inhibitor binding can
be significantly more sensitive to the phosphorylation state of the
A-loop than Type I inhibitor binding.
An inhibitor’s binding mode can impact several key parameters in
drug discovery, including enzyme inhibition kinetics, offsets between
in vitro and cellular potency, nearest neighbor & kinome-wide selectivity, on target residence time & pharmacodynamics, interactions
with upstream and downstream signaling molecules, and intellectual property position. Since the optimal binding mode is likely to
be target-specific, it is an essential parameter to characterize for
multiple leads at program outset and during optimization. When
the optimal binding mode is unknown a priori, a strategy to pursue
two lead series with distinct binding modes can de-risk early lead
[18 ]#*0t*5 803-%+6-:|"6(6452011
www.bio-itworld.com
'JHVSFscan.0%&DMBTTJàFTJOIJCJUPSCJOEJOHNPEFCZNFBTVSJOH
QIPTQIPSZMBUJPOTUBUFEFQFOEFOUBGàOJUZDIBOHFT
'VSUIFS EJGGFSFOUJBUF UIF EFUBJMFE CJOEJOH NPEFT PG
inhibitors within the Type I and Type II classes scanMODE
also includes a panel of PDGFR family RTK assay pairs (CSF1R, FLT3,
KIT) in the autoinhibited (JM domain docked) and non-autoinhibited
(JM domain not docked) states. Unlike the case for ABL A-loop
phosphorylation, both Type I and Type II inhibitor affinities are
dependent on the PDGFR family RTK activation state, with large and
often dramatic preferences for the non-autoinhibited state observed
for all inhibitors tested (Table 2). These binding affinity preferences
are inhibitor-specific and report on the compatibility of an inhibitor’s
binding mode with the autoinhibited conformation. In the autoinhibited
SPECIAL ADVERTISING SECTION
BEST
New Products
& Services
Table 2. Activation state-dependent KIT inhibitor binding provides structural insights
state, the docked JM domain can interfere with inhibitor binding in
two ways: first, by sterically clashing with the inhibitor directly, and,
second, by stabilizing an enzyme conformation incompatible with
inhibitor binding. Whereas inhibitors such as sunitinib (Table 2) and
dasatinib show relatively small affinity preferences and have binding
modes compatible with the autoinhibited conformation, imatinib
and nilotinib binding are sterically incompatible with JM domain
docking and the affinity preferences are much larger (Table 2). Thus,
structural insights are gained by measuring an inhibitor’s affinity
preference for the non-autoinhibited state, the magnitude of which
reports on the compatibility of an inhibitor’s binding mode with the
autoinhibited conformation. Since a significant fraction of known
kinase inhibitors have off target affinity for PDGFR family RTKs,
these data can provide structural insights for inhibitors targeting
kinases outside of the PDGFR family as well.
$PMMFDUBDUJWBUJPOTUBUFTQFDJàDCJPDIFNJDBM1%('3
GBNJMZ 35, JOIJCJUJPO EBUB OFDFTTBSZ UP QSFEJDU interpret potency in cellular assays Because Type I and
Type II inhibitor affinities may depend on the PDGFR family RTK
activation state, it is critical to know the activation state being
queried in biochemical assays when predicting cellular potency and
interpreting cellular data. Figure 3 presents biochemical and cellular
potency data for a panel of KIT inhibitors which show that both in
vitro enzyme activity IC50s and the autoinhibited state Kds can
greatly under-predict cellular potency, whereas the non-autoinhibited
Kd data are most predictive and give the expected potency offsets
(in vitro Kd < cellular IC50) and illustrate how highly potent PDGFR
family RTK inhibitors can be missed in biochemical assays using
enzyme preparations for which the activation state is undefined.
"OFYUHFOFSBUJPOCJPDIFNJDBMUPPMGPSLJOBTF
inhibitor drug discovery
Continued investment in the discovery, development and optimization of kinase inhibitor therapeutics exhibiting improved potency,
selectivity and safety profiles require a new generation of screening
tools and solutions. These tools should provide insight about how
inhibitors interact with kinases and binding mode in order to facilitate structure-guided drug design and provide a strategic approach
to the optimization of a next generation of potent, selective and
efficacious therapeutics. scanMODE is a novel biochemical tool
consisting of an expanding set of activation state-specific kinase
assay pairs that provide a facile, functional solution to inhibitor
binding mode classification that enhances the predictive value, and
facilitates interpretation of downstream cellular data to explore and
exploit inhibitor binding to defined activation state-dependent kinase
conformations.
To learn more about scan.0%&WJTJU
www.kinomescan.com/scanmode
Tel | 1.800.644.5687 www.kinomescan.com
'JHVSF$PNQBSJTPOPGCJPDIFNJDBMBOEDFMMVMBS,*5JOIJCJUPS
potency data
Tel | 1.866.448.4464 +44.121.260.6142 www.discoverx.com
www.bio-itworld.com+6-:|"6(6452011
#*0t*5 803-%
[ 19 ]
SPECIAL ADVERTISING SECTION
BEST
New Products
& Services
Eliminate Paper Source and eCRFs
T
he single biggest cost driver of late-stage clinical
development is the on-site monitoring of paper
source documents. Capturing source data on
paper is costly, error-prone, and time consuming for
both sites and sponsors.
SureSource® Tablet — maintains the natural workflow,
ease of use, and mobility of a paper chart while simultaneously capturing and validating data in real-time —
no need for a separate eCRF to be filled out later and
validated against the original paper source document.
SureSource Portal — review source documents remotely
immediately after a subject visit. Source Data Verification
(SDV) is eliminated; monitors can focus review efforts
on context, trends, and AEs.
,FZ#FOFàUT
■ Reduce study costs 25-30%; eliminate SDV; fewer
queries and on-site monitoring visits
■ Intuitive, paper-like interface; eliminate duplicate site
work to re-enter subject data
■ Offline functionality and real-time edit checks
■ Leverage existing infrastructure investments
■ Import/Export to EMRs via HL7 standards
Can your EDC
do this?
www. clinicalink.com
[20 ]#*0t*5 803-%+6-:|"6(6452011
www.bio-itworld.com
SPECIAL ADVERTISING SECTION
BEST
New Products
& Services
%JTFBTF.VUBUJPOTGPS:PVS/FYU
(FOFSBUJPO4FRVFODJOH/(4
"OBMZTJT
(FOPNF5SBY™
G
enome Trax is a collection of manually curated genome feature data that
enables you to identify human genome
variations of functional significance by mapping your NGS data to known elements such
as disease mutations, transcription factor
binding sites, drug target genes and more.
Key advantages of Genome Trax for NGS
analysis:
■ Quickly and easily identify functionally
relevant variations in genome data
■ Find and display functional non-coding
regions in human genome sequence
■ Filter large numbers of variants by multiple types of mapped
sequence features
Our database will help
you understand the impact
of human variation on disease risk. You can evaluate risk based on diseaselinked mutations mapped
to your human genome
variations, as well as by
mapping novel mutations
to functional features such as regulatory sites,
disease genes and more.
Genome Trax contains unique content:
■ 4,400+ regulatory sites
■ 90,000+ disease linked inherited mutations
■ 152,000+ COSMIC (Catalogue of Somatic
Mutations in Cancer) mutations
■ 877,000+ ChIP-Seq fragments with best
binding site predictions
For more information contact us:
sales@biobase-international.com
/FX.BSLFU4UVEZ/PX"WBJMBCMF
5IF'VUVSFPG/FYU(FO4FRVFODJOH/(4
T
his comprehensive CHI Research Group market study covers developments
and predictions related to the Next-Gen Sequencing (NGS) market. It is
compiled from over 1350 surveys submitted by current and future NGS
users. It is being offered at no charge through underwriting by key NGS technology providers including DataDirectNetworks, Illumina, Ingenuity and PerkinElmer.
Areas covered include:
■ The evolving role of NGS in R&D
■ NGS production, analysis, visualization and storage
■ NGS cloud developments
■ NGS in the clinic
■ NGS outsourcing
■ Much more
Read this valuable study to derive a highly informed sense of where the
NGS market is heading. We invite your continuing post-study comments at our
www.ngsleaders.org website. Download now at www.bio-itworld.com/BioIT/
WhitePapers.aspx!
CHI PROFESSIONAL
MARKETING SERVICES
Market Research Group
www.bio-itworld.com+6-:|"6(6452011
#*0t*5 803-%
[ 21 ]
Clinical Trials
Euphoria over EHR/EDC
Interoperability May be Misplaced
#VEHFUXPFTBOEMBDLPGDMJOJDBMJOWFTUJHBUPSTTUJMMCJHHFSQSPCMFNT
D
BY DEBORAH BORFITZ
CONTENTS
espite noble efforts by the Clinical
Data Interchange Standards Consortium (CDISC) and others to
write a rulebook for the exchange
of patient-level clinical information between electronic health records (EHRs)
and electronic data capture (EDC), interoperability between the two systems
is largely a pipe dream. All this obsessing
over data will not in any case remedy
the budgetary woes of trial sponsors or
the exodus of investigators from clinical
research. Answers to these most pressing
concerns will remain elusive until the
focus shifts to the realities of investigators
and study coordinators at sites initially
capturing the data.
So says Edward Seguine, formerly
an advisory board member of CDISC
and CEO of clinical trial planning software FastTrack (acquired three years
ago by Medidata) and now president of
electronic source record creator Clinical
Ink. Relying solely upon EHR data is
impractical for clinical research focused
on investigational new drugs, excepting
oncology studies where all but the new
treatment is customary medical care.
The proposed clinical research interoperability standard (HITSP IS158), which is
intended to serve as the basis for how data
within EHR systems can be used to support clinical research, instead “makes the
process even more complicated.”
Critically, of the 37 “use case actions”
mapped out for an interconnected environment, 22 have dependencies on systems that don’t exist, says Seguine. Data
monitoring activities, the biggest cost
driver of clinical research, are addressed
merely by referencing a non-existent
Reviewer System or EHRs that magically learn of protocol requirements via
a standardized message from another
non-existent Protocol Development
[22 ]#*0t*5 803-%+6-:|"6(6452011
System. “The [nonsensical] HITSP
day’s technology and business practices”
IS158 standard completely misses the
to export from EDC or Clinical Ink’s
big picture.”
SureSource solution an HL7 standard
It’s no coincidence that the United
document containing study visit data into
Kingdom, with its nearly 20-year-old
an EHR so other physicians are aware
national database of patient informathe patient was involved in a trial, says
tion, is still unable to tap the data for
Seguine. Conversely, “already standardclinical studies or use its EHR capabiliized concepts”—e.g. patient demographics, prior medications,
ties for direct entry of
lab results, and medical
clinical trial data, says
history—could under
Seguine. Interoperabilcertain conditions be
ity proponents this side
taged
imported into research
of the pond will point
to results of recent Condata
from an EHR.
integration
nectathons sponsored
But data-related
doesn’t fully
by CDISC and IHE (Inactivities—data entry,
tegrating the Healthcare
database handling, and
test all the
Enterprise) as “proof
data clean-up—account
that EHR/EDC intefor less than 12% of
pieces together
gration is viable now.”
clinical trial budgets,
and simply
Tellingly, Nextrials is the
according to Medidata
only participating EDC
CRO Contractor calcu‘assumes’ data
company and “that’s belations averaged across
cause this type of staged
phases. Meanwhile, site
exists.
integration doesn’t fully
monitoring and site
test all the pieces tomanagement consume
gether and simply ‘assumes’ data exists
a whopping 43% of the total. “Other
research points out that monitoring [by
from non-existent systems,” says Seguine.
itself ] can be nearly 40% of a large phase
Part of the holdup is CDISC’s close
III study budget,” says Seguine. “As a
ties to the National Cancer Institute (two
result of the convoluted process, project
current CDISC board members repremanagement is over 26% of the total
sent NCI), which have resulted in “overly
study budget. In any other industry that
simplistic views about the best approach,”
would be laughable.”
says Seguine. “In contrast to oncology,
most other therapeutic areas don’t manifest the same treatment dynamic.” About
The Paperless Path
two-thirds of procedures called for in
All of this brings us to the ideals of Cliniclinical trial protocols over the last decade
cal Ink, which include getting rid of all
have no corresponding medical billing
the paper that has made clinical research
standard, including well-known research
burdensome for sites and unnecessarily
instruments such as the Hamilton Deexpensive for study sponsors. Online porpression Rating Scale, says Seguine, who
tals used to disseminate newer versions
recently co-authored research on increasof paper documents is tacit acknowledgment that “paper rules the roost,” says
ing protocol complexity.
Seguine. His goal with the newly minted
Irrespective of therapeutic area, it’s
(CONTINUED ON PAGE 24)
“extremely valuable and easy with to-
www.bio-itworld.com
S
DIA 2011—Compliance,
Collaboration and the Cloud
5SFOETBSPVOEBWBUBSTTPDJBMNFEJBFOSPMMNFOU
BY ANN NEUER
Avatars in the ProtoSphere
ProtonMedia is the Pennsylvania-based
developer of ProtoSphere, a 3-D virtual
collaboration environment for the highperformance workplace that has had
success in the financial services and oil
and gas industries. In ProtoSphere, a
3-D avatar is created for each member
of a clinical team, enabling them to interact in a virtual conference room and
hold face-to-face discussions. Text chat,
voice over Internet protocol (VOIP),
and application-sharing are all enabled,
allowing people to connect in a socially
relevant manner.
CEO Ron Burns says the power of
ProtoSphere is about humanizing interactions. “Collaboration is a human-tohuman interaction—not a document-todocument interaction. It’s about a higher
level of engagement, a two-way discussion. ProtoSphere puts context around
those documents, and makes it easier and
more interesting to transfer knowledge.”
ProtonMedia’s research indicates that
doctors are willing to do an entire education session in ProtoSphere, instead of
relying on conventional Web tools and
slide presentations. “This results in efficiency gains and lower cost, as travel can
%*""OOVBM.FFUJOH$IJDBHP+VOF
be cut,” Burns notes. As reported last year,
Merck has successfully used ProtoSphere
to conduct a virtual poster session (see,
“Drug Discovery in a Virtual Environment,” %LRv,7 :RUOG, August 2010). All
attendees surveyed afterward said they
would participate in another virtual
event, while junior scientists said they felt
more comfortable conversing with senior
colleagues in the ProtoSphere than they
would in person.
“This is about collaboration and learning coming together in the clinical space,
and allowing individuals to have their
voices heard,” says Burns.
Patient Enrollment Benchmarks
Linda Drumright, president and CEO of
California-based DecisionView, is committed to solving delays in clinical trial
enrollment. “Cycle time is taking longer
and costing more, and the biggest chunk
of that problem is patient enrollment,”
Drumright explains.
Enrollment delays have long been an
intractable problem, yet much enrollment
planning and analysis proceeds with little
access to historical data. What data do
exist are often found in homegrown solutions such as Excel spreadsheets. “This is
our main competition,” says Drumright.
“Those spreadsheets are bursting at the
seams and there are no consistencies
from study to study or from department
to department. There’s little visibility
of data and therefore, an inability to set
expectations.”
StudyOptimizer, DecisionView’s Webbased solution (see p. 32), helps life sciences companies deliver clinical trials on
time and on budget by automating four
processes: planning, tracking patient
enrollment, diagnosing problems, and
optimizing enrollment. The solution
leverages predictive analytics and data
visualizations to help study teams monitor actual and projected enrollment in
Safety and Social Media
Elizabeth Garrard, chief safety officer of
Drug Safety Alliance, a North Carolinabased provider of pharmacovigilance and
risk management services, sees growing
concern among sponsors regarding use
of social media. In the age of Facebook
and Twitter, there is little guidance from
the Food and Drug Administration (FDA)
as to what sponsors should do if they become aware of online postings of possible
adverse events. “Sponsors want to know
how they can harness the power of social
media at a time when FDA has given no
direction,” Garrard says.
So far, the only agency guidance is a
www.bio-itworld.com+6-:|"6(6452011
#*0t*5 803-%
[ 23 ]
CONTENTS
CHICAGO—This year’s annual DIA
meeting* featured hundreds of exhibitors
and session topics ranging from patient
recruitment to cloud computing to improving regulatory reporting. There were
several overriding themes including the
continuing march toward eClinical suites,
the changing requirements for safety
reporting, and an increased emphasis on
strategic partnering.
Three companies in particular stood
out to this reporter: ProtonMedia, DecisionView, and Drug Safety Alliance.
near real-time. “As actual data come in,
our forecasting engine shows where you
thought you were going to be, and where
you actually are,” Drumright explains,
adding that customers need industry
benchmarking data to make realistic assumptions for rescuing trials and evaluating new therapeutic areas.
To amass the needed information,
DecisionView is creating an aggregated,
anonymized dataset based on customerprovided information. The input will
then be fed back to customers so they can
apply it to different therapeutic areas.
Once the dataset becomes robust, users of
StudyOptimizer will have access to more
granular information for planning and
creating rescue strategies. DecisionView
expects to have the first group of benchmarks available to Roche, Merck, and
GlaxoSmithKline this summer, folding
that information into a newly released
version of StudyOptimizer later in the
year. Several desired benchmarks have
been identified, such as screening and
randomization rates, recruitment cycle
time, and drop out percentages.
The new version of StudyOptimizer
will incorporate the benchmarks into
the application, enabling comparisons of
these industry benchmarks with a company’s own historical data as users make
planning and rescue decisions for their
trials. Only those who contribute data
will have access to the benchmark information. “We expect to refresh the dataset
quarterly as studies complete and as new
StudyOptimizer customers contribute
their historical trial data,” she says.
Clinical Trials
CONTENTS
lengthy draft guidance on post-marketing
safety reporting dating back to 2001—the
pre-Facebook era—which carried a single
paragraph on a sponsor’s responsibility for
reporting adverse events (AEs) from information gathered online. It does, however,
list four criteria that determine whether
something is reportable. There must be
an identifiable patient; an identifiable
reporter of the AE; a suspect drug or biological product; and a suspected adverse
experience or fatal outcome.
Ten years ago, companies could put
up carefully crafted informational Web
pages, with little or no ability for readers to post personal responses. But with
today’s social interactive media, how does
a sponsor respond to a tweet or blog post
about a possible AE? If a patient posts a
note on a company blog such as, “I lost
consciousness after taking the drug,” or “I
had to be hospitalized,” the sponsor might
be unable to investigate without knowing
many relevant facts. “This is fraught with Safety Alliance provides intake and case
all kinds of unknowns. You don’t know if processing on AEs and the submission of
this person is on concomitant meds, what appropriate cases to regulatory agencies.
The company also offers
dose he or she took or for
global aggregate safety
how long, or anything
reporting services and
else about that person’s
handles risk managemedical history. The only
ith today’s
ment projects, such as
thing you do know is that
risk maps and risk evalthe patient took the drug,
social media,
uation and mitigation
but you don’t know how
how does a
strategies (REMS) to furto follow up,” Garrard
ther refine the risk/bensays.
sponsor
efit profile of products.
Pharma wants to enA new guidance adgage its customers but
respond to a
dressing social media
it must also consider if
tweet or blog
and post-marketing AE
they have a regulatory
reporting is expected
obligation they are not
post about a
from FDA, but the date
meeting. (Some research
is not yet known.
suggests that most of
possible AE?
In the interim, Garthe time, the AE is not
rard says she advises cusreportable because it
does not meet the four criteria.) To help tomers to work within existing regulatory
customers navigate this process, Drug confines. x
&)3&%$
cut total study costs by 25% or more and
produce hundreds of millions of dollars
in annual savings for large trial sponsors,
says Seguine. The return on investment is
calculable based on the number of monitoring visits ($2,500-$5,000 each) and
queries related to SDV ($65-$100 per
cleaning) that can be eliminated.
With the untimely passing of Clinical Ink co-founder Tommy Littlejohn in
March, the company lost some commercial acceleration as well as a respected
peer who befriended everyone he met,
says Seguine. During the development of
SureSource, Littlejohn provided unfettered access to the 11 sites of WinstonSalem, NC-based PMG Research where
he served as president and executive
medical director. Field testing at the sites
ensured data entry into a tablet computer happens in the most expeditious
way possible—be it a drop-down list,
yes/no checkbox, number scale, image,
or handwriting—with the familiar feel of
pen and paper.
The e-source record complements existing data warehousing infrastructures,
and ultimately could supplant existing
EDC systems that output data statisticians
want to analyze, says Seguine. SureSource
(CONTINUED FROM PAGE 22)
SureSource is to create a model for how
sites collect information that reduces the
need for paper-based source documents
and thus source data verification (SDV).
The paper-free world Seguine envisions won’t happen at the hand of Clinical
Ink alone. SureSource provides a Web
portal where study sponsors, monitors,
and site users can review electronic source
documents remotely as visits happen in
real time.
But sponsors and sites would also
need access to other types of information electronically, including regulatory
and informed consent documents as well
as clinical trial results for individual
participants. Seguine feels this broader
information-sharing environment could
be built now using Microsoft’s Sharepoint
and Amalga platforms, but require interfaces that are geared toward sites—not
data managers—to facilitate the gathering
of data and documents.
Clinical Ink is now working with small
biotechnology companies and clinical
research organizations to prove the SureSource concept which, if successful, could
[24 ]#*0t*5 803-%+6-:|"6(6452011
8
www.bio-itworld.com
collects the same information in addition
to the source data investigators must document to demonstrate compliance with
Good Clinical Practice and patient case
history requirements. Importantly, within
the source documents study monitors
can immediately spot an adverse event
that investigators may initially interpret
as clinically insignificant. Indeed, the
frequency and relevance of interactions
between monitors and sites increases
even as the number of face-to-face visits
decline, he adds.
Despite new e-source guidance from
the U.S. Food and Drug Administration,
sites as a rule are not doing direct data
entry into EDC because those systems
were created to address the needs of
data managers and are thus sequentially
out-of-sync with how patients get evaluated, says Seguine. “If EDC could be used
to capture source data directly in front
of a patient, enterprising researchers
would have been doing so long ago. They
haven’t because EDC doesn’t meet their
needs.” And so long as doctors have to
endlessly toggle between forms, setting
off round after round of edit checks, they
will continue documenting patient visits
on paper. x
Computational Biology
Turning Blood into Gold:
The Wellness Chip
8BUDIBWJEFPPG-BSSZ
(PMETUBMLGSPNUIF
(PME-BCTZNQPTJVN
Larry Gold’s SomaLogic
EFUFDUTUIPVTBOETPG
QSPUFJOCJPNBSLFSTXJUI
VOQSFDFEFOUFETFOTJUJWJUZ
and specificity.
Larry Gold has 1,100 proteins on a chip that he believes could be used to indicate disease.
“a longitudinal proteomic biomarker
monitoring company.”
Blood Simple
What could be easier than monitoring an
individual’s health over time via a blood
test? In a few situations, screening blood
biomarkers can be as simple as measuring a single protein, such as in pregnancy
(HCG) or prostate cancer (PSA). But
what if the early—and treatable—presence of cancer or heart disease could be
gleaned in a similar blood test, measuring
a critical subset or “signature” of circulating proteins unequivocally associated
with the disease? The task begins by whittling down the total number of secreted
proteins in blood—the number is around
3,400, or one seventh of the human proteome—to the subset that represents a
validated diagnostic.
Using proprietary reagents called SOMAmers, custom nucleic acids that target
a specific protein, SomaLogic’s current
technology can simultaneously detect
and quantify 1,100 human proteins (see
“The Strength of SOMAmers”). “We’re
a quarter of the way there,” says Gold,
noting that the total number of blood
proteins (including intracellular proteins
released after cell death) is probably
closer to 4,000. “1,100 is already an awful
lot. Nobody else can do more than 20-30
at a time. For the moment, we have an
opportunity to learn a lot of medicine and
biology quickly. Every time we’ve added
proteins to the chip, the performance
gets better.”
The “Wellness Chip” (the term is trademarked) refers to measuring all 1,100
proteins in one assay, providing information on all diseases on the same chip. Of
those 1,100 proteins, Gold says one third
have already turned out to be markers in
various diseases or indications.
He shows me a wall chart in which
all of the current biomarkers are laid out
horizontally like a bar code, with diseases
grouped vertically. The key markers are
color coded for each indication. Interestingly, part of the blue oncology group
overlaps with the red cardiovascular
disease markers, which Gold says might
be indicative of inflammatory pathways.
www.bio-itworld.com+6-:|"6(6452011
#*0t*5 803-%
[ 25 ]
CONTENTS
BOULDER, CO—Fourteen years after
conceiving a tool to discover and measure
protein biomarkers, Larry Gold and his
colleagues at SomaLogic are poised to see
their first diagnostic—a lung cancer blood
test licensed to Quest Diagnostics—reach
the marketplace, perhaps before the end
of the year. This would be the first of a potentially extensive list of diagnostic assays
under development for various cancers,
cardiovascular disease, neurological disorders and neglected diseases. Eventually,
they could be brought together into a single, simple blood test: the Wellness Chip.
“We understand that longitudinal
‘omics is the ball game,” says Gold, the
company’s chairman and CEO. “Whether
it’s proteomics or lipidomics or transcriptomics, snapshots at time T are interesting but a series of snapshots at many
times T are better for managing health.”
Gold says his friends tease him, wondering how such a terrific hypothesisdriven scientist—he sold his former
company NeXstar to Gilead in 1999 for
about $550 million—is content to be
data-gathering ‘omics guy?
Gold smiles and tells them: “I have
a hypothesis: if we can measure more
things than you, better than you, we will
learn more than you know. That’s it!
That’s what all ‘omics is about. That’s why
I don’t dump on genomics. But I don’t
think DNA sequencing or biopsies of
numerous tissues are the best measurements to detect diseases in a way that is
immediate and actionable.”
Gold says SomaLogic aims to become
MATT STAVER
BY KEVIN DAVIES
Computational Biology
The green markers at the bottom are what
Gold calls “the horse****”—pre-analytic
variation largely due to sample acquisition and handling differences.
Big Business
Gold has assembled an experienced executive team to explore the full range of
diagnostic and research applications for
the SOMAmer platform. Among his key
colleagues are two ex-Pfizer executives—
Steve Williams (chief medical officer) and
Nicholas Saccomano (chief technology
officer). (Ed. Note: Saccomano was on
the cover of %LRv,7:RUOG in April 2005
while Pfizer’s senior VP global research
technology.)
Mark Messenbaugh, SomaLogic’s
director of corporate strategy, joined the
firm three years ago, having previously
worked as a lawyer and on Al Gore’s 2000
presidential election campaign. “My guy
The Strength of SOMAmers
CONTENTS
-BSSZ(PMEBOE$SBJH5VFSLJOWFOUFEBQUBNFSTTIPSUPMJHPOVDMFPUJEFTUIBUDBOCJOE
proteins, at the University of Colorado in 1989. The first aptamer drug, produced
CZ(JMFBEBGUFSBDRVJSJOH(PMETDPNQBOZ/F9TUBSJOXBTDBMMFE.BDVHFOGPS
the treatment of age-related macular degeneration. (The drug was successful,
BMUIPVHI(PMEDPODFEFTUIBU(FOFOUFDIT-VDFOUJTXIJDIUBSHFUTUIFTBNFSFDFQtor, is also a very good drug.)
4IPSUMZBGUFS(JMFBECPVHIU/F9TUBSGPSBCPVUNJMMJPOJO+VMZ(PME
was furloughed. But he had already started researching new uses for aptamer
SFBHFOUT(JMFBEBMMPXFE(PMEUPCVZCBDLUIFEJBHOPTUJDSJHIUTUPUIFUFDIOPMPHZ
XIJDIGPSNFEUIFCBTJTGPS4PNB-PHJD
SomaLogic developed a new class of aptamer reagents, which they call SOMAmers (the term stands for “Slow Off-rate Modified Aptamers”). SOMAmers are made
of DNA-containing modified nucleotides with unique chemical and kinetic properUJFT&BDI40."NFSDPOUBJOTBVOJRVFTUSFUDIPGBCPVUNPEJGJFEOVDMFPUJEFT
XJUIBUPUBMMJCSBSZTJ[FPGBCPVU15EJGGFSFOUTQFDJFT8JUITPNVDIWBSJBUJPOUP
choose from, and a development process designed to select against non-specific
CJOEJOHBTJOHMF40."NFSDBODPNCJOFUIFTQFDJGJDJUZPGUXPBOUJCPEJFT(PME
explains:
i8IZEPQFPQMFEP&-*4"BTTBZTXJUIUXPBOUJCPEJFTJOTUFBEPGPOF 5IFSFBTPOJTPOFBOUJCPEZDBOHSBCBQSPUFJOCVUUIFCJOEJOHBGGJOJUZ,E
JTTVDIUIBU
UIFBOUJCPEZXJMMBMTPCJOEUPPUIFS<NPSFBCVOEBOU>QSPUFJOTXJUIMPXFSBGGJOJUZ
5IFSFBSFMPHTEJGGFSFODFJOQSPUFJODPODFOUSBUJPOTJOCMPPEBOEMPHTEJGGFSFODFJOBGGJOJUZ"NPOPDMPOBMBOUJCPEZTN"C
TQFDJGJDJUZJTVTVBMMZCBTFEPO,E‰
ZPVNJHIUFOEVQNFBTVSJOHBMCVNJOPSGFSSJUJO<UXPQSFWBMFOUQSPUFJOT>XIJDI
ZPVEPOUXBOU*GZPVVTFUXPN"CTZPVHFUUPNVMUJQMZUIFTQFDJGJDJUJFTw
4PNB-PHJDIBTGJOBMMZCFFOBCMFUPSFQSPEVDFUIFTQFDJGJDJUZPGUXPBOUJCPEJFT
in a single SOMAmer reagent, in a way that allows for multiplexing literally thouTBOETPG40."NFSTPOBTJOHMFBSSBZ(PMEBENJUTTPMWJOHUIBUQSPCMFNXBTIBSE
CVUiBMMUIFCJPNBSLFSTBSFMJLFMZUPCFEPXOJOUIFXFFETBUWFSZMPXDPODFOUSBUJPOTwIFTBZTi8IFOUIJOHTXFSFOUXPSLJOHXFIBEUPDIBOHF5IFQSFWJPVT
BQUBNFSUFDIOPMPHZXBTOUHPPEFOPVHI8FEJEOUMPTFIFBSU8FLFQUGVOEJOH
coming, and it worked.”
40."NFSTDBOCFHFOFSBUFEJOXFFLTUPWJSUVBMMZBOZHJWFOUBSHFU"GUFSTFMFDUJOHUIF40."NFSUPEFUFDUBTQFDJGJDQSPUFJOQSPUFJOMFWFMTDBOCFNFBTVSFECZ
DPNCJOJOHTBNQMFTXJUIBMMUIFTQFDJGJD40."NFST"GUFSUIFGSFF40."NFSTBSF
EJTDBSEFEUIFCPVOE40."NFSTBSFSFMFBTFEQSPEVDJOHGMVPSFTDFOUMZUBHHFE
SOMAmers ready for high-throughput detection using microarray technologies
(Agilent is used the most at SomaLogic), which in turn gives a readout of the identities and concentrations of the proteins in the original sample. K.D.
[26 ]#*0t*5 803-%+6-:|"6(6452011
www.bio-itworld.com
lost. I wrote a lot of those losing briefs,” he
admits. While working in the non-profit
world, Messenbaugh went to hear Gold
speak at a local business meeting, and
was instantly hooked by Gold’s vision for
the future of health care. “I was smitten!”
he admits. Messenbaugh followed Gold
to the elevators and asked for a job, which
he eventually took just as the SOMAmer
technology was maturing.
The deal with Quest Diagnostics,
worth $15 million at the time, was signed
in 2005. “We’ve raised a lot of money here
without any revenues,” says Gold. (The
first SOMAmer on the market is actually
part of a “hot-start” PCR kit sold by New
England Biolabs.) Now Messenbaugh
and colleagues are laying out the longerterm vision. “How do we move toward the
Wellness Chip?”
Pharma customers clearly like the
technology, using SOMAmers to study
basic disease biology, drug effects, target
discovery and selection. “We recognized
that the tool is more powerful than we
as a small company can ever make use of
completely,” says Messenbaugh. “Pharmas
can create value out of this, but how do
we enable that without standing in our
own way?”
In one study with Bristol-Myers
Squibb, Gold says analysis of blood
samples before and after administration
with an anti-angiogenesis drug candidate
revealed some delayed responses. “We
saw a pattern of what was coming within
a month to help understand the mechanism of action,” he says.
SomaLogic has struck deals with Japan’s Otsuka to use SOMAmers for target
validation in animal models, and NEC to
deliver data analysis tools and, ultimately,
health information via cloud-computing
services, among others. “Discovery is
quite easy here,” says Messenbaugh. “We
can do broad-based discovery on virtually
any clinical question. So far, the successes
are outnumbering failures—by a lot.”
Messenbaugh and Gold recognize the
need to stay disciplined. “This tool will be
great for basic science and understanding
biology,” says Messenbaugh, “but our core
function is driving diagnostic tests into
the market. We have to ask: Is the clinical indication of value? If not, we have to
think about our resources.”
They see a broader impact of SOMAmers for the advancement of medicine. “Could some gene be overexpressed
and thus be a good target for drug development? I think there’s enormous hope
for neglected diseases,” says Gold.
“Wouldn’t it be nice if we could do proteomics on people with single-gene mutations and find something that helps understand the biology or helps with ideas
about therapeutics?” SomaLogic has
programs looking at ALS and Duchenne
muscular dystrophy, but as Gold says, “We
have to be careful not to be drawn away
from our end-goal of powerful, simple,
and fast diagnostics.”
Rule of Four
Larry Gold views SomaLogic as a longitudinal proteomic
CJPNBSLFSNPOJUPSJOHDPNQBOZi:PVIBWFUPQBSUOFSXJUI
TPNFPOFJOUFSFTUFEJO*5wIFTBZTi:PVSFOPUHPJOHUPTJU
BSPVOEXJUIQSPUFJONFBTVSFNFOUTBOEFYQFDUUIF
QFSTPOUPDPNQBSFUIJTZFBSTEBUBUPMBTUZFBS*UTBCPVUIBOdling the informatics around vectors as an aid to health and
disease management.”
i0VSCJPJOGPSNBUJDTHVZTIBWFEFWFMPQFEUIFJSPXOUPPM
TFUOFDFTTBSZUPEPCMPPECBTFEQSPUFPNJDT8FSFDPNNJUUFE
UPEFWFMPQJOHBCJP*5UPPMTFUGPSPVSFOEVTFSTBTXFMM*UT
BMMHPJOHUPCFBCPVUEFDJTJPOTVQQPSUwFYQMBJOT.BSL.FTTFOCBVHI4PNB-PHJDTIFBEPGDPSQPSBUFTUSBUFHZi*UTOPU
KVTUUIFBSSBZBOZNPSF*UTUIFEBUBTFUBOEUIFGJMUFSGPSUIF
EBUBTFU8FWFHPUUPUBLFUIBUGSBNFXPSLJOUPUIFIFBMUIDBSF
world.”
“Google thinks you can get there with a set of non-physJDBMNFBTVSFNFOUT8FUIJOLUIFLFZQIZTJDBMNFBTVSFNFOU
is proteomics,” says Gold. “The algorithm part is actually not
UIFUIJOHUIBUMJNJUTUIFFOUFSQSJTF5IBUTGJHVSJOHPVUIPX
UPNBLFUIFNFBTVSFNFOUT8FXFSFSFBEZUPEPBMHPSJUIN
EFWFMPQNFOUZFBSTBHP8FKVTUEJEOUIBWFUIFEBUBw
The SomaLogic informatics team is a small unit of four staff
MFECZ%PN;JDIJi"LFZTUSFOHUIJTUIBUXFSFGBNJMJBSXJUI
UIFNFBTVSFNFOUEFWJDFT8FLOPXUIFQJUGBMMT*UTBOFWPMWJOHUFDIOPMPHZwTBZT;JDIJ
Ultimately it comes down to understanding the protein
EJGGFSFODFTCFUXFFODBTFTBOEDPOUSPMT‰XIBUJTSFBMBOEXIBU
is an artifact resulting from how the samples were collected,
IBOEMFEBOETUPSFEBTXFMMBTXIBUNJHIUCFBUUSJCVUBCMFUP
BDPNPSCJEJUZBOEOPUUIFEJTFBTFJORVFTUJPO;JDIJTUFBN
CVJMETUIFDMBTTJGJFSTUPEJTUJOHVJTIUIFUXPHSPVQTVTJOHWBSJous machine-learning algorithms–Bayesian classifier, random
forests, clustering and multi-dimensional scaling, or PCA
QSJODJQBMDPNQPOFOUBOBMZTJT
i5IFSFTOPPOFSJHIUXBZw
IFTBZT*ONPTUDBTFTBTVCTFUPGNBSLFSTiTIPVMECF
TVGGJDJFOUGPSNPTUUIJOHTXFSFMPPLJOHBUw
An ongoing challenge is understanding the relationship
CFUXFFOQSPUFJOMFWFMTBOEUIFNBOOFSJOXIJDIUIFCMPPE
samples are collected. Major variations can hinge on the type
PGUVCFOFFEMFHBVHFBOETQFFEPGTBNQMFDPMMFDUJPO5IF
SBUFPGCMPPEGMPXJNQBDUTUIFTIFBSPOUIFQMBUFMFUTXIJDIJO
turn can result in a tenfold difference in some proteins.
i8FIPQFUIJTXPSLXJMMEFGJOFBQSPUPDPMGPSDPMMFDUJPOB
GPPMQSPPGXBZUPDPMMFDUBTBNQMF8FSFTUJMMEFWFMPQJOHBCFTU
QSBDUJDFGPSTBNQMFDPMMFDUJPOw;JDIJTBZTi'PSFYBNQMF1$"
is helping to get a handle on which analytes move in tandem
XJUIBCVTFTPGUIFCMPPETBNQMFDFMMMZTJT
8FSFTUBSUJOHUP
understand certain signatures.”
“The Holy Grail is to recover what the analyte levels were
QSJPSUPUIFBCVTFwTBZT(PMEi:PVEPOUXBOUUP<EJTDBSE>
UIFWBMJENBSLFST<IJEEFOJOUIFTBNQMFIBOEMJOHWBSJBCJMJUZ>
:PVWFHPUUPGJHVSFJUPVUwK.D.
www.bio-itworld.com+6-:|"6(6452011
#*0t*5 803-%
[ 27 ]
CONTENTS
Information Model
The release of the lung cancer test is in
Quest’s hands. Says Messenbaugh: “It’s
an LDT [lab-developed test].They’ll do it
at the pace they consider right.” The test
will enable early detection of lung cancer,
providing an indication whether nodules
are cancerous. Another test for pancreatic
many markers simply shift
up and down in tandem with
others. To better understand
F
PC
B
the underlying biology, Gold is
learning about KEGG, GO and
other pathway tools.
S
Ultimately, Gold sees his
business model as “an information model,” especially if
S SOMAmer — Slow Off-Rate Modified (ssDNA) Aptamer (~40nt)
one uses the term “longituF Fluorophore
dinal ‘omics.” He says: “You
PC Photocleavable group — o-Nitrobenzylether
B Biotin
do your annual test, get your
1,100, 2,000, or 3,000 data
Anatomy of a SOMAmer
points. The computer sends
you a note, ‘Nice job, see you next year.
cancer has also been licensed to Quest. An
We didn’t see anything.’ Or ‘Go see Dr.
ongoing challenge is to standardize the
Finklestein, because you need your
methods for blood sample collection and
[whatever] examined.’
analysis to eliminate variability as much
“Medicine will change over the next
as possible (see “Rule of Four”).
decade: people who get sick will be able to
“One would hope that as you add more
enter the medical system more effectively
markers, you get more perfect, but there’s
than they do today, because they’ll have
an asymptotic plateau,” says Gold. Having
early, even pre-symptomatic access to real
studied some 12,000 blood samples to
information. Nobody’s got time to think
date, SomaLogic scientists have concludthe way you do about your own health.” x
ed there’s a lot of redundancy in biology—
Computational Biology
Open Source Solutions
for Image Data Analysis
'SPNOFVSPOTUPOFNBUPEFTUIFDIBMMFOHFT
in data analysis remain pervasive.
BY OLIVIER MORTEAU
CONTENTS
Ron Kikinis, who runs the Surgical
Planning Laboratory (SPL) at Boston’s
Brigham and Women’s Hospital, admits
that his research field is privileged when
it comes to the tools that have been
developed in the past two decades for
neuroimage analysis. “It’s probably the
most advanced area in terms of image
data analysis,” he says, which he attributes
to a long-term effort by the NIH to fund
projects in neuroimaging.
Imaging analysis exists for clinical
applications in many other organs, but
there has not been as much funding to
develop post-processing for those applications, which consequently tend to lag
behind neuroimaging applications. But
that doesn’t mean that neuronal image
data analysis technologies cannot be improved. For example, most technologies
have focused on group comparison in like
healthy brains. “A lot of tools can do an incredible job in finalizing this type of data.
However, as soon as you go into brain
pathologies, the technology available is
significantly less robust,” says Kikinis.
Advances in bioimaging devices,
which are producing larger volumes of
data of ever greater complexity, mean
“we’re drowning in data”, he says. Images generated by magnetic resonance
imaging (MRI), CT scans and positron
emission tomography (PET), are typically
3-D or 4-D, where the fourth dimension
is time, contrast uptake, or some chemical
parameter.
“How do you process and analyze data
to the point where you see the information that you are interested in?” he asks.
“That usually means some form of processing that consists of throwing away a
lot of data, until the only data left are what
you are interested in.” The key is a combination of acquiring high quality data
by expert scientists and post-processing
[28 ]#*0t*5 803-%+6-:|"6(6452011
using relevant algorithms. “The point
of post-processing is not to decrease the
storage requirements—although it typically reduces data files of several gigabytes
to just a few kilobytes—but to expose the
relevant information in the context of a
particular task.”
High-Throughput Imaging
Anne Carpenter, who directs the imaging platform at the Broad Institute, says
that extracting key information is a task
inherent to bioimage analysis. “That is
just what image analysis is—converting a
large amount of digital information into
a more manageable amount of the most
critical information,” says Carpenter.
Because her focus is high-throughput
screening (HTS), she uses microscopes
Visualization of a brain tumor using 3D Slicer.
www.bio-itworld.com
that generate static 2-D high-throughput
images. The data are usually less complex
than those generated by medical imaging
devices like MRI or CT-scans. “In HTS,
the goal is to take millions or hundreds
of thousands of images and identify the
small percentage of them that has the
characteristics of interest. Conceptually,
that’s very simple, but the challenge is
actually in doing it,” says Carpenter.
Bioimaging and medical imaging possess separate challenges. The structure
of the human brain doesn’t vary much
from patient to patient. But studies of the
nematode (Caenorhabditis elegans), for
example, might involve organisms that
can curve upside down or backwards. The
cardinal features in one image analysis
project can vary from one experiment to
another, says Carpenter. The same is true
with cultured cells. “You can’t align them
to each other in the same way that you
can align a brain to another brain,” says
Carpenter.
From her viewpoint as a cell and
computational biologist, the challenge
of bioimaging merely reflects the level of
physiological complexity of the biological
system studied. Biologists are gravitating
toward much more physiological systems
than before, she says, preferring to work
with whole organisms rather than cultured cells. “However, many organisms
do not have yet their own image analysis
algorithms. C. elegans and zebrafish are
two organisms we’ve been working on.”
And cell biologists, who are often culturing two different types of cells together
(because it keeps the cells in a more
physiological environment), pose their
own challenges. “Whenever you mix two
cell types together, not only is it challenging to get the cells to grow happily, but it
also presents image analysis challenges,
because you are not tuning the algorithm
just to fit one cell type,” she says.
roccessing and analyzing
data usually means throwing
away a lot of data until the
only data left are what your
are interested in.
Ron Kikinis#SJHIBNBOE8PNFOT)PTQJUBM
developed at the Digital Imaging Unit of
the University Hospitals of Geneva, Switzerland, is OsiriX, which is the successor
of Osiris on the Mac platform (Osiris for
PC, still available for free, is no longer
supported). Another software product,
ClearCanvas PACS, was recently released
by ClearCanvas.
The 3D Slicer software package comes
with a set of tutorials so as to be as userfriendly as possible. But 3D Slicer also
targets developers using a plugin architecture. “We want to encourage people to
develop their own things,” says Kikinis.
Although designed for basic research
applications, another interesting feature
of the software is its potential to communicate with clinical devices via the Open
Image Guided Therapy (IGT) Link. The
connection enables 3D Slicer to receive
and send information from a medical
device, allowing it to control a scanner
or a robot, for example. Specific clinical
devices produced by companies such as
BrainLab come with the Open IGT Link.
Carpenter’s team built CellProfiler,
a successful open-source software that
won a %LRv,7:RUOG Best Practices Award
in 2009 (see, “Carpenter Builds Open
Source Imaging Software,” %LRv,7:RUOG,
Jul 2009). The goal was to find an alternative to custom programs, such as
MetaMorph (Molecular Devices) and Image-Pro Plus (Media Cybernetics), which
can be challenging to adapt to a specific
experiment, and to commercial software
that is useful for screens in certain cells
but otherwise limiting.
“CellProfiler is the only high-throughput cell image analysis software in existence that is open source,” Carpenter
says. Not only is it modular and therefore
quite flexible for complicated assays, but
it is also user-friendly; a beginner can mix
and match modules and different image
analysis functions. “We have users who do
low-throughput experiments where they
just count cells in a dozen or so images,
and users who look for a very complicated
phenotype and need to process images in
a cluster and measure hundreds of thousands of images in a round-the-clock
manner,” says Carpenter.
Working with a number of nematode
research groups, Carpenter is about to
release a toolbox of robust algorithms for
C. elegans analysis, and aims to do the
same for the zebrafish. Her group has
also completed a couple of screens in cocultured cells, using machine-learning to
accomplish those projects.
With two different cell types of different textures or size, it is easy to tune one
algorithm to one cell type and a different
algorithm to the other cell type. “But
when you mix them together, both algorithms would have to work on the entire
image, and an algorithm that’s very well
fitted to one cell type might chop the other
cell type into bits, and think that a portion
of the large cell type might be a clump of a
number of the other very small cell type,”
Carpenter says.
The group has developed an algorithm
that “intentionally chops the cells into bits
and then uses machine-learning algorithms to allow the biologist to train the
computer to learn which pieces belong to
which cell type. Then, optionally, you can
piece the cells back together again using
machine-learning.” x
0MJWJFS.PSUFBVJTBDPNNVOJDBUJPOTDJFOUJTU
BUB#PTUPOCBTFECJPQIBSNBDPNQBOZ
www.bio-itworld.com+6-:|"6(6452011
#*0t*5 803-%
[ 29 ]
CONTENTS
Seeing Solutions
As a tool for medical image analysis and
post-processing, Kikinis and his colleagues at the SPL have been developing
the 3D Slicer software package. “I’m a
medical doctor, so I don’t write codes myself, but I’ve been working in interdisciplinary research with computer scientists
for a quarter of a century,” says Kikinis.
3D Slicer has been developed with
NIH funding with no restrictions over
the past several years. “NIH wanted us to
make this software available in a meaningful way, and from our point of view
the most meaningful way was to go completely open source,” says Kikinis.
“Think of 3D Slicer as a big chest of
tools,” says Kikinis. For example, Kikinis
and his colleagues rely on a proven imaging method called diffusion-weighted
imaging (based on the local microstructural characteristics of water diffusion)
that is used to study the organization of
the brain’s white matter. 3D Slicer offers a
suite of tools to do rapid post-processing
of these images.
“You would first filter for noise reduction,” he says, “then do an estimate of the
diffusion tensor of the diffusion-weighted
images, and finally do some form of phase
streamline analysis inside the diffusion
tensor file.”
3D Slicer offers a versatile solution for
biomedical imaging analysis. Many software packages overlap various aspects of
3D Slicer, but none cover all of its applications, says Kikinis, and none are compatible with both Mac and PC. One offering,
P
‘
Expert Intelligence for Better Decisions
Next-Generation
Sequencing
Generates
Momentum:
Markets Respond to Technology
and Innovation Advances
This report focuses on current and innovative
NGS technologies, services and markets to
answer such questions as;
) Which early NGS market entrants have been continually
improving and updating their original systems?
) Who has introduced new scaled-down instruments to
broaden the market?
) What Generation 2.5 systems featuring new detection
technologies and single-molecule sequencing are now on
the market?
) When will third generation of instruments led by
nanopore technologies be entering the commercial
feasibility stage?
) Why upstream sample handling is undergoing continual
technological innovation?
) Which informatics providers are moving rapidly toward
fully integrated systems to provide the rapid generation
of actionable biological information?
For more info & to order:
InsightPharmaReports.com
Insight Pharma Reports
a division of Cambridge Healthtech Institute
250 First Ave., Suite 300, Needham, MA 02494
T: 781-972-5400 Toll-free in the U.S. 888-999-6288 F: 781-972-5425
Biot*58PSMET
2011 BEST
S P E C I A L R E P O R T:
PRACTICES
Awards
The Select Six Best Practices
CONTENTS
A
s always, we are dedicating our summer issue
to a showcase of the winners of our annual Best
Practices Awards competition. The six winning
entries—from CliniWorks, Collaborative Drug
Discovery, GlaxoSmithKline, Merck, Novartis, and Oxford Nanopore Technologies—were introduced at the
Bio-IT World Expo in April. Their stories are presented
in the following pages.
This year’s competition attracted 34 entries and
prompted much frank deliberation among our judging
panel, as they sought to identify the most important,
novel, and potentially impactful collaborations and ideas
from basic research and IT infrastructure to translational
medicine. We believe that the winners of the 2011 Best
Practices Awards offer some exciting stories that highlight the value of ingenuity and collaboration, impacting
areas including drug discovery, diagnostics, and clinical
research. We hope that some of these advances will have
resonance across portions of the industry.
As always, thanks to our panel of 13 guest judges for
volunteering their time and insights. We congratulate
not only our winners (and their nominating organizations) but everyone else who took time to enter this year’s
competition.
We will have news about the make-up and timing of
the 2012 awards in our next issue.
— The Editors
Best Practices Awards 2011
Enrollment Modeling Results in
Productivity Gains for Merck
i l o t t e s t i n g D e c i s i o n V i e w ’s
StudyOptimizer provided Merck
& Co. with ample evidence that the
predictive analytics platform significantly improves the odds of clinical trials
getting done on time and within budget.
The clinical enrollment optimization and
decision support tool has since become
the standard for large phase II and III
studies across legacy Merck, and will
$MJOJDBMBOE)FBMUI*53FTFBSDI
CONTENTS
Winner:.FSDL
Nominator: %FDJTJPO7JFX
Project: $MJOJDBM&OSPMMNFOU0QUJNJ[BUJPO
soon be the standard for big-investment
studies across legacy Schering-Plough as
well, says Christopher Heider, director
of information technology at Merck. The
two former rivals merged in late 2009.
Merck expects to receive the same
return on investment across the entire
portfolio of trials that it observed in the
pilot studies, says Heider. These include
a reduction in overall cycle time variance
by approximately two to eight weeks, reduction in the time trial managers spend
aggregating study data by roughly 50%,
improvement in timelines and accuracy
of reporting study data to management
by 50%, and reduction in the time trial
managers spend identifying recruitment
and data cleanup issues by 20%.
The insights and productivity gains
Merck has achieved using the enrollment
modeling capabilities of StudyOptimizer
were recognized in April with a %LRv,7
World Best Practices Award in the category of clinical and health-IT research.
StudyOptimizer is a “leading-edge,”
cloud-based application for planning,
tracking, and optimizing clinical trial enrollment performance, says Heider. Eight
of the ten top global pharmaceutical companies are now customers. Traditionally,
patient enrollment projections and course
correction strategies were based on the
[32 ]#*0t*5 803-%+6-:|"6(6452011
experience and intuition
of study managers with inconsistent and often costly
consequences. Merck previously used a custom solution that looked only at
first patient enrolled/last
patient enrolled and the
number of sites. It had no
way to model additional
variables such as the impact of additional sites,
vendor tactics, and screen
failure ratio differences
between geographies.
The tool automates the
business process of enrollment through a collaborative platform, allowing
Chris Heider, director clinical trial operations IT, Merck;
headquarters and regional
David Hilmer, director of sales, DecisionView
trial management teams
to work together to create realistic enrolllowing the impact of various strategies to
ment plans, validate plan assumptions,
be visualized, Heider says. When several
test multiple scenarios, and approve a
targeted countries dropped out of one dibaseline against which performance will
abetes study during enrollment with only
be monitored, says Linda Drumright,
seven months remaining, for example,
president and CEO of DecisionView.
Merck’s Global Trial Optimization (GTO)
Underperforming sites can be quickly
group was able to use StudyOptimizer to
pinpointed and closed, and rescue sites
develop three recovery strategies using
identified to keep studies on track.
validated assumptions about new and existing countries: the number of additional
sites that could be brought on board, siteTrial Testing
ready ramp-up time, and fluctuations in
Importantly, StudyOptimizer captures
screening rates during winter holidays.
current enrollment plans and historical
Seven months later, the diabetes trial
enrollment metrics in a single database,
finished enrollment within three weeks
updated from the organization’s clinical
of projections.
trial management system nightly. Study
Feedback from Merck’s GTO group,
managers no longer need to aggregate
the primary users of StudyOptimizer,
data into Excel files to see overall study
has been extremely positive, says Heider.
enrollment progress. As part of the projStudyOptimizer gets a daily data feed
ect with DecisionView, Merck centralized
data loads from disparate sources—interfrom Merck, which is loaded into the apactive voice recognition, electronic data
plication to update projections, estimate
capture, and central lab systems—encompletion dates, display alerts, recaliabling creation of this single “source of
brate the forecasting model, and execute
truth,” Heider says.
many of the tasks once performed manuThe “real value” of StudyOptimizer
ally. Trial managers can thus concentrate
comes from underlying algorithms that
their efforts on analyzing trends and spotproduce usable charts and graphs, alting potential problems. x
www.bio-itworld.com
MARK GABRENYA
P
BY DEBORAH BORFITZ
Clinical Imaging: Focus on Service
N
BY ALISSA POH
*5*OGPSNBUJDT
MARK GABRENYA
relevant health care standards, including
DICOM and CDISC.
When deployed at clinical trial sites,
ImagEDC enables clients to produce images compatible with trial requirements,
without additional processing from the
responsible CRO. These clean, [patient]
de-identified images are then stored in a
local repository that includes a tracking
service to record their receipt and other
workflow events.
One might figure that physical bandwidth for image transfer between Novartis and a study partner could be a bottleneck, but Baumann and Snyder disagree.
“Several generic and specialized solutions
[to maximize bandwidth] work very well,
and we don’t seek to supplant these with
ImagEDC. Data format and quality issues
are far more pertinent.” The approach is
also cost-effective: with NIAI and ImagEDC replacing manual image reads,
Novartis estimates that it has reduced
the cost of each applicable clinical trial by
about $80,000.
“We were interested to share our success story with NIAI and encourage adoption of ImagEDC, to promote interoperability between sponsor and vendor
infrastructures,” says Snyder of Novartis’
decision to participate in this year’s
Best Practices competition. “Given
the excellent work submitted by
our competitors, it was definitely
gratifying when the judges announced our win; it’s validation
that we are working to a valuable
purpose.”
Snyder, Baumann, and their
colleagues at Novartis are also
organizing a Pharma Image Exchange interest group to help govern the rapidly evolving landscape
of image processing, “We hope that
SOA-based tools like ImagEDC
will be increasingly adopted, and
that we’ll see an encapsulation of
more image processing and workflow functions as services,” Snyder
says. “Service-based exchange of
data quality requirements, for example, is an important next step.” x
www.bio-itworld.com+6-:|"6(6452011
#*0t*5 803-%
[ 33 ]
CONTENTS
ovartis’ open-source platform,
Winner:/PWBSUJT*OTUJUVUFGPS
ImagEDC, first rolled out in 2010
#JP.FEJDBM3FTFBSDI/*#3
(see, “Novel IT Platform Helps
Project: Novartis Image Analysis
Novartis Gain Control of Clinical
*OUFSGBDFBOE*NBH&%$
Imaging Data,” %LRv,7:RUOG, Nov 2010),
continues to make waves in the IT and
implementing infrastructure.”
informatics world. It garnered a Best
Until recently, Novartis researchers
Practices Award (IT & Informatics cathad to deal with a “closed box” workflow,
egory) at this year’s Bio-IT World Expo, in
where several different parties were inconjunction with Novartis Image Analysis
volved in any one trial. Each—from the
Interface (NIAI), the company’s fully auimaging CRO to core labs—had its own
tomated image analysis workflow system.
systems and processes, with no standards
ImagEDC, a nifty combination of
enabling decentralized data storage or
service-oriented architecture (SOA) and
removal of patient-identifying informagrid computing, gives researchers greater
tion. This was an inefficient process, and
control and ownership of imaging data
potentially compromised research quality.
across multiple clinical trials. Basically,
it enables smooth data transfer between
trial partners, using caGrid-enabled
Transparent, Trackable Data
Web services for high performance and
The imaging team at Novartis developed
security.
a plan to manage clinical trial data that
According to Stefan Baumann, head of
would be transparent, trackable, and
Novartis’ clinical imaging team, and Josh
easily configurable, with real-time qualSnyder, an imaging infrastructure expert
ity control enabled through faster image
at the company, incorporating innovative
transport between study partners. They
image processing techniques—even into
turned to caGrid, an open-source middlesmall, exploratory trials—is currently no
ware product capable of supporting parteasy task. “If you look at other industries,
ners with different IT proficiency levels
such as travel or banking, interoperability
and budgets, and compliant with the
between data sources and consumers has grown tremendously in
recent years, and the resulting
ease of data exchange has created
huge advances in data driven applications,” Snyder says. “Not so
for imaging; we believe there is significant opportunity in this space.”
Meanwhile, there are increasingly complex needs that come
with this opportunity—data quality requirements, interface standards, and workflow modularization, to name several. ImagEDC
offers the flexibility to adapt as
these requirements change. “We’ve
delivered not just a solution for
interoperability, but a working,
open source reference software
package,” says Snyder. “Vendors
Thierry Cladé, solution architect, Novartis;
and other sponsors can use this
Stefan Baumann, head of clinical imaging, NIBR
to accelerate their own efforts at
Best Practices Awards 2011
GSK’s Helium Rises to the Top
F
BY ALISSA POH
,OPXMFEHF.BOBHFNFOU
Winner:(MBYP4NJUI,MJOF
Nominator: $FJCB4PMVUJPOT
Project: )FMJVNJO&YDFM"/FX1BSBEJHN
GPS%BUB*OTJHIU
that researchers would find Helium’s
“wrapper” comfortingly familiar, while
the tool retained Spotfire’s functionality.
Helium mines and reveals relationships between integrated data stores.
Based on a data’s “type”—say, a compound, gene, target, or project code—
Helium suggests complimentary data
from disparate sources. For instance, if a
project code is entered, Helium prompts
scientists to retrieve associated compound numbers and places these in a second column. If the scientist clicks on this
column, Helium offers
data relationships such
as “Compound Number to Structure,” or
“Compound Number to
Biological Result.”
Helium can thus
generate a vast lexicon
of data mashups, all
via commands in plain
English. Researchers
need not know where
data is stored, its format, or even the specifics of running a query.
Helium’s advent enabled the retirement of
Richard Bolton, strategic IT portfolio manager and Ashley George,
two key systems within
director of strategic IT portfolio for discovery, GlaxoSmithKline;
discovery: a toolset for
Tom Arneman, president, Ceiba Solutions
retrieving biological
data from GSK’s in-house systems; and
probably overlapped in a lot of cases.”
a bespoke chemistry spreadsheet. Both
Unsurprisingly, the idea of creating a
of these were “complex to maintain and
“Swiss army knife” approach to all of the
required significant training,” says a
company’s SAR needs proved popular.
company representative; scientists find
The first version of Helium was based on
Helium much more flexible and intuitive.
TIBCO Spotfire, but the average bench
Developing Helium involved plenty of
researcher found Spotfire difficult to
end user ownership and interaction, acnavigate. Then in 2009, GSK purchased
cording to GSK. A senior researcher from
ChemAxon’s suite of tools—JChem for
discovery headed a group of 10 users covExcel, Instant JChem, and JChem Carering all disciplines and sites within the
tridge—and modified Helium to utilize
company’s R&D; the group met weekly
Excel’s spreadsheet format. The idea was
MARK GABRENYA
CONTENTS
amiliarity breeds contempt, it’s said,
but not in the case of GlaxoSmithKline’s most recent tool for SAR
analysis: Helium, which has the
ubiquitous Microsoft Excel at its core, and
received %LRv,7 :RUOG
V Best Practices
Award in the Knowledge Management
category this year.
Helium was first conceived several
years ago as GSK sought to streamline
data access for its scientists. Then, GSK
managed data based on how it was stored,
not how it was used within workflows.
This resulted in “siloed” datasets requiring a slew of laborious steps before scientists could use the information. Researchers were left with a mishmash of tools and
resources that, as a GSK representative
puts it, “did some things really well but
[34 ]#*0t*5 803-%+6-:|"6(6452011
www.bio-itworld.com
to review Helium’s progress and put the
product through its paces on “live” data.
GSK opted for a gradual, “viral” release
of Helium in 2010. Users passed Helium
along to peers if they felt the product
would be useful, and this word-of-mouth
approach “actually worked very well.”
Currently, Helium is employed by over
1,400 users in GSK’s discovery domain,
a number expected to rise to 3,500. This
tool has “dramatically increased” productivity and scientific knowledge interchange, the company says, besides eliminating over 30% of IT infrastructure.
Branching Out
Plans are already afoot to commercialize
Helium’s functionality for a broader market, starting with GSK’s biopharma and
preclinical spaces. The company is working with IT company Ceiba Solutions on
this active expansion of Helium into new
domains, which involves updating core
functionalities such as security, and will
also necessitate preliminary feedback
from new user groups.
“Ultimately, R&D IT leadership strives
to provide their researchers with realtime, comprehensive insights across disparate data sources,” says Ceiba Solutions’
president Tom Arneman. “In developing
Helium, GSK realized this goal, drastically reducing license fees for point search
portals, and saving scientists valuable
time and focus.” Ceiba nominated the
product for Best Practices consideration
this year, as “Helium is a visionary solution to a very real and growing need for
the industry.” A GSK representative admits that they had few expectations with
regards to Helium’s entry, as competition
in their category was “very strong indeed;
we were surprised and very pleased at the
announcement [of Helium’s win].”
Like many of its pharmaceutical peers,
GSK is moving away from huge data
stores, and adopting Web 2.0/3.0 for
sleeker data integration, while requiring
minimal specific training for its scientists.
Helium, designed to be sophisticated yet
user-friendly, will be “a crucial element in
effecting this cultural change.” x
Accelrys Pipeline Pilot Guides ONT’s
Nascent NGS Data Handling
A
BY KEVIN DAVIES
lthough still in stealth mode,
Oxford Nanopore Technologies
(ONT) recently revealed details
of the GridION hardware that
will form the basis of its next-generation
sequencing technology as well as protein
analysis and other applications. And as its
Best Practices Award shows, it has been
laying the groundwork for an effective
and flexible informatics solution as well.
3FTFBSDIBOE%JTDPWFSZ
“In the face of staggering estimates
for the all-inclusive cost and complexity
of NGS analysis, simply providing a new
instrument is only half of the story,” says
ONT senior scientist Richard Carter.
The British company believes in offering simple ways for scientists to analyze
NGS data while retaining the flexibility
to adapt to “a rapidly shifting landscape of
analysis methods and algorithms.”
After assessing several commercial
and public options, ONT elected to
partner with Accelrys, agreeing to offer
a version of the Pipeline Pilot NGS Collection as its recommended platform for
NGS data analysis. Already deployed
in some 1,300 institutions, the Pipeline
Pilot workflow software appears to be a
good choice. After all, “Pipeline Pilot is
the computational underpinning for all
Accelrys products,” says Clifford Baron,
product marketing director.
A bioinformatician himself, Carter
collaborated with Accelrys to develop
the NGS collection and created a series
of workflows that reflect analyses performed on a broad range of publications.
“It’s relatively simple even for a novice
user of Pipeline Pilot to create useful and
powerful applications using the NGS
Collection,” says Carter. “In little time
No Best Answer
Launched in early 2011, the NGS Collection for Pipeline Pilot consists of some
150 components for analyzing NGS data,
including quality assessment and processing, assembly and mapping, variant detection and profiling, and transcript and
ChIP-Seq analysis. From ONT’s standpoint, Pipeline Pilot’s use of graphical application development and application integration provides the data management
and algorithmic building blocks needed
to develop customized NGS analyses in a
relatively accessible environment.
“It’s all about empowering your bench
scientists,” says Carter. For example, one
user of the system used the software to
run an analysis of the publicly available
German food poisoning Escherichia coli
data. In a handful of mouse clicks and a
couple of hours, a de novo assembly had
been performed and the sequence compared with other strains in Genbank.
Carter has created several NGS workflows using out-of-the-box Pipeline Pilot
components. One calculates GC content
in a genome and compares it to depth of
coverage, helping scientists to spot outliers. Carter also integrated the popular
Circos plot (now a standard component
in the NGS collection) for visualizing genomic variation such as SNP prevalence
Richard Carter, Oxford Nanopore
or gene density.
The software appears well suited to the
properties of ONT’s technology when it is
launched. ONT’s GridION is designed to
acquire and analyze data in real time so
that experiments can be monitored and
adjusted as they are being performed.
The range of analyses in the NGS collection facilitates the “Run until…” function,
where users will choose to sequence until
a pre-determined experimental outcome
has been achieved.
“Our customer surveys indicate that
Pipeline Pilot saves 30-70% development time,” says Baron. Trevor Heritage,
Accelrys’ senior VP, adds that he is “really
excited” about the new NGS Collection.
“We’re not prescribing an out-of-thebox packaged solution... We’re offering
a workflow-oriented platform with the
scientific brains to read the data in an intelligent way and do the analysis on top.”
ONT believes that Pipeline Pilot can
help address the rising challenges of data
analysis and the level of expertise required to perform it. “That’s what makes
it such a powerful and important tool,”
says Carter. x
www.bio-itworld.com+6-:|"6(6452011
#*0t*5 803-%
[ 35 ]
CONTENTS
Winner:0YGPSE/BOPQPSF5FDIOPMPHJFT
Nominator: Accelrys
Project: %BUB1JQFMJOFTGPS/FYU(FOFSBUJPO
4FRVFODJOH"QQMJDBUJPOT
and without requiring scientists to learn
sophisticated analysis software, Pipeline
Pilot helps scientists ask relevant, scientific questions about [NGS] data.”
With the growing number of NGS software algorithms available, selecting and
configuring the best tool is a tricky, even
risky business. “There is no universal ‘best
answer’ when it comes to NGS analysis
algorithms,” says Carter. “Analysis of NGS
data is far from a settled science.” What
bioinformatics teams need, he says, are
systems to compare analysis algorithms
and organize data processing workflows
for their various user groups quickly and
efficiently, minimizing repetition.
Best Practices Awards 2011
Providing Patients as a Service
C
BY ALLISON PROFFITT
ambridge, Mass.-based CliniWorks’
new software-as-a-service platform,
AccelFind, allows real time clinical
data mining and patient screening
from medical records to streamline planning and recruitment of clinical trials. It is
fully HIPAA compliant, protects patient
privacy, and is capable of incorporating
data of any source, format, structure and
content. The platform’s promise caught
the IRB. They only need IRB approval for
those 30 [or so appropriate patients] that
the system identified.”
Faster Feasibility Studies
This saves time and money, Sneh says.
High quality, iterative feasibility studies
can be done in a few days or even a few
hours, rather than weeks or months.
Recruiting can be compressed by 3-6
months, not only by shortening the frontend exploratory part but also by being
+VEHFT1SJ[F
able to target only the most promising
sites with known and quantified availWinner:$MJOJ8PSLT
ability of suitable candidates. The acProject: AccelFind
celeration at each phase can accrue to a
significant reduction in time to market,
the judges’ attention and earned it the
or faster decision to eliminate a drug
2011 %LRv,7:RUOG Best Practices Judges’
candidate from the pipeline, leading
Prize.
to reduced clinical development costs
AccelFind is a specialized natural(approximately $50,000 per day). For
language processing platform with a
successful drug candidates the ROI is
vast conceptual terminology database,
even higher: earlier revenue
as well as syntax and context
and longer patent protection
analytics. “Our search engine
(value can be in the vicinity of
could be compared to Google,
$1,000,000 per day).
but Google is looking for keyThe last year has been
words… while in our case, what
marked by rapid growth for
we’re looking for is relationCliniWorks. In December
ship between words,” explains
2010, AccelFind successfully
Nitzan Sneh, CEO.
concluded a 140-study pilot of
Sponsors access the system
sponsored phase II or phase
to plan a recruiting strategy
III studies. Sneh lists current
for a clinical trial. The platcustomers including pharma
form converts existing medical
companies like Novartis (in
records from any number of
rare diseases), Merck (in oncolinstitutions (from databases,
ogy), CROs like Parexel, and
transcriptions, or scanned cophospitals including a health
ies) and other notations (from
doctors’ notes, nurses’ notes,
information exchange of 11
or lab reports) into a unified
hospitals in Texas using the
and universally usable form
program for internal quality
using language rather than the
and safety studies.
Nitzan Sneh, founder and CEO and Udi Meirav, executive chairman
and co-founder, CliniWorks
structure of databases to deciWith 10 employees in Campher the meaning of medical
bridge and eight at a whollypart of a document. A site or sponsor
data and place it accordingly. AccelFind
owned subsidiary in Tel Aviv, Israel, Clinican used AccelFind to scan the patient
then searches and analyzes the unified
Works is small, but Sneh expects to hire
population before getting IRB approval,
data against any set of inclusion/exclusion
five to six new staff by year end. He says
Sneh says, because the data is completely
criteria, intelligently sifting through free
that winning the Best Practices award
anonymous. “Users can freely search for
text entries and accounting for context.
was a very personal triumph for the team.
anything they need because they won’t
The system is “very sensitive to the
“Many [employees] consider this prize as
be exposed to any patient identifiers, so
meaning of vocabulary,” says Sneh. For
recognition for their own contribution.
they can go and search before they go to
example, AccelFind can distinguish effecThey are very proud.” x
MARK GABRENYA
CONTENTS
tively between the statements “patient has
heart disease”, “patient expressed concern
about heart disease”, or “patient has family history of heart disease”. Researchers
can screen millions of patients against a
complex set of inclusion and exclusion
criteria with instantaneous feedback.
For example, Sneh says, “the system
can screen the medical records of the
entire population looking for a hemoglobin level between 7 and 9. Only 30%
of the time is the result found in the lab
results section, the rest of the time it’s
everywhere—comments made by the
physician, lab summaries, etc.”
AccelFind has put great emphasis on
patient privacy, removing all data from
HIPAA identification fields, not just
patient name, date of birth and address
from the structured fields of a medical
record but also from any mention or reference that might be buried in any other
[36 ]#*0t*5 803-%+6-:|"6(6452011
www.bio-itworld.com
Tuberculosis Cloud Collaborations
ollaborative Drug Discovery has
developed a molecular library database to serve a network of over 100
tuberculosis researchers in the U.S.
and Europe, helping their users mine and
collaborate on tuberculosis data. Their
efforts, nominated by the Tuberculosis
Research Section, NIAID, NIH and the
Global Alliance for TB Drug Development, earned them the 2011 %LRv,7:RUOG
&EJUPST$IPJDF"XBSE
Winner:$PMMBCPSBUJWF%SVH%JTDPWFSZ
Nominators: 5VCFSDVMPTJT3FTFBSDI
4FDUJPO/*"*%/*)BOE5IF(MPCBM"MMJBODF
GPS5#%SVH%FWFMPQNFOU
Project: $PMMBCPSBUJWF%SVH%JTDPWFSZ5#
database
www.bio-itworld.com+6-:|"6(6452011
#*0t*5 803-%
[ 37 ]
CONTENTS
Best Practices Editors’ Choice Award.
With funding from the Bill and Melinda Gates Foundation and other investors,
CDD collated at least 15 public datasets
on Mycobacterium tuberculosis, representing well over 300,000 compounds
derived from patents, literature and high
throughput sequencing data.
“Any new chemoinformatic system
should provide, at minimum, capabilities
for fundamental data storage, retrieval,
and analysis of diverse data originating
from chemistry, biology, pharmacology,
and toxicology activities,” explained the
TB Alliance in its nomination. “Ideally
such a system would be Web-based so
that any participating laboratory could
use it without further investment in
hardware. The system should be intuitive
so that new participants can learn the system with minimum training. In addition
to fundamental chemoinformatic tools,
such a system should be able to enhance
collaboration among researchers in the
same field, the community.”
Enter Collaborative Drug Discovery.
“The Gates Foundation made us a grant
to fund specific groups that needed to
use the software for collaborations either
within the institute or between institutes
or between institutes and companies,” explains Sean Ekins, CDD’s collaborations
director. CDD took their Cloud-based
application and developed
software specifically for the
TB community. The initial
grant was for two years
awarded in 2008, but has
been extended to five.
CDD’s database allows
collaborators to share research data securely within
and across organizations
without the need to install and maintain complex
software. CDD runs on a
fault tolerant infrastructure providing redundant
storage, compute nodes,
power, HVAC, and backbone connections. The
infrastructure is also redundantly secure, protected by multiple layers of
host-based, network and
physical security measures.
CDD software runs on a
MySQL database and was
developed using the Ruby
and Java programming
languages. The tool was
Sean Ekins, Collaborative Drug Discovery
developed using an agile
their data into vaults that enable sharing
development process which uses an intewith specific collaborators, says Ekins.
grated design-build-test process.
“But there’s another component of the
CDD’s TB database fosters data ardatabase. There’s a public side where we
chiving and selective sharing within
have some datasets, and we’ve done anthe research community and enhances
notations around TB—sort of curation
creation of computational models, said
of data from the literature around comthe Tuberculosis Research Section in its
pounds,” he says.
nomination.
“This award acknowledges two years
of software development and support for
TB Curation
TB research groups funded by the Gates
Having public and private screening data
Foundation, and is a credit to all CDD
available against M. tuberculosis enables
users and community members who have
researchers to analyze the biological achelped guide our technology over the
tivity vs. physicochemical properties of
past seven years in the cloud,” said Barry
compounds in the database, said the TB
Bunin, CDD’s founder.
Alliance. “Consequently, this database
“This gives me a rare opportunity to
has also been used to build novel compupublicly recognize the exceptional actational machine learning and pharmacocomplishments of our software developphore models that could be used to filter
other libraries of molecules to rapidly
ment and product team. We would like to
identify potential M. tuberculosis-active
thank our nominators and collaborators,
compounds.”
as well as the editors of %LRv,7:RUOG for
The software allows users to segment
this prestigious award!” x
MARK GABRENYA
C
BY ALLISON PROFFITT
Best Practices Awards 2011
#FTU1SBDUJDFT&OUSJFT
CLINICAL AND HEALTH-IT RESEARCH
KNOWLEDGE MANAGEMENT
Company | Nominator | Project
Company | Nominator | Project
WINNER: Merck | DecisionView | Clinical Enrollment
Optimization
WINNER: GlaxoSmithKline | Ceiba Solutions | Helium in Excel: A New
Paradigm for Data Insight
JUDGES’ PRIZE: CliniWorks | AccelFind
EDITORS’ CHOICE AWARD: Collaborative Drug Discovery | Tuberculosis
Research Section, NIAID, NIH and TB Alliance (Global Alliance for TB Drug
Development) | Collaborative Drug Discovery TB database
Abbott Laboratories | eLearning
Abbott Vascular | ClearTrial | Optimizing Budgets and Resource
Demand Across the Clinical Trial Portfolio
Accelrys | Synthesis and Process Route Planning
Roche | ePharmaSolutions | Safety Letter Distribution (SLD)
Application
Pfizer Business Information Systems, Business Operations,
Pharmaceutical Sciences, Global R&D | Composite Software | Rapid
Deployment Technology Program
Ochsner Health System | Orion Health | Implementing a System
Wide HIE
Merck | Oracle | Enhanced ELN Performance via Oracle Exadata
DIA | Phlexglobal | TMF Reference Model
Harvard Catalyst, the Harvard Clinical and Translational Science
Center | Recombinant Data Corp | Profiles Research Networking
Software
CONTENTS
National Cancer Institute’s Cancer Therapy Evaluation Program |
SAFE-BioPharma Association | Research collaboration in the cloud:
How NCI and Research Partners are using Digital Identities to
Accelerate Drug
Rota Consortium, South Africa | Synexus Clinical Research |
Effect of Human Rotavirus Vaccine on Severe Diarrhea in African
Infants
PPD | PatientView
Pfizer Global Research and Development | Oyster Imaging
Collaborative Portal
Genomics Institute of the Novartis Research Foundation | The Gene
Wiki – community annotation of gene function
RESEARCH AND DISCOVERY
Company | Nominator | Project
WINNER: Oxford Nanopore Technologies | Accelrys | Data Pipelines
for Next Generation Sequencing Applications
IT & INFORMATICS
Company | Nominator | Project
WINNER: Novartis Institute for Biomedical Research | Novartis
Image Analysis Interface and ImagEDC
FDA Division of Animal Research, Center for Veterinary Medicine
(CVM) | IO Informatics | Species-independent drug toxicity and
disease markers
BrainCells | Accelrys | CIVET: Cohort In/Ex Vivo Experiment
Tracker
Strand Life Sciences | A global RNAi screen analysis leads to the
identification of key regulators of heart function
London School of Hygiene and Tropical Medicine | Accelrys |
Whole organism high-throughput drug screening of Schistosoma
mansoni
Smithsonian Institution | Biomatters | Biocode LIMS
University of Texas Southwestern Medical Center at Dallas |
Elucidation of evolutionarily stable, immunologically reactive regions of
human H1N1 influenza viruses through integrative data analysis using the
Influenza Research Database
UCLA Laboratory of Neuro Imaging | Isilon | Unified storage
infrastructure
Janssen Pharmaceutica | I/NI-calls: a statistical search engine for the
relevant genes in ’omics studies
University of Florida, Interdisciplinary Center for Biotechnology
Research | ScaleMP | ICBR needed a local system that would
allow scientists and researchers to submit large interactive jobs.
However, the price point of large SMP systems was prohibitive to
ICBR.
PPD | REMS Technology Solution (Risk Evaluation and Mitigation
Strategy)
Janssen Pharmaceutica | Computer-based mechanistic disease model
of schizophrenia to predict therapeutic effect of new investigative drugs
ERT | EXPERT
[38 ]#*0t*5 803-%+6-:|"6(6452011
Selventa | Novel mechanism-based classifiers to predict patient
response before availability of clinical treatment outcomes data
www.bio-itworld.com
YOUR OPINION
IS NEEDED!
The Cambridge Healthtech Market Research Group in conjunction
with #JPt*58PSME, F$MJOJRVB and other partners is conducting an
industry market research study on:
“The Future of Clinical Trials”
Findings will be released beginning in October 2011. If you are
currently involved in, or will soon be engaged in clinical trials work
then please join us in helping to design the survey by answering
just 5 short questions. Click here to access this survey.
For your time and input you will be provided results from the first
study module released, and entered into a bonus prize drawing.
Thank you.
CHI PROFESSIONAL
MARKETING SERVICES
Market Research Group
CLICK HERE TO ACCESS THIS SURVEY
For vendor sponsorship
information regarding
this study contact:
Alan El Faye
$BNCSJEHF)FBMUIUFDI.FEJB(SPVQ#JP*58PSME
BFMGBZF!IFBMUIUFDIDPN
Next-Gen Data
Genome Analytics for All
1BVMJOF/HJTQMBOOJOHPQFOTPVSDFPQFOBDDFTTBOBMZUJDTGPSUIFHFOPNFTUPDPNF
S
BY ALLISON PROFFITT
CONTENTS
INGAPORE—Pauline Ng’s office is
the Genome building of the Biopolis
science park in Singapore, a fitting
home for one of the authors of the
first published personal genome, that of
J. Craig Venter, published in 2007 while
Ng was a senior scientist at the J. Craig
Venter Institute.
Now Ng leads an expanding group of
three bioinformaticists (she’s hiring!) at
the Genome Institute of Singapore (GIS).
Before her stint at the Venter Institute, Ng
worked for Illumina as well as the Fred
Hutchinson Cancer Center in Seattle,
where she wrote the powerful SIFT algorithm (http://sift-dna.org), a widely used
tool to predict the effect of a given amino
acid substitution on protein function.
“We put the algorithm on a Web
server,” she said. “Ten years ago people
would publish their algorithms, but they
wouldn’t necessarily put them on a Web
server. But my Ph.D. advisors were very
emphatic, ‘You need to do this.’ That actually was very informative, because people
used it. That opened it up for clinicians
and geneticists to use the algorithm, instead of just pure bioinformaticists.”
Ng believes that access is very important. “What’s happening is [sequencing
is] accessible to academic institutions like
GIS. We can sequence; we can analyze
that data. The Broad Institute, University
of Washington, Baylor—these are very
highly regarded institutions with collaborations with a medical center. But if you’re
anyone else, you may not have access to
those types of resources.”
In 2009, Ng co-authored a muchdiscussed Nature commentary outlining
an agenda for personalized medicine in
which they compared the results of two
commercial consumer genomics tests.
They found that the accuracy of raw data
in both 23andMe and Navigenics tests
was high, but one third of risk predictions
(for five anonymous individuals) did not
agree between the tests. A disappoint-
[40 ]#*0t*5 803-%+6-:|"6(6452011
ing result for Ng. “At that
time I though, wow, there’s
something not quite right,”
she said.
“When you get a health
diagnosis, you don’t consider it a prediction, you
expect it to be correct. Just
like you go to the doctor
and he says, ‘Take this drug
because you’re at risk for Pauline Ng
heart disease,’ or something. But if you went to another doctor
and they said something else, it would
reduce the credibility overall.”
GIS a Job
Ng moved to Singapore in 2010, but
hasn’t quite shaken her discomfort. “All
of this together: working on individual
genomes, making tools that are accessible
to everybody, and just getting exposure to
direct-to-consumer [tests]” has shaped
what she now hopes to do at GIS: make
bioinformatics accessible to everyone.
Like SIFT, Ng’s next tools will be open
source. “The plan is not to let just doctors
access the software, but really anybody.”
She acknowledges that bioinformatics
is “a bit specialized,” but also believes that
the patient is his own best advocate. She
cites Hugh Reinhoff ’s work on his daughter’s DNA (see, “Hugh Reinhoff ’s Voyage
Round his Daughter’s DNA,” %LRv,7
World, Sept 2010). “There’s someone with
a huge self interest in finding out what
is wrong with his daughter. That’s one
example, but you can probably imagine
all across the world there are families like
this where doctors probably don’t have
time or resources to do it. But if there
truly is a $1,000 genome, that means that
for $5,000 they can get the full family
sequenced.”
Affordable sequencing is still a limiting factor, but Ng is confident in that
progression. And the types of diseases
that Ng hopes to address need full genome sequencing. “The 23andMe data,
www.bio-itworld.com
they’ve squeezed as much
as they can from it. But
the applications—cancer,
Mendelian disorders—
they’re tailored toward the
rare variants or somatic
variants which you need
[to get] from sequencing.”
She expects that to be easy
enough to outsource in
about two years.
But sequencing and
analysis—today at least—cost the same.
“The problem is that right now, companies like Knome are actually charging the
same amount for bioinformatics as they
are for sequencing. If you sequence more
individuals, I’d expect the bioinformatics
to go down, but it’s the same price. That
means the price is double! If we can make
these tools online, accessible for free or at
least at cost, I think I can get it to a tenth
of the cost.”
Ng plans to do the computation on
the Amazon Cloud and, at today’s rates,
expects a genome analysis to cost $500.
She hopes that these price points will
enable doctors and individuals to use
genomics. “If we could say, OK, outsource
[the sequencing] to these companies.
You’re going to get a hard disk. Mail it to
Amazon and get your results in a week.”
Ng is not promising a magic cure,
and doesn’t even think that this model
should be the only one. She just hopes to
drive prices down and open the market.
“There’s never a guarantee of an answer,” she says. “Even with the software
we write, there may not be a guarantee
of an answer, but at least…” she pauses
and begins again, emphatically. “We can
definitely give you the basic annotation
and provide the tools that everyone uses.
And if it doesn’t work, then you go to an
expensive company that really uses the
same tools as the academics but with a
couple of more bells and whistles. If you
try our stuff first, at least you’ve invested
only $500 instead of $5,000.” x
Sequencing at ‘Biblical Proportions’
5IF6OJWFSTJUZPG2VFFOTMBOET4FBO(SJNNPOEHFUT
UIFGJSTUQFFLBU*PO5PSSFOUTOFXUFDIOPMPHZ
Sean Grimmond heads the
University of Queensland’s
sequencing center
BRISBANE, AUSTRALIA—Sean Grimmond, director of the Queensland Centre
for Medical Genomics at the Institute for
Molecular Biosciences in Brisbane, was
the first lab in Australia to obtain (premarket) the Personal Ge5IJTTUPSZPSJHJOBMMZ
nome Machine (PGM)
ran in Australian
Life Sciences.
from Ion Torrent.
Unlike second-generation sequencing platforms, Ion Torrent’s technology
foregoes optics, lasers, and cameras to
quantitatively measures changes in pH
generated by hydrogen ions released
during nucleotide incorporation. “Relative to how much of a particular base is
added, you get a quantitative difference
in the amount of hydrogen ions released,”
says Grimmond. Those pH spikes are
translated into base calls and nucleotide
sequence within a matter of seconds. The
PGM is essentially a sophisticated pH
meter. The chip inside comprises millions of tiny wells for the samples sitting
on millions of tiny electrodes. The PGM
offers several advantages says Grimmond.
“The data file sizes are small, and the way
it actually analyzes and measures the
nucleotide incorporation is quick. Generating reads of about 120 bases takes less
than two hours.”
Moreover, he says, “Converting changes in pH directly into a base call means
much smaller files sizes” than other 2ndgen platforms. “You really could run those
machines pretty well all year without
emptying the hard drives, whereas we run
the SOLiD machine twice and then we
have to move data to make more room.”
Scaling Up
The early PGM machines come with a
“314” chip, containing about 1.2 million
wells and matching electrodes. A newer
“316” chip (6-8 million wells) is about to
be released, and within a year Ion Torrent
is planning to release the “318” chip which
will comprise some 25 million wells (Ion
Torrent says it is aiming for read lengths
of 400 bases). Each new chip offers a
theoretical tenfold increase in sequence
throughput.
“Using the exact same machine and
sequencing platform, you can go from
generating 1 million base reads to 25
million reads, and with that many wells,
www.bio-itworld.com+6-:|"6(6452011
#*0t*5 803-%
[ 41 ]
CONTENTS
BY FIONA WYLIE
we are getting into the 1-gigabasepair
range of data in around two hours,” says
Grimmond.
The SOLiD instruments generate
about 100 Gb over two weeks and are
still Grimmond’s preferred choice for
sequencing human genomes. “But for
smaller and more tractable sequencing
applications such as the transcriptome or
microRNAs, or candidate DNA mutation
analysis, or microbial genomes, the PGM
is ideal,” he says.
Grimmond leads Australia’s effort as
part of the International Cancer Genome
Consortium (ICGC). In the ICGC program, the PGMs are validating patient
mutations initially detected using SOLiD
instruments and to address questions of
clinical significance. “For example, we can
now do very deep sequencing on samples
from the tumor margins using primers
that will detect every mutation found in
the parent cancer, and in this way more
closely define the risk of metastases—this
is particularly critical in the case of pancreatic cancer,” says Grimmond.
The PGMs are also helping to validate
every DNA variant found in the cancer
genomes. “We can cut some corners to
pick up some of those variants, but for
the novel ones we really need validate
around 200 mutations per individual.
Ion Torrent allows us to automate our
primers, hone in on the regions that we
think will have mutations, PCR them up
and sequence them all on the chip and
then move on to the next one quickly and
easily,” says Grimmond.
“If they can make a silicon chip that
determines DNA sequences the size they
are now and sell it for ~$200, and you can
generate enough long reads, it would be
very easy to make a bigger chip that could
generate a human genome in two hours,”
says Grimmond. “The detection system
needed is virtually already built—they just
have to work out how to get the molecular
biology down to fit in with the more and
more sophisticated chips.”
Grimmond predicts, “we will be reaching data sizes of ‘Biblical’ proportions in
the near future—then you really might
start seeing one on every bench.” x
Next-Gen Data
Charges Continue to Fly over
Ion Torrent Sequencing Licenses
3FTFBSDIFSTBSFVOIBQQZXJUIIPXUIFUFDIOPMPHZXBTMJDFOTFEBOEXIPHPUDSFEJU
attribution of credit—or lack thereof—for
results published in an important paper
Six years after publishing details of the
co-authored by Pourmand and Davis in
first commercially available next-gener2006 (and cited in the new Ion Torrent
ation sequencing (NGS) system, by 454
Nature paper).
Life Sciences, Jonathan Rothberg and his
Those feelings were resurrected in the
colleagues at Ion Torrent have published
past month after Pourmand turned to
the first results from a new desktop NGS
%LRv,7:RUOG and other media to express
technology today, also in Nature.
his frustration with the Stanford-Ion TorAll 44 co-authors are (or were) emrent licensing deal.
ployees of Ion Torrent or its parent com“It is very surprising to me that Nader
pany, Life Technologies, including Kevin
is claiming right now that ‘Hydrogen
McKernan, one of the architect’s of Life
generation [during DNA sequencing] is
Technologies second-generation SOLiD
my patent, my invention,’” said Hassibi,
platform, who recently left the company.
The new paper includes an
overview of the sequence (at
tenfold coverage) of Gordon
Moore, the co-founder of Intel
and the author of the famous
Moore’s Law concerning the
growth of compute processing
capacity.
Meanwhile, charges continue to fly over the origins of
some of the key technology
that Ion Torrent licensed from
Stanford University’s Office of
Technology Licensing (OTL).
Recently, two scientists, Stanford’s Ron Davis and his former
colleague, Nader Pourmand Arjang Hassibi
(University of California, Santa
repeating a quote from Pourmand in a
Cruz), complained publicly that Stanford
%LRv,7:RUOG story. “They want to erase
OTL undervalued their technology durany memory of what happened before
ing negotiations of an exclusive license to
that. This I have a problem with.”
Ion Torrent. Now those complaints have
sparked a strong response by another forThey Did Great
mer Stanford colleague, Arjang Hassibi.
In an exclusive interview with %LRv,7 For Hassibi, it is a matter of principle
and seeking fair credit for past intellecWorld, Hassibi, now an assistant profestual contributions. Like Davis and Poursor at the University of Texas in Austin,
mand, he is receiving licensing fees from
says he and others played a key role in
Ion Torrent, albeit less than the paltry
the development of “charge sequencing”
$2,300 that irked Pourmand. “Let’s be
technology. In 2001, Hassibi co-founded a
clear: Ion Torrent hasn’t done anything
biotech company called Xagros Genomics
wrong. Stanford hasn’t done anything
with Pourmand. But the two men fell out
wrong. Some scientists came up with a
over the demise of the company and the
CONTENTS
UNIVERSITY OF TEXAS IN AUSTIN
BY KEVIN DAVIES
[42 ]#*0t*5 803-%+6-:|"6(6452011
www.bio-itworld.com
good idea. We [Xagros] failed. Another
company—Ion Torrent—picked it up and
they did great.”
The issue of credit for what Ion Torrent calls semiconductor sequencing—
and potentially financial compensation
down the road—has become more acute
following Ion Torrent’s acquisition by Life
Technologies in 2010 for $375 million
(potentially rising to $725 million).
Among dozens of patents it has inlicensed, Ion Torrent acquired exclusive
licenses to two related Stanford patents.
The first—Stanford docket S00-157
“Charge Sequencing: A New
Technique for DNA Sequencing and SNP Detection” (priority date October 2001)—lists
Hassibi and Pourmand as the
inventors. The second—docket
S04-291 “Charge Alternation
DNA Detection System” (priority date November 2004)—has
three inventors: Pourmand,
Davis, and Miloslav Karhanek.
“Based on the rule of ‘Success has many fathers, failure is
an orphan,’ there will be many
‘fathers’ for this technology,
including Nader,” said Hassibi.
“However, he is completely
distorting the story right now
by claiming all the credit for himself.
I believe this to be neither ethical nor
constructive if [as he claimed] he wants
to improve Stanford OTL’s licensing
processes.”
Signal Detection
The sequencing squabble dates back to
2000, when Hassibi, then a Ph.D. student in electrical engineering at Stanford
University, first met Pourmand, who
was a postdoc at the Stanford Genome
Technology Center (SGTC). Pourmand
and Mostafa Ronaghi (now Illumina’s
chief technology officer) were work-
STANFORD
UNIVERSITY OF CALIFORNIA SANTA CRUZ
Hafeman, a founding scientist at Molecular Devices,
becoming chief technologist.
According to Hassibi,
as the assay development
lagged other aspects of the
technology, management
decided to put Hafeman
in charge of that project.
“Nader first agreed and the
project got on track partially
and we all were very hopeful that we would pass this
speed bump,” says Hassibi.
One day, however, Hassibi came to
work to find that Pourmand had cleared
out his desk. “He said that due to health
reasons, he could not work in a start-up
anymore,” Hassibi said. Hassibi and Pourmand later met in person at Stanford, but
Pourmand was unhappy with the way
Xagros was operating, and had decided
to rejoin Davis’ group at Stanford. Pourmand encouraged Hassibi to do the same,
which infuriated Hassibi.
“I told him I had quit my Ph.D. and
put 2.5 years of my life without back-up
plans or getting any academic credit—and
now he wants me to come back? I also
mentioned that the investors relied on
us two and had put serious money into
Xagros and we were responsible for the
other employees.” The two agreed to try
to co-exist. “Our last handshake was that
if we ever decide to publish this work, we
would do it together,” said Hassibi. But
that did not happen.
Hassibi said Pourmand refused to
hand over government-funded projects
on which he was listed as the PI, or to
negotiate the status of his outstanding
shares. A second round of company
financing fell apart, and Xagros finally
went out of business in 2004. Ultimately,
the Stanford patents went back to Stanford, while other patents were abandoned
(some related to CMOS chips), as there
was no money to support them.
Nader Pourmand
for characterizing molecular interaction
and/or motion in a sample.” Hassibi later
called the technology Charge Perturbation Signature (CPS).
The core of Hassibi and Pourmand’s
original patent, said Hassibi, is this: “If
polymerization happens near an electrode, you see an explosion of ions. ‘Ion
Torrent’ is a perfect name for a company
commercializing this specific technology,
although I am not sure if they named it
because of this. Now, to simplify it, one
can say it is pH. Initially, when we were
marketing Xagros, we said, ‘It is negative
charge.’ But the explanation is more complicated than that. These ions move, you
have diffusive processes, then you detect
it if they get to an electrode.”
Fall Out
In 2001, Hassibi and Pourmand decided
to jump on the biotech start-up bandwagon and form a company called Xagros
Genomics (named after the Zagros mountains in Iran, but with the obligatory ‘X’
instead). Xagros exclusively licensed
docket S00-157 from Stanford and got
funding from Tempo Ventures.
Hassibi’s task was to create the sequencing hardware, sensor and the semiconductor chip, while Pourmand was in
charge of assay development. A strong
advisory board included Davis, Lee, and
Berkeley’s Richard Mathies.
Hassibi said they obtained funding because “CPS was semiconductor-compatible... the story was music to the investors’
ears.” Pourmand, a family man, remained
affiliated with Stanford, but Hassibi, still
a graduate student, opted to join Xagros
full-time. Sia Ghazvini joined the company from Combimetrix as CEO, with Dean
PNAS Envy
Hassibi eventually returned to Stanford
to get his Ph.D. (designing CMOS chips
for biosensing and sequencing), but had
no contact with Pourmand and Davis.
It was during his last semester in 2006
that a colleague showed him a paper by
www.bio-itworld.com+6-:|"6(6452011
#*0t*5 803-%
[ 43 ]
CONTENTS
ing with Davis on pyrosequencing, the sequencing
technology that underlies
the Roche/454 next-gen sequencing platform.
“I remember clearly, one
afternoon in the spring of
2000, talking with Nader
in the second floor of Stanford Center for Integrated
Systems (CIS). He asked me
whether we can detect any
electrical signal in the DNA Ron Davis
structure and if yes, whether
I could build the electronics for it,” Hassibi recalled.
Pourmand argued that DNA must
have an associated charge, because it migrates during electrophoresis (the basis
of Sanger sequencing). The next time the
two met, Hassibi proposed to place some
DNA near an electrode and connect it to a
high-impedance voltage amplifier to see if
a length difference resulted in a different
charge (or voltage) signature.
Hassibi designed a setup in the IC
design lab of his then advisor, Thomas
Lee, using iron needles as electrodes
and micro-titer plates, while Pourmand
prepared magnetic beads with primed
DNA attached to them. “We placed a
small refrigerator magnet on top one of
the needles to immobilize the beads (and
DNA) on one electrode. We placed the
electrodes in the polymerization buffer
and added dNTP. What we saw wasn’t
conclusive, but there were distinct fluctuations when polymerization was supposed to happen,” said Hassibi.
Hassibi and Pourmand, who both hail
from Iran, began a close collaboration.
Based on those early data, they filed a provisional patent in October 2001, which
became a full patent the following year.
“We initially called the technology
‘charge-sequencing,’ inspired by pyrosequencing,” said Hassibi. [The Stanford
docket S00-157 and provisional patent
both had this title.] But Hassibi argued
that “what we are seeing is not the DNA
charge but essentially some perturbation
in the charge equilibrium near the DNA
(and near the electrode).” This, he says, is
why the title of Hassibi and Pourmand’s
US patent 7,223,540 is “Transient electrical signal based methods and devices
Next-Gen Data
CONTENTS
Pourmand and colleagues in the Proceedings of the National Academy of Sciences,
contributed by Davis, entitled “Direct
electrical detection of DNA synthesis.”
“I was appalled and could not believe
what I was reading in the paper,” said
Hassibi. “There was no reference to me or
Xagros—that pissed me off. Much of the
data and the methods were developed at
Xagros by me and others and [Pourmand]
simply published it without acknowledging any of it. It seems that they came up
with technology by themselves and they
are trying to erase [our contribution].”
Hassibi has not spoken to Pourmand
since the publication of the PNAS paper.
Hassibi subsequently learned that Pourmand and Davis had filed an incremental
patent “to overshadow” their 540 patent.
“They should have involved us as coinventors,” said Hassibi.
Pourmand had also resubmitted Xagros’s SBIR grants to receive NIH funding, which further upset Hassibi, but as
he was still an international graduate
student on an F1 visa, he decided not to
pursue any further action.
Pourmand Response
Contacted by %LRv,7 :RUOG, Pourmand
says he “highly respects” Hassibi’s work
and down plays any talk of a disagreement. He points out that the original
2001 patent “didn’t mention anything
about hydrogen [ions]. We saw the signal
generation based on incorporation of
nucleotides. We solved electrical detection. [The 2001 patent] showed that we
could detect electrically dNTP incorporation—but still at that time we couldn’t
understand where that comes from.”
Pourmand says he left Xagros because
the company was increasingly interested
on bioluminescence, prompting him to
return to Stanford to work on electrical
detection. “After Xagros went belly up,
I came back to Stanford, and continued
working on that without Arjang.”
“In 2004, we realized the signal we’re
detecting is hydrogen [ions]. In the original work with Arjang, we thought it was
pyrophosphate actually… Even the sensors, they say they originally designed, we
didn’t use that in the 2006 [PNAS] paper.
We used commercially available polarized
electrodes, amplifiers and so forth. Basi-
[44 ]#*0t*5 803-%+6-:|"6(6452011
Jonathan Rothberg on the cover of Forbes
magazine.
cally, we started from fresh.”
Pourmand insists he was “not ignoring his [Hassibi’s] work, absolutely not,”
in the PNAS paper. “It’s completely different.” He adds he would “absolutely”
have cited a paper that referred to his
earlier collaboration with Hassibi if one
existed—but it didn’t.
“I understand he feels I’m saying ‘it’s
my patent,’ but I’m particularly referring
to hydrogen detection and hydrogen
release, not the electrical signals,” says
Pourmand. “In the 2006 patent and
paper, we’re clearly claiming it is hydrogen [ions]. I don’t really care which system you’re using to detect it, the release of
hydrogen is important.”
Texas Shuffle
Hassibi is now based at the University of
Texas at Austin, where his research focuses on building new semiconductor chips
for life sciences applications.
Hassibi was late learning about Ion
Torrent’s interest in his intellectual property. In September 2009, Stanford OTL
sent a “conflict of interest” memo to the
Stanford Dean’s office on the proposed
licensing deal of the two aforementioned
dockets to Ion Torrent. (Davis was already
a member of Ion Torrent’s SAB at that
time.)
According to the OTL, the earlier S00-
www.bio-itworld.com
157 technology provided “a faster and
cheaper alternative to current methods of
DNA sequencing” by detecting variations
in the charge of immobilized DNA. The
S04-291 invention more specifically focused on sequencing “by detection of electric charge perturbations of polymerasecatalyzed reaction by the electrochemical
detection sensor with immobilized DNA.”
Stanford’s OTL concluded that Ion
was in “a strong position to successfully
commercialize” both technologies. Despite widespread marketing to dozens of
companies, only the 157 docket had been
previously licensed (to Xagros). Several
companies subsequently expressed interest in both technologies, but none took
a license until the deal with Ion Torrent.
“I expected Stanford to take some
equity, but I think they were convinced by
Rothberg et al. that this is the maximum
that they can get for it,” said Hassibi.
“Stanford OTL didn’t have an obligation
to involve me in the negotiations. I was
not the Stanford PI involved in this project at the time. Who would they talk to?
It would be Ron, but he had a conflict of
interest [as a member of Ion’s SAB]. My
anger is not at OTL, although they should
have got equity.”
“I want to give a lot of credit to Rothberg,” Hassibi added. “We never perceived, including Ron or Nader, putting
things in microwells. This is very important. Everything that was done in Stanford, as far as I know, has been based on
immobilizing DNA on a gold electrode.
Ion’s embodiment in terms of sample
loading and interfacing is quite different,
because they came out of 454, which is a
perfect match for this technology.”
Indeed, Hassibi may have reason to
be especially grateful to Rothberg. Last
year, he started raising money again.
“This was music to my ears: Rothberg
on cover of Forbes magazine was the best
marketing we needed! Companies like
Life Technologies are not going to have
a division of integrated circuit designers
creating these CMOS chips. They’re going
to outsource it.”
Hassibi’s new company, Insilixa, is
a fabless semiconductor company that
might count the next-generation of sequencing manufacturers among its future
clients. x
IT / Workflow
Gordon Puts Flash into Data
Intensive Supercomputing
$BMJUEJSFDUPS-BSSZ4NBSSPGGFSTTPMVUJPOTGPSIJHIUISPVHIQVUEBUBNBOBHFNFOU
S
BY KEVIN DAVIES
9(FO$POHSFTT$BNCSJEHF)FBMUIUFDI*OTUJUVUF
4BO
%JFHP.BSDI
UCSD campus—featuring optical fiber—
provide a shining example of the university campus of the future. High-definition
video streams can be sent as live feeds
from microscopes and tiled LCD walls
(driven by PCs with NVIDIA graphics
cards), allowing microscopy collages
featuring 600 million pixels to be viewed.
Smarr says each optical fiber has
independent infra-red channels, each
providing 100-1,000 times greater data
throughput compared to the existing
Internet. And instead of 200 university
campuses going through one channel,
“I’m saying you should have one yourself.”
For example, the National Lambda Rail
(with many 10GbE paths on their fibers)
connects large data research centers in
California and around the world.
Making the Switch
With Calit2 and the SDSC on its campus,
it is no surprise that UCSD has jumped
ahead in improving the campus cyberinfrastructrure. UCSD now boasts 60
10GbE paths across campus, in parallel
with the shared Internet, eliminating data bottlenecks.
Users can choose which layer of
the Internet to send data using a
simple 3-level switch (0.5 terabits/
sec). “Think about the clusters
on campus and the space and
energy they use,” says Smarr incredulously. “They’re completely
isolated into islands, connected to
the Internet at 10 megabits/sec.
They’re 1,000:1 isolated from the
rest of the world. You’re putting all
your money into those instead of a
fairly inexpensive optical switch?
Whatever.”
UCSD has also brought optical
fiber to the NGS facilities on the
medical campus. “There’s nothing
wrong with the shared Internet for
email; it’s what it’s built for. But it’s not
useful for where we’re going.” Trey Ideker,
who heads the systems biology group, is
starting to generate more NGS data (see
“Groundbreaking Work”).
The UCSD campus has centralized
data storage at SDSC that Smarr equated
to the old library in the center of the campus. “Imagine a digital aquifer under the
grass. All researchers get to use that data
oasis. Then you plug in these 10 Gbps optical fibers.” Or imagine taking the output
from an Illumina NGS instrument and
putting it in RAM.
Smarr is no stranger to working with
genomics data, having collaborated for
years with Craig Venter on the CAMERA
project, a global microbial metagenomics
community research (see, “CAMERA Database Snaps into Action,” %LRv,7:RUOG,
Apr 2007). CAMERA’s IT infrastructure
boasts 512 processors, 5 TeraFlops and
200 terabytes storage.
“You can take your genome and BLAST
it against the entire dataset. We now have
more than 4,000 users in 90 countries, all
www.bio-itworld.com+6-:|"6(6452011
#*0t*5 803-%
[ 45 ]
CONTENTS
AN DIEGO—A new
supercomputer at the
University of California
San Diego (UCSD)’s San
Diego Supercomputer Center
(SDSC) named Gordon, featuring a quarter of a petabyte
of flash memory (hence the
name) and which has been
dubbed “the world’s largest
thumb drive,” earned raves
from Larry Smarr in a wideranging talk about the future
management of life sciences
data.
Smarr has spent ten years
building the California Institute for Telecommunications Larry Smarr
and Information Technology
(Calit2)—a joint program between the
UCSD and UC Irvine . Speaking at CHI’s
XGen Congress*, Smarr urged organizations and universities to radically retool
their approaches to computing infrastructure to facilitate collaboration, data
sharing, telecommunication, and nextgeneration sequencing (NGS) data.
Although he has long advocated the
transformative nature of optical networking, he says, “the ability to have your
own personal 10,000 megabit/second
(mbps) optical link is what we really
need to deal with NGS machines. We’re
trying to do data-intensive science on an
infrastructure—the shared Internet—that
was never meant for that.” But from the
shared Internet to dedicated high-performance optical networks, much else has to
change as well. “The last 100 feet aren’t
there,” he said.
Many of the innovations Smarr and
colleagues have deployed across the
IT / Workflow
(SPVOECSFBLJOH8PSL
CONTENTS
5IF6$4%DBNQVTIBTBWBSJFUZPG/(4JOTUSVNFOUTDPOOFDUFEUPB(JHBCJU
SFTFBSDICBDLCPOFXIFSFEBUBBSFQJQFEEJSFDUMZUPTFSWFSTBENJOJTUFSFECZ4%4$JO
conjunction with CalIT2.
"DDPSEJOHUP6$4%TZTUFNTCJPMPHJTU5SFZ*EFLFSUIFDPNNFSDJBM/(4PVUPGUIF
CPY*5TPMVUJPOiTJNQMZXBTOUBMMPXJOHBRVJDLFOPVHIUVSOBSPVOEUJNFw%FQFOEing on the width of the pipe, Ideker says it could take a day to transmit data out of
UIFDPSFGBDJMJUZi5IBUTBQSPCMFNwIFTBZTi8FEMJLFUPHFUSJEPGUIJTJOGPSNBUJPOTPZPVDBOSFCPPUUIFTFRVFODFSBOETUBSUUIFOFYUSVO5IFXIPMFHPBMJTUP
keep these machines working 24/7. To do that, you have to get the data off the temporary location in the core facility and quickly onto something else.”
The new model dispenses with a reason to have the temporary location. “Even
XJUIB(CJUDBNQVTSFTFBSDIOFUXPSLJUUBLFTMFTTUIBOBOIPVSUPHFU/(4SVO
EBUBUSBOTGFSSFE8FDBOTBWFUIBUIPVSCZXSJUJOHEBUBJOSFBMUJNFUPBSFNPUF
MPDBUJPO5IFDPODFSOTBSFUIBUJGUIFSFTBHMJUDIJOUIFOFUXPSLyZPVDPVMEMPTFB
XFFLTXPSUIPGEBUB#VUUIBUIBTOUSFBMMZCFFOBIVHFQSPCMFNw
%FTQJUFUIFVTFPGPQUJDBMGJCFSBSPVOEUIF6$4%DBNQVTiJUTPGUFOUIFMBTU
NJMFUIBUJTUIFQSPCMFNwTBZT*EFLFSi8FNBZIBWFUIJTXJEFCBOEXJEUIJOUIF
HSPVOEGPSNPTUPGUIFOFDFTTBSZEJTUBODFCVUJUTHFUUJOHJUJOUPUIFCVJMEJOHy
UIBUJTPGUFOUIFCPUUMFOFDLw*OTPNFDBTFTSFTFBSDIFSTIBWFXPSLFEXJUIDBNQVT
OFUXPSLJOHTUBGGUPFOTVSFUIBUiMBTUNJMFwPGDBCMFBOETXJUDIFTBSFJOQMBDF
*EFLFSEPFTOUDMBJNUIBU6$4%IBTGPVOEBVOJRVFTPMVUJPOUPIBOEMJOH/(4
EBUBCVUTBZTi*UIJOLXIBU-BSSZJTEPJOHXJUIHFUUJOHBMMUIJTEJHJUBMHFOPNJD
JOGPSNBUJPOQJQFEEJSFDUMZJOUPUIF4%4$NBDIJOFSPPNJTHSPVOECSFBLJOHw/P
pun intended.) K.D.
connected to Calit2’s CAMERA cluster.
If a researcher has a dedicated 10Gbps
connection to Calit2, they can use uncompressed, high-def feeds at 1,500 megabits/
second. “This avoids latency—the enemy
of real-time collaboration. This is the kind
of thing you can do once you have this
infrastructure in place.”
“The cost of electricity is becoming unbearable,” said Smarr. UCSD is already a
40MW campus and additional computers
are becoming the most important driver
of higher electricity demands. Smarr is
part of an NSF grant, the GreenLight
Project, that is adapting Sun modular
data centers to measure a series of metrics
including temperature, airflow, etc. on
various applications running on various
architectures—from multicores to GPUs,
FPGAs, routers, and storage. “At the end
of day, we have to know it costs this much
for electricity or CO2 production. You’ll
see this more and more. Universities have
got to get on top of electricity costs.”
Flash Gordon
Smarr calls the SDSC’s new 245-Tera-
[46 ]#*0t*5 803-%+6-:|"6(6452011
Flop supercomputer, Gordon, “the first
high-performance data computer in the
academic world. It has 256,000 GB [a
quarter of a petabyte] of flash memory,
that’s more flash in one place than anywhere in the world. We thank Steve Jobs
for making flash memory cheap enough!”
Smarr’s colleague Michael Norman,
SDSC director, says Gordon “will do for
scientific data analysis what Google does
for Web search.”
In a normal computer, with tens of
gigabytes of RAM, most data sits on
the disk. “But disk is 100X slower than
memory. You’re disc I/O limited, waiting
for the disk to get data to the RAM. Now
imagine you have terabytes of RAM. You
can put all your data in there at once.
Then algorithms completely change.”
Gordon has 32 nodes, each with 2
TB RAM, 8 TB Flash SSD (sold state
drive), and a 4-PB parallel disc farm
(file system). “There’s nothing like it in
the world,” says Smarr. “When I think
about next-gen sequencing, Gordon is
the machine almost built for this. De
novo assembly will benefit from large
www.bio-itworld.com
shared memory. This is not your father’s
supercomputer. It’s a high-performance
computer designed for data intensive science, just like supercomputers were optimized for solving differential equations.
Federations of databases and interaction
networks will benefit from low latency
I/O from Flash.”
The construction of Gordon, funded
by a $20-million NSF grant, has also
benefited from the plummeting price of
10GbE switches. In 2005, Smarr said
the cost of a 10GbE port was around
$80,000. In 2011, a single port is less
than $1,000. Gordon will have 128 parallel channels, each 10GbE. “We now use
10GbE paths in the back-end like they’re
popcorn! 10G is the new 1G. Apple is
shipping MacBooks with two 10-Gbps
ports! People still act like 10GbE is a
lot—I don’t get it.”
Smarr had some less positive views on
Cloud computing, however. “The Cloud is
not set up for terabyte or gigabyte files,” he
said. “You can get there, but once you’re
inside, there isn’t the SDSC 10GbE farm
to move your data around. How much
to get it back out? What do you pay for
egress and exit? There are lots of developments necessary for commercial clouds to
be useful for science.”
“You need to understand you have a
problem,” Smarr continued. “I have data!
It’s exponentially growing. It just boggles
my mind how otherwise intelligent places
aren’t dealing with it. It’s hard—you have
to bring together experts who normally
don’t talk to each other—biologists, computer scientists, engineers. You have to
bring together... the School of Medicine,
the campus, networking/storage, departments. Now it’s a collective problem. Noone has enough money themselves.”
“People don’t think about exponentials, but they make the impossible routine as we go through the threshold you
care about. It’s impossible to plan for. It
cost a couple of billion dollars for first
human genome. Now it’s $1,000?! That’s
a factor of 1 million in ten years. Over that
time, Moore’s Law is 1,000. It’s the square
of Moore’s Law.”
“The 10GbE data superhighway is
coming into being. NGS is its most important application for science, because
of the democratization of sequencing.” x
NVIDIA Unveils New
Flagship GPU Processor
/FX5FTMBJTAGBTUFTUQSPDFTTPSGPS)1$NBSLFU
BY KEVIN DAVIES
ence they’re trying to
do with GPUs,” says
Gupta. Besides HP and
Dell, NVIDIA also works
with SGI, Supermicro, IBM,
Tyan and others.
“HP is very high volume OEM.
They only build systems like this when
they believe there’s a very wide market for
them,” says Gupta. While OEMs typically
determine pricing, Gupta says it is possible to buy a GPU server with 4 GPUs for
less than $10,000. “It’s essentially in the
$5,000-$10,000 range to buy a server
fully equipped,” says Gupta.
Goes to 11
The benefits of GPUs can be found both
in enhanced performance and accessibility. Mark Berger, NVIDIA’s specialist in
life and material sciences, recently joined
the company after working in drug discovery with Cytokinetix. “I see huge momentum in GPUs, there’s a real wind in
our back with a lot of people in academia
and software development in national
labs and software companies working on
GPU versions,” says Berger.
To showcase the performance of the
M2090, Gupta cites work using the
popular AMBER 11 molecular dynamics
software. “Using 4 GPUs, you can now
simulate 69 nanoseconds [of molecular
dynamics] per day,” says Gupta. Previously, this kind of simulation would
require access to a supercomputer in a
national laboratory, such as KRAKEN,
the 192-quad-core CPU supercomputer
at the Oak Ridge National Laboratory,
which held the previous simulation record at 46 ns/day.
“This is the fastest result ever reported,” says Ross Walker, a researcher at the
San Diego Supercomputer Center who
did the AMBER benchmarking. “AMBER
users from a university department can
now accelerate their scientific work as if
NVIDIA Telsa
M2090
they had a supercomputer
in their own lab. Other life
sciences customers include Boston Scientific (magnetic resonance
imaging), Max Planck Institute (3-D
electron cryo-microscopy), Massachusetts General Hospital (imaging), and
OpenEye.
Gupta adds: “It democratizes access to
this software to every researcher around
the world. You don’t have to write a grant
proposal to get access to a supercomputer.” Similar analyses and results are
being obtained by David E. Shaw and
colleagues, but Gupta points out that
their work is performed on a custom supercomputer, Anton.
There are several bioinformatics applications already running on GPUs,
including BLAST, Hidden Markov Models, and MATLAB. “Users can get real
performance and quite easily port their
applications to the GPU,” says Gupta.
“The toughest task is that most applications are written with a sequential mind
frame—CPUs are inherently sequential.
Users have to rethink some of the applications to take advantage of the GPU
acceleration and parallel processor.”
A key question facing potential users
is, do they have to modify the entire application? “The answer is no,” says Gupta.
“When I open a photograph on my hard
disk, this is a fairly sequential task, suitable for a CPU. Once a photo is open,
you might want to do red eye reduction,
autofocus etc. Those tasks modify each
pixel mathematically. That’s extremely
amenable to GPU. That’s the only part of
Picasa you’d have to port to a GPU. Now
take sequence search software. Reading the database, opening the sequences
can continue to run on the CPU. But the
search gets accelerated by GPUs.” x
www.bio-itworld.com+6-:|"6(6452011
#*0t*5 803-%
[ 47 ]
CONTENTS
NVIDIA has released the latest version
of its flagship GPU (graphics processing
unit) processor, the Tesla M2090. Company executives claim this to be the fastest
processor for high-performance computing (HPC) in the market, accelerating applications and offering a 20-30% increase
in speed and performance compared to its
predecessor, the M2070.
According to Sumit Gupta, NVIDIA’s
Tesla product line manager, “life sciences
is our #1 vertical” in terms of widespread
adoption and the number of users. Applications range from molecular dynamics
to genome sequence analysis, with at least
one next-generation sequencing company
using GPUs in its instruments.
The Tesla M2090 GPU is equipped
with 512 CUDA parallel processing cores,
delivering 665 gigaflops of peak doubleprecision performance and providing application acceleration up to 10x compared
to a CPU alone.
At the same time, HP is announcing
the release of a new server featuring 8
NVIDIA GPUs, the HP ProLiant SL390
G7 4U server. The SL390 family is built
for hybrid computing environments that
combine GPUs and CPUs. The SL390 G7
4U server incorporates up to eight Tesla
M2090 GPUs in a 4U chassis. With a
configuration of 8 GPUs to two CPUs, this
server has the highest GPU-to-CPU ratio
currently available, says Gupta. (Just a
few years ago, no server could take even
1 GPU.) The ideal configuration would be
1 CPU core to 1 GPU. “We’re not there yet
with this server, but getting closer,” says
Gupta. (For the record, Gupta notes that
Dell has an extension box that can take up
to 16 GPUs, but this is not a single server—it has to connect to another machine.)
Gupta says most customers work with
OEMs—NVIDIA doesn’t sell direct, but
helps move applications to a GPU. “We’re
trying to learn from users about the sci-
IT / Workflow
Panasas ActiveStor Storage Goes to 11
/(4TUPSBHFQSPEVDUTFFLTUPCBMBODFQFSGPSNBODFDBQBDJUZBOEDPTU
at a more attractive cost. We’re confident
that will help us in life sciences,” says
Panasas unveiled the latest version of its
Noer. It represents “the lowest dollar/TB
ActiveStor storage product line at the Inoption for all three models.”
ternational Supercomputing Conference
Until now, Panasas products carried
in Germany in June. The ActiveStor 11
up to 40 TB/chassis. But that has now
product features 3-terabyte (TB) enterexpanded to 60 TB/chassis with the use
prise drives without a price premium. The
of 3-TB drives, which has a substantial
California company hopes it will prove
impact on scaling. “We now scale to 6
an attractive offering for life sciences
petabytes in a single file system.”
customers in general, and next-genera“Performance has been in Panasas’
tion sequencing (NGS) applications in
particular.
Panasas’ background is in
technical computing. It boasts
five years of consecutive revenue
growth (42% in 2010) as it pushes
into new markets from its traditional strengths in energy, finance/
risk analysis, universities, and
government/defense.
“Our customers tend to start by
buying 1-2 shelves of our storage,
and then become loyal customers
over time and expand that footprint. That’s one of the attributes
that comes with having such a
scalable system,” says Geoffrey
Noer, Panasas’ senior director of Panasas ActiveStor 11
product marketing.
DNA from the very beginning. That
The introduction of ActiveStor 11,
hasn’t been as much of a core need for a
which nestles between the top-of-the-line
competitor like Isilon in their prior mar12 (launched last year) and the more afkets,” says Noer. “Institutions are deployfordable 8 products, should appeal to life
ing more and more NGS machines, with
sciences organizations. Existing clients
faster run times. Workloads are becominclude NIH, Yale University, BGI in
ing about large-file throughput rather
China, and Uppsala University in Swethan millions of small files. So NGS is a
den. “We left a gap so we could introduce
perfect application for ActiveStor stor11 after 12,” Noer explains, adding that
age. You have a blade design that allows
he expects ActiveStor 11 to represent the
you to grow as needed—a single shelf can
bulk of sales going forward.
stand on its own, but it takes less than ten
minutes to grow capacity as you need it
Performance Issues
to. And you can maintain a single global
The ActiveStor 12 delivers 80 megabytes/
namespace.”
sec per SATA drive. “Where performance
Noer says several genome institutes
is the top factor, this is the solution,” says
are using Panasas for high-performance
Noer. The ActiveStor 11 is some 20% less
needs, but also using storage from Isilon
expensive. “Some markets need more of a
(which he admits offers a much lower dolbalance between performance and capaclar/TB footprint) for their bulk capacity
ity, that’s where ActiveStor 11 is available
BY KEVIN DAVIES
CONTENTS
[48 ]#*0t*5 803-%+6-:|"6(6452011
www.bio-itworld.com
requirement. “It can make sense to have
both installed,” he says.
Private Clouds
Panasas is actively looking at the private
cloud to further its momentum, a move
applauded by IDC analyst Earl Joseph,
who says that the “ActiveStor 11 appliance
is well positioned to capitalize on this
important [HPC] trend.”
But at least one top bio-IT industry
consultant, BioTeam’s Chris Dagdigian, blasted private clouds as
“empty hype” at Bio-IT World
Conference and Expo a couple of
months ago.
“With all due respect to [Dagdigian], I see the public cloud
as being more of an overhyped
approach than private clouds,”
responds Noer. “Our products
are the best suited for big data
workloads, typically hundreds of
terabytes or petabytes of storage.
That data is valuable and highly
proprietary. Trying to leverage the
public cloud fails on several reasons—the cost of the bandwidth
is out of sight, and you have a lot
of security and performance concerns
having the data remote... We don’t see a
lot of traction for big data workloads in
public clouds.”
Noer acknowledges that “private
clouds” is a new marketing name for what
was previously labeled utility computing
or grid computing. “But the trend has
been taking place for many years, before
the term “private cloud” was invented.
The desire to centralize is a very real direction and gaining momentum,” he says.
He also admits that Panasas has been
criticized in the past that its price/TB was
unapproachable. “Now, with the 50%
bump in capacity with the 3-TB drives
in addition to cost reductions we’re announcing and ActiveStor 11, all those
things make Panasas a very attractive
option.” x
The Russell Transcript
DREAM6
Breaks New
Ground
JOHN RUSSELL
oughly five years ago the organizers of DREAM—
Dialogue for Reverse Engineering Assessment and
Methods—set out to find the best algorithms for
Advanced Aggregate
inferring biological networks from blinded data sets.
One interesting aspect of this aggregation approach is that even
Emulating the CASP* program, they created an analgorithms that perform poorly overall may get a particular
nual competition in which researchers downloaded
interaction right and be captured in the aggregate prediction.
data for a set of challenges and used their favorite algorithms
In one DREAM4 challenge, 11 of 12 groups identified a new
to solve the problem. Winners were announced at the annual
interaction inferred from the data that was included in the agDREAM conference, and the results published in an effort to
gregate prediction although many of the teams’
create a valuable resource.
overall predictions were poor. “Even suboptiA funny thing happened along the way.
mal algorithms have a place in the zoo of algoActually two important things. First, it turns
The aggregate of
rithm,” he says.
out there is no such critter as the perfect algoIt’s collaboration by competition, says Storithm. “Data itself is so high dimensional, the
the predictions is
lovitzky. “So people try to do something against
biology itself is so complex, that probably the
really robust with each other but unbeknownst to them, when you
notion of finding the best algorithm to analyze
aggregate their predictions on exactly the same
a data set was a little too simplistic,” says Gusrespect to any of
data you are really making them collaborate. In
tavo Stolovitzky, DREAM chair and manager
that aggregate prediction is where the wisdom
of functional genomics and systems biology, at
the individual
of the crowd can emerge.” All of a sudden, “It is
IBM’s Computational Biology Center.
predictions; very
not worth it to try to develop the best algorithm
Potentially much more important, it also
if the aggregate is always the best.”
turned out the aggregate prediction of comoften it is better
Instead, perhaps, problem selection bepeting groups was nearly always the best precomes
more important. This doesn’t mean
diction or in the top three. This unexpected
than the best... or
work on developing great algorithms is worthresult highlighting the wisdom of the crowd is
among the best.
less; it’s not, emphasizes Stolovitzky, but it beprompting DREAM to rethink its mission and
comes secondary. Tackling real world problems
begin seeking to put “collaboration by competibecomes more enticing and impactful. One issue is incentivtion” to work solving real-world problems rather than chase
izing the activity. “People like to be the best at something,” notes
down better algorithms.
Stolovitzky.
It’s a fascinating finding, which might have great utility in
Recently, the challenges for DREAM6 were posted (http://
unraveling thorny basic biology questions as well as use in early
the-dream-project.org; submission deadline is August 22,
drug discovery.
winners will be announced October 14). This year, Stolovitzky
“The big encompassing lesson is that without a doubt there
says there is no specific network inference challenge while the
is wisdom of crowds. Consistently the aggregate of the predicDREAM team mulls over its future direction. However, there
tions is really robust with respect to any of the other individual
is one on diagnosing Acute Myeloid Leukemia from patient
predictions; very often it is better than the best, and when it’s
samples using flow cytometry data.
$SJUJDBM"TTFTTNFOUPG5FDIOJRVFTGPS1SPUFJO4USVDUVSF1SFEJDUJPO
Change is in the wind for DREAM.
www.bio-itworld.com+6-:|"6(6452011
#*0t*5 803-%
[ 49 ]
CONTENTS
R
not, it is among the best,” explains Stolovitzky.
Stolovitzky says, “There are many ways of aggregating predictions. The one we have been using is very robust.” It’s a little
complicated, so with apologies to Gustavo, here’s an attempt
to simplify and summarize it: Each competing team produces
an overall solution to a challenge (e.g. fairly granular description of a particular signaling pathway or gene regulatory network). They rank each of their solution’s components (e.g. an
interaction between two genes) in terms of their confidence it
is correct. One interaction may be ranked high while another
may be ranked low. A team’s overall solution may or may not
perform well.
Stolovitzky averages the confidence rankings for each interaction from all the teams and produces a new aggregate solution to the challenge by re-ranking the interactions according
to their average rank. Obviously there’s a bit more to it, but you
get the broad picture.
Educational Opportunities
,FFQBCSFBTUPGUIFWBSJFUZPGFEVDBUJPOBMFWFOUTJOUIFMJGFTDJFODFJOEVTUSZ
UIBUXJMMIFMQZPVXJUIZPVSCVTJOFTTBOEQSPGFTTJPOBMOFFET5PQSFWJFX
BNPSFJOEFQUIMJTUJOHPGFEVDBUJPOBMPGGFSJOHTWJTJUUIFi&WFOUTwTFDUJPO
of bio-itworld.com.
5PMJTUBOFEVDBUJPOBMFWFOUFNBJMmarketing_chmg@chimediagroup.com.
Bio-IT World Conference & Expo Europe
0DUPCFS])BOOPWFS(FSNBOZ
NGS Data Management
0DUPCFS])BOOPWFS(FSNBOZ
Next-Generation Sequencing Data
Management
4FQUFNCFS]1SPWJEFODF3*
Featured Events
CHI Events
'PSNPSFJOGPSNBUJPOPOUIFTFDPOGFSFODFT
BOEPUIFS$)*FWFOUTWJTJUhealthtech.com.
MOLECULAR DIAGNOSTICS
SUMMIT EUROPE
IT Infrastructure and the Cloud
0DUPCFS])BOOPWFS(FSNBOZ
Drug Discovery Informatics
0DUPCFS])BOOPWFS(FSNBOZ
Bioinformatics
0DUPCFS])BOOPWFS(FSNBOZ
Molecular Diagnostics Summit Europe
0DUPCFS])BOOPWFS(FSNBOZ
Emerging Molecular Diagnostics
Partnering Forum
"VHVTU]8BTIJOHUPO%$
Molecular Diagnostics for Cancer
0DUPCFS])BOOPWFS(FSNBOZ
Next Generation Diagnostics Summit
"VHVTU]8BTIJOHUPO%$
Convergence of Technologies for
Point-of-Care Diagnostics
0DUPCFS])BOOPWFS(FSNBOZ
ADAPT 2011
4FQUFNCFS]1IJMBEFMQIJB1"
Molecular Diagnostics for Infectious
Disease
0DUPCFS])BOOPWFS(FSNBOZ
CONTENTS
Cloud Computing: Looking Beyond
the Cloud
4FQUFNCFS]-B+PMMB$"
NGS: Molecular Diagnostics
Magnified
0DUPCFS])BOOPWFS(FSNBOZ
Barnett Educational Services
7JTJUBarnettInternational.com for detailed information on Barnett’s live
TFNJOBSTJOUFSBDUJWFXFCTFNJOBSTPOTJUFUSBJOJOHQSPHSBNTDVTUPNJ[FEF-FBSOJOH
EFWFMPQNFOUTFSWJDFTBOEQVCMJDBUJPOT
Web Seminars
Regulatory Intelligence
"VHVTU
Source Documentation: What is Adequate
& Accurate?
"VHVTU
How to Prepare and Submit a Bullet Proof
510(k)
"VHVTU]4BO'SBODJTDP$"
Writing and Maintaining the Canadian
CTA (Clinical Trial Application)
"VHVTU
Monitoring Clinical Drug Studies:
Intermediate
"VHVTU]4BO'SBODJTDP$"
Sponsor Management of Investigator
Non-Compliance
"VHVTU
Adverse Events: Managing and Reporting
for Pharmaceuticals
4FQUFNCFS]$IJDBHP*-
Introduction to Data Management
"VHVTU
t$POUFOUUBJMPSFEUPZPVS
VOJRVFOFFET
Introduction to Signal Detection and Data
Mining
"VHVTU
t$PSFDPNQFUFODZBTTFTTNFOUT
BOEFYBNT
Comparing FDA and Health Canada
Regulations: Using an ICH GCP
Framework
"VHVTU
Gap Analysis: How to Bridge the NonApprovable to the Approved Marketing
Application
"VHVTU
10 Week CRA & CRC: Beginner Program
4FQUFNCFS
[50 ]#*0t*5 803-%+6-:|"6(6452011
Conducting Clinical Trials in ResourceLimited Settings
"VHVTU]4BO'SBODJTDP$"
Monitoring Phase I Clinical Trials
"VHVTU
eLearning Solutions
t)JHIMZJOUFSBDUJWFGFBUVSFTXJUI
BEVMUMFBSOJOHJONJOE
Live Seminars
www.bio-itworld.com
Clinical Drug Development
4FQUFNCFS]$IJDBHP*-
Introduction to Clinical Data Management
4FQUFNCFS]#PTUPO."
Introduction to Clinical Project
Management
4FQUFNCFS]#PTUPO."
Negotiation Skills for Clinical Research
Professionals
4FQUFNCFS]1IJMBEFMQIJB1"
Patient Recruitment and Retention
4FQUFNCFS]1IJMBEFMQIJB1"
Pharmacovigilance Audit
4FQUFNCFS]1IJMBEFMQIJB1"
Webcasts, White Papers, and Podcasts
JODSFBTFJOQSPEVDUJWJUZXJUIOFBS
DPTUSFEVDUJPO
Visit: www.bio-itworld.com to download
7JTJUbio-itworld.comUPCSPXTFPVS
FYUFOTJWFMJTUPGDPNQMJNFOUBSZ-JGF
4DJFODFXIJUFQBQFSTQPEDBTUTBOE
webcasts.
The Power of HP Converged Infrastructure
for Genomic Research
Sponsored by HP
5PMFBSONPSFBCPVUEFWFMPQJOH
BNVMUJNFEJBMFBEHFOFSBUJOH
TPMVUJPODPOUBDUmarketing_chmg@
chimediagroup.com.
Webcast
Enabling Better Data Relationships
Utilizing Oracle 11g New Oracle Data Miner
GUI, and Applications “Powered by ODM”
Sponsored by: Oracle
Whitepapers
A BPM-Approach to Adverse Event
Management
Sponsored by Pegasystems
5PEBZTCJPQIBSNB
Surfing the Rich Data Deluge
companies are drownJOHVOEFSBEFMVHFPG
EJHJUBMJNBHFTBOEPUIFS
SJDIEBUB'SPN/FYU(FO
%/"TFRVFODJOHUPIJHI
DPOUFOUTDSFFOJOH)$4
UIFTFSBQJEMZFWPMWJOH
UFDIOPMPHJFTBSFPQFOJOHJNQPSUBOUBWFOVFTPG
TDJFOUJGJDFYQMPSBUJPOBDSPTTESVHEJTDPWFSZBOE
EFWFMPQNFOU)PXFWFSUIFZBMTPDPOGSPOU*5
PSHBOJ[BUJPOTXJUIUIFDIBMMFOHFPGNBOBHJOH
MBSHFEJWFSTFEBUBTFUT
-FBSOUIFDSJUJDBMJTTVFTBOETUFQTSFRVJSFEUP
EFWFMPQFGGFDUJWF*5TUSBUFHJFTGPSNBOBHJOH
BOENBYJNJ[JOHUIFWBMVFPGJNBHFBOESJDI
EBUBJOUIJTXIJUFQBQFS
Visit: www.bio-itworld.com to download
STEPS TOWARD DEVELOPING AN EFFECTIVE IT STRATEGY
By John Russell, Contributing Editor, #JPr*58PSME
Produced by Cambridge Healthtech Media
Group Custom Publishing
www.tessella.com
8FC4ZNQPTJB4FSJFTDPWFSTBCSPBEBSSBZ
PGUPQJDTXJUIJOUIFMJGFTDJFODFTBOEESVH
development enterprise.
t 3FHJTUFSGPSVQDPNJOHXFCTZNQPTJB
t -JTUFOUPSFDPSEFEXFCFWFOUT
t 1VSDIBTFB%7%PS&MFDUSPOJD7FSTJPO
Safety management is
POFPGUIFNPTUEJGGJDVMU
SFRVJSFNFOUTJNQPTFE
POUIFMJGFTDJFODFT
JOEVTUSZ$PNQBOJFT
A BPM-Approach to
Safety Management
confront a tangle of
safety monitoring
SFRVJSFNFOUTUIBUTQBO
CPUIQSFBOEQPTU
market approval activiUJFTBOEWBSZCZ*3#*&$HPWFSOBODFQPMJDJFT
QSPEVDUUZQFBOEEJGGFSFOUHMPCBMSFHVMBUPSZ
BHFODJFT-FBSOIPX1FHBTZTUFNT#1.BOEJUT
"EWFSTF&WFOU$BTF1SPDFTTJOH4PMVUJPO"&$1
DBOIFMQDPNQBOJFT
t5SBOTGPSNBEWFSTFFWFOUNBOBHFNFOUTZTtems
t-PXFSDPTUXJUIJODSFBTFEQSPEVDUJWJUZ
0OFDMJFOUTVDDFTTGVMMZVUJMJ[FE1FHB#1.UP
FTUBCMJTIQBQFSMFTTBEWFSTFFWFOUSFQPSUJOH
BDSPTTDPVOUSJFTSFTVMUJOHJOVQXBSETPGB
BY JOHN RUSSELL
Produced by Cambridge Healthtech
Media Group Custom Publishing
Surfing the Rich Data Deluge —
Developing an IT Strategy
Sponsored by Tessella
CONTENTS
*5EFQBSUNFOUTPGBMMTJ[FTJODSFBTJOHMZSFMZ
POTFMGNBOBHJOHTZTUFNTUPIFMQPWFSDPNF
DIBMMFOHFTXJUIMPXFSDPTUBOESJTL#ZVTJOH
FNCFEEFE0SBDMFUFDIOPMPHZBTUSBOTQBSFOU
CVJMEJOHCMPDLTJOBQQMJDBUJPOTPSEFWJDFT*47
BOE0&.TPMVUJPOEFWFMPQFSTDBOPGGFSMJGFTDJFODFTSPCVTUEBUBNBOBHFNFOUDBQBCJMJUJFT
4PGUXBSFEFWFMPQFSTWJFXJOHUIJTMJWFXFCJOBS
DBOMFBSOIPX0SBDMFFNCFEEBCMFQSPEVDUT
NBLFJUFBTJFSUPEFWFMPQNBOBHFBOEEFQMPZ
TFDVSFSFMJBCMFBOETDBMBCMFDVTUPNFSTPMVUJPOT0OUIF&OE6TFSTJEF0SBDMF%BUBCBTF
UFDIOPMPHJFTBOEBQQMJDBUJPOTBSFSVOOJOHJO
BMMUIFUPQMJGFTDJFODFTDPNQBOJFTBOEUPQ
NFEJDBMEFWJDFDPNQBOJFT6TFSTWJFXJOH
UIJTMJWFXFCJOBSDBOMFBSOIPX0%.BVUPNBUJDBMMZEJTDPWFSTSFMBUJPOTIJQTIJEEFOJO
EBUBBOEIPXQSFEJDUJWFNPEFMTBOEJOTJHIUT
EJTDPWFSFEXJUI0SBDMF%BUB.JOJOHBEESFTTMJGF
TDJFODFTIFBMUIDBSFBOECVTJOFTTQSPCMFNT
Visit: www.bio-itworld.com to download
)1BJETUIFHFOPNJDTSFTFBSDIQSPDFTT
by providing scalable
TUPSBHFTPMVUJPOTUIBU
simplify data analysis
BOERVJDLBDDFTTUPUIBU
EBUBTPUIBUUIFHPBMPG
QFSTPOBMJ[FENFEJDJOF
DBOCFSFBMJ[FE-FBSO
IPX)1IBTXPSLFEXJUI
MJGFTDJFODFTQSPGFTTJPOBMTUPNFFUUIFNPTU
EFNBOEJOHDPNQVUBUJPOBMBOETUPSBHFOFFET
Visit: www.bio-itworld.com to download
w w w. p e g a . c o m
Podcast
Metrics that Matter: How Actionable Data
Can Drive Better Decisions
8JUIUIFXJEFTQSFBE
adoption of eClinical
UFDIOPMPHZDMJOJDBMPQFSBUJPOTEFQBSUNFOUTIBWF
BDDFTTUPVOQSFDFEFOUFE
BNPVOUTPGEBUB5IJT
QPEDBTUXJMMEJTDVTTIPX
DMJOJDBMCVTJOFTTBOBMZUJDT
DBOIFMQTQPOTPSTFGGFDUJWFMZNJOFUIBUEBUBUP
make more informed decisions.
*OEVTUSZFYFDVUJWFTXJMMBEESFTTUIFGPMMPXJOH
RVFTUJPOT
t8IBUBSFUIFLFZUFDIOJDBMBOEPSHBOJ[BUJPOBMDIBMMFOHFTUPFGGJDJFODZJODMJOJDBM
operations?
t)PXJTUIFEFGJOJUJPOPGBDUJPOBCMFEBUB
evolving?
t 4QPOTPSBTZNQPTJVNPOBUPQJDPG
your choice
For details onUIF8FC4ZNQPTJB
4FSJFTWJTJU www.bio-itworld
symposia.com or email marketing
_CHMG@chimediagroup.com
t)PXDBODMJOJDBMCVTJOFTTBOBMZUJDTCFBO
BHFOUPGDIBOHFGPSBOPSHBOJ[BUJPO t8IBUUFDIOJDBMBOEPSHBOJ[BUJPOBMJTTVFT
TIPVMETQPOTPSTDPOTJEFSXIFOTFFLJOHB
CVTJOFTTBOBMZUJDTTPMVUJPO 4QFBLFST4UFQIFO:PVOH4FOJPS1SPEVDU
%JSFDUPS.FEJEBUB4PMVUJPOTBOE-BVSJF
)BMMPSBO$&01SFTJEFOU)BMMPSBO$POTVMUJOH
(SPVQ
Listen Now — Visit: www.bio-itworld.com to
download
www.bio-itworld.com+6-:|"6(6452011
#*0t*5 803-%
[51 ]
Register by August 12 & Save up to $200!
CAMBRIDGE HEALTHTECH
INSTITUTE’S INAUGURAL
September 19-20, 2011
LOOKING BEYOND THE CLOUD
Boosting Life Science Researches and Drug Discovery
with Ubiquitous High Performance Computing
Focused Sessions on:
Keynote Presentation:
t High Performance Computing
in the Cloud
How We Got Here, Where We Are,
and Where We Are Heading
Jeff Barr, Web Services
Evangelist, Amazon.com
t Science-as-a-Service
t Genomics in the Cloud
The Hilton
La Jolla Torrey Pines
La Jolla, CA
t Pharma Adopting the Cloud
t Ubiquitous Personal Health Service
Don’t Miss - Pre-Conference Events:
t Cloud Computing Training: Amazon Web Services
t Cloud Computing and Genome Content
Management Driving Translational Bioinformatics
through the Next Decade
t Orchestrating Cloud Systems and Workflows with
Opscode Chef
t Ensuring Information Security and Compliance
When Moving into the Cloud
Premier
Sponsor
Corporate
Sponsors
Corporate
Support
Sponsor
Official
Publication
Organized by:
Cambridge Healthtech Institute
250 First Avenue, Suite 300, Needham, MA 02494
T: 781-972-5400 or toll-free in the U.S. 888-999-6288
'tXXXIFBMUIUFDIDPN
Bio-ITCloudSummit.com