proposal riset
Transcription
proposal riset
HIGH PERFORMANCE COMPUTING FOR MOLECULAR DYNAMIC ANALYSIS : PART OF INDONESIAN HERBAL PHARMACOLOGICAL SCREENING ACTIVITIES IN SILICO STUDY Heru Suhartanto1, Arry Yanuar2 Ari Wibisono1, Muhammad Hilman1, Surya Darma3 1: faculty of computer Science, univerity of Indonesia (UI) 2: dept of pharmacy, UI 3: dept of physic, UI Presented at SEAIP 2010, It will be available at http://hsuhartanto.wordpress.com OUTLINES • Introduction • Molecular docking and dynamics • Indonesian Higher Education Networks • InGrid (Indonesian/Inherent Grid) • Tentative HPC performance in Molecular Dynamics • Challenges and prospects MOLECULAR DYNAMICS Molekul Source :4) Atom Source :6) Protein Source :5) MOLECULER DYNAMICS Moleculer Dynamic Simulation Drug Discovery Understanding Molecule Structure Trajectory Position Memontem Time In Vitro Source :10) PROTEIN SIMULATION Curcumin Curcuma Longa Inhibitor Anti-Cancer Compound Inflamation PROTEIN SIMULATION MOLECULAR DOCKING AND VIRTUAL SCREENING • Molecular docking is a computational procedure that attempts to predict non covalent binding of macromolecules. • The goal is to predict how small molecules, such as substrates or drug candidate, to bind to a receptor of known 3D structure. • The prediction process is based on information that embedded inside the chemical bond of substance. • Autodock Vina is used in the simulation. 7 THEORITICAL ASPECT IN DOCKING CONFORMATION SEARCH & SCORING FUNCTION Conformation search between ligand and receptor using spesific algorithm that being calculated by certain scoring function MOLECULAR DYNAMIC SIMULATION • used to study the solvation of proteins, the interaction of DNA-protein complexes and lipid systems, and study the ligand binding and folding of proteins. • to produce a trajectory of molecules in a finite time period, where each the molecules in these simulations have positional parameters and momentum. • be used to assist drug discovery. The usage of computers offer a method of insilico as a complement to the method in-vitro and in-vivo that are commonly used in the process of drug discovery. Terminology in-silico, analog with in-vitro and invivo, refers to the use of computer in drug discovery studies • GROMACS is used in the simulation. 10 MOLECULER DYNAMICS Drug Discovery In Vitro Moleculer Dynamics • Protein Information • Conformations • Enzyme Activity GROMACS Gromacs Gromacs (Groningen Machine for Chemical Simulation) Universitas Groningen Belanda Moleculer Dynamics One way to assess the movement of a molecular system according to the laws of physics STAGES IN MOLECULAR DYNAMICS SIMULATION WITH GROMACS Newton Molecules Movement in Newton equation Atom (i=1,2,...,N) From molecules system. atom coordinate (r), speed (v), atom (i), mass (mj) PRELIMINARY TEST • GROMACS/inGrid Rad Gem 10 Ns eksperiment/simulation needed 14 days of 5 processors power. AN ILUSTRATION Molecular dynamics (10 ps) Molecular docking (Binding constant = Aktifitas farmakologis) Arry Yanuar LENGTHER SIMULATION TIME NEEDS MORE COMPUTATION RESOURCES CPU days 1 0 41,7 206,3 1250 3750 4 0 10,4 52,1 512,5 937,5 8 0 5,2 26,0 156,3 468,8 16 0 2,6 13,0 78,1 234,4 32 0 1,3 6,5 39,1 117,2 The visualitation of Gromacs results in 90 ns which shows the development of veskel DPPC (dipalmitoylphosphatidylcholine) [De Vries 2008] INHERENT: INDONESIA HIGHER EDUCATION NETWORK 18 INGRID: INHERENT/INDONESIA GRID • Idea – RI-GRID: National Grid Computing infrastructure development proposal, Mei 2006, by FAculty of Computer Science, UI • Part of UI competitive grants (PHK INHERENT K1 UI) ”Menuju Kampus Dijital: Implementasi Virtual Library, Grid Computing, Remote-Laboratory, Computer Mediated Learning, dan Sistem Manajemen Akademik dalam INHERENT,” Sep ‟06 – Mei „07 • Objective: – Developing Grid Computing Infrastructure with computation capacity intially 32 processors (~intel pentium IV) and 1 TB storage. – Hopes: the capacity will improve as some other organization will joint the InGRid. – Developing e-Science community in Indonesia 19 INGRID: PORTAL HTTP://GRID.UI.AC.ID/PORTAL 20 THE INGRID ARCHITECTURE inGRID PORTAL User U* Globus Head Node User Windows/x86 Cluster INHERENT Linux/x86 Cluster Solaris/x86 Cluster UI I* Globus Head Node Globus Head Node Linux/Sparc Cluster Custom PORTAL 21 H/W SPECS • inGRID Portal – SUN Fire X2100, AMD Opteron Processor (2.4 GHz, dual core), 2 GB Memory, 80 GB Disk, 2 10/100/1000 Mbps NICs, DVD-ROM Drive • Globus Head Node – SUN Fire X2100, AMD Opteron Processor (2.2 GHz, dual core), 1 GB Memory, 80 GB Disk, 2 10/100/1000 Mbps NICs, DVD-ROM Drive • Linux Cluster (16 nodes) – SUN Fire X2100, AMD Opteron Processor (2.2 GHz, dual core), 1 GB Memory, 80 GB Disk, 2 10/100/1000 Mbps NICs • Storage Server – Dual Xeon Processor (3.0GHz), 2 GB Memory, 1 TB Disk 22 HW/ SW SPECIFICATION (CLUSTER HASTINAPURA) Source :13) Head node (1) • Sun Fire X2100 • AMD Opteron 2.2GHz (Dual Core) • 2 GB RAM • Debian GNU/Linux 3.1 “Sarge” Worker nodes (16 ) • Sun Fire X2100 • AMD Opteron 2.2GHz (Dual Core) • 1 GB RAM • Debian GNU/Linux 3.1 “Sarge” Storage node (1) • Dual Intel Xeon 2.8GHz (HT) • 2 GB RAM • Debian GNU/Linux 4.0-testing “Etch” • Harddisk 3x320 GB HW/SW SPECIFICATION (CLUSTER FARMASI) Worker Node HW (6 Unit/24 Logical Prosessor) • • • • • Prosessor Intel QuadQore (2.66 GHz) RAM : 4 GB Hard Disk Drive : Western Digital 320 GB Graphic Card : NVIDIA GeForce 8800 Ethernet Speed: 1Gb /s Worker Node SW (6 Unit/24 Logical Prosessor) • NFS (Network File System) • MPI (Message Passing Interface) MPICH2 • Gromacs 4.0.5 HW/SW SPECIFICATION (CLUSTER FARMASI) Database Server grid01 grid04 Web Server Router Farmasi grid01 Gigabit Ethernet Switch JUITA (Jaringan Universitas Indonesia Terpadu) grid03 grid05 grid06 INGRID S/W SPECS • User Interface: – UCLA Grid Portal • Middleware – Globus Toolkit • Job Scheduler: – Sun Grid Engine (SGE) • Programming: – C, Java – Paralel: MPICH • Applications: – Chemistry: • Gromacs – Biology: • Blast – Computer Graphic: • Povray – Utilities: • Matrics multiplication, Sort, Octave (Matlab-like) 26 AUTODOCK VINA 1.1 developed by The Scripps Research Institute, nonprofit biomedical research from San Diego, California, USA Autodock Vina is the next generation of molecular docking engine after The Scripps Research Institute released Autodock in the first place Boost C++ libraries for multithreading Modified parallel Monte Carlo method BFGS, an efficient quasi newton was used Autodock 4.2 dan Autodock Vina 1.1 can take advantage of cluster technology as embarassingly parallel application MESSAGE PASSING • Embarrassingly Parallel (EP) Paradigm • No communication required • Easily load balances • Perfect speed up • Regular and synchronous • Easily (static) load balances • Expect good speed up for local communication • Expect reasonable for non-local communication • Irregular and/or asynchronous • Difficult to load balances • Communication overhead usually high • Usually can’t be done efficiently using data parallel programming Parallel Paradigm EP PROBLEM • Each element of an array (sub problems) can be processed independently of the others. • No communication required, except to combine the final result. • Static load balancing is usually trivial – can use any kind of distribution since communication is not a factor. • Dynamic load balancing can be done using a task form approach. • Expect perfect speedup. Parallel Paradigm EP PROBLEM Disconnected computational graph Parallel Paradigm EP PROBLEM Dynamic master slave approach Parallel Paradigm EXPERIMENT RESULT Autodock Vina 1.1, Speed up for cluster [22 cpu] in Autodock vina 1.1 is 29.16 with efficiency 1.325 Autodock Vina 1.1 Serial Paralel 15294.2 12370.5 8117.6 4629.72 2277.42 77.43 1000 292.27 159.5 2000 3000 Bioinformatic Case 406.8 4000 509.8 5000 ANOTHER RESULT Autodock Vina 1.1 Speed up for 8 cpu is 7.25 The Scripps Research Institute running time (menit) 521.85 Autodock 4.2 8.41 1.16 Vina 1.1 (1 cpu) Vina 1.1 (8 cpu) HASTINAPURA CLUSTER PERFORMANCE ANALYSIS USING GROMACS Execution Time Based on Processor No Time Step 1 1 2 3 200ps 1d:00h:28m:16s 12h:29m:01s 9h:37m:00s 5h:33m:27s 400ps 2d:02h:15m:59s 1d:00h:35m:07s 19h:12m:38s 12h:00m:06s 3 600ps 3d:05h:36m:52s 1d:11h:52m:40s 1d:05h:24m:26s 19h:59m:36s 4 800ps 4d:10h:05m:20s 2d:01h:39m:51s 1d13h04:45s 1d:01h:01m:45s 1000ps 5d:13h:37m:29s 1d19h39:35s 1d:05h:28m:02s 2 5 2d12h:04m:00s 4 5 PHARMACY CLUSTER PERFORMANCE ANALYSIS USING GROMACS Time Based on Prosessor No Time 1 2 3 4 5 1 200ps 13h:37m:38s 7h:23m:47s 5h:32m:34s 4h:26m:20s 3h:38m:48s 2 400ps 1d:03h10m:06s 14h:44m:02s 11h:01m:38s 8h:41m:15s 7h:16m:42s 3 600ps 1d:16h:22m:34s 22h:04m:25s 16h:40m:14s 13h:17m:38s 10h:55m:54s 4 800ps 2d:06h:52m:48s 1d:03h:02:m46s 22h:11m:54s 17h:46m:35s 14h:35m:29s 5 1000ps 2d:21h:22m:57 1d:13h:00m:25s 1d:03h:41m:49s 22h:06m:03s 18h:09m:47s CLUSTER HASTINAPURA PERFORMANCE ANALYSIS Time(S) Hastinapura Cluster performance 600000 400000 200000 0 200ps 400ps 1 Processor 2 Processor 600ps 3 Processor 800ps 4 Processor 1000ps 5 Processor Pharmacy cluster performance Time(S) 300000 200000 100000 0 200ps 400ps 1 Processor 2 Processor 600ps 3 Processor 800ps 4 Processor 5 Processor 1000ps CLUSTER HASTINAPURA SPEED UP Speed Up Cluster Hastinapura Speed-Up (x) 6 4 2 0 200ps 400ps 1 Processor 600ps 2 Processor 3 Processor 800ps 4 Processor 1000ps 5 Processor Speed-Up Cluster Farmasi 6 Speed-Up (x) 4 2 0 200ps 400ps 1 Processor 2 Processor 600ps 3 Processor 800ps 4 Processor 1000ps 5 Processor CHALLENGES • Unreliable electricity supplies • Relies on grant fund which leads to other negatives effects such as, – Most Indonesian funding resources do not allow hardware (computers) investment (only spare parts are allowed ) – Permanent human resources that manage the Grid, – Maintenance of the grid to adapt with current technology development. • Many organization are “very protective” to their computing resources, only a few are willing to share them. • Only few (may one or two) faculties teach cluster, cloud and grid Computing. So only few master and understand them. • A limited cluster computing nodes/workers (maximum used 22 were available), in order to have a reliable results more than 100 nodes are needed. 38 PROSPECTS • More people are becoming interested in shared computing facilities, • Many free of charge grid development tools are available, • Considering GP GPU for the next computing environment, • Develop a strong unit that capable building the Grid infrastructure, but it needs commitment and dedication from at least university level and government, or • Perhaps Cloud computing is the alternative solution in one way, however ………. • The internet connection is still not reliable and the cloud itself has some challenges 39 CLOUD COMPUTING CHALLENGES: DEALING WITH TOO MANY ISSUES [REF BUYYA] Scalability Reliability Billing Utility & Risk Management Programming Env. & Application Dev. Uhm, I am not quite clear…Yet another complex IT paradigm? Software Eng. Complexity 40 REFERENCES 1. Luebke, David, The Democratization of Parallel computing: High Performance Computing with CUDA, the International Conference for High Performance Computing, Networking, Storage and Analysis, 2007, http://sc07.supercomputing.org/ 2. de Vries, A.H., A. E. Mark, and S. J. Marrink Molecular Dynamics Simulation of the Spontaneous Formation of a Small DPPC Vesicle in Water in Atomistic Detail, J. Am. Chem. Soc. 2004, 126, 4488-448 3. Buck, Ian, Cuda Programming, the International Conference for High Performance Computing, Networking, Storage and Analysis, 2007, http://sc07.supercomputing.org/ 4. Fatica, Massimiliano, CUDA Libraries, the International Conference for High Performance Computing, Networking, Storage and Analysis, 2007, http://sc07.supercomputing.org/ 5. Cuda Medicine, Aplikasi Medicine, http://www.nvidia.co.uk/object/cuda_medical_uk.html [akses 13 Feb 2010] 6. de Vries, A.H., A. E. Mark, and S. J. Marrink Molecular Dynamics Simulation of the Spontaneous Formation of a Small DPPC Vesicle in Water in Atomistic Detail, J. Am. Chem. Soc. 2004, 126, 4488-448 7. Karplus, M. & J. Kuriyan. Molecular Dynamics and Protein Function. PNAS, 2005. 102 (19): 6679-6685 8. Spoel DVD, Erick L, Berk H. Gerit G, Alan EmM & Herman JCB., Gomacs: Fast, Flexible and Free., J. Comput Chem, 2005, 26(16): 1701-1707 9. Adcock SA dan JA McCammon. Molecular Dynamics: Survey Methods for Simulating tha Activity of Protein. Chem Rev 2006. 105(5):1589-1615 10. Correll,RN., Pang C, Niedowicz, DM, Finlin, BS and. Andres, DA., The RGK family of GTP-binding Proteins: Regulators of Voltage-dependent Calcium Channels and Cytoskeleton Remodeling 11. Kutzner, C, D. Van Der Spoel, M Fechner, E Lindahl, U W. Schmitt, B L. De Groot, H Grubmüller, Speeding up parallel GROMACS on high-latency networks J. Comp. Chem. 2007. 28(12): 2075-2084 CLOSING STATEMENTS • Thanks for inviting us to this meeting, • Thanks to Indonesian Ministry of Research and Technology for2009 – 2010 the research grant, • Thank you for listening to our talk and providing your suggestions