THÀNH TÍCH NGHIÊN CỨU - Quỹ Học Bổng RVN
Transcription
THÀNH TÍCH NGHIÊN CỨU - Quỹ Học Bổng RVN
THÀNH TÍCH NGHIÊN CỨU Sinh viên: Đỗ Trọng Nhất – Ngày sinh: 22/12/1990 – MSSV: D08-140 Lớp Dược chính quy niên khóa 2009-2014 Khoa Dược – Đại học Y Dược TPHCM Công trình nghiên cứu công bố quốc tế và khu vực - “Design, Synthesis and Biological Evaluation of some Chalcone Derivatives as Potential Pancreatic Lipase Inhibitors” được đăng trên “The 17th International Electronic Conference on Synthetic Organic Chemistry 2013” tại đường dẫn http://www.sciforum.net/conference/ecsoc-17/paper/2299 (có đính kèm abstract và thư xác nhận của giảng viên hướng dẫn) - “In silico modeling for antimalarial compounds.”, báo cáo poster tại Hội nghị Quốc tế PharmaIndochina năm 2013 và đăng trong kỷ yếu của “Proceedings of The Eighth Indochina Conference on Pharmaceutical Sciences, 2013, trang 503-509.” (có đính kèm bài toàn văn và thư xác nhận của giảng viên hướng dẫn) Công trình nghiên cứu cấp thành phố - “ ghiên c u khả năng g n kết gi a enzym histon deacetylase 2 và nhóm dẫn ch t hydroxamat và mercaptoacetamid.” đăng trên tạp chí h c Thành phố H hí inh năm 2014 tập 18 phụ bản của Số 2, trang 317-323. (có đính kèm abstract và thư xác nhận của giảng viên hướng dẫn) Công trình nghiên cứu tham dự cuộc thi “Sinh viên nghiên cứu khoa học” năm 2014 của trường Đại h c Dược TPH vòng chung kết hiện đang chờ kết quả. - “ ghiên c u xây dựng mô hình docking và 2D-QSAR trên các dẫn ch t c chế enzym telomerase ng dụng trong điều trị ung thư” (có đính kèm abstract và thư xác nhận của giảng viên hướng dẫn) Design, Synthesis and Biological Evaluation of some Chalcone Derivatives as Potential Pancreatic Lipase Inhibitors Hoai-Anh Nguyen, Trong-Nhat Do, Van-Dat Truong, Khac-Minh Thai, Ngoc-Chau Tran, Thanh-Dao Tran Faculty of Pharmacy, Ho Chi Minh City University of Medicine and Pharmacy 41 Dinh Tien Hoang Street, District 1, Ho Chi Minh City, Vietnam ABSTRACT Obesity is a growing global health problem, but few drugs are available for the treatment of obesity. Several classes of compounds have been studied and demonstrated for the human lipase inhibition activity - a target in obesity prevention. This study was about design, synthesis and biological evaluation of some synthetic chalcones as pancreatic lipase inhibitors. FlexX software integrated in LeadIT was used for molecular docking studies of 66 chalcone derivatives. 6 derivatives with low docking scores (good binding affinity) were selected for synthesis using both classical and microwave-assisted Claisen-Schmidt condensation reactions. Biological evaluation on pancreatic lipase indicated that some chalcones showed good lipase inhibition activities and that there were correlations between in silico model and biological activities. These presented the possibility to apply virtual screening tools in finding potential agents with high obesity-prevention capacity. Keywords: lipase inhibitory activity, chalcone NGHIÊN CỨU XÂY DỰNG MÔ HÌNH DOCKING VÀ 2D-QSAR TRÊN CÁC DẪN CHẤT ỨC CHẾ ENZYM TELOMERASE ỨNG DỤNG TRONG ĐIỀU TRỊ UNG THƯ Đỗ Trọng Nhất, Đồng Quốc Hiệp, Phan Cường Huy, Nguyễn Thị Thanh Lan, Nguyễn Đức Khánh Thơ Thầy hướng dẫn: PGS.TS Thái Khắc Minh Từ khóa Telomerase, oxadiazol, pyrazol, flavonoid, docking, 2D-QSAR, ung thư Đặt vấn đề Theo thống kê của tổ chức y tế thế giới (WHO), chỉ tính riêng trong năm 2012 có khoảng 8,2 triệu người trên toàn cầu đã tử vong vì bệnh ung thư. Telomerase hiện đang là một đích tác động đầy tiềm năng của các thuốc chống ung thư thế hệ mới và các dẫn chất oxadiazol, pyrazol và flavonoid. Nghiên cứu này sử dụng mô hình docking để khảo sát khả năng gắn kết của những nhóm cấu trúc này với telomorase, đồng thời xây dựng mô hình QSAR từ cơ sở dữ liệu thu thập được. Từ đó tiến hành sàng lọc các thuốc sẵn có trên thị trường; định hướng thiết kế tìm ra những chất mới có khả năng ức chế telomerase, mở ra hy vọng mới cho bệnh nhân ung thư. Đối tượng và phương pháp nghiên cứu Phần mềm FlexX tích hợp trong LeadIT được sử dụng để nghiên cứu mô hình mô tả phân tử docking của 110 dẫn chất thuộc 3 nhóm cấu trúc khác nhau, thu thập từ 9 bài báo khoa học với giá trị sinh học IC50 ức chế hoạt tính của telomerase. Mô hình 2D-QSAR được xây dựng dựa trên thuật toán bình phương tối thiểu từng phần PLS (MOE) trên cơ sở dữ liệu gồm: 41 dẫn xuất của oxadiazol và 48 chất dẫn xuất của pyrazol. Kết quả và bàn luận Các acid amin đóng vai trò quan trọng trong việc gắn kết Lys189, Phe193 và Asp254. Mô hình docking cho thấy Fla–3d và Pyr–p16a là các chất có điểm số docking tốt nhất, phù hợp với giá trị hoạt tính thực nghiệm. Theo hướng thiết kế thuốc dựa vào Ligand xây dựng 2 mô hình QSAR, 1 mô hình cho nhóm cấu trúc oxadiazol với các thông số SMR_VSA3, GCUT_SLOGP_1 và PEOE_VSA-5; mô hình còn lại cho nhóm cấu trúc pyrazol với các thông số PEOE_VSA+1, b_1rotN và PEOE_VSA-1. Cả 2 mô hình đều có khả năng dự đoán đúng với sai số so với giá trị thực nghiệm nhỏ hơn 0,5. Kết hợp QSAR với Docking dự đoán một số cấu trúc có khả năng ức chế telomerase cao thuộc 2 nhóm cấu trúc trên. Kết quả dự đoán của mô hình QSAR với các thuốc có sẵn trên thị trường có cơ chế khác với ức chế telomerase cho thấy nhiều thuốc có khả năng tác động trên enzym đích này. Kết luận Mô hình docking xây dựng dựa trên Protein telomerase (ID : 3DU6) có chất lượng tốt và độ tin cậy cao. Hoạt tính ức chế telomerase có thể là do khả năng gắn kết với khe kị nước phía trên vị trí gắn kết của ATP với enzym. Khả năng dự đoán của mô hình docking và mô hình 2D–QSAR trong nghiên cứu có sự tương quan với nhau. ZYM HISTON HYDROXAMIC MERCAPTOACETAMID Thái Khắc Minh*, *, *, Hứa Ng c Minh Tuyền*, Sơ ** TÓM T T Enzym histon deacetylase (HDAC) được xe l phát triển các thuốc u t ư t e ca t aceta d v HDAC được t cứu d c t d c c được s s để t HDAC2. tt ục tiêu nghiên cứu a d c t hydroxamic v ư t t a vị trí g n k t chọn lọc cho enzym ối ượ v p ươ p áp iê cứu: Các d n ch t hydroxamic v định IC50 trên hoạt tính ức ch HDAC được t cứu t tt t t ể HDAC ( d A ) ead e ca t aceta d đ x c tv c ut c Kết quả và bàn luận: tc t u c t y c c ac d a ua t ọng trong g n k t c a d c t cứu t HDAC l e e ly Hs Hs As As Hs A ly Ala C cc tc t v HDAC tốt HDAC Trên HDAC2, ph n dị vòng là benzo[d]thiazol tỏ ra g n k t hiệu qu v i hệ thống liên k t π-π v i các acid amin Phe155, Phe210. Kết luận: Trong nghiên cứu này mô hình mô t phân t docking được ti n hành trên c u trúc tinh thể chụp b ng tia X c a enzym HDAC2 (3MAX) và các d n ch t hydroxamic v e ca t aceta d ô hình mô t phân t docking này có thể ứng dụng nh m thi t k ra các ch t có kh n ng ức ch mạnh và chuyên biệt HDAC2 nh m mục tiêu tìm ra các hoạt ch t có kh n ng ứng dụng trong đi u trị ung thư. HDAC, docking, hydroxamic e ca t aceta d u t ư t t t uốc ABSTRACT MOLECULAR INTERACTION BETWEEN HISTONE DEACETYLASE AND HYDROXAMIC, MERCAPTOACETAMIDE DERIVATIVES Khac-Minh Thai, Nhan-Tam Nguyen-Huu, Trong-Nhat Do, Minh-Tuyen Hua-Ngoc, Cao-Son Doan * Y Hoc TP. Ho Chi Minh * Vol. 18 - Supplement of No 2 - 2014:317-323 Introduction: Histone deacetylase (HDAC) enzyme has recently been considered as one of the target for anticancer drug development. In this study, the molecular docking model of hydroxamic derivatives on HDAC2 was analysed to figure out the different binding regions of the enzyme. The results could give insight the interactions of HDACs and hydroxamic derivatives at the molecular level and helpful for design new selective HDAC inhibitors. Material and methods: The hydroxamic and mercaptoacetamide derivatives (with HDAC2 IC50 values) were used to dock into X-ray crystal structure of HDAC2 (pdb 3MAX) with LeadIT software. Results and discussion: The important amino acids in the binding site of HDAC2 were indicated via docking results including Phe210, Phe155, Gly154, His146, His183, Asp269, Asp181, His145, Arg39, Gly305, and Trp140. Our results also indicated that these compounds could interact with HDAC2 better than HDAC8. On HDAC2, the heterocyclic benzo[d]thiazole shows the effective binding with target v a π- π te act wt e a d e Conclusion: This study described the molecular docking on X-ray crystal structure of HDAC2 (3MAX) and hydroxamic, mercaptoacetamide derivatives. Our results could be applied to design the active, effective and selective inhibitors on HDAC2 which is useful in cancer treatment. Keywords: HDAC, docking, hydroxamic, mercaptoacetamide, cancer, drug design *Bộ môn Hóa Dược – Khoa Dược – Đại học Y Dược TP. Hồ Chí Minh; ** Viện Kiểm nghiệm Thuốc TW Tác gi liên hệ: TS. Thái Khắc Minh : 0909. 680. 385 Email: thaikhacminh@gmail.com IN SILICO MODELING FOR ANTIMALARIAL COMPOUNDS Thanh-Tan Mai, Trong-Nhat Do, Quoc-Hiep Dong, Duc-Khanh-Tho Nguyen, Thanh-Man Le, and Khac-Minh Thai Department of Medicinal Chemistry, Faculty of Pharmacy, University of Medicine and Pharmacy at Ho Chi Minh City; 41 Dinh Tien Hoang Street, District 1, Ho Chi Minh City, Vietnam; Email: thaikhacminh@gmail.com Abstract Malaria and drug resistance of the parasite are the current serious problems. The drug discovery proccess requires a lot of time and money. Pharmacy informatics can help in virtual screening on the large number of chemical compounds quickly with cost saving, which opens up the prospect of antimalarial drug development. Counter-propagation neural networks was used in this study to build classification and regression models for predicting antimalarial activity in silico. A total of 8 classification models and 2 regression models were built and have good predicting ability for large chemical databases with diverse structural frames. Counter-propagation neural networks has shown the good ability to build multilayer classification models and nonlinear regression models. Key words: Malaria; Counter-propagation neural networks; Classification; Regression. 1. Introduction Malaria is one of most dangerous epidemic diseases in the developing countries. According the Malaria Report 2012 of World Health Organization (WHO), there are 219 cases of malaria with 660,000 deaths.[1] Malaria is caused by five species of parasites of the genus Plasmodium that affect humans (P. falciparum, P. vivax, P. ovale, P. malariae and P. knowlesi). Malaria due to P. falciparum (Pf) is the most deadly form and it predominates in Africa and Southeast Asia. International disbursements to malaria-endemic countries increased every year from less than US$ 100 million in 2000 to US$ 1.84 billion in 2012. Global resource requirements for malaria control were estimated in the 2008 Global Malaria Action Plan to exceed US$ 5.1 billion per year between 2011 and 2020. In addition, while our current tools remain remarkably effective in most settings, resistance to artemisinins – the key compounds in artemisinin-based combination therapies – has been detected in four countries of South-East Asia.[1] Therefore, a new drug which is effective, safety and have activity against resistant parasite strains is an imperative demand. The drug discovery proccess requires 10-15 years and costs a lot of money and effort. With the support of pharmacoinformatics, the time and expenses for this process will be saved. In addition, the determination of in vitro activities of potential antimalarial compounds is good condition for building QSAR models for predicting in silico antimalarial activity. Thus, this study was conducted with the objective of building classification and regression models for predicting antimalarial activity of chemical compounds by counter-propagation neural networks (CPG-NN). 2. Materials and Methods Dataset The in vitro antimalarial activity (IC50) of 1,126 structurally diverse compounds were collected from the literature. In particular, 585 compounds have activites against chloroquin (CQ) sensitive Pf strains and 705 compounds have activities against CQ resistant Pf strains. For classification of antimalarial compounds based on CPG-NN models, the compounds with activity higher than CQ (class 1) are assigned to 1 value and compounds with activity lower than CQ (class 2) are assigned to 0 value. After the training, ouput is a real number between 0.0 and 1.0. For the final classification, the response weight values were transformed to discriminative values (0 or 1), applying a threshold value of 0.5 for each class. For regression models, activity value of a compound is pIC50 (equal to – log(IC50)). 1 Molecular descriptors and feature selection A wide range of 184 different 2D descriptors was calculated for all compounds using the descriptor tool in MOE [4]. The 2D descriptors are defined as numerical properties which can be calculated from the connection table representation of a molecule. They include physical properties, subdivided surface areas, atom counts and bond counts, Kier&Hall connectivity and Kappa shape indices, adjacency and distance matrix descriptors, pharmacophore feature descriptors, and partial charge descriptors. In addition, a number of 2489 2D descriptors was also calculated by DRAGON [6]. They include topological descriptors, walk and path counts, connectivity indices, information indices, 2D auto correlations, edge adjacency indicies, burden eigenvalues, topological charge indicies, eigenvalue-based indicies, 2d binary fingerprints, 2D frequency fingerprint. To select an optimum set of molecular descriptors, QuaSAR-Contingency tool in MOE and Select attribute tool in WEKA [7] were applied to prune the set of the large of number molecular descriptors. Counter-propagation neural networks CPG-NN is a method for supervised learning which has a two-layer architecture and can be used for prediction of pIC50 values. The CPG-NN involves a Kohonen neural network as an input layer and an output layer related to the properties for the object. During training of a CPG-NN, the winning neuron is determined exclusively on basis of the input values, which is similar as in a regular Kohonen network. Additionally, in a CPG-NN each neuron in the Kohonen layer has one or several corresponding neurons in the output layer. Normally, a CPG-NN trained for predicting antimalarial affinity would have an output layer containing one dimension with the antimalarial pIC50 value. [3] All CPG-NN studies described were carried out with the software package SONNIA [5]. The network topology used in this study was a toroidal one with a width equal to the height thus resulting in square maps. The CPG-NN network size was depending on the number of compounds in the training set and comprised N neurons with N equal to the number of compounds in the training set. Other parameters was kept to the default of SONNIA with: Epochs = 100; Interval = 1; Span(x) = Span(y) = N / 2 , Step(x) = Step(y) = Span/Epochs, Rate = 0.5, Rate Factor = 0.995. Evaluation criteria for classification models Accuracy Accuracy is the fraction of observations correctly predicted. The performance of the classification models was measured as total accuracy on all compounds and accuracy values for each class. Let NH, NL, and NM represent the number of compounds belonging to the high, low, and middle hERG activitiy class, respectively, and N is the total number of compounds. Let TP and TN is the number of compounds having activity higher and lower than CQ correctly labeled by the CPG-NN model. The number of false positives in the higher than CQ class is named FP whereas FN accounts for false assignments to the lower than CQ class. Accuracy values were calculated as follows: (i) overall accuracy on all compounds, total accuracy = (TP + TN)/(TP + FP + TN + FN); (ii) accuracy on higher than CQ acivity class, TP/(TP + FN); (iii) accuracy on lower than CQ activity class, TN/(TN + FP). [2] Precision The precision represents the probability that a compound in a given class is predicted correctly, i.e., the fraction of true positives among all cases predicted as being positive. Precision values were calculated as follows: (iv) precision on higher than CQ acivity class, TP/(TP + FP); (v) precision on lower than CQ activity class, TN/(TN + FN). [2] GH score The 'Goodness of Hit lists' or GH score was applied to measure the overall quality of classification results. The GH score on each class takes into account both the precision (the fraction of correct predictions within a class) and the percentage of this class that is retrieved 2 from the dataset. Those models where the GH scores of all three classes are close to 1 (the maximum possible value) will be considered as being the best one. The GH scores for each class of antimalarial compounds are defined as follows: (vi) GH score for higher-than-CQ activity class, TP(2TP + FN + FP)/[(TP + FN)(TP + FP)]; (vii) GH score for lower-than-CQ activity class, TN(2TN + FN + FP)/[(TN + FP)(TN + FN)]. [2] [3] 3. Results and Discussion Classification models for predicting antimalarial activity A classification model for predicting activity against Pf in generally was built by CPG-NN. In addition, there are 3 classification models for predicting activity against CQ sensitive Pf strains and 3 classification models for CQ resistant Pf strains. Classification model for predicting activities against Pf This model was named CPG-C PF and built with 7 molecular desciptors and the dataset includes 487 compounds (341 in class 1 and 146 in class 2). The CPG network size was equal to 19 19 neurons and the number of training cycles was set to 100. Using the training and test sets obtained by diverse splitting and five-times random splitting (80:20), the results show equally high values for accuracy, precision and GH-score for both the training and the test set in two splitting methods (Table 1). The CPG-NN average map obtained for the diverse training set is presented in Figure 1. Figure 1: Output CPG-NN average maps for classification model for activities against Pf (19 19) with training set is the diverse subset. Blue: 0; Orange: 1; White: empty; Black: conflict. With diverse splitting, the total accuracy values obtained for the training and test set are 0.99 and 0.98, respectively. In each class, accuracies archived range from 0.98 to 0.96 for the training set and from 0.80 to 0.94 for the test set. In addition, the GH scores obtained for the training set were 0.99 for class 1 and 0.98 for class 2. For the test set, the GH scores were 0.99 and 0.96 for class 1 and class 2, respectively. For the training and test sets obtained by five-times random splitting (80/20), the total accuracies obtained showed an average value of 0.99 for the training set of 390 compounds and of 0.96 for the test set containing of 98 compounds (Table 1). The compounds in class 1 reach a value of 0.99 both for accracy and GH score for the training set and an accuracy value of 0.98 and a GH score of 0.97 for the test set. The compounds class 2 reach a value of 0.98 both for accracy and GH score for the training set and an accuracy value of 0.91 and a GH score of 0.94 for the test set. Thus, CPG-PF seem to be good classify compounds in both two classes. 3 Table 1: Summary of antimalarial activity against Pf classification powers by CPG-NN Diversea Randomb CPG-C PF YModel Train Test Train Test scramblingc b_single, PEOE_VSA+0, PEOE_VSA+2, Descriptors PEOE_VSA+5, opr_nring, SMR_VSA4, logP(o/w) N 390 98 390 390 98 Total accuracy 0,99 0,98 0,56 0,99 0,96 Accuracy 0,99 0,97 0,68 0,99 0,98 Higher Precision 0,99 1,00 0,67 0,99 0,96 than CQ GH 0,99 0,99 0,68 0,99 0,97 Accuracy 0,98 1,00 0,30 0,98 0,91 Lower 0,98 0,92 0,30 0,98 0,97 than CQ Precision GH 0,98 0,96 0,30 0,98 0,94 a training set is the diverse subset; bFive-fold-leave-20%-out; cY-scrambling (50 times) of diverse training set. (+) Compounds with antimalarial activity higher than CQ; (-) Compounds with activity lower than CQ. Classification models for activities against CQ sensitive and resistant Pf strains For classifying activity against CQ sensitive Pf strains, 3 models named CPG-C S1, S2 and S3 were built with 5 MOE descriptors, 10 DRAGON descriptors and 15 descriptors combined from CPG-C S1 and S2 model, respectively. The dataset includes 585 compounds. CPG network size was equal to 21 21 neurons and the epochs was set to 100. Using the diverse subset as the training set and the remaining compounds as test set, the results show equally high values for accuracy, precision and GH-score for both the training and the test set (Table 2). The CPG-NN average map obtained for the training set of CPG-C S1, S2 and S3 are presented in Figure 2. Among them, CPG-C S3 model is the best model with total accuracy value of 0.94 for training set and 0.85 for test set. Figure 2: Output CPG-NN average maps for CPG-C S1, S2 and S3 models, respectively, for activities against CQ sensitive Pf strains (21 21) with training set is the diverse subset. Blue: 0; Orange: 1; White: empty; Black: conflict. 4 Table 2: Summary of antimalarial activity against CQ sensitive Pf strains classification powers by CPG-NN CPG-C S1 CPG-C S2 CPG-C S3 YYYModels Traina Test scram Traina Test scram Traina Test scram bling bling bling N 468 117 468 468 117 468 468 117 468 nR07, nArNHO, GCUT_SLOGP_3, T(Br..Br), T(F..Cl), BCUT_PEOE_0, Combined from CPGDescriptors nROR, B04[N-F], b_1rotN, a_nN, C S1 and S2 model B07[N-N], B08[N-F], SlogP_VSA0 H-048, EEig04r Total accuracy 0,95 0,84 0,70 0,92 0,84 0,71 0,94 0,85 0,69 Accuracy 0,86 0,76 0,20 0,71 0,61 0,17 0,80 0,70 0,25 (+) Precision 0,84 0,74 0,22 0,87 0,68 0,21 0,85 0,74 0,18 GH 0,85 0,75 0,21 0,79 0,64 0,19 0,83 0,72 0,22 Accuracy 0,97 0,87 0,83 0,97 0,91 0,84 0,97 0,90 0,80 (-) Precision 0,97 0,88 0,81 0,93 0,88 0,81 0,96 0,88 0,82 GH 0,97 0,88 0,82 0,95 0,90 0,83 0,96 0,89 0,81 a training set is the diverse subset. (+) Compounds with antimalarial activity higher than CQ; (-) Compounds with activity lower than CQ. Moreover, three models namely CPG-C R1, R2 and R3 for classification activity against CQ resistant Pf strains were built by using 7 MOE descriptors, 11 DRAGON descriptors and 18 descriptors combined from CPG-C R1 and R2 model, respectively. The dataset was included of 701 compounds. The diverse subset as the training set and the remaining compounds as test set was used and CPG network size equal to 23 23 and epochs was set to 100 (Table 3). The CPG-NN average map obtained for the training set of CPG-C R1, R2 and R3 are presented in Figure 3. Similar to the models for compounds with activity against CQ sensitive Pf strains, CPG-C R3 is the best model with total accuracy value of 0.94 for the training set and 0.91 for the test set. Figure 3: Output CPG-NN average maps for CPG-C R1, R2 and R3 models, respectively, for activities against CQ resistant Pf strains (21 21) with training set is the diverse subset. Blue: 0; Orange: 1; White: empty; Black: conflict. 5 Table 3: Summary of antimalarial activity against CQ resistant Pf strains classification powers by CPG-NN CPG-C R1 CPG-C R2 CPG-C R3 YYYModels Traina Test scram Traina Test scram Traina Test scram bling bling bling N 561 140 561 561 140 561 561 140 561 GCUT_PEOE_3, C-012, nR10, nRNHO, BCUT_PEOE_0, nArC=N, MSD, Combined from CPGDescriptors PEOE_VSA-0, GATS1m, AAC, TI2, C R1 and R2 model PEOE_VSA+2, SMR_ B05[N-Cl], RBN, VSA2, vsa_acc, chiral_u EEig02x Total accuracy 0,93 0,85 0,54 0,95 0,89 0,54 0,94 0,91 0,54 Accuracy 0,89 0,69 0,38 0,92 0,81 0,37 0,91 0,83 0,35 (+) Precision 0,91 0,93 0,39 0,95 0,93 0,38 0,93 0,98 0,38 GH 0,90 0,81 0,38 0,93 0,87 0,38 0,92 0,91 0,36 Accuracy 0,95 0,96 0,64 0,97 0,95 0,64 0,96 0,99 0,66 (-) Precision 0,94 0,81 0,64 0,95 0,86 0,63 0,95 0,85 0,63 GH 0,94 0,89 0,64 0,96 0,90 0,64 0,96 0,92 0,64 a training set is the diverse subset. (+) Compounds with antimalarial activity higher than CQ; (-) Compounds with activity lower than CQ. All six classification models showed high values for accuracy, precision and GH-score. However, CPG-C S1, S2 and S3 model either showed unequally predicting ability for class 1 and class 2 with the GH-score for the training set in class 1 in each models archived range from 0.79 to 0.85 and from 0.95 to 0.96 in class 2. Whereas, CPG-C R1, R2 and R3 model showed more equally predicting ability for both two class with the GH-score for the training set in each models archived range from 0.90 to 0.93 in class 1 and from 0.94 to 0.96 in class 2. This can be explained by the homogeneity of compounds number in two property class in the dataset of compounds having activity on CQ resistant Pf strains. Regression model for activities against CQ sensitive and resistant Pf strains CPG-NN can also be used for regression analyses and prediction of pIC50 values. There are 2 regression models namely CPG-R1 and CPG-R2 were built for predicting IC50 values for activity against CQ sensitive and resistant Pf strains, respectively. CPG-R1 was built with the dataset of 572 compounds (CPG network size was set to 21 21) and 7 MOE descriptors (logP(o/w), rings, vsa_acc, GCUT_SLOGP_0, BCUT_PEOE_1, SMR_VSA3, lip_druglike). The training set based on diversity showed a good performance (Table 4 and Figure 4) with an R2 = 0.88 (RMSE = 0.44), and also the test set performed quite well (R2 = 0.70, RMSE = 0.61). In case of five-times random division, R2 = 0.88 and RMSE = 0.42 are obtained for the training set. However, the validation run on the test sets revealed a rather low performance (R2 = 0.65, RMSE = 0.76). Table 4: Summary of antimalarial activity regression powers by CPG-NN approach. Diversea Randomb Models Train Test Y-scrambling Train Test N 458 114 458 458 114 RMSE 0,44 0,61 1,74 0,42 0,76 CPG-R1 0,88 0,70 0,0032 0,88 0,65 R2 N 553 138 553 553 138 0,36 0,46 1,80 0,37 0,69 CPG-R2 RMSE 2 0,92 0,84 0,0005 0,93 0,76 R a training set is the diverse subset; bFive-fold-leave-20%-out. 6 CPG-R2 was built with the dataset of 691 compounds (network size was set to 23 23) and 8 MOE descriptors (PEOE_VSA+0, PEOE_VSA-0, SlogP_VSA0, SMR_VSA4, SlogP_VSA2, vsa_other, rings, b_triple). This model showed the better performance than CPG-R1 with an R2 = 0.92 (RMSE = 0.36) for training set in diverse division and R2 = 0.93 (RMSE = 0.37) for training set in random division (Table 4 and Figure 5). 5 y = 0.9252x + 0.0383 4 R² = 0.9232 3 2 pIC50 predicted pIC50 predicted 5 y = 0.8776x + 0.0567 4 R² = 0.8807 3 1 0 -3 -1 -1 1 3 y = 0.8731x + 0.0734 -2 R² = 0.7032 -3 2 1 0 -3 -1 2 -2 y = 0.8748x + 0.0492 R² = 0.8359 -3 pIC50 observed pIC50 observed Figure 4: Calculated versus observed Figure 5: Calculated versus observed antimalarial pIC50 plots for the CPG-R1 antimalarial pIC50 plots for the CPG-R2 regression model for the diverse training set. regression model for the diverse training set. Conclusion CPG-NN was used to build classification and regression models which have good predicting ability for large chemical databases with diverse structural frames. The CPG-C PF model can be applied for classification activity on Pf. CPG-C S3 and CPG-R3 are the best models for classification activity on CQ sensitive and resistant Pf strains. Besides, two regression models CPG-R1 and CPG-R2 are also good models for predicting IC50 value on Pf. These are useful models for virtual screening the antimalarial activity on large library of chemical compounds. Acknowledgement This research is funded by the Department of Science and Technology, Ho Chi Minh City under grant number VLM-139-15-ds2011 (to Khac-Minh Thai). References 1. K. Andrews, M. Aregawi, R. Cibulskis, M. Lynch, R. Newman, R. Williams. World Malaria Report 2012. 2012; v-xiii. 2. O. Carugo. Detailed estimation of bioinformatics prediction reliability through the Fragmented Prediction Performance Plots. BMC Bioinformatics, 2007; 8(380. 3. K.M. Thai, G.F. Ecker. Classification models for HERG inhibitors by counterpropagation neural networks. Chem Biol Drug Des, 2008; 72(4): 279-289. 4. Chemical Computing Group, Molecular Operating Environment (MOE) 2008.10, http://www.chemcomp.com, Access date: 30/06/2013. 5. Molecular Networks, SONNIA 4.2, http://www.molecular-networks.com, Access date: 27/6/2013. 6. Telete srl, DRAGON 5.5 (2007), http://www.telete.mi.it, Access date: 27/6/2013. 7. Waikato Environment for Knownledge Analysis, Weka 3.6, http://www.cs.waikato.ac.nz/~ml/weka/, Access date: 26/6/2013. 7