Co-reporter:Zijian Qin, Maolin Wang, Aixia Yan
Bioorganic & Medicinal Chemistry Letters 2017 Volume 27, Issue 13(Issue 13) pp:
Publication Date(Web):1 July 2017
DOI:10.1016/j.bmcl.2017.05.001
In this study, quantitative structure-activity relationship (QSAR) models using various descriptor sets and training/test set selection methods were explored to predict the bioactivity of hepatitis C virus (HCV) NS3/4A protease inhibitors by using a multiple linear regression (MLR) and a support vector machine (SVM) method. 512 HCV NS3/4A protease inhibitors and their IC50 values which were determined by the same FRET assay were collected from the reported literature to build a dataset. All the inhibitors were represented with selected nine global and 12 2D property-weighted autocorrelation descriptors calculated from the program CORINA Symphony. The dataset was divided into a training set and a test set by a random and a Kohonen’s self-organizing map (SOM) method. The correlation coefficients (r2) of training sets and test sets were 0.75 and 0.72 for the best MLR model, 0.87 and 0.85 for the best SVM model, respectively. In addition, a series of sub-dataset models were also developed. The performances of all the best sub-dataset models were better than those of the whole dataset models. We believe that the combination of the best sub- and whole dataset SVM models can be used as reliable lead designing tools for new NS3/4A protease inhibitors scaffolds in a drug discovery pipeline.The predicted model (Model D4) that was built by support vector machine (SVM) method based on a series of 512 hepatitis C virus (HCV) NS3/4A protease inhibitors, which could be used as a reliable tool for designing or virtual screening for new NS3/4A PIs scaffolds.Download high-res image (118KB)Download full-size image
Co-reporter:Yue Kong, Aixia Yan
Chemometrics and Intelligent Laboratory Systems 2017 Volume 167(Volume 167) pp:
Publication Date(Web):15 August 2017
DOI:10.1016/j.chemolab.2017.06.011
•Developed in silico models to predict the bioactivity of Polo-like kinase 1 (PLK1) inhibitors.•Constructed 16 single classifier models and one consensus Kohonen’s Self-organizing Map (SOM) model.•MCCs range from 0.609 to 0.864 and accuracies range from 78.7% to 93.1% for test set of 16 single classifier models.•Consensus SOM model based on single classifiers obtained a more reliable model with MCC of 0.872 for test set.•Feature selected by SVMAttributeEval and split by SOM achieved the best model performance.As a member of serine/threonine kinases family, Polo-like kinase 1 (PLK1) plays a key role in regulating cell cycle progression, particularly mitosis, emerging as an important target for cancer therapy. It is necessary and urgent to develop highly predictive in silico models to predict the bioactivity of PLK1 inhibitors. In our work, 16 single classifier models and one consensus Kohonen's Self-organizing Map (SOM) model were constructed to discriminate the highly active PLK1 inhibitors from the poorly active ones on a dataset of 601 noncongeneric PLK1 inhibitors. For these 16 single classifier models, we used four machine learning methods - Support Vector Machine (SVM), Naive Bayes (NB), C4.5 Decision Tree (C4.5 DT) and Random Forest (RF), with the MCCs ranging from 0.609 to 0.864 and the accuracies ranging from 78.7% to 93.1% for the test set. Then the consensus SOM model was built based on four single classifier models to obtain a more reliable and robust model. It turned out our consensus model outperformed all the single classifier models with the MCC of 0.872 and the accuracy of 93.6% on the test set. In addition, we combined two dataset splitting methods (by random and SOM) and two feature selection methods to find the best combination of them. As a result, SVMAttributeEval combined with SOM splitting method achieved the best model performance. Additionally, 20 good ECFP_4 features and 20 bad ECFP_4 features were found, which will help chemists to discriminate highly active PLK1 inhibitors from poorly active ones.
Co-reporter:Zhonghua Xia
Molecular Diversity 2017 Volume 21( Issue 3) pp:661-675
Publication Date(Web):08 May 2017
DOI:10.1007/s11030-017-9743-x
Human microsomal prostaglandin \(\hbox {E}_{2}\) synthase (mPGES)-1 is a promising drug target for inflammation and other diseases with inflammatory symptoms. In this work, we built classification models which were able to classify mPGES-1 inhibitors into two groups: highly active inhibitors and weakly active inhibitors. A dataset of 1910 mPGES-1 inhibitors was separated into a training set and a test set by two methods, by a Kohonen’s self-organizing map or by random selection. The molecules were represented by different types of fingerprint descriptors including MACCS keys (MACCS), CDK fingerprints, Estate fingerprints, PubChem fingerprints, substructure fingerprints and 2D atom pairs fingerprint. First, we used a support vector machine (SVM) to build twelve models with six types of fingerprints and found that MACCS had some advantage over the other fingerprints in modeling. Next, we used naïve Bayes (NB), random forest (RF) and multilayer perceptron (MLP) methods to build six models with MACCS only and found that models using RF and MLP methods were better than NB. Finally, all the models with MACCS keys were used to make predictions on an external test set of 41 compounds. In summary, the models built with MACCS keys and using SVM, RF and MLP methods show good prediction performance on the test sets and the external test set. Furthermore, we made a structure–activity relationship analysis between mPGES-1 and its inhibitors based on the information gain of fingerprints and could pinpoint some key functional groups for mPGES-1 activity. It was found that highly active inhibitors usually contained an amide group, an aromatic ring or a nitrogen heterocyclic ring, and several heteroatoms substituents such as fluorine and chlorine. The carboxyl group and sulfur atom groups mainly appeared in weakly active inhibitors.
Co-reporter:Ling Wang;Maolin Wang;Bin Dai
Molecular Diversity 2013 Volume 17( Issue 1) pp:85-96
Publication Date(Web):2013 February
DOI:10.1007/s11030-012-9404-z
Using a self-organizing map (SOM) and support vector machine, two classification models were built to predict whether a compound is a selective inhibitor toward the two Acyl-coenzyme A: cholesterol acyltransferase (ACAT) isozymes, ACAT-1 and ACAT-2. A dataset of 97 ACAT inhibitors was collected. For each molecule, the global descriptors, 2D and 3D property autocorrelation descriptors and autocorrelation of surface properties were calculated from the program ADRIANA.Code. The prediction accuracies of the models (based on the training/ test set splitting by SOM method) for the test sets are 88.9 % for SOM1, 92.6 % for SVM1 model. In addition, the extended connectivity fingerprints (ECFP_4) for all the molecules were calculated and the structure–activity relationship of selective ACAT inhibitors was summarized, which may help find important structural features of inhibitors relating to the selectivity of ACAT isozymes.
Co-reporter:Aixia Yan, Kai Wang
Bioorganic & Medicinal Chemistry Letters 2012 Volume 22(Issue 9) pp:3336-3342
Publication Date(Web):1 May 2012
DOI:10.1016/j.bmcl.2012.02.108
Several QSAR (Quantitative Structure–Activity Relationships) models for predicting the inhibitory activity of 404 Acetylcholinesterase inhibitors were developed. The whole dataset was split into a training set and a test set randomly or using a Kohonen’s self-organizing map. Then the inhibitory activity of 404 Acetylcholinesterase inhibitors was predicted using Multilinear Regression (MLR) analysis and Support Vector Machine (SVM) methods, respectively. For the test sets, correlation coefficients of all our models over 0.90 were achieved. Y-randomization test was employed to ensure the robustness of our models and a docking simulation was used to confirm the descriptors we used.
Co-reporter:Zhi Wang, Yuanying Chen, Hu Liang, Andreas Bender, Robert C. Glen, and Aixia Yan
Journal of Chemical Information and Modeling 2011 Volume 51(Issue 6) pp:1447-1456
Publication Date(Web):May 23, 2011
DOI:10.1021/ci2001583
P-glycoprotein (P-gp) is one of the major ABC transporters and involved in many essential processes such as lipid and steroid transport across cell membranes but also in the uptake of drugs such as HIV protease and reverse transcriptase inhibitors. Despite its importance, reliable models predicting substrates of P-gp are scarce. In this study, we have built several computational models to predict whether or not a compound is a P-gp substrate, based on the largest data set yet published, employing 332 distinct structures. Each molecule is represented by ADRIANA.Code, MOE, and ECFP_4 fingerprint descriptors. The models are computed using a support vector machine based on a training set which includes 131 substrates and 81 nonsubstrates that were evaluated by 5-, 10-fold, and leave-one-out (LOO) cross-validation. The best model gives a Matthews Correlation Coefficient of 0.73 and a prediction accuracy of 0.88 on the test set. Examination of the model based on ECFP_4 fingerprints revealed several substructures which could have significance in separating substrates and nonsubstrates of P-gp, such as the nitrile and sulfoxide functional groups which have a higher frequency in nonsubstrates than in substrates. In addition structural isomerism in sugars was found to result in remarkable differences regarding the likelihood of a compound to be a substrate for P-gp.
Co-reporter:Aixia Yan, Yang Chong, Liyu Wang, Xiaoying Hu, Kai Wang
Bioorganic & Medicinal Chemistry Letters 2011 Volume 21(Issue 8) pp:2238-2243
Publication Date(Web):15 April 2011
DOI:10.1016/j.bmcl.2011.02.110
Several QSAR (quantitative structure–activity relationships) models for predicting the inhibitory activity of 117 Aurora-A kinase inhibitors were developed. The whole dataset was split into a training set and a test set based on two different methods, (1) by a random selection; and (2) on the basis of a Kohonen’s self-organizing map (SOM). Then the inhibitory activity of 117 Aurora-A kinase inhibitors was predicted using multilinear regression (MLR) analysis and support vector machine (SVM) methods, respectively. For the two MLR models and the two SVM models, for the test sets, the correlation coefficients of over 0.92 were achieved.Four QSAR models were built by multilinear regression (MLR) analysis and support vector machine (SVM) method based on a series of 117 Aurora-A kinase inhibitors, which could be used for the predicting the activities of Aurora-A inhibitors.
Co-reporter:Xiaoying Hu, Aixia Yan, Tianwei Tan, Oliver Sacher and Johann Gasteiger
Journal of Chemical Information and Modeling 2010 Volume 50(Issue 6) pp:1089-1100
Publication Date(Web):June 1, 2010
DOI:10.1021/ci9004833
In this work, the perception of similarity of reactions catalyzed by hydrolases and oxidoreductases on the basis of the overall breaking and making of bonds of reactions is investigated. Six physicochemical properties for the reacting bond in the substrate of each enzymatic reaction were calculated to describe the characteristics of each reaction. The 311 reactions catalyzed by hydrolases (EC 3.b.c.d) and the 651 reactions catalyzed by oxidoreductases (EC 1.b.c.d) were classified by Kohonen’s self-organizing neural network (KohNN), by a support vector machine (SVM), and by hierarchical clustering analysis (HCA). For the 311 reactions catalyzed by hydrolases, the classification accuracy of 95.8% by a KohNN and 97.7% by an SVM was achieved. For the 651 reactions catalyzed by oxidoreductases, the classification accuracy was 93.4% and 96.3% by a KohNN and a SVM, respectively. The similarities of reactions reflected by the physicochemical effects of reacting bonds were compared with the traditional Enzyme Commission (EC) classification system. The results of a KohNN and a SVM are similar to those of the EC classification system method. However, the perception of similarity of reactions by a KohNN and a SVM shows finer details of the enzymatic reactions and thus could provide a good basis for the comparison of enzymes.
Co-reporter:Aixia Yan, Liyu Wang, Shuyu Xu, Jun Xu
Drug Discovery Today (March 2011) Volume 16(Issues 5–6) pp:260-269
Publication Date(Web):1 March 2011
DOI:10.1016/j.drudis.2010.12.003
Aurora kinases (A–C) belong to the serine/threonine protein kinase family. In recent years, the constitutive or elevated expression of Aurora kinases has been found in cancer cells and oncogene transfected cells. In this review, we summarize the common binding modes of Aurora-A kinase inhibitors, the hot spot residues in the binding sites and the privileged inhibitor structures. Our review of the reported chemical scaffolds of Aurora-A kinase inhibitors and their binding modes could provide a useful framework from which new design strategies for inhibitors might be assessed or developed.
Co-reporter:Yang Li, Shouyi Xuan, Yue Feng, Aixia Yan
Drug Discovery Today (April 2015) Volume 20(Issue 4) pp:435-449
Publication Date(Web):1 April 2015
DOI:10.1016/j.drudis.2014.12.001
•Overview of structural and functional properties of HIV-1 integrase (IN).•Classifying the HIV-1 integrase strand transfer inhibitors (INSTIs) into ten classes.•Summarizing the essential features and the development of HIV-1 INSTIs.HIV-1 integrase (IN) is a retroviral enzyme essential for integration of genetic material into the DNA of the host cell and hence for viral replication. The absence of an equivalent enzyme in humans makes IN an interesting target for anti-HIV drug design. This review briefly overviews the structural and functional properties of HIV-1 IN. We analyze the binding modes of the established drugs, clinical candidates and a comprehensive library of leads based on innovative chemical scaffolds of HIV-1 IN strand transfer inhibitors (INSTIs). Computational clustering techniques are applied for identifying structural features relating to bioactivity. From bio- and chemo-informatics analyses, we provide novel insights into structure–activity relationships of INSTIs and elaborate new strategies for design of innovative inhibitors.