Co-reporter:Ming Hao, Yan Li, Yonghua Wang, Shuwei Zhang
Analytica Chimica Acta 2011 Volume 690(Issue 1) pp:53-63
Publication Date(Web):25 March 2011
DOI:10.1016/j.aca.2011.02.004
Presently, a genetic algorithm (GA)-support vector machine (SVM) coupled approach is proposed for optimizing the 2D molecular descriptor subset generated for series of P2Y12 (members of the G-protein-coupled receptor family) antagonists, with the statistical performance and efficiency of the model being simultaneously enhanced by SVM kernel-based nonlinear projection. As we know, this is the first QSAR study for prediction of P2Y12 inhibition activity based on an unusually large dataset of 364 P2Y12 antagonists with diversity of structures. In addition, three other widely used approaches, i.e., partial least squares (PLS), random forest (RF), and Gaussian process (GP) routines combined with GA (namely, GA–PLS, GA–RF, GA–GP, respectively) are also employed and compared with the GA–SVM method in terms of several rigorous evaluation criteria. The obtained results indicate that the GA–SVM model is a powerful tool for prediction of P2Y12 antagonists, producing a conventional correlation coefficient R2 of 0.976 and Rcv2 (cross-validation) of 0.829 for the training set as well as Rpred2 of 0.811 for the test set, which significantly outperforms the other three methods with the average R2 = 0.894, Rcv2=0.741, Rpred2=0.693. The proposed model with excellent prediction capacity from both the internal to external quality should be helpful for screening and optimization of potential P2Y12 antagonists prior to chemical synthesis in drug development.