Tingjun Hou

Find an error

Name:
Organization: Zhejiang University
Department: College of Pharmaceutical Sciences
Title:
Co-reporter:Tailong Lei, Huiyong Sun, Yu Kang, Feng Zhu, Hui Liu, Wenfang Zhou, Zhe Wang, Dan Li, Youyong Li, and Tingjun Hou
Molecular Pharmaceutics November 6, 2017 Volume 14(Issue 11) pp:3935-3935
Publication Date(Web):October 16, 2017
DOI:10.1021/acs.molpharmaceut.7b00631
Xenobiotic chemicals and their metabolites are mainly excreted out of our bodies by the urinary tract through the urine. Chemical-induced urinary tract toxicity is one of the main reasons that cause failure during drug development, and it is a common adverse event for medications, natural supplements, and environmental chemicals. Despite its importance, there are only a few in silico models for assessing urinary tract toxicity for a large number of compounds with diverse chemical structures. Here, we developed a series of qualitative and quantitative structure–activity relationship (QSAR) models for predicting urinary tract toxicity. In our study, the recursive feature elimination method incorporated with random forests (RFE-RF) was used for dimension reduction, and then eight machine learning approaches were used for QSAR modeling, i.e., relevance vector machine (RVM), support vector machine (SVM), regularized random forest (RRF), C5.0 trees, eXtreme gradient boosting (XGBoost), AdaBoost.M1, SVM boosting (SVMBoost), and RVM boosting (RVMBoost). For building classification models, the synthetic minority oversampling technique was used to handle the imbalance data set problem. Among all the machine learning approaches, SVMBoost based on the RBF kernel achieves both the best quantitative (qext2 = 0.845) and qualitative predictions for the test set (MCC of 0.787, AUC of 0.893, sensitivity of 89.6%, specificity of 94.1%, and global accuracy of 90.8%). The application domains were then analyzed, and all of the tested chemicals fall within the application domain coverage. We also examined the structure features of the chemicals with large prediction errors. In brief, both the regression and classification models developed by the SVMBoost approach have reliable prediction capability for assessing chemical-induced urinary tract toxicity.Keywords: boosting; ensembles; imbalanced classification; machine learning; nephrotoxicity; quantitative structure−activity relationship; support vector machine; urinary tract toxicity;
Co-reporter:Tailong Lei, Fu Chen, Hui Liu, Huiyong Sun, Yu Kang, Dan Li, Youyong Li, and Tingjun Hou
Molecular Pharmaceutics July 3, 2017 Volume 14(Issue 7) pp:2407-2407
Publication Date(Web):June 9, 2017
DOI:10.1021/acs.molpharmaceut.7b00317
As a dangerous end point, respiratory toxicity can cause serious adverse health effects and even death. Meanwhile, it is a common and traditional issue in occupational and environmental protection. Pharmaceutical and chemical industries have a strong urge to develop precise and convenient computational tools to evaluate the respiratory toxicity of compounds as early as possible. Most of the reported theoretical models were developed based on the respiratory toxicity data sets with one single symptom, such as respiratory sensitization, and therefore these models may not afford reliable predictions for toxic compounds with other respiratory symptoms, such as pneumonia or rhinitis. Here, based on a diverse data set of mouse intraperitoneal respiratory toxicity characterized by multiple symptoms, a number of quantitative and qualitative predictions models with high reliability were developed by machine learning approaches. First, a four-tier dimension reduction strategy was employed to find an optimal set of 20 molecular descriptors for model building. Then, six machine learning approaches were used to develop the prediction models, including relevance vector machine (RVM), support vector machine (SVM), regularized random forest (RRF), extreme gradient boosting (XGBoost), naïve Bayes (NB), and linear discriminant analysis (LDA). Among all of the models, the SVM regression model shows the most accurate quantitative predictions for the test set (q2ext = 0.707), and the XGBoost classification model achieves the most accurate qualitative predictions for the test set (MCC of 0.644, AUC of 0.893, and global accuracy of 82.62%). The application domains were analyzed, and all of the tested compounds fall within the application domain coverage. We also examined the structural features of the compounds and important fragments with large prediction errors. In conclusion, the SVM regression model and the XGBoost classification model can be employed as accurate prediction tools for respiratory toxicity.Keywords: dimension reduction; extreme gradient boosting; machine learning; quantitative structure−activity relationship; respiratory system toxicity;
Co-reporter:Huiyong Sun, Youyong Li, Mingyun Shen, Dan Li, Yu Kang, and Tingjun Hou
Journal of Chemical Information and Modeling August 28, 2017 Volume 57(Issue 8) pp:1895-1895
Publication Date(Web):July 27, 2017
DOI:10.1021/acs.jcim.7b00075
Drug–target residence time plays a vital role in drug efficacy. However, there is still no effective strategy to predict drug residence time. Here, we propose to use the optimized (or minimized) structures derived from holo-state proteins to calculate drug residence time, which could give a comparable or even better prediction accuracy compared with those calculated utilizing a large number of molecular dynamics (MD) structures based on the Poisson process. Besides, in addition to the Poisson process, one may use fewer samples for predicting residence time due to the reason that, in a large extent, the calculated drug residence time is stable and independent of the number of samples used for the prediction. With remarkably reduced computational load, the proposed strategy may be promising for large-scale drug residence time prediction, such as post-processing in virtual screening (VS) and lead compound optimization.
Co-reporter:Hui Liu, Fu Chen, Huiyong Sun, Dan Li, and Tingjun Hou
Journal of Chemical Theory and Computation April 11, 2017 Volume 13(Issue 4) pp:1827-1827
Publication Date(Web):March 15, 2017
DOI:10.1021/acs.jctc.6b01139
By means of estimators based on non-equilibrium work, equilibrium free energy differences or potentials of mean force (PMFs) of a system of interest can be computed from biased molecular dynamics (MD) simulations. The approach, however, is often plagued by slow conformational sampling and poor convergence, especially when the solvent effects are taken into account. Here, as a possible way to alleviate the problem, several widely used implicit-solvent models, which are derived from the analytic generalized Born (GB) equation and implemented in the AMBER suite of programs, were employed in free energy calculations based on non-equilibrium work and evaluated for their abilities to emulate explicit water. As a test case, pulling MD simulations were carried out on an alanine polypeptide with different solvent models and protocols, followed by comparisons of the reconstructed PMF profiles along the unfolding coordinate. The results show that when employing the non-equilibrium work method, sampling with an implicit-solvent model is several times faster and, more importantly, converges more rapidly than that with explicit water due to reduction of dissipation. Among the assessed GB models, the Neck variants outperform the OBC and HCT variants in terms of accuracy, whereas their computational costs are comparable. In addition, for the best-performing models, the impact of the solvent-accessible surface area (SASA) dependent nonpolar solvation term was also examined. The present study highlights the advantages of implicit-solvent models for non-equilibrium sampling.
Co-reporter:Jun Shang;Huiyong Sun;Hui Liu;Fu Chen;Sheng Tian
Journal of Cheminformatics 2017 Volume 9( Issue 1) pp:25
Publication Date(Web):21 April 2017
DOI:10.1186/s13321-017-0212-4
BackgroundVirtual screening (VS) based on a variety of ligand-based or structure-based drug design approaches, such as property-based or drug-likeness rules, quantitative structure–activity relationship (QSAR) models, pharmacophore hypotheses, molecular docking, has become a powerful way to find hits in drug discovery. Certainly, screening libraries of small molecules with 2-D or 3-D structures are indispensable sources for VS campaigns. For example, the number of purchasable molecules collected in the ZINC database increases from ~0.73 million in 2005 to over 100 million in 2015 [1]. For the 176 vendors deposited in ZINC15, 37 offer more than 100,000 compounds and 9 offer more than 1 million compounds. This highlights the progress in synthesis of organic chemistry and tremendous demand of this market. In most VS applications, it is more practical and time effective to screen a compound library provided by a specific vendor rather than screen all compound libraries collected by ZINC. Certainly, the distributions of physiochemical properties, structural features and scaffold diversity of purchasable compound libraries afforded by different vendors should be different [2]. Therefore, an important question may be raised: which library should be used for VS? In order to answer this question, we need to have a deep understanding of the intrinsic features of each purchasable compound library and the difference among them.As we know, the properties of a molecule are determined by its structure, and therefore similar structures tend to bear similar properties according to the similarity principle [3]. Thus, the chemical space of a compound library should be examined by molecular structures, especially chemical scaffolds, which has a huge impact on the success rate in biological screenings [4]. The scaffold of a molecule can be described by different ways. The most traditional way to define a scaffold is the Markush structure proposed by Markush [5]. Markush structures are usually used in patent applications to define chemical series [6], but they may be too generic to highlight the important structural features essential for pharmaceutical activity. Another scaffold representation is the Murcko framework proposed by Bemis and Murcko [7]. This method employs a more systematical way to dissect a molecule into four parts: ring systems (Fig. 1a), linkers (Fig. 1b), side chains (Fig. 1c), and the Murcko framework (Fig. 1d) that is the union of ring systems and linkers in a molecule. Lewell et al. [8] described a more chemically meaningful presentation of molecular structures, namely “RECAP” (retrosynthetic combinatorial analysis procedure), which cleaves molecules at bonds based on 11 predefined bond cleavage rules derived from common chemical reactions. As an example shown in Fig. 1h, the molecule is dissected into two parts at the bond linked by nitrogen and carbon. Therefore, analysis of compound libraries by using the RECAP representation may be a good way to explore the synthetic feasibility of a molecule.Open image in new windowFig. 1Definitions of different types of fragments in a molecule: a ring systems, b linkers, c side chains and d Murcko framework, e ring assemblies, f bridge assemblies, g rings, h RECAP fragments and i scaffold treeBased on the Murcko framework, Schuffenhauer et al. [9] proposed a more complicated and systematical methodology, called Scaffold Tree (ST), to describe the ring systems arranged in a hierarchical tree, which iteratively prunes rings one by one based on a set of prioritization rules until only one ring remains. The structural hierarchies of each molecule in a Scaffold Tree are numbered numerically from Level 0 (the single remaining ring usually) to Level n (the original molecule) (Fig. 1i), and Level n − 1 is the Murcko framework. Owing to the systematic partition of molecular structures, the Scaffold Tree methodology has been employed in many scaffold diversity studies of compound libraries [10, 11, 12].A number of studies have been reported to analyze and compare the chemical space and diversity of commercially available compound libraries in the last decade [13]. Krier et al. [14] evaluated the scaffold diversity of 17 commercially available screening collections with 2.4 million compounds by analyzing the maximum common substructures (MCS), and they grouped the commercial collections into different categories with low, medium and high scaffold diversity. However, the definition of MCS is arbitrary and data set dependent, and MCS may be not a general way to represent a large number of scaffolds. Langdon et al. [12] analyzed the structural diversity of 7 commercial compound libraries by using the Murcko frameworks and Scaffold Trees, and then visualized the scaffold space by using the Tree Maps software [15]. They found that there were some emblematical scaffolds in each library. Nevertheless, the libraries analyzed by Langdon et al. are rarely used in practical VS and the numbers of molecules in three libraries are even <10,000, and therefore the results may not be informative for drug design/discovery. With the rapid increase of the number of commercially available small molecules, analysis of the structural features and scaffold diversity for representative screening libraries is quite demanding.In this study, the structural features and scaffold diversity of eleven commercially available screening libraries and Traditional Chinese Medicine compound database (TCMCD) were explored by analyzing seven fragment representations. All the selected commercial libraries have more than 50,000 compounds and have been widely used in VS. We aimed to find the difference of the structural features and scaffold diversity among these libraries. Tree Maps and SAR Maps [16] were used to visualize the distribution of the scaffolds based on the similarity of molecular fingerprints. Moreover, the underlying pharmacological characteristics, that is the potential targets of the molecules with the representative scaffolds, were also examined. We believe that our study will help the decision making process when selecting commercially available compound libraries for VS.MethodsPreparation and standardization of librariesThe 11 large compound libraries deposited in ZINC15 were chosen in the analysis, and they are Mcule, Enamine, ChemDiv, VitasM, UORSY, ChemBridge, LifeChemicals, ZelinskyInstitute, Specs, ChemicalBlock and Maybridge. Mcule is the largest library in ZINC15, and it contains 4,922,295 molecules. The SDF files of the studied libraries were downloaded from the vendors’ websites (Additional file 1: File S1). TCMCD developed in our group was also included in this study, and it contains 57,809 molecules with molecular weight (MW) lower than 800, which are found in more than 5000 herbs used in traditional Chinese medicines (TCM) [17, 18, 19]. The basic information of the studied libraries is summarized in Table 1. Then, the molecules in all libraries were preprocessed by the following Pipeline Pilot protocol: fixing bad valence, filtering out inorganic molecules, adding hydrogens and removing duplicated molecules [20].Table 1Basic information of the 12 studied librariesDatabasesaNumberaFilteredbDescriptioncMcule4,922,2954,876,889Large, individual serviceEnamine1,959,0261,958,807Lead-like, diverseChemDiv1,741,8071,741,603SelectedVitasM1,460,2481,460,009Novel compoundsUORSY1,301,0921,293,353Original and uniqueChemBridge1,064,5581,064,425Selected, derivativesLifeChemicals413,286412,788SelectedZelinskyInstitute381,214379,048No descriptionsSpecs212,404212,332SelectedChemicalBlock125,791125,473Selected, diverseMaybridge57,80957,490Highly diverseTCMCD54,20654,138Natural productaNumber of all molecules in each librarybNumber of the molecules in each library after processed by different filterscSimple description of the studied librariesThe MW distributions of the studied libraries are shown in Fig. 2. It can be observed that ranges of MW for these libraries vary greatly. Then, we analyzed the MW distributions at an interval of 100 and found that the numbers of molecules in some intervals for different libraries are quite different. Molecules in the studied libraries with MW from 100 to 700 are highly overlapped. Thus the distributions of MW should be standardized in order to eliminate the influence of MW on scaffold analysis [21]. Eventually, based on the least number of molecules at each interval of 100 MW within the studied libraries, the same numbers of molecules were randomly selected at each interval for all libraries and then 12 new standardized subsets were generated. The standardized subsets have the equal numbers of molecules (41,071) and almost identical MW distributions ranging from 100 to 700. The following analyses were conducted based on the 12 standardized subsets.Open image in new windowFig. 2Box plots of the distributions of molecular weight for the 12 studied databasesGeneration of fragment presentationsA total of 7 fragment representations were used to characterize the structural features and scaffolds of molecules, and they are ring assemblies, bridge assemblies, rings, chain assemblies, Murcko frameworks [7], RECAP fragments [8], and Scaffold Tree [9].The first five types of fragment representations were generated by using the Generate Fragments component in Pipeline Pilot 8.5 (PP 8.5) [20]. The RECAP fragments and Scaffold Tree for each molecule were generated by using the sdfrag command in MOE [22]. Owing to the lack of the original molecules in the Scaffold Tree provided by the sdfrag command, the missing original molecules were added to the SDF files of the Scaffold Tree using PP 8.5 (Additional file 1: File S1). The generation of the Scaffold Tree (from Level 1 to Level n) was accomplished in PP 8.5 by defining the fragments at different levels for each molecule. Eventually, the SDF files of these fragment representations were obtained (Additional file 1: File S1).Analyses of scaffold diversityThe scaffold diversity of each standardized dataset was characterized by the fragment counts and the cumulative scaffold frequency plots (CSFPs) or so called cyclic system retrieval (CSR) curves [23, 24]. The duplicated fragments were removed first, and the numbers of unique fragments for each dataset were counted for ring assemblies, bridge assemblies, rings, chain assemblies, Murcko frameworks, RECAP fragments and Levels 0–11 of Scaffold Tree, along with the numbers of molecules they represent (referred to as the scaffold frequency).Then, the scaffolds were sorted by their scaffold frequency from the most to the least, and the cumulative percentage of scaffolds was computed as the cumulative scaffold frequency divided by the total number of molecules [12]. Similarly, percentages of unique fragments can also be calculated. Then, CSFPs with the number or the percentage of Murcko frameworks and Level 1 scaffolds, which may better represent the whole molecules than the other types of fragments, were generated. In each CSFP, PC50C was determined for each scaffold representation to quantify the distribution of molecules over scaffolds. PC50C was defined as the percentage of scaffolds that represent 50% of molecules in a library [14].Generation of Tree MapsThe Tree Maps methodology was employed to analyze the structural similarity of the Level 1 scaffolds by using the TreeMap software, which can highlight both the structural diversity of scaffolds and the distribution of compounds over scaffolds. Tree Maps has been used as a powerful tool to depict structure–activity relationships (SARs) and analyze scaffold diversity [25]. Different from traditional tree structure represented by a graph with the root node and children nodes from the top to the bottom, Tree Maps proposed by Shneiderman uses circles or rectangles in a 2D space-filling way to delegate a kind of property for a clustered dataset with clearly intuitive visualization [15]. Thus, one can visualize a hierarchical clustering map by organizing those clustered properties along with other features for a dataset, such as MW.First, the unique Level 1 scaffolds were clustered by using the cluster molecules component in PP 8.5 based on the ECFP_4 (extensive-connectivity fingerprint 4) fingerprints [26, 27, 28]. According to Tian’s study [29] and our testing, although the clustering method is order dependent, the order dependency of the cluster molecules component did not have obvious effect on the clustering results. So, recentering the cluster center twice in a clustering protocol is enough. Then, the SDF file of the clustered scaffolds for each standardized dataset was converted into a text formatted file, which was used as the input of the TreeMap software [30] (Additional file 1: File S1). In each Tree Maps, scaffolds are represented by circles with gray perimeters. The area of each circle is proportional to the scaffold frequency, and the color of each small circle is related to the DTC (DistanceToClosest, i.e., the distance between the fragment and the cluster center) of fragments in each cluster. The lowest value of DTC for the Level 1 scaffolds of ChemBridge (DTC = 0) was colored in red, the highest value (DTC = 0.778) in deep green and the middle value in white. The highest values of DTC for the other databases were also around 0.8. The yellow labels in each Tree Maps were the order numbers of clusters.Generation of SAR MapsSAR Maps generated by the DataMiner 1.6 software is usually used to organize high throughput screening (HTS) data into clusters of chemically similar molecules, which provides a good way for interactive analysis. This structural clustering allows identification of possible false negatives and false positives in the data when the colors in the map represent experimental activity values. The map can not only display the results effectively, but also provide a convenient way to access the chemical series presented by the maximum common structure (MCS) scaffolds. Along with SAR (structure–activity relationship) rules, and substructure- and property-based tools provided in DataMiner, the SAR Map is a powerful method assisting to make the best possible decision on which molecules should be studied further.First, the cluster centers of the top 10 most frequently occurring clusters of the Level 1 Scaffolds observed in the Tree Maps for each standardized subset were defined as the queries to search the dataset by using the Substructure Filter from File component in PP 8.5. The 4816 identified records (i.e., original molecules) were saved into a SDF file (Additional file 1: File S1).Then, the Generate SAR Map function in DataMiner 1.6 was used to generate the structure similarity maps, i.e. SAR Maps [16]. The K-dissimilarity Selection or OptiSim method [31, 32, 33] was used to select a diverse and representative samples from the original dataset based on the Tanimoto similarity distances calculated from the 2D UNITY structural fingerprints [34]. Because the SAR Map is not a simple plot of two variables, it does not have axes. For N compounds, the SAR Map is an optimal projection of the N-squared similarities within the points onto a two dimensional plot using the nonlinear mapping (NLM) projection method [35]. Singleton Radius and SAR Map Horizon are two critical parameters to control the map. The Singleton Radius represents a dissimilarity radius, which was set to 0.3. A singleton is a compound that does not have any nearest neighbor within a predefined radius, and it is regarded as a point in the hedge of the map. The SAR Map Horizon was also set to 0.3, which means that two points will be placed far apart if the dissimilarity between them is higher than the parameter value, but their distance is not in scale relative to the others’ on the map. Accordingly, molecules gathered on the map definitely characterizing much more similar compounds are more meaningful than those separated ones. Therefore, 40 denser areas or so called representative molecules were selected and shown with black dotted circles on the SAR Map. The similarity between molecules in each area and its central molecules were higher than 0.8 (including 0.8), and these representative molecules in an area were saved as a SDF file (Additional file 1: File S1). Then selected molecules from each circle were used as the queries to identify the similar molecules in the BindingDB database [36]. In similarity search, the structural similarity threshold for each query was adjusted to make sure that at least one similar compound could be found for each query, and the least similarity threshold was set to 0.6. Finally, the potential targets of 39 queries were assigned to those of the similar molecules found in BindingDB.Results and discussionCounts of fragmentsFor the 12 standardized subsets, the fragments based on seven types of fragment representations, including ring assemblies, bridge assemblies, rings, chain assemblies, Murcko frameworks, RECAP fragments and Scaffold Tree scaffolds, were generated. The total numbers of all and unique fragments are listed in Tables 2 and 3. Because the standardized subsets have the identical numbers of molecules (41,071) and approximately the same MW distributions, the impact of MW on the analysis of fragments can be eliminated and the counts of the dissected molecules (i.e. fragments) can be compared and analyzed directly.Table 2Numbers of the duplicated and non-duplicated ring assemblies (ra), bridge assemblies (b), rings (r), chain (c), Murcko framwork (m) and RECAP fragment (RECAP) for the 12 standardized datasetsDatabasesTotal numberNon-duplicated numberrabrcmRECAPrabrcmRECAPChemBridge105,467964125,082514,42241,024493,990125585543345025,788107,898ChemDiv103,562440129,997512,14240,933369,011202169784349321,87593,439ChemicalBlock96,2361204125,442492,51540,870250,7652355106888336917,04563,061Enamine99,387496117,219474,17040,832496,594113039523600226,87094,869LifeChemicals103,421431128,421493,05640,973370,651106334531260320,27668,912Maybridge94,063577110,054461,41540,841264,327140868729354315,24253,852Mcule101,088538122,696492,81340,874419,190214475812536827,247108,294Specs96,202872119,323494,75241,038336,076188982832315415,25972,454TCMCD58,1115793127,355466,84239,192702,520850913511176596212,941104,631UORSY96,675454110,588471,90240,678521,18282928449612021,49191,776VitasM98,063650122,978493,39140,871321,898213264839393920,10881,702ZelinskyInstitute96,4301128117,460481,94840,927310,800153372669314516,66668,365Table 3Numbers of the duplicated and non-duplicated scaffolds at different levels of Scaffold Tree for the 12 standardized datasetsLevelChemBridgeChemDivChemicalBlockEnamineLifeChemicalsMaybridgeMculeSpecsTCMCDUORSYVitasMZelinskyInstituteDuplicated scaffolds041,02240,93040,86140,83240,97340,84140,87341,03839,13840,67740,86840,923141,01940,91840,85640,82040,96440,83640,86941,02939,12240,66540,86040,919238,26738,49937,84636,96338,56636,13837,55937,37734,70335,90237,42837,036327,39729,62526,44524,09528,56521,63426,33323,74625,63221,76725,86223,242411,99214,68612,640931512,968704511,87110,77014,752792212,2345998452652330734712039280213012726293366501594326326836363342552317272212419415171519660642174724543210196048306269061824552421124346692 1  2219 2110  1  1  6 1 11        3   Non-duplicated scaffolds05947168715714677138018221110482804684189977112778911,2836524940610,9928142808310,63287368504226,56020,38520,14529,03420,08022,64528,04121,15514,15826,94122,23622,384324,81024,12919,64222,85022,25818,43424,17818,08614,82220,43320,84818,597411,60413,94811,114904211,889662311,542972710,414780811,30190675260632653293196527311253267328595723158631032593634733954531227120940541315871945854167312354329196048298269061824552421124346692 1  2219 2110  1  1  6 1 11        3   Obviously, two kinds of fragments contain side chains, including chain assemblies (chains) and RECAP fragments. The percentages of molecules that do not have any ring in the standardized subsets were also calculated, and they are 0.12, 0.34, 0.51, 0.58, 0.24, 0.56, 0.48, 0.08, 4.71, 0.96, 0.49 and 0.36% for ChemBridge, ChemDiv, ChemicalBlock, Enamine, LifeChemicals, Maybridge, Mcule, Specs, TCMCD, UORSY, VitasM and ZelinskyInstitute, respectively. Among the studied libraries, TCMCD has the highest percentage of acyclic molecules (close to 2000), which is consistent with the results reported by Tian et al. [29]. However, the total number of chains in TCMCD is the least but one (466,842). More interestingly, TCMCD has 5962 unique chains, which are almost twice to those in ChemBridge (3450). Considering that the standardized subset of TCMCD has more acylic compounds, less chains while more unique chains, it appears that the chains in TCMCD are bigger or more complicated and diverse. Despite Maybridge has the fewest number of chains (461,415), which is similar to TCMCD, its number of unique chains (3543) is at the average level, which is still higher than those of ChemBridge (3450) and ChemDiv (3493). However, Chembridge and ChemDiv bear the top two numbers of chains (>510,000). Thus, the structures in Maybridge may be more diverse, which needs to be explored by other types of fragment representations. Among the studied libraries, UORSY and Enamine have more non-duplicated chain assemblies (6120 and 6002) than the others, suggesting that they have more diverse chains, which are two times higher than that of LifeChemicals (2603). Moreover, Mcule owns relatively high number of unique chains (5368).Another fragment representation containing side chains is RECAP fragments, which are the building blocks for synthesizing molecules. As shown in Table 2, TCMCD has extremely high number of RECAP fragments (702,520), indicating that, on the average, synthesizing a compound in TCMCD needs more RECAP fragments than synthesizing a molecule in any other standardized subset. That is to say, synthesizing these compounds in TCMCD may be quite difficult. ChemBridge, Enamine and UORSY have relatively high numbers of RECAP fragments (~500,000), which are almost twice comparing with those of ChemicalBlock (250,765) and Maybridge (264,327). Therefore, it may be easier to synthesize the molecules in ChemicalBlock and Maybridge.In the other five types of fragment presentations, three of them belong to ring systems, including rings, ring assemblies and bridge assemblies. The total numbers of rings for all libraries are quite close, and the biggest difference is found between Maybridge (110,054) and ChemDiv (129,997). Similarly, the total numbers of ring assemblies of these libraries are not quite different, but TCMCD is the only exception. The number of all ring assemblies in TCMCD (58,111) is significantly fewer than those in the other libraries, but quite interestingly, the number of the unique ring assemblies in TCMCD (1351) is quite higher than those in the other libraries. Different from rings and ring assemblies, bridge assemblies characterize contiguous ring systems sharing two or more bonds, and therefore they are also ring assemblies but more complicated. As shown in Table 2, the total number of the bridge assemblies in TCMCD (5793) is significantly higher than those in the other libraries. Although the total number of the simple ring systems in TCMCD is not quite high, its unique numbers of the rings and ring assemblies are much higher than those of the other libraries. In a word, TCMCD has more complicated and diverse ring systems. However, commercial libraries generally contain more simple rings instead of multiple ring systems, such as bridge assemblies. Herein, as a whole, ChemicalBlock and Specs have more unique ring systems, as shown in Table 2. Mcule and VitasM have relatively diverse ring systems, and Mcule also has relatively diverse chains. Enamine and UORSY have relatively high numbers of unique chains, but the numbers of their distinctive ring systems are so low. For LifeChemicals, both of the numbers of the unique chains and ring systems are quite low, suggesting that it has relatively low structural diversity.The other two types of fragment presentations, Murcko frameworks and Scaffold Tree, characterize molecular scaffolds, and they can represent the whole structural features for compounds in a library. Murcko frameworks, which are the union of ring systems (Fig. 1a) and linkers (Fig. 1b), are usually used as the structural signatures of molecules. As shown in Table 2, the total numbers of the Murcko frameworks for all the standardized subsets except TCMCD do not have large difference, which may result from much more acylic molecules found in TCMCD, but those of the unique ones are quite different. The number of the unique Murcko frameworks for Mcule (27,247) is the highest, while that for TCMCD (12,941) is the lowest, highlighting the fact that the structures of natural compounds may be more conservative than those of the synthesized molecules in commercially available libraries. Other databases, such as ChemBridge and Enamine, also possess relatively high numbers of Murcko frameworks (25,788 and 26,870, respectively). However, as mentioned above, the diversity of the ring systems for Enamine is pretty low.Scaffold Tree is a series of rings by chopping molecules into smaller pieces. The numbers of the fragments found at 12 levels of the Scaffold Tree [9] are listed in Table 3. The number of the levels for each subset is determined by the maximum complexity of molecules. Obviously, the most complicated structures can be found in TCMCD. To better compare the differences at each level among the 12 libraries, rose maps were plotted and shown in Fig. 3. Twelve petals stand for the studied libraries, and the twelve layers on each petal depict Level 0 to Level 11 of the Scaffold Tree from inside to outside in turn. Frequencies of molecules can be easily identified and compared by colors. As shown in Fig. 3a, as the levels increase higher than Level 1, the numbers of the scaffolds decrease sharply. At the levels higher than Level 2, the numbers of the fragments for Maybridge, UORSY and ZelinskyInstitute are lower than those for the other libraries. For TCMCD, the numbers of the fragments at Levels 0–2 are relatively low, but those at Level 4 or higher are quite high. That is to say, TCMCD is rich in more complicated structures. In Fig. 3b, the numbers of the unique fragments at 12 levels show different trend comparing with those of all fragments at 12 levels. The numbers of the unique scaffolds at Level 0 are even much lower than those at Level 1, and the numbers of the unique scaffolds at Level 2 or 3 are the highest. It appears that ChemBridge, Enamine and Mcule have higher diversity at Levels 2 and 3 than the other libraries.Open image in new windowFig. 3Rose maps for a the total numbers of the Scaffold Tree for the 12 datasets and b the non-duplicated numbers of the Scaffold Tree for the 12 datasetsIn summary, TCMCD contains much more complicated structures and its whole molecular scaffolds are more conservative than the commercial libraries. Generally speaking, at Levels 2 and 3, ChemBridge and Mcule show high structural diversity. At Level 5 or higher, ChemicalBlock, Specs and VitasM possess relatively high structural diversity, suggesting that these libraries contain more complicated structures. LifeChemicals has relatively high diversity for the Scaffolds at Levels 3 and 4, but has relatively low diversity for rings, ring assemblies and bridge assemblies (Table 2). Certainly, in order to characterize the structural diversity of the 12 studied libraries more clearly, further quantitative analyses are necessary.Cumulative scaffold frequency plots (CSFPs)Among the seven types of fragment representations, which kind of representation is the best choice to characterize the diversity of molecules is a critical problem for us to solve. According to the result from Langdon et al. and Tian et al. [12, 29], considering the balance between structural complexity and diversity, Level 1 scaffolds and Murcko frameworks may be the best choice to represent the scaffolds for most molecules. Besides, the scaled distributions of MW of the fragments for the 12 libraries are shown in Fig. 4. Noticeably, the distributions of the Level 2 scaffolds and Murcko frameworks are quite similar. As for the RECAP fragments, many fragments are too small. Therefore, the Level 1 scaffolds and Murcko frameworks are better to represent the whole molecules, and they are used in the following analyses.Open image in new windowFig. 4The scaled distributions of molecular weight for nine types of fragments found in the 12 datasets. Here, b represents bridge assemblies, c represents chain assemblies, Level_0, Level_1 and Level_2 represent Level 0, Level 1 and Level 2 of the Scaffold Tree, respectively, m represents Murcko frameworks, r represents rings, ra represents ring assemblies, and RECAP represents RECAP fragmentsThe CSFP is a good way to analyze the diversity for large compound libraries. Scaffold frequencies are the number of molecules containing particular scaffolds, which can also be represented as the percentage of the compounds in a library. Similarly, the number of fragments can also be presented as the percentage of the total numbers as shown in Fig. 5. In Fig. 5a, b, curves were truncated at the point where the frequency of the fragment turns from 2 to 1 to compare them clearly considering the following lines are paralleled. By analyzing the CSFPs in these two figures roughly, we found that the slopes of the curves were different and the steeper curves suggested that the most frequently occurring scaffolds can be found in more molecules. For instance, the percentages of the molecules of the top ten frequently occurring Murcko frameworks are 7.625, 5.174, 7.042, 7.756, 4.540, 11.792, 6.938, 13.332, 11.015, 12.601, 8.710 and 11.005% for ChemBridge, ChemDiv, ChemicalBlock, Enamine, LifeChemicals, Maybridge, Mcule, Specs, TCMCD, UORSY, VitasM and ZelinskyInstitute, respectively.Open image in new windowFig. 5a Cumulative scaffold frequency curves of the Murcko frameworks, which is truncated at the point where the frequency of the fragment turns from 2 to 1, for the 12 dataset; b cumulative scaffold frequency curves of the Level 1 Scaffold Tree fragments, which is truncated at the point where the frequency of the fragment turns from 2 to 1, for the 12 datasets; c cumulative scaffold frequency plots (CSFPs) of the Murcko frameworks for the 12 datasets; d CSFPs of the Scaffold Tree fragments for the 12 datasetsHowever, different libraries do not have identical numbers of fragments, which may influence the direct comparison of the 12 standardized datasets. The information derived from the CSFPs in Fig. 5c, d can be roughly quantified by using the PC50C values, which is the percentage of scaffolds that represent 50% of molecules, as shown in Table 4. Accordingly, the higher the value of PC50C is, the more diverse the scaffolds of a database will be. As shown in Fig. 5c and Table 4, TCMCD reaches 50% at the lowest number of the Murcko frameworks, then Specs, Maybridge, Zelinsky Institute and ChemicalBlock. On the contrary, Mcule, Enamine and Chembridge do not reach 50% even the percentage of the most frequently occurring scaffolds become about 25% (Fig. 5a). According to the PC50C values of the Murcko frameworks for the 12 libraries (Table 4), the scaffold diversity of Mcule, Enamine, ChemBridge, ChemDiv, LifeChemicals, VitasM, UORSY, ChemicalBlock, Maybridge, ZelinskyInstitute, Specs and TCMCD can be ranked in a descending order. In Fig. 5d and Table 4, the rank of the Level 1 scaffolds, however, is a little bit different. The scaffold diversity of ChemDiv, Mcule, Maybridge, LifeChemicals, ChemBridge, VitasM, ChemicalBlock, Enamine, ZelinskyInstitute, UORSY, Specs and TCMCD are ranked in a descending order.Table 4PC50C values of the Murcko frameworks (Murcko) and Level 1 scaffolds for the 12 standardized datasetsDatabasesPC50CMurckoScaffold TreeChemBridge21.381.92ChemDiv16.032.82ChemicalBlock9.421.68Enamine26.411.68LifeChemicals12.961.99Maybridge8.522.09Mcule27.362.49Specs6.151.28TCMCD5.921.11UORSY11.101.30VitasM11.851.86ZelinskyInstitute7.851.47The scaffold diversity evaluated based on the Level 1 scaffolds and Murcko frameworks deliver similar overall trends. Three libraries, including ChemDiv, Mcule and LifeChemicals, are more structurally diverse for whether the Level 1 scaffolds or Murcko frameworks, and two libraries, including TCMCD and Specs, are less structurally diverse. But the quantity statistics cannot reveal similarities among these scaffolds, and the scaffolds of TCMCD may present more diverse in similarity. Besides, the exact trends of CSFPs for the Murcko frameworks and Level 1 scaffolds are also different. The CSFPs for the Murcko frameworks are more discriminatory. It is possible that more granular Murcko frameworks enhance the apparent scaffold diversity. Moreover, PC50C is also just a simple index at a certain point in CSFPs. Therefore, a more comprehensive comparison within the distributions of the Level 1 scaffolds is necessary to evaluate the structural features of these libraries.Tree MapsIn the previous section, we analyzed the scaffold diversity of the 12 libraries using the distributions of molecules over scaffolds. Our analyses show that the studied libraries are not evenly distributed over scaffolds, but we know little about the structural similarity and distribution of representative scaffolds. Thus, Tree Maps was used to visualize the structural similarity and distribution of the Level 1 scaffolds.In Fig. 6 and Additional file 2: Fig. S1, colors in these circles are related to DistanceToClosest (DTC). That is to say, the deeper the red color is, the more similar the scaffold will be to the cluster center, and on the contrary, the deeper the green color is, the more dissimilar the fragment will be to the cluster center. As observed in those 12 Tree Maps, green, especially deep green, accounts for large areas in most of the datasets. To describe it easier, the deep green coverage ratio is defined as “Forest Coverage” (FC). As shown in Fig. 6, the FC values of TCMCD and LifeChemicals are larger than those of Enamine and Mcule, indicating that the Level 1 scaffolds in every gray circle of Enamine and Mcule are more similar to each other than those of the other two datasets. This is consistent with the results reported by Yongye et al. that natural products showed low molecule overlap [37]. Nevertheless, in a whole view, the separate gray circles for TCMCD and LifeChemicals are sparser than those for Enamine and Mcule, suggesting that the Level 1 scaffolds of Enamine and Mcule own higher structural diversity than the others. This is also demonstrated by the cluster numbers of Enamine, Mcule, TCMCD and LifeChemicals, which are 226, 220, 162 and 131, respectively. According to the analysis of CSFPs, it is believed that Enamine and Mcule may be more structurally diverse, which may result from more clusters not more diversity in similarities among molecular structures. By contrast, in LifeChemicals, however, despite some high dissimilarity appears in some clusters, these dissimilarities centralize in several kinds of scaffolds, resulting in much less unique fragments.Open image in new windowFig. 6Tree Maps of the Level 1 Scaffolds for a LifeChemicals, b Enamine, c Mcule and d TCMCDIn order to compare the difference of the representative structures identified in the studied libraries, the 10 most frequently occurring scaffolds and the 10 scaffolds of the cluster centers in the top 10 clusters of each library were extracted (Additional file 2: Figs. S2, S3) and these two kinds of extracted scaffolds were merged respectively. Then, the frequencies of the merged scaffolds were counted and the scaffolds with frequencies ≥2 are shown in Fig. 7. Frequencies of these scaffolds for No. 1, 2, 4, 6 and 7 fragments found in different datasets are over 5. Interestingly, 8 out of the 10 most frequently occurring scaffolds of TCMCD cannot be found in any of the other 11 libraries. They have many non-aromatic rings with less nitrogen and more oxygen, and are quite different from the scaffolds found in the other libraries. By contrast, commercial libraries (except Maybridge) possess many common frequently occurring scaffolds with frequencies higher than 5.Open image in new windowFig. 7The Level 1 scaffolds with frequencies ≥2 found in the 10 most frequently occurring scaffolds (1–25) and the 10 scaffolds of the cluster centers in the top 10 clusters (26–27)In Additional file 2: Fig. S3, these scaffolds acting as the cluster centers in Tree Maps are obviously more dissimilar between each other. As shown in Fig. 7, there are only 2 scaffolds (26 and 27) with frequencies ≥2, which can be found in ChemBridge and LifeChemicals, and ChemDive and Maybridge, respectively. It seems that the scaffolds of these cluster centers serving as the representatives for clusters are more unique than the most frequent ones.SAR MapsIn the previous two sections, the structural features, distributions and scaffold diversity of 12 libraries have been analyzed, but the relationships among the scaffolds present in clusters for different libraries have not been explored. Then, the chemical space of the molecules identified by the substructure search of the representative scaffolds, which are the cluster centers from the Tree Maps for the 12 subsets, was characterized by the SAR Maps methodology. Besides, high interests in diverse scaffolds that preferentially interact with important target families are also taken into consideration [38]. The underlying pharmacological characteristics of some representative scaffolds which are important components of drug candidates against different drug targets are also predicted.As shown in Fig. 8a, each point represents a molecule, and therefore, there are 4816 molecules in the SAR map. As mentioned above, points around the edge of the map are the molecules whose dissimilarities measured by the Tanimoto distance with all the others are higher than 0.3. Two molecules will be placed far apart if their similarity is < 0.3. It should be noted that it is meaningless to compare the molecules at the hedge with any other points in the map in terms of the actual similarity. More compounds on the hedge, such as Enamine (green circles), LifeChemicals (gray circles) and Mcule (the smallest blue circles), may imply higher structural diversity of the representative molecules. It is more meaningful to examine the points that are put together as the typical molecules in these libraries.Open image in new windowFig. 8a The panoramic SAR Map of the Level 1 scaffolds for the 12 datasets. The numbers of molecules for 1 ChemBridge (260), 2 ChemDiv (47), 3 ChemicalBlock (562), 4 Enamine (328), 5 LifeChemicals (900), 6 Maybridge (513), 7 Mcule (518), 8 Specs (106), 9 TCMCD (1268), 10 UORSY (62), 11 VitasM (140) and 12 ZelinskyInstitute (112); b the center part of the SAR Map, and some selected groups of the representative molecules (39 in total) are highlighted by the black dotted linesTherefore, to focus on the gathered molecules, the original SAR Map is magnified and shown in Fig. 8b. Compounds in the same library are represented by the points with the same color, size and shape. As shown in Fig. 8b, most of the biggest blue circles in TCMCD lie on the left of the map, and vast of the pink circles of ChemicalBlock on the upper right. Similarly, most light blue circles of Maybridge are at the bottom. As for the other libraries, such as Mcule represented by the smallest blue circles, it distributes more sparsely with few dense parts. But Mcule has 518 representative molecules, roughly equal to that of Maybridge (513) on the map. More dispersive distribution of Mcule suggests that Mcule also owns a large number of diverse molecules. The gray ones of LifeChemicals also spread in a wide range, but some accumulate in certain separated areas. Thus, there must be some distinct molecules in each library as shown by the denser areas on the map. Then, 40 selected areas of representative molecules highlighted by the black dotted circles on the SAR Map were identified.To grasp the potential functions and structural properties of these selected representative molecules, similarity searching and the MCS searching were carried out. By searching BindingDB based on similarity, similar inhibitors of the representative molecules and the corresponding targets were obtained. Similar molecules in BindingDB could be found for 39 out of the 40 representative molecules, and the 39 corresponding MCSs are shown in Additional file 2: Fig. S4 and the potential targets are listed in Additional file 2: Table S1. We found that many identified potential targets were kinases and GPCRs with high similarity thresholds, such as Pyruvate kinase for ChemDiv, streptokinase A precursor for ChemicalBlock, Cyclin-Dependent kinase for LifeChemicals, Serine/threonine-protein kinase for Maybridge, hexokinase and Serine-protein kinase for TCMCD and Glycogen synthase kinase for LifeChemicals, Maybridge, Mcule, TCMCD, VitasM and ZelinskyInstitute. Moreover, GPCRs were also identified as the potential targets for the representative molecules found in ChemBridge, ChemicalBlock, Maybridge, TCMCD and VitasM. In particular, three groups of molecules in TCMCD have high similarity (up to 1) to the inhibitors of GPCRs but MCSs of the representative structures from these groups are not that similar. Besides, some ion channels, transporters, etc. can also be found as the potential targets. Our results suggest that these typical structures found by the SAR Maps can reveal some important structural and potential functional features for each dataset. Specifically, TCMCD, ChemicalBlock and Maybridge occupying unique area in chemical space, are of great potential to find drug candidates of those vital druggable targets, such as kinases and GPCRs.ConclusionsIn this study, based on seven different fragment representations, the structural features, scaffold diversity and chemical distributions of 12 libraries, including 11 commercially available compound libraries and TCMCD, were explored and compared. The analyses indicate that although Chembridge, ChemicalBlock, Mcule, TCMCD and VitasM are more structurally diverse than the other databases. TCMCD is actually not quite structurally diverse for simple molecules, but the most occurring Level 1 scaffolds of it has tremendous difference to those of the other libraries. Despite Chembridge, Mcule and VitasM are rich in different kinds of fragments, their representative molecules largely overlap with those of the other databases, suggesting that the unique compounds in these libraries may be not so high in fact. Structures in ChemicalBlock are really diverse and complicated enough for VS. As for LifeChemicals, it does not have a variety of fragments but has much dissimilar molecular structures. Some libraries such as Enamine and UORSY are not good choice for actual VS considering the structural complexity and diversity of the molecules. Besides, 40 groups of representative scaffolds were identified in these 12 databases through Tree Maps and SAR Maps, and some molecules with these representative scaffolds found in certain libraries may be potential inhibitors of kinases and GPCRs. We believe that our study may provide valuable information to select proper commercial libraries in practical VS.
Co-reporter:Fu Chen;Huiyong Sun;Hui Liu;Dan Li;Youyong Li
Physical Chemistry Chemical Physics 2017 vol. 19(Issue 15) pp:10163-10176
Publication Date(Web):2017/04/12
DOI:10.1039/C6CP08232G
High-throughput screening (HTS) is widely applied in many fields ranging from drug discovery to clinical diagnostics and toxicity assessment. Firefly luciferase is commonly used as a reporter to monitor the effect of chemical compounds on the activity of a specific target or pathway in HTS. However, the false positive rate of luciferase-based HTS is relatively high because many artifacts or promiscuous compounds that have direct interaction with the luciferase reporter enzyme are usually identified as active compounds (hits). Therefore, it is necessary to develop a rapid screening method to identify these compounds that can inhibit the luciferase activity directly. In this study, a virtual screening (VS) classification model called MIEC-GBDT (MIEC: Molecular Interaction Energy Components; GBDT: Gradient Boosting Decision Tree) was developed to distinguish luciferase inhibitors from non-inhibitors. The MIECs calculated by Molecular Mechanics/Generalized Born Surface Area (MM/GBSA) free energy decomposition were used to energetically characterize the binding pattern of each small molecule at the active site of luciferase, and then the GBDT algorithm was employed to construct the classifiers based on MIECs. The predictions to the test set show that the optimized MIEC-GBDT model outperformed molecular docking and MM/GBSA rescoring. The best MIEC-GBDT model based on the MIECs with the energy terms of ΔGele, ΔGvdW, ΔGGB, and ΔGSA achieves the prediction accuracies of 87.2% and 90.3% for the inhibitors and non-inhibitors in the test sets, respectively. Moreover, the energetic analysis of the vital residues suggests that the energetic contributions of the vital residues to the binding of inhibitors are quite different from those to the binding of non-inhibitors. These results suggest that the MIEC-GBDT model is reliable and can be used as a powerful tool to identify potential interference compounds in luciferase-based HTS experiments.
Co-reporter:Mojie Duan, Na Liu, Wenfang Zhou, Dan Li, Minghui Yang, and Tingjun Hou
Journal of Chemical Theory and Computation 2016 Volume 12(Issue 9) pp:4611-4619
Publication Date(Web):August 25, 2016
DOI:10.1021/acs.jctc.6b00424
Androgen receptor (AR) plays important roles in the development of prostate cancer (PCa). The antagonistic drugs, which suppress the activity of AR, are widely used in the treatment of PCa. However, the molecular mechanism of antagonism about how ligands affect the structures of AR remains elusive. To better understand the conformational variability of ARs bound with agonists or antagonists, we performed long time unbiased molecular dynamics (MD) simulations and enhanced sampling simulations for the ligand binding domain of AR (AR-LBD) in complex with various ligands. Based on the simulation results, we proposed an allosteric pathway linking ligands and helix 12 (H12) of AR-LBD, which involves the interactions among the ligands and the residues W741, H874, and I899. The interaction pathway provides an atomistic explanation of how ligands affect the structure of AR-LBD. A repositioning of H12 was observed, but it is facilitated by the C-terminal of H12, instead of by the loop between helix 11 (H11) and H12. The bias-exchange metadynamics simulations further demonstrated the above observations. More importantly, the free energy profiles constructed by the enhanced sampling simulations revealed the transition process between the antagonistic form and agonistic form of AR-LBD. Our results would be helpful for the design of more efficient antagonists of AR to combat PCa.
Co-reporter:Huiyong Sun, Pengcheng Chen, Dan Li, Youyong Li, and Tingjun Hou
Journal of Chemical Theory and Computation 2016 Volume 12(Issue 2) pp:851-860
Publication Date(Web):January 14, 2016
DOI:10.1021/acs.jctc.5b00973
As one of the most successful anticancer drugs, crizotinib is found to be efficient in the suppression of MTH1, a new therapeutic target for RAS-dependent cancers. Deep analysis shows that stereospecificity is prevalent in the binding of crizotinib to MTH1, where the target is more preferred to bind with the (S)-enantiomer of crizotinib. Surprisingly, very similar binding modes were found for the two enantiomers (Huber et al. Nature 2014, 508, 222−227), which puzzled us to ask a question as to why such a subtle structural variation could lead to so large of a binding affinity difference. Thereafter, by using advanced all-atom molecular dynamics simulations, we characterized the free energy surfaces of the binding/unbinding processes of the (S) and (R)-crizotinib enantiomers to/from MTH1. Interestingly, we found that rather than the induced-fit process, which is prevalent in drug selectivity and specificity (Wilson et al. Science 2015, 347, 882−886), the directly binding process has dominated impact on the binding affinity difference of the enantiomers, implying a common mechanism of stereoselectivity of enantiomers.
Co-reporter:Xiaotian Kong, Huiyong Sun, Peichen Pan, Sheng Tian, Dan Li, Youyong Li and Tingjun Hou  
Physical Chemistry Chemical Physics 2016 vol. 18(Issue 3) pp:2034-2046
Publication Date(Web):02 Dec 2015
DOI:10.1039/C5CP05622E
Due to the high sequence identity of the binding pockets of cyclin-dependent kinases (CDKs), designing highly selective inhibitors towards a specific CDK member remains a big challenge. 4-(thiazol-5-yl)-2-(phenylamino) pyrimidine derivatives are effective inhibitors of CDKs, among which the most promising inhibitor 12u demonstrates high binding affinity to CDK9 and attenuated binding affinity to other homologous kinases, such as CDK2. In this study, in order to rationalize the principle of the binding preference towards CDK9 over CDK2 and to explore crucial information that may aid the design of selective CDK9 inhibitors, MM/GBSA calculations based on conventional molecular dynamics (MD) simulations and enhanced sampling simulations (umbrella sampling and steered MD simulations) were carried out on two representative derivatives (12u and 4). The calculation results show that the binding specificity of 12u to CDK9 is primarily controlled by conformational change of the G-loop and variation of the van der Waals interactions. Furthermore, the enhanced sampling simulations revealed the different reaction coordinates and transient interactions of inhibitors 12u and 4 as they dissociate from the binding pockets of CDK9 and CDK2. The physical principles obtained from this study may facilitate the discovery and rational design of novel and specific inhibitors of CDK9.
Co-reporter:Shuangquan Wang; Huiyong Sun; Hui Liu; Dan Li; Youyong Li
Molecular Pharmaceutics 2016 Volume 13(Issue 8) pp:2855-2866
Publication Date(Web):July 5, 2016
DOI:10.1021/acs.molpharmaceut.6b00471
Blockade of human ether-à-go-go related gene (hERG) channel by compounds may lead to drug-induced QT prolongation, arrhythmia, and Torsades de Pointes (TdP), and therefore reliable prediction of hERG liability in the early stages of drug design is quite important to reduce the risk of cardiotoxicity-related attritions in the later development stages. In this study, pharmacophore modeling and machine learning approaches were combined to construct classification models to distinguish hERG active from inactive compounds based on a diverse data set. First, an optimal ensemble of pharmacophore hypotheses that had good capability to differentiate hERG active from inactive compounds was identified by the recursive partitioning (RP) approach. Then, the naive Bayesian classification (NBC) and support vector machine (SVM) approaches were employed to construct classification models by integrating multiple important pharmacophore hypotheses. The integrated classification models showed improved predictive capability over any single pharmacophore hypothesis, suggesting that the broad binding polyspecificity of hERG can only be well characterized by multiple pharmacophores. The best SVM model achieved the prediction accuracies of 84.7% for the training set and 82.1% for the external test set. Notably, the accuracies for the hERG blockers and nonblockers in the test set reached 83.6% and 78.2%, respectively. Analysis of significant pharmacophores helps to understand the multimechanisms of action of hERG blockers. We believe that the combination of pharmacophore modeling and SVM is a powerful strategy to develop reliable theoretical models for the prediction of potential hERG liability.
Co-reporter:Fu Chen, Hui Liu, Huiyong Sun, Peichen Pan, Youyong Li, Dan Li and Tingjun Hou  
Physical Chemistry Chemical Physics 2016 vol. 18(Issue 32) pp:22129-22139
Publication Date(Web):14 Jul 2016
DOI:10.1039/C6CP03670H
Understanding protein–protein interactions (PPIs) is quite important to elucidate crucial biological processes and even design compounds that interfere with PPIs with pharmaceutical significance. Protein–protein docking can afford the atomic structural details of protein–protein complexes, but the accurate prediction of the three-dimensional structures for protein–protein systems is still notoriously difficult due in part to the lack of an ideal scoring function for protein–protein docking. Compared with most scoring functions used in protein–protein docking, the Molecular Mechanics/Generalized Born Surface Area (MM/GBSA) and Molecular Mechanics/Poisson Boltzmann Surface Area (MM/PBSA) methodologies are more theoretically rigorous, but their overall performance for the predictions of binding affinities and binding poses for protein–protein systems has not been systematically evaluated. In this study, we first evaluated the performance of MM/PBSA and MM/GBSA to predict the binding affinities for 46 protein–protein complexes. On the whole, different force fields, solvation models, and interior dielectric constants have obvious impacts on the prediction accuracy of MM/GBSA and MM/PBSA. The MM/GBSA calculations based on the ff02 force field, the GB model developed by Onufriev et al. and a low interior dielectric constant (εin = 1) yield the best correlation between the predicted binding affinities and the experimental data (rp = −0.647), which is better than MM/PBSA (rp = −0.523) and a number of empirical scoring functions used in protein–protein docking (rp = −0.141 to −0.529). Then, we examined the capability of MM/GBSA to identify the possible near-native binding structures from the decoys generated by ZDOCK for 43 protein–protein systems. The results illustrate that the MM/GBSA rescoring has better capability to distinguish the correct binding structures from the decoys than the ZDOCK scoring. Besides, the optimal interior dielectric constant of MM/GBSA for re-ranking docking poses may be determined by analyzing the characteristics of protein–protein binding interfaces. Considering the relatively high prediction accuracy and low computational cost, MM/GBSA may be a good choice for predicting the binding affinities and identifying correct binding structures for protein–protein systems.
Co-reporter:Zhe Wang, Huiyong Sun, Xiaojun Yao, Dan Li, Lei Xu, Youyong Li, Sheng Tian and Tingjun Hou  
Physical Chemistry Chemical Physics 2016 vol. 18(Issue 18) pp:12964-12975
Publication Date(Web):07 Apr 2016
DOI:10.1039/C6CP01555G
As one of the most popular computational approaches in modern structure-based drug design, molecular docking can be used not only to identify the correct conformation of a ligand within the target binding pocket but also to estimate the strength of the interaction between a target and a ligand. Nowadays, as a variety of docking programs are available for the scientific community, a comprehensive understanding of the advantages and limitations of each docking program is fundamentally important to conduct more reasonable docking studies and docking-based virtual screening. In the present study, based on an extensive dataset of 2002 protein–ligand complexes from the PDBbind database (version 2014), the performance of ten docking programs, including five commercial programs (LigandFit, Glide, GOLD, MOE Dock, and Surflex-Dock) and five academic programs (AutoDock, AutoDock Vina, LeDock, rDock, and UCSF DOCK), was systematically evaluated by examining the accuracies of binding pose prediction (sampling power) and binding affinity estimation (scoring power). Our results showed that GOLD and LeDock had the best sampling power (GOLD: 59.8% accuracy for the top scored poses; LeDock: 80.8% accuracy for the best poses) and AutoDock Vina had the best scoring power (rp/rs of 0.564/0.580 and 0.569/0.584 for the top scored poses and best poses), suggesting that the commercial programs did not show the expected better performance than the academic ones. Overall, the ligand binding poses could be identified in most cases by the evaluated docking programs but the ranks of the binding affinities for the entire dataset could not be well predicted by most docking programs. However, for some types of protein families, relatively high linear correlations between docking scores and experimental binding affinities could be achieved. To our knowledge, this study has been the most extensive evaluation of popular molecular docking programs in the last five years. It is expected that our work can offer useful information for the successful application of these docking tools to different requirements and targets.
Co-reporter:Lei Xu, Dan Li, Li Tao, Yanling Yang, Youyong Li and Tingjun Hou  
Molecular BioSystems 2016 vol. 12(Issue 2) pp:379-390
Publication Date(Web):08 Dec 2015
DOI:10.1039/C5MB00781J
L-type Ca2+ channels (LTCCs), the heteromultimeric proteins, are associated with electrical signaling and provide the key link between electrical signals and non-electrical processes. 1,4-Dihydropyridine (DHP) derivatives are a major class of blockers for LTCCs, and have experienced widespread use in the treatment of cardiovascular diseases. However, the precise knowledge of the binding mechanism of these ligands to LTCCs at the atomic level has remained unknown because of the unavailability of the crystal structures of LTCCs. In this study, homology modeling, molecular docking, molecular dynamics (MD) simulations, free energy calculations and decomposition were employed to explore the structural requirement of the binding of DHP derivatives to human Cav1.2, a member of LTCCs. The binding conformations of the DHPs in the active site of Cav1.2 were predicted, and the rank of the binding free energies of Cav1.2/DHPs is generally consistent with the experimental data. The structural analysis shows that most studied ligands fit into a hydrophobic pocket formed by Phe1129, Ile1173, Phe1176, Met1177 and Met1509, and form aryl–aryl interaction with Phe1129 or Tyr1508. The consistency between the predictions and experimental data suggest that the developed model is reliable and can be used as a valuable platform for the structure-based design of new potent ligands of Cav1.2.
Co-reporter:Yu Zhang; Lei Xu; Zhiqiang Zhang; Zhiyu Zhang; Longtai Zheng; Dan Li; Youyong Li; Feng Liu; Kunqian Yu; Tingjun Hou;Xuechu Zhen
Journal of Chemical Information and Modeling 2015 Volume 55(Issue 9) pp:1994-2004
Publication Date(Web):August 19, 2015
DOI:10.1021/acs.jcim.5b00445
Macrophage migration inhibitory factor (MIF), a proinflammatory cytokine, is an attractive therapeutic target for the treatment of inflammatory diseases. In our previous study, 3-[(biphenyl-4-ylcarbonyl)carbamothioyl]amino benzoic acid (compound 1) was discovered as a potent inhibitor of MIF by docking-based virtual screening and bioassays. Here, a series of analogues of compound 1 derived from similarity search and chemical synthesis were evaluated for their MIF tautomerase activities, and their structure–activity relationships were then analyzed. The most potent inhibitor (compound 5) with an IC50 of 370 nM strongly suppressed lipopolysaccharide (LPS)-induced production of TNF-α and IL-6 in a dose-dependent manner and significantly enhanced the survival rate of mice with LPS-induced endotoxic shock from 0 to 35% at 0.5 mg/kg and to 45% at 1 mg/kg, highlighting the therapeutic potential of the MIF tautomerase inhibition in inflammatory diseases.
Co-reporter:Nan Li; Richard I. Ainsworth; Bo Ding; Tingjun Hou;Wei Wang
Journal of Chemical Information and Modeling 2015 Volume 55(Issue 7) pp:1400-1412
Publication Date(Web):May 20, 2015
DOI:10.1021/acs.jcim.5b00056
Human immunodeficiency virus (HIV) protease inhibitors (PIs) are important components of highly active anti-retroviral therapy (HAART) that block the catalytic site of HIV protease, thus preventing maturation of the HIV virion. However, with two decades of PI prescriptions in clinical practice, drug-resistant HIV mutants have now been found for all of the PI drugs. Therefore, the continuous development of new PI drugs is crucial both to combat the existing drug-resistant HIV strains and to provide treatments for future patients. Here we purpose an HIV PI drug design strategy to select candidate PIs with binding energy distributions dominated by interactions with conserved protease residues in both wild-type and various drug-resistant mutants. On the basis of this strategy, we have constructed a virtual screening pipeline including combinatorial library construction, combinatorial docking, MM/GBSA-based rescoring, and reranking on the basis of the binding energy distribution. We have tested our strategy on lopinavir by modifying its two functional groups. From an initial 751 689 candidate molecules, 18 candidate inhibitors were selected using the pipeline for experimental validation. IC50 measurements and drug resistance predictions successfully identified two ligands with both HIV protease inhibitor activity and an improved drug resistance profile on 2382 HIV mutants. This study provides a proof of concept for the integration of MM/GBSA energy analysis and drug resistance information at the stage of virtual screening and sheds light on future HIV drug design and the use of virtual screening to combat drug resistance.
Co-reporter:Lei Xu, Youyong Li, Dan Li, Peng Xu, Sheng Tian, Huiyong Sun, Hui Liu and Tingjun Hou  
Physical Chemistry Chemical Physics 2015 vol. 17(Issue 5) pp:3370-3382
Publication Date(Web):11 Dec 2014
DOI:10.1039/C4CP05095A
Macrophage migration inhibitory factor (MIF) is a multi-functional protein that acts as a cytokine and as an enzyme. Recently, MIF was identified as a non-canonical ligand of G protein-coupled chemokine receptor CXCR2 with low nanomolar affinity in leukocyte arrest and chemotaxis, but the precise knowledge of the molecular determinants of the MIF–CXCR2 interface has remained unknown. Therefore, we employed homology modeling, protein–protein docking, molecular dynamics (MD) simulations, Molecular Mechanics/Generalized Born Surface Area (MM/GBSA) binding free energy calculations and MM/GBSA binding free energy decomposition to obtain insights into the molecular recognition of MIF with CXCR2. The predicted binding pattern of MIF–CXCR2 is in good agreement with the experimental data and sheds light on the functional role of important MIF–CXCR2 interface residues in association with binding and signaling. According to our predictions, the R11A/D44A double mutations of MIF exhibit a pronounced defect in the binding affinity of MIF to CXCR2, resulting in large conformational changes. The potential two-site binding model for the MIF–CXCR2 recognition was proposed: initialized primarily by the non-polar interactions including the van der Waals and hydrophobic interactions, the N-terminal region of CXCR2 contacts the N-like loop and β-sheet of MIF (site 1), and then the ECL2 and ECL3 regions of CXCR2 form strong interactions with the pseudo-(E)LR motif and C-terminus of MIF, which induces the molecular thermodynamic motion of TMs for signal transduction (site 2). This study will extend our understanding to the binding mechanisms of MIF to CXCR2 and provide useful information for the rational design of potent inhibitors selectively targeting the MIF–CXCR2 interactions.
Co-reporter:Peichen Pan, Sheng Tian, Huiyong Sun, Xiaotian Kong, Wenfang Zhou, Dan Li, Youyong Li, and Tingjun Hou
Journal of Chemical Information and Modeling 2015 Volume 55(Issue 12) pp:2693-2704
Publication Date(Web):November 30, 2015
DOI:10.1021/acs.jcim.5b00576
Angiopoietin (ANG) ligands and their downstream TIE receptors have been validated as the second vascular signaling system involving vessel remodeling and maturation. Among them, the ANG/TIE-2 signaling pathway is involved in numerous life-threatening diseases and has become an attractive potential therapeutic target. Several large-molecule inhibitors targeting the ANG/TIE-2 axis have recently entered clinical phase for the therapy of various solid tumors, but selective small-molecule inhibitors of TIE-2 are still quite limited. In the present work, structure-based virtual screening was performed to search for type-I inhibitors of TIE-2. Of the only 41 compounds selected by our strategy, 8 molecules with the concentration of 25 μg/mL exhibit over 50% inhibitory rate against TIE-2 in in vitro enzymatic activity assay, and the IC50 values of 2 hits are lower than 1 μM. Further optimization and SAR analysis based on compound TP-S1-30 and 31 were carried out by using substructure searching strategy, leading to the discovery of several sub-100 nM inhibitors. Among them, the most potent compound, TP-S1-68, showed an inhibitory IC50 of 0.149 μM. These novel inhibitors of TIE-2 discovered in this study and the analogs of the active core scaffolds can serve as the starting points for further drug development.
Co-reporter:Xiaotian Kong, Peichen Pan, Dan Li, Sheng Tian, Youyong Li and Tingjun Hou  
Physical Chemistry Chemical Physics 2015 vol. 17(Issue 8) pp:6098-6113
Publication Date(Web):27 Jan 2015
DOI:10.1039/C4CP05440G
Anaplastic lymphoma kinase (ALK) has gained increased attention as an attractive therapeutic target for the treatment of various cancers, especially non-small-cell lung cancer (NSCLC). Recently, piperidine carboxamides were reported as Type I1/2 inhibitors of ALK, which occupy both the ATP binding site and the back ATP hydrophobic cavity in DFG-in conformation. Due to the dynamic behavior of ALK in the binding of Type I1/2 inhibitors, the accurate predictions of the binding structures and relative binding potencies of these inhibitors are quite challenging. In this study, different modeling techniques, including molecular docking, ensemble docking based on multiple receptor conformations, molecular dynamics simulations and free energy calculations, were utilized to explore the binding mechanisms of piperidine carboxamides. Our predictions show that the conventional docking protocols are not sufficient to predict the relative binding potencies of the studied inhibitors with high accuracy, but incorporating protein flexibility before or after docking is quite effective to improve the prediction accuracy. Notably, the binding free energies predicted by MM/GBSA or MM/PBSA based on the MD simulations for the docked poses give the highest correlation with the experimental data, highlighting the importance of the inclusion of receptor flexibility for the accurate predictions of the binding potencies for Type I1/2 inhibitors of ALK. Furthermore, the comprehensive analysis of several pairs of representative inhibitors demonstrates the importance of hydrophobic interactions in improving the binding affinities of the inhibitors with the hot-spot residues surrounding the binding pocket. This work is expected to provide valuable clues for further rational design of novel and potent Type I1/2 ALK inhibitors.
Co-reporter:Yan Guan, Huiyong Sun, Peichen Pan, Youyong Li, Dan Li and Tingjun Hou  
Molecular BioSystems 2015 vol. 11(Issue 9) pp:2568-2578
Publication Date(Web):21 Jul 2015
DOI:10.1039/C5MB00394F
Mutations at a number of key positions (Ala156, Asp168 and Arg155) of the HCV NS3/4A protease can induce medium to high resistance to MK5172. The emergence of the resistant mutations seriously compromises the antiviral therapy efficacy to hepatitis C. In this study, molecular dynamics (MD) simulations, free energy calculations and free energy decomposition were used to explore the interaction profiles of MK5172 with the wild-type (WT) and four mutated (R155K, D168A, D168V and A156T) HCV NS3/4A proteases. The binding free energies predicted by Molecular Mechanics/Generalized Born Solvent Area (MM/GBSA) are consistent with the trend of the experimental drug resistance data. The free energy decomposition analysis shows that the resistant mutants may change the protein–MK5172 interaction profiles, resulting in the unbalanced energetic distribution amongst the catalytic triad, the strand 135–139 and the strand 154–160. Moreover, umbrella sampling (US) simulations were employed to elucidate the unbinding processes of MK5172 from the active pockets of the WT HCV NS3/4A protease and the four mutants. The US simulations demonstrate that the dissociation pathways of MK5172 escaping from the binding pockets of the WT and mutants are quite different, and it is quite possible that MK5172 will be harder to get access to the correct binding sites of the mutated proteases, thereafter leading to drug resistance.
Co-reporter:Huali Shi, Sheng Tian, Youyong Li, Dan Li, Huidong Yu, Xuechu Zhen, and Tingjun Hou
Chemical Research in Toxicology 2015 Volume 28(Issue 1) pp:116
Publication Date(Web):December 12, 2014
DOI:10.1021/tx500389q
The activation of pregnane X receptor (PXR), a member of the nuclear receptor (NR) superfamily, can mediate potential drug–drug interactions, and therefore, prediction of PXR activation is of great importance for evaluating drug metabolism and toxicity. In this study, based on 532 structurally diverse compounds, we present a comprehensive analysis with the aim to build accurate classification models for distinguishing PXR activators from nonactivators by using a naive Bayesian classification technique. First, the distributions of eight important molecular physicochemical properties of PXR activators versus nonactivators were compared, illustrating that the hydrophobicity-related molecular descriptors (AlogP and log D) show slightly better capability to discriminate PXR activators from nonactivators than the others. Then, based on molecular physicochemical properties, VolSurf descriptors, and molecular fingerprints, naive Bayesian classifiers were developed to separate PXR activators from nonactivators. The results demonstrate that the introduction of molecular fingerprints is quite essential to enhance the prediction accuracy of the classifiers. The best Bayesian classifier based on the 21 physicochemical properties, VolSurf descriptors, and LCFC_10 fingerprints descriptors yields a prediction accuracy of 92.7% for the training set based on leave-one-out (LOO) cross-validation and of 85.2% for the test set. Moreover, by exploring the important structural fragments derived from the best Bayesian classifier, we observed that flexibility is an important structural pattern for PXR activation. In addition, chemical compounds containing more halogen atoms, unsaturated alkanes chains relevant to π–π stacking, and fewer nitrogen atoms tend to be PXR activators. We believe that the naive Bayesian classifier can be used as a reliable virtual screening tool to predict PXR activation in the drug design and discovery pipeline.
Co-reporter:Lei Xu ; Yu Zhang ; Longtai Zheng ; Chunhua Qiao ; Youyong Li ; Dan Li ; Xuechu Zhen
Journal of Medicinal Chemistry 2014 Volume 57(Issue 9) pp:3737-3745
Publication Date(Web):April 9, 2014
DOI:10.1021/jm401908w
Macrophage migration inhibitory factor (MIF) is involved in regulation of both the innate and the adaptive immune responses and is regarded as an attractive anti-inflammatory pharmacological target. In this study, molecular docking-based virtual screening and in vitro bioassays were utilized to identify novel small-molecule inhibitors of MIF. The in vitro enzyme-based assay identified that ten chemically diverse compounds exhibited potent inhibitory activity against MIF in the micromolar regime, including three compounds with IC50 values below 10 μM and one with an IC50 value below 1 μM (0.55 μM); the latter is 26-fold more potent than the reference compound ISO-1. The structural analysis demonstrates that most of these active compounds possess novel structural scaffolds. Further in vitro cell-based glucocorticoid overriding, chemotaxis, and Western blotting assays revealed that the three compounds can effectively inhibit the biological functions of MIF in vitro, suggesting that these compounds could be potential agents for treating inflammatory diseases.
Co-reporter:Sheng Tian, Huiyong Sun, Peichen Pan, Dan Li, Xuechu Zhen, Youyong Li, and Tingjun Hou
Journal of Chemical Information and Modeling 2014 Volume 54(Issue 10) pp:2664-2679
Publication Date(Web):September 18, 2014
DOI:10.1021/ci500414b
In this study, to accommodate receptor flexibility, based on multiple receptor conformations, a novel ensemble docking protocol was developed by using the naïve Bayesian classification technique, and it was evaluated in terms of the prediction accuracy of docking-based virtual screening (VS) of three important targets in the kinase family: ALK, CDK2, and VEGFR2. First, for each target, the representative crystal structures were selected by structural clustering, and the capability of molecular docking based on each representative structure to discriminate inhibitors from non-inhibitors was examined. Then, for each target, 50 ns molecular dynamics (MD) simulations were carried out to generate an ensemble of the conformations, and multiple representative structures/snapshots were extracted from each MD trajectory by structural clustering. On average, the representative crystal structures outperform the representative structures extracted from MD simulations in terms of the capabilities to separate inhibitors from non-inhibitors. Finally, by using the naïve Bayesian classification technique, an integrated VS strategy was developed to combine the prediction results of molecular docking based on different representative conformations chosen from crystal structures and MD trajectories. It was encouraging to observe that the integrated VS strategy yields better performance than the docking-based VS based on any single rigid conformation. This novel protocol may provide an improvement over existing strategies to search for more diverse and promising active compounds for a target of interest.
Co-reporter:Huiyong Sun, Youyong Li, Mingyun Shen, Sheng Tian, Lei Xu, Peichen Pan, Yan Guan and Tingjun Hou  
Physical Chemistry Chemical Physics 2014 vol. 16(Issue 40) pp:22035-22045
Publication Date(Web):18 Aug 2014
DOI:10.1039/C4CP03179B
With the rapid development of computational techniques and hardware, more rigorous and precise theoretical models have been used to predict the binding affinities of a large number of small molecules to biomolecules. By employing continuum solvation models, the MM/GBSA and MM/PBSA methodologies achieve a good balance between low computational cost and reasonable prediction accuracy. In this study, we have thoroughly investigated the effects of interior dielectric constant, molecular dynamics (MD) simulations, and the number of top-scored docking poses on the performance of the MM/GBSA and MM/PBSA rescoring of docking poses for three tyrosine kinases, including ABL, ALK, and BRAF. Overall, the MM/PBSA and MM/GBSA rescoring achieved comparative accuracies based on a relatively higher solute (or interior) dielectric constant (i.e. ε = 2, or 4), and could markedly improve the ‘screening power’ and ‘ranking power’ given by Autodock. Moreover, with a relatively higher solute dielectric constant, the MM/PBSA or MM/GBSA rescoring based on the best scored docking poses and the multiple top-scored docking poses gave similar predictions, implying that much computational cost can be saved by considering the best scored docking poses only. Besides, compared with the rescoring based on the minimized structures, the rescoring based on the MD simulations might not be completely necessary due to its negligible impact on the docking performance. Considering the much higher computational demand of MM/PBSA, MM/GBSA with a high solute dielectric constant (ε = 2 or 4) is recommended for the virtual screening of tyrosine kinases.
Co-reporter:Huiyong Sun, Youyong Li, Sheng Tian, Lei Xu and Tingjun Hou  
Physical Chemistry Chemical Physics 2014 vol. 16(Issue 31) pp:16719-16729
Publication Date(Web):28 May 2014
DOI:10.1039/C4CP01388C
By using different evaluation strategies, we systemically evaluated the performance of Molecular Mechanics/Generalized Born Surface Area (MM/GBSA) and Molecular Mechanics/Poisson–Boltzmann Surface Area (MM/PBSA) methodologies based on more than 1800 protein–ligand crystal structures in the PDBbind database. The results can be summarized as follows: (1) for the one-protein-family/one-binding-ligand case which represents the unbiased protein–ligand complex sampling, both MM/GBSA and MM/PBSA methodologies achieve approximately equal accuracies at the interior dielectric constant of 4 (with rp = 0.408 ± 0.006 of MM/GBSA and rp = 0.388 ± 0.006 of MM/PBSA based on the minimized structures); while for the total dataset (1864 crystal structures), the overall best Pearson correlation coefficient (rp = 0.579 ± 0.002) based on MM/GBSA is better than that of MM/PBSA (rp = 0.491 ± 0.003), indicating that biased sampling may significantly affect the accuracy of the predicted result (some protein families contain too many instances and can bias the overall predicted accuracy). Therefore, family based classification is needed to evaluate the two methodologies; (2) the prediction accuracies of MM/GBSA and MM/PBSA for different protein families are quite different with rp ranging from 0 to 0.9, whereas the correlation and ranking scores (an averaged rp/rs over a list of protein folds and also representing the unbiased sampling) given by MM/PBSA (rp-score = 0.506 ± 0.050 and rs-score = 0.481 ± 0.052) are comparable to those given by MM/GBSA (rp-score = 0.516 ± 0.047 and rs-score = 0.463 ± 0.047) at the fold family level; (3) for the overall prediction accuracies, molecular dynamics (MD) simulation may not be quite necessary for MM/GBSA (rp-minimized = 0.579 ± 0.002 and rp-1ns = 0.564 ± 0.002), but is needed for MM/PBSA (rp-minimized = 0.412 ± 0.003 and rp-1ns = 0.491 ± 0.003). However, for the individual systems, whether to use MD simulation is depended. (4) both MM/GBSA and MM/PBSA may be unable to give successful predictions for the ligands with high formal charges, with the Pearson correlation coefficient ranging from 0.621 ± 0.003 (neutral ligands) to 0.125 ± 0.142 (ligands with a formal charge of 5). Therefore, it can be summarized that, although MM/GBSA and MM/PBSA perform similarly in the unbiased dataset, for the currently available crystal structures in the PDBbind database, compared with MM/GBSA, which may be used in multi-target comparisons, MM/PBSA is more sensitive to the investigated systems, and may be more suitable for individual-target-level binding free energy ranking. This study may provide useful guidance for the post-processing of docking based studies.
Co-reporter:Jingyu Zhu, Peichen Pan, Youyong Li, Man Wang, Dan Li, Biyin Cao, Xinliang Mao and Tingjun Hou  
Molecular BioSystems 2014 vol. 10(Issue 3) pp:454-466
Publication Date(Web):25 Nov 2013
DOI:10.1039/C3MB70314B
Phosphoinositide 3-kinase (PI3K) is known to be closely related to tumorigenesis and cell proliferation, and controls a variety of cellular processes, including proliferation, growth, apoptosis, migration, metabolism, etc. The PI3K family comprises eight catalytic isoforms, which are subdivided into three classes. Recently, the discovery of inhibitors that block a single isoform of PI3K has continued to attract special attention because they may have higher selectivity for certain tumors and less toxicity for healthy cells. The PI3Kβ and PI3Kδ share fewer studies than α/γ, and therefore, in this work, the combination of molecular dynamics simulations and free energy calculations was employed to explore the binding of three isoform-specific PI3K inhibitors (COM8, IC87114, and GDC-0941) to PI3Kβ or PI3Kδ. The isoform specificities of the studied inhibitors derived from the predicted binding free energies are in good agreement with the experimental data. In addition, the key residues critical for PI3Kβ or PI3Kδ selectivity were highlighted by decomposing the binding free energies into the contributions from individual residues. It was observed that although PI3Kβ and PI3Kδ share the conserved ATP-binding pockets, individual residues do behave differently, particularly the residues critical for PI3Kβ or PI3Kδ selectivity. It can be concluded that the inhibitor specificity between PI3Kβ and PI3Kδ is determined by the additive contributions from multiple residues, not just a single one. This study provides valuable information for understanding the isoform-specific binding mechanisms of PI3K inhibitors, and should be useful for the rational design of novel and selective PI3K inhibitors.
Co-reporter:Xuwen Wang, Peichen Pan, Youyong Li, Dan Li and Tingjun Hou  
Molecular BioSystems 2014 vol. 10(Issue 5) pp:1196-1210
Publication Date(Web):07 Feb 2014
DOI:10.1039/C4MB00013G
Protein kinase CK2, also known as casein kinase II, is related to various cellular events and is a potential target for numerous cancers. In this study, we attempted to gain more insight into the inhibition process of CK2 by a series of CX-4945 derivatives through an integrated computational study that combines molecular docking, molecular dynamics (MD) simulations, and binding free energy calculations. Based on the binding poses predicted by molecular docking, the MD simulations were performed to explore the dynamic binding processes for ten selected inhibitors. Then, both Molecular Mechanics/Poisson Boltzmann Surface Area (MM/PBSA) and Molecular Mechanics/Generalized Born Surface Area (MM/GBSA) techniques were employed to predict the binding affinities of the studied systems. The predicted binding energies of the selected inhibitors correlate well with their experimental activities (r2 = 0.78). The van der Waals term is the most favorable component for the total energies. The free energy decomposition on a per residue basis reveals that the residue K68 is essential for the electrostatic interactions between CK2 and the studied inhibitors and numerous residues, including L45, V53, V66, F113, M163 and I174, play critical roles in forming van der Waals interactions with the inhibitors. Finally, a number of new derivatives were designed and the binding affinity and the predicted binding free energies of each designed molecule were obtained on the basis of molecular docking and MM/PBSA. It is expected that our research will benefit the future rational design of novel and potent inhibitors of CK2.
Co-reporter:Yan Zhang, Mingyun Shen, Sunliang Cui, Tingjun Hou
Bioorganic & Medicinal Chemistry Letters 2014 Volume 24(Issue 23) pp:5470-5472
Publication Date(Web):1 December 2014
DOI:10.1016/j.bmcl.2014.10.009
A simple synthesis of 2-hydroxylated (E)-stilbenes was accomplished in good yields via oxidative coupling of 2-hydroxystyrenes and arylboronic acids, with Rh(III)-catalyst and Cu(OAc)2 as oxidant. The antiproliferative evaluation of all the synthesized compounds were assessed on four different human cancer cell lines (Colo-205, MDA-468, HT29, and MGC80-3), and the results showed that several compounds exhibit strong antiproliferative activities (up to IC50 = 35 nM for MGC80-3).A simple synthesis of 2-hydroxylated (E)-stilbenes was accomplished in good yields via oxidative coupling of 2-hydroxystyrenes and arylboronic acids, with Rh(III)-catalyst and Cu(OAc)2 as oxidant. The antiproliferative evaluation of all the synthesized compounds were assessed on four different human cancer cell lines (Colo-205, MDA-468, HT29, and MGC80-3), and the results showed that several compounds exhibit strong antiproliferative activities (up to IC50 = 35 nM for MGC80-3.
Co-reporter:Lei Xu, Youyong Li, Huiyong Sun, Dan Li and Tingjun Hou  
Molecular BioSystems 2013 vol. 9(Issue 8) pp:2107-2117
Publication Date(Web):24 May 2013
DOI:10.1039/C3MB70120D
The G protein-coupled chemokine receptor CXCR4 is implicated in a variety of physiological responses that also share several downstream effectors involved in multiple pathological processes. The interaction between CXCR4 and its natural ligand CXCL12/stromal-derived factor-1 (SDF-1) plays important roles in cancer metastasis, HIV-1 infection, and inflammatory diseases. Therefore, investigating the CXCR4–CXCL12 interaction is critical for understanding the molecular mechanisms of the modulation of chemokine-receptor functions and designing new pharmaceutical agents to target the CXCR4–CXCL12 pathway. Based on known experimental data, the interaction between CXCR4 and CXCL12 was predicted by an integrated protocol, which combines protein–protein docking, molecular dynamics (MD) simulations, Molecular Mechanics/Generalized Born Surface Area (MM/GBSA) binding free energy calculations, and MM/GBSA binding free energy decomposition analysis. The predicted CXCR4–CXCL12 binding pattern is in good agreement with the experimental data. Analysis of the binding structure reveals an obvious electrostatic complementarity between CXCR4 and CXCL12. Moreover, significant conformational rearrangements were observed during the 50 ns MD simulations. In particular, the basic Lys1 at the CXCL12 N-terminus, an essential residue in receptor activation, forms a strong polar interaction with the Glu32 in the CXCR4 extracellular region. It facilitates the significant movement of TM5 and TM6 in the conformational transition, which is coupled to the association with the intracellular signal transduction pathways via heterotrimer G-protein. Based on the dynamic and energetic analyses, a two-site binding model was proposed. We believe that our study provides useful information for understanding the mechanisms of CXCR4 ligand binding and structure-based drug design of CXCR4.
Co-reporter:Mingyun Shen, Shunye Zhou, Youyong Li, Dan Li and Tingjun Hou  
Molecular BioSystems 2013 vol. 9(Issue 10) pp:2435-2446
Publication Date(Web):27 Jun 2013
DOI:10.1039/C3MB70168A
LIM kinases (LIMKs), downstream of Rho-associated protein kinases (ROCKs) and p21-activated protein kinases (PAKs), are shown to be promising targets for the treatment of cancers. In this study, the inhibition mechanism of 41 pyrrolopyrimidine derivatives as LIMK2 inhibitors was explored through a series of theoretical approaches. First, a model of LIMK2 was generated through molecular homology modeling, and the studied inhibitors were docked into the binding active site of LIMK2 by the docking protocol, taking into consideration the flexibility of the protein. The binding poses predicted by molecular docking for 17 selected inhibitors with different bioactivities complexed with LIMK2 underwent molecular dynamics (MD) simulations, and the binding free energies for the complexes were predicted by using the molecular mechanics/generalized born surface area (MM/GBSA) method. The predicted binding free energies correlated well with the experimental bioactivities (r2 = 0.63 or 0.62). Next, the free energy decomposition analysis was utilized to highlight the following key structural features related to biological activity: (1) the important H-bond between Ile408 and pyrrolopyrimidine, (2) the H-bonds between the inhibitors and Asp469 and Gly471 which maintain the stability of the DFG-out conformation, and (3) the hydrophobic interactions between the inhibitors and several key residues (Leu337, Phe342, Ala345, Val358, Lys360, Leu389, Ile408, Leu458 and Leu472). Finally, a variety of LIMK2 inhibitors with a pyrrolopyrimidine scaffold were designed, some of which showed improved potency according to the predictions. Our studies suggest that the use of molecular docking with MD simulations and free energy calculations could be a powerful tool for understanding the binding mechanism of LIMK2 inhibitors and for the design of more potent LIMK2 inhibitors.
Co-reporter:Peichen Pan, Lin Li, Youyong Li, Dan Li, Tingjun Hou
Antiviral Research (November 2013) Volume 100(Issue 2) pp:
Publication Date(Web):1 November 2013
DOI:10.1016/j.antiviral.2013.09.006
•The ordering of the level of drug resistance was successfully predicted by MM/GBSA for the studied drugs.•The unfavorable shifts of the polar interactions between NA and the drugs were found to be primarily responsible for drug resistance.•The resistance mechanisms caused by the E119G mutation for the studied drugs were elucidated.Neuraminidase inhibitors (NAIs) play vital roles in controlling human influenza epidemics and pandemics. However, the emergence of new human influenza virus mutant strains resistant to existing antiviral drugs has been becoming a major challenge. Therefore, it is critical to uncover the mechanisms of drug resistance and seek alternative treatments to combat drug resistance. In this study, molecular dynamics (MD) simulations and Molecular Mechanics/Generalized Born Surface Area (MM/GBSA) were applied to investigate the different sensitivities of oseltamivir (OTV), zanamivir (ZNV), and peramivir (PRV) against the E119G mutant of 2009 A/H1N1 neuraminidase. The predicted binding free energies indicate that the E119G mutation in NA confers resistance to all of the three studied inhibitors. The ordering of the level of drug resistance predicted by the binding free energies for the three inhibitors is ZNV > PRV > OTV, which agrees well with the experimental data. Drug resistance arises primarily from the unfavorable shifts of the polar interactions between NA and the inhibitors. It comes as a surprise that the mutation of Glu119 that can form strong H-bonds with the inhibitors in the wild-type protein does not have direct impact on the binding affinities of both OTV and PRV due to the regulation of the strong unfavorable polar desolvation energies. The indirectly conformational variations of the inhibitors, which caused by the E119G mutation, are responsible for the loss of the binding free energies. However, for ZNV, the E119G mutation has both direct and indirect influences on the drug binding. The structural and quantitative viewpoint obtained from this study provides valuable information for the rational design of novel and effective drugs to combat drug resistance.
Co-reporter:Hui-Yong Sun, Ting-Jun Hou, Hong-Yu Zhang
Drug Discovery Today (December 2014) Volume 19(Issue 12) pp:1836-1840
Publication Date(Web):1 December 2014
DOI:10.1016/j.drudis.2014.09.013
•It is still a great challenge to treat genetic diseases with gene therapy.•Chemical drugs provide alternative treatments for genetic diseases.•Disease-causing mutations in drug targets were examined.•It is found that a large part of mutations have slight influence on drug binding.•This observation supports the chemical therapeutic strategy for genetic diseases.Chemical drugs provide alternative treatments for genetic diseases in addition to gene therapy. Inherited diseases arising from gain-of-function (GOF) or loss-of-function (LOF) mutations of certain genes can be ameliorated by the antagonists and/or agonists of the target genes. However, a premise for this chemical therapeutic strategy is that the GOF/LOF mutations in drug targets have a negligible influence on drug–target binding. Here, we analyze the disease-causing mutations [derived from Online Mendelian Inheritance in Man (OMIM)] in successful drug targets. We found that >70% of the mutations are located far from the drug-binding sites (>12 Å), and most of the other mutations are unlikely to have adverse effects on drug binding, supporting the chemical strategy for combating genetic diseases.
Co-reporter:Sheng Tian, Junmei Wang, Youyong Li, Dan Li, ... Tingjun Hou
Advanced Drug Delivery Reviews (23 June 2015) Volume 86() pp:2-10
Publication Date(Web):23 June 2015
DOI:10.1016/j.addr.2015.01.009
The concept of drug-likeness, established from the analyses of the physiochemical properties or/and structural features of existing small organic drugs or/and drug candidates, has been widely used to filter out compounds with undesirable properties, especially poor ADMET (absorption, distribution, metabolism, excretion, and toxicity) profiles. Here, we summarize various approaches for drug-likeness evaluations, including simple rules/filters based on molecular properties/structures and quantitative prediction models based on sophisticated machine learning methods, and provide a comprehensive review of recent advances in this field. Moreover, the strengths and weaknesses of these approaches are briefly outlined. Finally, the drug-likeness analyses of natural products and traditional Chinese medicines (TCM) are discussed.Download high-res image (228KB)Download full-size image
Co-reporter:Xiaotian Kong, Peichen Pan, Dan Li, Sheng Tian, Youyong Li and Tingjun Hou
Physical Chemistry Chemical Physics 2015 - vol. 17(Issue 8) pp:NaN6113-6113
Publication Date(Web):2015/01/27
DOI:10.1039/C4CP05440G
Anaplastic lymphoma kinase (ALK) has gained increased attention as an attractive therapeutic target for the treatment of various cancers, especially non-small-cell lung cancer (NSCLC). Recently, piperidine carboxamides were reported as Type I1/2 inhibitors of ALK, which occupy both the ATP binding site and the back ATP hydrophobic cavity in DFG-in conformation. Due to the dynamic behavior of ALK in the binding of Type I1/2 inhibitors, the accurate predictions of the binding structures and relative binding potencies of these inhibitors are quite challenging. In this study, different modeling techniques, including molecular docking, ensemble docking based on multiple receptor conformations, molecular dynamics simulations and free energy calculations, were utilized to explore the binding mechanisms of piperidine carboxamides. Our predictions show that the conventional docking protocols are not sufficient to predict the relative binding potencies of the studied inhibitors with high accuracy, but incorporating protein flexibility before or after docking is quite effective to improve the prediction accuracy. Notably, the binding free energies predicted by MM/GBSA or MM/PBSA based on the MD simulations for the docked poses give the highest correlation with the experimental data, highlighting the importance of the inclusion of receptor flexibility for the accurate predictions of the binding potencies for Type I1/2 inhibitors of ALK. Furthermore, the comprehensive analysis of several pairs of representative inhibitors demonstrates the importance of hydrophobic interactions in improving the binding affinities of the inhibitors with the hot-spot residues surrounding the binding pocket. This work is expected to provide valuable clues for further rational design of novel and potent Type I1/2 ALK inhibitors.
Co-reporter:Zhe Wang, Huiyong Sun, Xiaojun Yao, Dan Li, Lei Xu, Youyong Li, Sheng Tian and Tingjun Hou
Physical Chemistry Chemical Physics 2016 - vol. 18(Issue 18) pp:NaN12975-12975
Publication Date(Web):2016/04/07
DOI:10.1039/C6CP01555G
As one of the most popular computational approaches in modern structure-based drug design, molecular docking can be used not only to identify the correct conformation of a ligand within the target binding pocket but also to estimate the strength of the interaction between a target and a ligand. Nowadays, as a variety of docking programs are available for the scientific community, a comprehensive understanding of the advantages and limitations of each docking program is fundamentally important to conduct more reasonable docking studies and docking-based virtual screening. In the present study, based on an extensive dataset of 2002 protein–ligand complexes from the PDBbind database (version 2014), the performance of ten docking programs, including five commercial programs (LigandFit, Glide, GOLD, MOE Dock, and Surflex-Dock) and five academic programs (AutoDock, AutoDock Vina, LeDock, rDock, and UCSF DOCK), was systematically evaluated by examining the accuracies of binding pose prediction (sampling power) and binding affinity estimation (scoring power). Our results showed that GOLD and LeDock had the best sampling power (GOLD: 59.8% accuracy for the top scored poses; LeDock: 80.8% accuracy for the best poses) and AutoDock Vina had the best scoring power (rp/rs of 0.564/0.580 and 0.569/0.584 for the top scored poses and best poses), suggesting that the commercial programs did not show the expected better performance than the academic ones. Overall, the ligand binding poses could be identified in most cases by the evaluated docking programs but the ranks of the binding affinities for the entire dataset could not be well predicted by most docking programs. However, for some types of protein families, relatively high linear correlations between docking scores and experimental binding affinities could be achieved. To our knowledge, this study has been the most extensive evaluation of popular molecular docking programs in the last five years. It is expected that our work can offer useful information for the successful application of these docking tools to different requirements and targets.
Co-reporter:Lei Xu, Youyong Li, Dan Li, Peng Xu, Sheng Tian, Huiyong Sun, Hui Liu and Tingjun Hou
Physical Chemistry Chemical Physics 2015 - vol. 17(Issue 5) pp:NaN3382-3382
Publication Date(Web):2014/12/11
DOI:10.1039/C4CP05095A
Macrophage migration inhibitory factor (MIF) is a multi-functional protein that acts as a cytokine and as an enzyme. Recently, MIF was identified as a non-canonical ligand of G protein-coupled chemokine receptor CXCR2 with low nanomolar affinity in leukocyte arrest and chemotaxis, but the precise knowledge of the molecular determinants of the MIF–CXCR2 interface has remained unknown. Therefore, we employed homology modeling, protein–protein docking, molecular dynamics (MD) simulations, Molecular Mechanics/Generalized Born Surface Area (MM/GBSA) binding free energy calculations and MM/GBSA binding free energy decomposition to obtain insights into the molecular recognition of MIF with CXCR2. The predicted binding pattern of MIF–CXCR2 is in good agreement with the experimental data and sheds light on the functional role of important MIF–CXCR2 interface residues in association with binding and signaling. According to our predictions, the R11A/D44A double mutations of MIF exhibit a pronounced defect in the binding affinity of MIF to CXCR2, resulting in large conformational changes. The potential two-site binding model for the MIF–CXCR2 recognition was proposed: initialized primarily by the non-polar interactions including the van der Waals and hydrophobic interactions, the N-terminal region of CXCR2 contacts the N-like loop and β-sheet of MIF (site 1), and then the ECL2 and ECL3 regions of CXCR2 form strong interactions with the pseudo-(E)LR motif and C-terminus of MIF, which induces the molecular thermodynamic motion of TMs for signal transduction (site 2). This study will extend our understanding to the binding mechanisms of MIF to CXCR2 and provide useful information for the rational design of potent inhibitors selectively targeting the MIF–CXCR2 interactions.
Co-reporter:Xiaotian Kong, Huiyong Sun, Peichen Pan, Sheng Tian, Dan Li, Youyong Li and Tingjun Hou
Physical Chemistry Chemical Physics 2016 - vol. 18(Issue 3) pp:NaN2046-2046
Publication Date(Web):2015/12/02
DOI:10.1039/C5CP05622E
Due to the high sequence identity of the binding pockets of cyclin-dependent kinases (CDKs), designing highly selective inhibitors towards a specific CDK member remains a big challenge. 4-(thiazol-5-yl)-2-(phenylamino) pyrimidine derivatives are effective inhibitors of CDKs, among which the most promising inhibitor 12u demonstrates high binding affinity to CDK9 and attenuated binding affinity to other homologous kinases, such as CDK2. In this study, in order to rationalize the principle of the binding preference towards CDK9 over CDK2 and to explore crucial information that may aid the design of selective CDK9 inhibitors, MM/GBSA calculations based on conventional molecular dynamics (MD) simulations and enhanced sampling simulations (umbrella sampling and steered MD simulations) were carried out on two representative derivatives (12u and 4). The calculation results show that the binding specificity of 12u to CDK9 is primarily controlled by conformational change of the G-loop and variation of the van der Waals interactions. Furthermore, the enhanced sampling simulations revealed the different reaction coordinates and transient interactions of inhibitors 12u and 4 as they dissociate from the binding pockets of CDK9 and CDK2. The physical principles obtained from this study may facilitate the discovery and rational design of novel and specific inhibitors of CDK9.
Co-reporter:Huiyong Sun, Youyong Li, Sheng Tian, Lei Xu and Tingjun Hou
Physical Chemistry Chemical Physics 2014 - vol. 16(Issue 31) pp:NaN16729-16729
Publication Date(Web):2014/05/28
DOI:10.1039/C4CP01388C
By using different evaluation strategies, we systemically evaluated the performance of Molecular Mechanics/Generalized Born Surface Area (MM/GBSA) and Molecular Mechanics/Poisson–Boltzmann Surface Area (MM/PBSA) methodologies based on more than 1800 protein–ligand crystal structures in the PDBbind database. The results can be summarized as follows: (1) for the one-protein-family/one-binding-ligand case which represents the unbiased protein–ligand complex sampling, both MM/GBSA and MM/PBSA methodologies achieve approximately equal accuracies at the interior dielectric constant of 4 (with rp = 0.408 ± 0.006 of MM/GBSA and rp = 0.388 ± 0.006 of MM/PBSA based on the minimized structures); while for the total dataset (1864 crystal structures), the overall best Pearson correlation coefficient (rp = 0.579 ± 0.002) based on MM/GBSA is better than that of MM/PBSA (rp = 0.491 ± 0.003), indicating that biased sampling may significantly affect the accuracy of the predicted result (some protein families contain too many instances and can bias the overall predicted accuracy). Therefore, family based classification is needed to evaluate the two methodologies; (2) the prediction accuracies of MM/GBSA and MM/PBSA for different protein families are quite different with rp ranging from 0 to 0.9, whereas the correlation and ranking scores (an averaged rp/rs over a list of protein folds and also representing the unbiased sampling) given by MM/PBSA (rp-score = 0.506 ± 0.050 and rs-score = 0.481 ± 0.052) are comparable to those given by MM/GBSA (rp-score = 0.516 ± 0.047 and rs-score = 0.463 ± 0.047) at the fold family level; (3) for the overall prediction accuracies, molecular dynamics (MD) simulation may not be quite necessary for MM/GBSA (rp-minimized = 0.579 ± 0.002 and rp-1ns = 0.564 ± 0.002), but is needed for MM/PBSA (rp-minimized = 0.412 ± 0.003 and rp-1ns = 0.491 ± 0.003). However, for the individual systems, whether to use MD simulation is depended. (4) both MM/GBSA and MM/PBSA may be unable to give successful predictions for the ligands with high formal charges, with the Pearson correlation coefficient ranging from 0.621 ± 0.003 (neutral ligands) to 0.125 ± 0.142 (ligands with a formal charge of 5). Therefore, it can be summarized that, although MM/GBSA and MM/PBSA perform similarly in the unbiased dataset, for the currently available crystal structures in the PDBbind database, compared with MM/GBSA, which may be used in multi-target comparisons, MM/PBSA is more sensitive to the investigated systems, and may be more suitable for individual-target-level binding free energy ranking. This study may provide useful guidance for the post-processing of docking based studies.
Co-reporter:Huiyong Sun, Youyong Li, Mingyun Shen, Sheng Tian, Lei Xu, Peichen Pan, Yan Guan and Tingjun Hou
Physical Chemistry Chemical Physics 2014 - vol. 16(Issue 40) pp:
Publication Date(Web):
DOI:10.1039/C4CP03179B
Co-reporter:Fu Chen, Hui Liu, Huiyong Sun, Peichen Pan, Youyong Li, Dan Li and Tingjun Hou
Physical Chemistry Chemical Physics 2016 - vol. 18(Issue 32) pp:NaN22139-22139
Publication Date(Web):2016/07/14
DOI:10.1039/C6CP03670H
Understanding protein–protein interactions (PPIs) is quite important to elucidate crucial biological processes and even design compounds that interfere with PPIs with pharmaceutical significance. Protein–protein docking can afford the atomic structural details of protein–protein complexes, but the accurate prediction of the three-dimensional structures for protein–protein systems is still notoriously difficult due in part to the lack of an ideal scoring function for protein–protein docking. Compared with most scoring functions used in protein–protein docking, the Molecular Mechanics/Generalized Born Surface Area (MM/GBSA) and Molecular Mechanics/Poisson Boltzmann Surface Area (MM/PBSA) methodologies are more theoretically rigorous, but their overall performance for the predictions of binding affinities and binding poses for protein–protein systems has not been systematically evaluated. In this study, we first evaluated the performance of MM/PBSA and MM/GBSA to predict the binding affinities for 46 protein–protein complexes. On the whole, different force fields, solvation models, and interior dielectric constants have obvious impacts on the prediction accuracy of MM/GBSA and MM/PBSA. The MM/GBSA calculations based on the ff02 force field, the GB model developed by Onufriev et al. and a low interior dielectric constant (εin = 1) yield the best correlation between the predicted binding affinities and the experimental data (rp = −0.647), which is better than MM/PBSA (rp = −0.523) and a number of empirical scoring functions used in protein–protein docking (rp = −0.141 to −0.529). Then, we examined the capability of MM/GBSA to identify the possible near-native binding structures from the decoys generated by ZDOCK for 43 protein–protein systems. The results illustrate that the MM/GBSA rescoring has better capability to distinguish the correct binding structures from the decoys than the ZDOCK scoring. Besides, the optimal interior dielectric constant of MM/GBSA for re-ranking docking poses may be determined by analyzing the characteristics of protein–protein binding interfaces. Considering the relatively high prediction accuracy and low computational cost, MM/GBSA may be a good choice for predicting the binding affinities and identifying correct binding structures for protein–protein systems.
Co-reporter:Fu Chen, Huiyong Sun, Hui Liu, Dan Li, Youyong Li and Tingjun Hou
Physical Chemistry Chemical Physics 2017 - vol. 19(Issue 15) pp:NaN10176-10176
Publication Date(Web):2017/03/17
DOI:10.1039/C6CP08232G
High-throughput screening (HTS) is widely applied in many fields ranging from drug discovery to clinical diagnostics and toxicity assessment. Firefly luciferase is commonly used as a reporter to monitor the effect of chemical compounds on the activity of a specific target or pathway in HTS. However, the false positive rate of luciferase-based HTS is relatively high because many artifacts or promiscuous compounds that have direct interaction with the luciferase reporter enzyme are usually identified as active compounds (hits). Therefore, it is necessary to develop a rapid screening method to identify these compounds that can inhibit the luciferase activity directly. In this study, a virtual screening (VS) classification model called MIEC-GBDT (MIEC: Molecular Interaction Energy Components; GBDT: Gradient Boosting Decision Tree) was developed to distinguish luciferase inhibitors from non-inhibitors. The MIECs calculated by Molecular Mechanics/Generalized Born Surface Area (MM/GBSA) free energy decomposition were used to energetically characterize the binding pattern of each small molecule at the active site of luciferase, and then the GBDT algorithm was employed to construct the classifiers based on MIECs. The predictions to the test set show that the optimized MIEC-GBDT model outperformed molecular docking and MM/GBSA rescoring. The best MIEC-GBDT model based on the MIECs with the energy terms of ΔGele, ΔGvdW, ΔGGB, and ΔGSA achieves the prediction accuracies of 87.2% and 90.3% for the inhibitors and non-inhibitors in the test sets, respectively. Moreover, the energetic analysis of the vital residues suggests that the energetic contributions of the vital residues to the binding of inhibitors are quite different from those to the binding of non-inhibitors. These results suggest that the MIEC-GBDT model is reliable and can be used as a powerful tool to identify potential interference compounds in luciferase-based HTS experiments.
4-(1-aminoethyl)-N-(pyridin-4-yl)cyclohexanecarboxamide dihydrochloride
Acriflavine
Pyridinium,1-methyl-4-phenyl-
Melphalan
2-?Pyridinamine, 3-?[(1S)?-?1-?(2,?6-?dichloro-?3-?fluorophenyl)?ethoxy]?-?5-?[1-?(4-?piperidinyl)?-?1H-?pyrazol-?4-?yl]?-
4-(6-Methoxy-2-naphthyl)-2-(4-methylsulfinylphenyl)-5-(4-pyridyl)-1H-imidazole
Phenol, 2-[(1E)-2-(3,5-dimethoxyphenyl)ethenyl]-6-methoxy-
Phenol, 2-[(1E)-2-(3,5-dimethoxyphenyl)ethenyl]-
AK-968/41925015