Co-reporter:Stephanie K. Ashenden, Thierry Kogej, Ola Engkvist, and Andreas Bender
Journal of Chemical Information and Modeling November 27, 2017 Volume 57(Issue 11) pp:2741-2741
Publication Date(Web):October 25, 2017
DOI:10.1021/acs.jcim.7b00295
It is well-established that the number of publications of novel small molecule modulators, and their associated targets, has increased over the years. This work focuses on publishing trends over the years with a particular focus on the comparison between patents and scientific literature which is accessible via the ChEMBL and GOSTAR databases. More precisely, the patents and scientific literature associated with bioactive molecules and their target annotations have been compared to identify where novelty (in the meaning of the first modulator of a protein target) originated from. Comparing the published date of the first small molecule modulator published in literature and patents for a particular target (with either identical or different structure) shows that modulators are usually published in both scientific literature and in patents (45%), or in scientific literature alone (51%), but rarely in patents only. When looking at the time when first modulators are published in both sources, 65% of the time they are disseminated in literature first. Finally, when analyzing just the novel small molecule modulators, regardless of the protein targets they have been published with, those structures representing novel chemistry tend to be published in patents first 61% of the time.
Co-reporter:Daniel J. Mason, Ian Stott, Stephanie Ashenden, Zohar B. Weinstein, Idil Karakoc, Selin Meral, Nurdan Kuru, Andreas Bender, and Murat Cokol
Journal of Medicinal Chemistry May 11, 2017 Volume 60(Issue 9) pp:3902-3902
Publication Date(Web):April 6, 2017
DOI:10.1021/acs.jmedchem.7b00204
Combination antibiotic therapies are clinically important in the fight against bacterial infections. However, the search space of drug combinations is large, making the identification of effective combinations a challenging task. Here, we present a computational framework that uses substructure profiles derived from the molecular structures of drugs and predicts antibiotic interactions. Using a previously published data set of 153 drug pairs, we showed that substructure profiles are useful in predicting synergy. We experimentally measured the interaction of 123 new drug pairs, as a prospective validation set for our approach, and identified 37 new synergistic pairs. Of the 12 pairs predicted to be synergistic, 10 were experimentally validated, corresponding to a 2.8-fold enrichment. Having thus validated our methodology, we produced a compendium of interaction predictions for all pairwise combinations among 100 antibiotics. Our methodology can make reliable antibiotic interaction predictions for any antibiotic pair within the applicability domain of the model since it solely requires chemical structures as an input.
Co-reporter:Georgios Drakakis, Keith A. Wafford, Suzanne C. Brewerton, Michael J. Bodkin, David A. Evans, and Andreas Bender
ACS Chemical Biology June 16, 2017 Volume 12(Issue 6) pp:1593-1593
Publication Date(Web):April 17, 2017
DOI:10.1021/acschembio.7b00209
In this work, we describe the computational (“in silico”) mode-of-action analysis of CNS-active drugs, which is taking both multiple simultaneous hypotheses as well as sets of protein targets for each mode-of-action into account, and which was followed by successful prospective in vitro and in vivo validation. Using sleep-related phenotypic readouts describing both efficacy and side effects for 491 compounds tested in rat, we defined an “optimal” (desirable) sleeping pattern. Compounds were subjected to in silico target prediction (which was experimentally confirmed for 21 out of 28 cases), followed by the utilization of decision trees for deriving polypharmacological bioactivity profiles. We demonstrated that predicted bioactivities improved classification performance compared to using only structural information. Moreover, DrugBank molecules were processed via the same pipeline, and compounds in many cases not annotated as sedative-hypnotic (alcaftadine, benzatropine, palonosetron, ecopipam, cyproheptadine, sertindole, and clopenthixol) were prospectively validated in vivo. Alcaftadine, ecopipam cyproheptadine, and clopenthixol were found to promote sleep as predicted, benzatropine showed only a small increase in NREM sleep, whereas sertindole promoted wakefulness. To our knowledge, the sedative-hypnotic effects of alcaftadine and ecopipam have not been previously discussed in the literature. The method described extends previous single-target, single-mode-of-action models and is applicable across disease areas.
Co-reporter:Rucha K. Chiddarwar, Sebastian G. Rohrer, Antje Wolf, Stefan Tresch, Sabrina Wollenhaupt, Andreas Bender
Journal of Molecular Graphics and Modelling 2017 Volume 71() pp:70-79
Publication Date(Web):January 2017
DOI:10.1016/j.jmgm.2016.10.021
The rapid emergence of pesticide resistance has given rise to a demand for herbicides with new mode of action (MoA). In the agrochemical sector, with the availability of experimental high throughput screening (HTS) data, it is now possible to utilize in silico target prediction methods in the early discovery phase to suggest the MoA of a compound via data mining of bioactivity data. While having been established in the pharmaceutical context, in the agrochemical area this approach poses rather different challenges, as we have found in this work, partially due to different chemistry, but even more so due to different (usually smaller) amounts of data, and different ways of conducting HTS. With the aim to apply computational methods for facilitating herbicide target identification, 48,000 bioactivity data against 16 herbicide targets were processed to train Laplacian modified Naïve Bayesian (NB) classification models. The herbicide target prediction model (“HerbiMod”) is an ensemble of 16 binary classification models which are evaluated by internal, external and prospective validation sets. In addition to the experimental inactives, 10,000 random agrochemical inactives were included in the training process, which showed to improve the overall balanced accuracy of our models up to 40%. For all the models, performance in terms of balanced accuracy of ≥ 80% was achieved in five-fold cross validation. Ranking target predictions was addressed by means of z-scores which improved predictivity over using raw scores alone. An external testset of 247 compounds from ChEMBL and a prospective testset of 394 compounds from BASF SE tested against five well studied herbicide targets (ACC, ALS, HPPD, PDS and PROTOX) were used for further validation. Only 4% of the compounds in the external testset lied in the applicability domain and extrapolation (and correct prediction) was hence impossible, which on one hand was surprising, and on the other hand illustrated the utilization of using applicability domains in the first place. However, performance better than 60% in balanced accuracy was achieved on the prospective testset, where all the compounds fell within the applicability domain, and which hence underlines the possibility of using target prediction also in the area of agrochemicals.
Co-reporter:Fredrik Svensson;Ulf Norinder
Toxicology Research (2012-Present) 2017 vol. 6(Issue 1) pp:73-80
Publication Date(Web):2017/01/03
DOI:10.1039/C6TX00252H
The assessment of compound cytotoxicity is an important part of the drug discovery process. Accurate predictions of cytotoxicity have the potential to expedite decision making and save considerable time and effort. In this work we apply class conditional conformal prediction to model the cytotoxicity of compounds based on 16 high throughput cytotoxicity assays from PubChem. The data span 16 cell lines and comprise more than 440 000 unique compounds. The data sets are heavily imbalanced with only 0.8% of the tested compounds being cytotoxic. We trained one classification model for each cell line and validated the performance with respect to validity and accuracy. The generated models deliver high quality predictions for both toxic and non-toxic compounds despite the imbalance between the two classes. On external data collected from the same assay provider as one of the investigated cell lines the model had a sensitivity of 74% and a specificity of 65% at the 80% confidence level among the compounds assigned to a single class. Compared to previous approaches for large scale cytotoxicity modelling, this represents a balanced performance in the prediction of the toxic and non-toxic classes. The conformal prediction framework also allows the modeller to control the error frequency of the predictions, allowing predictions of cytotoxicity outcomes with confidence.
Co-reporter:Shardul Paricharak, Adriaan P. IJzerman, Andreas Bender, and Florian Nigsch
ACS Chemical Biology 2016 Volume 11(Issue 5) pp:1255
Publication Date(Web):February 16, 2016
DOI:10.1021/acschembio.6b00029
With increased automation and larger compound collections, the development of high-throughput screening (HTS) started replacing previous approaches in drug discovery from around the 1980s onward. However, even today it is not always appropriate, or even feasible, to screen large collections of compounds in a particular assay. Here, we present an efficient method for iterative screening of small subsets of compound libraries. With this method, the retrieval of active compounds is optimized using their structural information and biological activity fingerprints. We validated this approach retrospectively on 34 Novartis in-house HTS assays covering a wide range of assay biology, including cell proliferation, antibacterial activity, gene expression, and phosphorylation. This method was employed to retrieve subsets of compounds for screening, where selected hits from any given round of screening were used as starting points to select chemically and biologically similar compounds for the next iteration. By only screening ∼1% of the full screening collection (∼15 000 compounds), the method consistently retrieves diverse compounds belonging to the top 0.5% of the most active compounds for the HTS campaign. For most of the assays, over half of the compounds selected by the method were found to be among the 5% most active compounds of the corresponding full-deck HTS. In addition, the stringency of the iterative method can be modified depending on the number of compounds one can afford to screen, making it a flexible tool to discover active compounds efficiently.
Co-reporter:Lewis H. Mervin, Qing Cao, Ian P. Barrett, Mike A. Firth, David Murray, Lisa McWilliams, Malcolm Haddrick, Mark Wigglesworth, Ola Engkvist, and Andreas Bender
ACS Chemical Biology 2016 Volume 11(Issue 11) pp:3007
Publication Date(Web):August 29, 2016
DOI:10.1021/acschembio.6b00538
While mechanisms of cytotoxicity and cytostaticity have been studied extensively from the biological side, relatively little is currently understood regarding areas of chemical space leading to cytotoxicity and cytostasis in large compound collections. Predicting and rationalizing potential adverse mechanism-of-actions (MoAs) of small molecules is however crucial for screening library design, given the link of even low level cytotoxicity and adverse events observed in man. In this study, we analyzed results from a cell-based cytotoxicity screening cascade, comprising 296 970 nontoxic, 5784 cytotoxic and cytostatic, and 2327 cytostatic-only compounds evaluated on the THP-1 cell-line. We employed an in silico MoA analysis protocol, utilizing 9.5 million active and 602 million inactive bioactivity points to generate target predictions, annotate predicted targets with pathways, and calculate enrichment metrics to highlight targets and pathways. Predictions identify known mechanisms for the top ranking targets and pathways for both phenotypes after review and indicate that while processes involved in cytotoxicity versus cytostaticity seem to overlap, differences between both phenotypes seem to exist to some extent. Cytotoxic predictions highlight many kinases, including the potentially novel cytotoxicity-related target STK32C, while cytostatic predictions outline targets linked with response to DNA damage, metabolism, and cytoskeletal machinery. Fragment analysis was also employed to generate a library of toxicophores to improve general understanding of the chemical features driving toxicity. We highlight substructures with potential kinase-dependent and kinase-independent mechanisms of toxicity. We also trained a cytotoxic classification model on proprietary and public compound readouts, and prospectively validated these on 988 novel compounds comprising difficult and trivial testing instances, to establish the applicability domain of models. The proprietary model performed with precision and recall scores of 77.9% and 83.8%, respectively. The MoA results and top ranking substructures with accompanying MoA predictions are available as a platform to assess screening collections.
Co-reporter:Qurrat U. Ain, Robert M. Owen, Kiyoyuki Omoto, Rubben Torella, Krishna C. Bulusu, David C. Pryde, Robert C. Glen, Julian E. Fuchs, and Andreas Bender
Molecular Pharmaceutics 2016 Volume 13(Issue 11) pp:4001-4012
Publication Date(Web):October 5, 2016
DOI:10.1021/acs.molpharmaceut.6b00813
Selective modulators of the γ-amino butyric acid (GABAA) family of receptors have the potential to treat a range of disease states related to cognition, pain, and anxiety. While the development of various α subunit-selective modulators is currently underway for the treatment of anxiety disorders, a mechanistic understanding of the correlation between their bioactivity and efficacy, based on ligand–target interactions, is currently still lacking. In order to alleviate this situation, in the current study we have analyzed, using ligand- and structure-based methods, a data set of 5440 GABAA modulators. The Spearman correlation (ρ) between binding activity and efficacy of compounds was calculated to be 0.008 and 0.31 against the α1 and α2 subunits of GABA receptor, respectively; in other words, the compounds had little diversity in structure and bioactivity, but they differed significantly in efficacy. Two compounds were selected as a case study for detailed interaction analysis due to the small difference in their structures and affinities (ΔpKi(comp1_α1 – comp2_α1) = 0.45 log units, ΔpKi(comp1_α2 – comp2_α2) = 0 log units) as compared to larger relative efficacies (ΔRE(comp1_α1 – comp2_α1) = 1.03, ΔRE(comp1_α2 – comp2_α2) = 0.21). Docking analysis suggested that His-101 is involved in a characteristic interaction of the α1 receptor with both compounds 1 and 2. Residues such as Phe-77, Thr-142, Asn-60, and Arg-144 of the γ chain of the α1γ2 complex also showed interactions with heterocyclic rings of both compounds 1 and 2, but these interactions were disturbed in the case of α2γ2 complex docking results. Binding pocket stability analysis based on molecular dynamics identified three substitutions in the loop C region of the α2 subunit, namely, G200E, I201T, and V202I, causing a reduction in the flexibility of α2 compared to α1. These amino acids in α2, as compared to α1, were also observed to decrease the vibrational and dihedral entropy and to increase the hydrogen bond content in α2 in the apo state. However, freezing of both α1 and α2 was observed in the ligand-bound state, with an increased number of internal hydrogen bonds and increased entropy. Therefore, we hypothesize that the amino acid differences in the loop C region of α2 are responsible for conformational changes in the protein structure compared to α1, as well as for the binding modes of compounds and hence their functional signaling.Keywords: affinity modeling; benzodiazepines; GABAA α1; GABAA α2; proteochemometric modeling; relative efficacy; selective modulators;
Co-reporter:Shardul Paricharak, Adriaan P. IJzerman, Jeremy L. Jenkins, Andreas Bender, and Florian Nigsch
Journal of Chemical Information and Modeling 2016 Volume 56(Issue 9) pp:1622-1630
Publication Date(Web):August 3, 2016
DOI:10.1021/acs.jcim.6b00244
Despite the usefulness of high-throughput screening (HTS) in drug discovery, for some systems, low assay throughput or high screening cost can prohibit the screening of large numbers of compounds. In such cases, iterative cycles of screening involving active learning (AL) are employed, creating the need for smaller “informer sets” that can be routinely screened to build predictive models for selecting compounds from the screening collection for follow-up screens. Here, we present a data-driven derivation of an informer compound set with improved predictivity of active compounds in HTS, and we validate its benefit over randomly selected training sets on 46 PubChem assays comprising at least 300,000 compounds and covering a wide range of assay biology. The informer compound set showed improvement in BEDROC(α = 100), PRAUC, and ROCAUC values averaged over all assays of 0.024, 0.014, and 0.016, respectively, compared to randomly selected training sets, all with paired t-test p-values <10–15. A per-assay assessment showed that the BEDROC(α = 100), which is of particular relevance for early retrieval of actives, improved for 38 out of 46 assays, increasing the success rate of smaller follow-up screens. Overall, we showed that an informer set derived from historical HTS activity data can be employed for routine small-scale exploratory screening in an assay-agnostic fashion. This approach led to a consistent improvement in hit rates in follow-up screens without compromising scaffold retrieval. The informer set is adjustable in size depending on the number of compounds one intends to screen, as performance gains are realized for sets with more than 3,000 compounds, and this set is therefore applicable to a variety of situations. Finally, our results indicate that random sampling may not adequately cover descriptor space, drawing attention to the importance of the composition of the training set for predicting actives.
Co-reporter:Chad H. G. Allen, Alexios Koutsoukas, Isidro Cortés-Ciriano, Daniel S. Murrell, Thérèse E. Malliavin, Robert C. Glen and Andreas Bender
Toxicology Research 2016 vol. 5(Issue 3) pp:883-894
Publication Date(Web):03 Mar 2016
DOI:10.1039/C5TX00406C
Prediction of compound toxicity is essential because covering the vast chemical space requiring safety assessment using traditional experimentally-based, resource-intensive techniques is impossible. However, such prediction is nontrivial due to the complex causal relationship between compound structure and in vivo harm. Protein target annotations and in vitro experimental outcomes encode relevant bioactivity information complementary to chemicals’ structures. This work tests the hypothesis that utilizing three complementary types of data will afford predictive models that outperform traditional models built using fewer data types. A tripartite, heterogeneous descriptor set for 367 compounds was comprised of (a) chemical descriptors, (b) protein target descriptors generated using an algorithm trained on 190000 ligand–protein interactions from ChEMBL, and (c) descriptors derived from in vitro cell cytotoxicity dose–response data from a panel of human cell lines. 100 random forests classification models for predicting rat LD50 were built using every combination of descriptors. Successive integration of data types improved predictive performance; models built using the full dataset had an average external correct classification rate of 0.82, compared to 0.73–0.80 for models built using two data types and 0.67–0.78 for models built using one. Pairwise comparisons of models trained on the same data showed that including a third data domain on top of chemistry improved average correct classification rate by 1.4–2.4 points, with p-values <0.01. Additionally, the approach enhanced the models’ applicability domains and proved useful for generating novel mechanism hypotheses. The use of tripartite heterogeneous bioactivity datasets is a useful technique for improving toxicity prediction. Both protein target descriptors – which have the practical value of being derived in silico – and cytotoxicity descriptors derived from experiment are suitable contributors to such datasets.
Co-reporter:Uli Fechner;Chris de Graaf;Andrew E. Torda
Journal of Cheminformatics 2016 Volume 8( Issue 1 Supplement) pp:
Publication Date(Web):2016 April
DOI:10.1186/s13321-016-0119-5
Co-reporter:Daniela De Lucia, Oscar Méndez Lucio, Biagia Musio, Andreas Bender, Monika Listing, Sophie Dennhardt, Andreas Koeberle, Ulrike Garscha, Roberta Rizzo, Stefano Manfredini, Oliver Werz, Steven V. Ley
European Journal of Medicinal Chemistry 2015 Volume 101() pp:573-583
Publication Date(Web):28 August 2015
DOI:10.1016/j.ejmech.2015.07.011
•A new library of triazole-containing 5-lipoxygenase inhibitors was prepared.•Binding optimization of compounds was assisted with molecular docking calculations.•A clear structure–activity relationship of this family of compounds was observed.•Some of the new molecules potently suppressed 5-lipoxygenase product formation.•Cytotoxicity evaluation suggested that inhibitory activity was not due to toxicity.In this work the synthesis, structure–activity relationship (SAR) and biological evaluation of a novel series of triazole-containing 5-lipoxygenase (5-LO) inhibitors are described. The use of structure-guided drug design techniques provided compounds that demonstrated excellent 5-LO inhibition with IC50 of 0.2 and 3.2 μm in cell-based and cell-free assays, respectively. Optimization of binding and functional potencies resulted in the identification of compound 13d, which showed an enhanced activity compared to the parent bioactive compound caffeic acid 5 and the clinically approved zileuton 3. Compounds 15 and 16 were identified as lead compounds in inhibiting 5-LO products formation in neutrophils. Their interference with other targets on the arachidonic acid pathway was also assessed. Cytotoxicity tests were performed to exclude a relationship between cytotoxicity and the increased activity observed after structure optimization.
Co-reporter:Aakash Chavan Ravindranath, Nolen Perualila-Tan, Adetayo Kasim, Georgios Drakakis, Sonia Liggi, Suzanne C. Brewerton, Daniel Mason, Michael J. Bodkin, David A. Evans, Aditya Bhagwat, Willem Talloen, Hinrich W. H. Göhlmann, QSTAR Consortium, Ziv Shkedy and Andreas Bender
Molecular BioSystems 2015 vol. 11(Issue 1) pp:86-96
Publication Date(Web):16 Sep 2014
DOI:10.1039/C4MB00328D
Integrating gene expression profiles with certain proteins can improve our understanding of the fundamental mechanisms in protein–ligand binding. This paper spotlights the integration of gene expression data and target prediction scores, providing insight into mechanism of action (MoA). Compounds are clustered based upon the similarity of their predicted protein targets and each cluster is linked to gene sets using Linear Models for Microarray Data. MLP analysis is used to generate gene sets based upon their biological processes and a qualitative search is performed on the homogeneous target-based compound clusters to identify pathways. Genes and proteins were linked through pathways for 6 of the 8 MCF7 and 6 of the 11 PC3 clusters. Three compound clusters are studied; (i) the target-driven cluster involving HSP90 inhibitors, geldanamycin and tanespimycin induces differential expression for HSP90-related genes and overlap with pathway response to unfolded protein. Gene expression results are in agreement with target prediction and pathway annotations add information to enable understanding of MoA. (ii) The antipsychotic cluster shows differential expression for genes LDLR and INSIG-1 and is predicted to target CYP2D6. Pathway steroid metabolic process links the protein and respective genes, hypothesizing the MoA for antipsychotics. A sub-cluster (verepamil and dexverepamil), although sharing similar protein targets with the antipsychotic drug cluster, has a lower intensity of expression profile on related genes, indicating that this method distinguishes close sub-clusters and suggests differences in their MoA. Lastly, (iii) the thiazolidinediones drug cluster predicted peroxisome proliferator activated receptor (PPAR) PPAR-alpha, PPAR-gamma, acyl CoA desaturase and significant differential expression of genes ANGPTL4, FABP4 and PRKCD. The targets and genes are linked via PPAR signalling pathway and induction of apoptosis, generating a hypothesis for the MoA of thiazolidinediones. Our analysis show one or more underlying MoA for compounds and were well-substantiated with literature.
Co-reporter:Alexios Koutsoukas, Shardul Paricharak, Warren R. J. D. Galloway, David R. Spring, Adriaan P. IJzerman, Robert C. Glen, David Marcus, and Andreas Bender
Journal of Chemical Information and Modeling 2014 Volume 54(Issue 1) pp:230-242
Publication Date(Web):December 2, 2013
DOI:10.1021/ci400469u
Chemical diversity is a widely applied approach to select structurally diverse subsets of molecules, often with the objective of maximizing the number of hits in biological screening. While many methods exist in the area, few systematic comparisons using current descriptors in particular with the objective of assessing diversity in bioactivity space have been published, and this shortage is what the current study is aiming to address. In this work, 13 widely used molecular descriptors were compared, including fingerprint-based descriptors (ECFP4, FCFP4, MACCS keys), pharmacophore-based descriptors (TAT, TAD, TGT, TGD, GpiDAPH3), shape-based descriptors (rapid overlay of chemical structures (ROCS) and principal moments of inertia (PMI)), a connectivity-matrix-based descriptor (BCUT), physicochemical-property-based descriptors (prop2D), and a more recently introduced molecular descriptor type (namely, “Bayes Affinity Fingerprints”). We assessed both the similar behavior of the descriptors in assessing the diversity of chemical libraries, and their ability to select compounds from libraries that are diverse in bioactivity space, which is a property of much practical relevance in screening library design. This is particularly evident, given that many future targets to be screened are not known in advance, but that the library should still maximize the likelihood of containing bioactive matter also for future screening campaigns. Overall, our results showed that descriptors based on atom topology (i.e., fingerprint-based descriptors and pharmacophore-based descriptors) correlate well in rank-ordering compounds, both within and between descriptor types. On the other hand, shape-based descriptors such as ROCS and PMI showed weak correlation with the other descriptors utilized in this study, demonstrating significantly different behavior. We then applied eight of the molecular descriptors compared in this study to sample a diverse subset of sample compounds (4%) from an initial population of 2587 compounds, covering the 25 largest human activity classes from ChEMBL and measured the coverage of activity classes by the subsets. Here, it was found that ”Bayes Affinity Fingerprints” achieved an average coverage of 92% of activity classes. Using the descriptors ECFP4, GpiDAPH3, TGT, and random sampling, 91%, 84%, 84%, and 84% of the activity classes were represented in the selected compounds respectively, followed by BCUT, prop2D, MACCS, and PMI (in order of decreasing performance). In addition, we were able to show that there is no visible correlation between compound diversity in PMI space and in bioactivity space, despite frequent utilization of PMI plots to this end. To summarize, in this work, we assessed which descriptors select compounds with high coverage of bioactivity space, and can hence be used for diverse compound selection for biological screening. In cases where multiple descriptors are to be used for diversity selection, this work describes which descriptors behave complementarily, and can hence be used jointly to focus on different aspects of diversity in chemical space.
Co-reporter:Qurrat U. Ain, Oscar Méndez-Lucio, Isidro Cortés Ciriano, Thérèse Malliavin, Gerard J. P. van Westen and Andreas Bender
Integrative Biology 2014 vol. 6(Issue 11) pp:1023-1033
Publication Date(Web):16 Sep 2014
DOI:10.1039/C4IB00175C
Serine proteases, implicated in important physiological functions, have a high intra-family similarity, which leads to unwanted off-target effects of inhibitors with insufficient selectivity. However, the availability of sequence and structure data has now made it possible to develop approaches to design pharmacological agents that can discriminate successfully between their related binding sites. In this study, we have quantified the relationship between 12625 distinct protease inhibitors and their bioactivity against 67 targets of the serine protease family (20213 data points) in an integrative manner, using proteochemometric modelling (PCM). The benchmarking of 21 different target descriptors motivated the usage of specific binding pocket amino acid descriptors, which helped in the identification of active site residues and selective compound chemotypes affecting compound affinity and selectivity. PCM models performed better than alternative approaches (models trained using exclusively compound descriptors on all available data, QSAR) employed for comparison with R2/RMSE values of 0.64 ± 0.23/0.66 ± 0.20 vs. 0.35 ± 0.27/1.05 ± 0.27 log units, respectively. Moreover, the interpretation of the PCM model singled out various chemical substructures responsible for bioactivity and selectivity towards particular proteases (thrombin, trypsin and coagulation factor 10) in agreement with the literature. For instance, absence of a tertiary sulphonamide was identified to be responsible for decreased selective activity (by on average 0.27 ± 0.65 pChEMBL units) on FA10. Among the binding pocket residues, the amino acids (arginine, leucine and tyrosine) at positions 35, 39, 60, 93, 140 and 207 were observed as key contributing residues for selective affinity on these three targets.
Co-reporter:Georgios Drakakis, Adam E. Hendry, Kimberley Hanson, Suzanne C. Brewerton, Michael J. Bodkin, David A. Evans, Grant N. Wheeler and Andreas Bender
MedChemComm 2014 vol. 5(Issue 3) pp:386-396
Publication Date(Web):29 Jan 2014
DOI:10.1039/C3MD00313B
Given the increasing utilization of phenotypic screens in drug discovery also the subsequent mechanism-of-action analysis gains increased attention. Such analyses frequently use in silico methods, which have become significantly more popular in recent years. However, identifying phenotype-specific mechanisms of action depends heavily on suitable phenotype identification in the first place, many of which rely on human input and are therefore inconsistent. In this work, we aimed at analysing the impact that human phenotype classification has on subsequent in silico mechanism-of-action analysis. To this end, an image analysis application was implemented for the rapid identification of seven high-level phenotypes in Xenopus laevis tadpoles treated with compounds from the National Cancer Institute Diversity Set II. It was found that manual and automated phenotype classifications were in agreement with some of the phenotypes (e.g. 73.9% agreement observed for general morphology abnormality), while this was not the case in others (e.g. melanophore migration with 37.6% agreement between both annotations). Based on both annotations, protein targets of active compounds were predicted in silico, and decision trees were generated to understand mechanisms-of-action behind every phenotype while also taking polypharmacology (combinations of targets) into account. It was found that the automated phenotype categorisation greatly increased the accuracy of the results of the mechanism-of-action model, where it improved the classification accuracy by 9.4%, as well as reducing the tree size by eight nodes and the number of leaves and the depth by three levels. Overall we conclude that consistent phenotype annotations seem to be generally crucial for successful subsequent mechanism-of-action analysis, and this is what we have shown here in Xenopus laevis screens in combination with in silico mechanism-of-action analysis.
Co-reporter:Alexios Koutsoukas, Robert Lowe, Yasaman KalantarMotamedi, Hamse Y. Mussa, Werner Klaffke, John B. O. Mitchell, Robert C. Glen, and Andreas Bender
Journal of Chemical Information and Modeling 2013 Volume 53(Issue 8) pp:1957-1966
Publication Date(Web):July 8, 2013
DOI:10.1021/ci300435j
In this study, two probabilistic machine-learning algorithms were compared for in silico target prediction of bioactive molecules, namely the well-established Laplacian-modified Naïve Bayes classifier (NB) and the more recently introduced (to Cheminformatics) Parzen-Rosenblatt Window. Both classifiers were trained in conjunction with circular fingerprints on a large data set of bioactive compounds extracted from ChEMBL, covering 894 human protein targets with more than 155,000 ligand-protein pairs. This data set is also provided as a benchmark data set for future target prediction methods due to its size as well as the number of bioactivity classes it contains. In addition to evaluating the methods, different performance measures were explored. This is not as straightforward as in binary classification settings, due to the number of classes, the possibility of multiple class memberships, and the need to translate model scores into “yes/no” predictions for assessing model performance. Both algorithms achieved a recall of correct targets that exceeds 80% in the top 1% of predictions. Performance depends significantly on the underlying diversity and size of a given class of bioactive compounds, with small classes and low structural similarity affecting both algorithms to different degrees. When tested on an external test set extracted from WOMBAT covering more than 500 targets by excluding all compounds with Tanimoto similarity above 0.8 to compounds from the ChEMBL data set, the current methodologies achieved a recall of 63.3% and 66.6% among the top 1% for Naïve Bayes and Parzen-Rosenblatt Window, respectively. While those numbers seem to indicate lower performance, they are also more realistic for settings where protein targets need to be established for novel chemical substances.
Co-reporter:Fazlin Mohd Fauzi, Alexios Koutsoukas, Robert Lowe, Kalpana Joshi, Tai-Ping Fan, Robert C. Glen, and Andreas Bender
Journal of Chemical Information and Modeling 2013 Volume 53(Issue 3) pp:661-673
Publication Date(Web):January 27, 2013
DOI:10.1021/ci3005513
Traditional Chinese medicine (TCM) and Ayurveda have been used in humans for thousands of years. While the link to a particular indication has been established in man, the mode-of-action (MOA) of the formulations often remains unknown. In this study, we aim to understand the MOA of formulations used in traditional medicine using an in silico target prediction algorithm, which aims to predict protein targets (and hence MOAs), given the chemical structure of a compound. Following this approach we were able to establish several links between suggested MOAs and experimental evidence. In particular, compounds from the ’tonifying and replenishing medicinal’ class from TCM exhibit a hypoglycemic effect which can be related to activity of the ingredients against the Sodium-Glucose Transporters (SGLT) 1 and 2 as well as Protein Tyrosine Phosphatase (PTP). Similar results were obtained for Ayurvedic anticancer drugs. Here, both primary anticancer targets (those directly involved in cancer pathogenesis) such as steroid-5-alpha-reductase 1 and 2 were predicted as well as targets which act synergistically with the primary target, such as the efflux pump P-glycoprotein (P-gp). In addition, we were able to elucidate some targets which may point us to novel MOAs as well as explain side effects. Most notably, GPBAR1, which was predicted as a target for both ’tonifying and replenishing medicinal’ and anticancer classes, suggests an influence of the compounds on metabolism. Understanding the MOA of these compounds is beneficial as it provides a resource for NMEs with possibly higher efficacy in the clinic than those identified by single-target biochemical assays.
Co-reporter:Ha P. Nguyen;Alexios Koutsoukas;Fazlin Mohd Fauzi;Georgios Drakakis;Mateusz Maciejewski;Robert C. Glen
Chemical Biology & Drug Design 2013 Volume 82( Issue 3) pp:252-266
Publication Date(Web):
DOI:10.1111/cbdd.12155
Diversity selection is a frequently applied strategy for assembling high-throughput screening libraries, making the assumption that a diverse compound set increases chances of finding bioactive molecules. Based on previous work on experimental ‘affinity fingerprints’, in this study, a novel diversity selection method is benchmarked that utilizes predicted bioactivity profiles as descriptors. Compounds were selected based on their predicted activity against half of the targets (training set), and diversity was assessed based on coverage of the remaining (test set) targets. Simultaneously, fingerprint-based diversity selection was performed. An original version of the method exhibited on average 5% and an improved version on average 10% increase in target space coverage compared with the fingerprint-based methods. As a typical case, bioactivity-based selection of 231 compounds (2%) from a particular data set (‘Cutoff-40’) resulted in 47.0% and 50.1% coverage, while fingerprint-based selection only achieved 38.4% target coverage for the same subset size. In conclusion, the novel bioactivity-based selection method outperformed the fingerprint-based method in sampling bioactive chemical space on the data sets considered. The structures retrieved were structurally more acceptable to medicinal chemists while at the same time being more lipophilic, hence bioactivity-based diversity selection of compounds would best be combined with physicochemical property filters in practice.
Co-reporter:Gerard J. P. van Westen ; Olaf O. van den Hoven ; Rianne van der Pijl ; Thea Mulder-Krieger ; Henk de Vries ; Jörg K. Wegner ; Adriaan P. IJzerman ; Herman W. T. van Vlijmen
Journal of Medicinal Chemistry 2012 Volume 55(Issue 16) pp:7010-7020
Publication Date(Web):July 24, 2012
DOI:10.1021/jm3003069
The four subtypes of adenosine receptors form relevant drug targets in the treatment of, e.g., diabetes and Parkinson’s disease. In the present study, we aimed at finding novel small molecule ligands for these receptors using virtual screening approaches based on proteochemometric (PCM) modeling. We combined bioactivity data from all human and rat receptors in order to widen available chemical space. After training and validating a proteochemometric model on this combined data set (Q2 of 0.73, RMSE of 0.61), we virtually screened a vendor database of 100910 compounds. Of 54 compounds purchased, six novel high affinity adenosine receptor ligands were confirmed experimentally, one of which displayed an affinity of 7 nM on the human adenosine A1 receptor. We conclude that the combination of rat and human data performs better than human data only. Furthermore, we conclude that proteochemometric modeling is an efficient method to quickly screen for novel bioactive compounds.
Co-reporter:Alexios Koutsoukas, Benjamin Simms, Johannes Kirchmair, Peter J. Bond, Alan V. Whitmore, Steven Zimmer, Malcolm P. Young, Jeremy L. Jenkins, Meir Glick, Robert C. Glen, Andreas Bender
Journal of Proteomics 2011 Volume 74(Issue 12) pp:2554-2574
Publication Date(Web):18 November 2011
DOI:10.1016/j.jprot.2011.05.011
Given the tremendous growth of bioactivity databases, the use of computational tools to predict protein targets of small molecules has been gaining importance in recent years. Applications span a wide range, from the ‘designed polypharmacology’ of compounds to mode-of-action analysis. In this review, we firstly survey databases that can be used for ligand-based target prediction and which have grown tremendously in size in the past. We furthermore outline methods for target prediction that exist, both based on the knowledge of bioactivities from the ligand side and methods that can be applied in situations when a protein structure is known. Applications of successful in silico target identification attempts are discussed in detail, which were based partly or in whole on computational target predictions in the first instance. This includes the authors' own experience using target prediction tools, in this case considering phenotypic antibacterial screens and the analysis of high-throughput screening data. Finally, we will conclude with the prospective application of databases to not only predict, retrospectively, the protein targets of a small molecule, but also how to design ligands with desired polypharmacology in a prospective manner.Highlights► Public bioactivity databases are increasing in size at a tremendous rate. ► Integration of chemical and biological data is next key step; first examples exist. ► Target prediction and mode-of-action analysis help us rationalize compound action. ► Incorporating pathway information potentially enables multi-target drug design.
Co-reporter:Gerard J. P. van Westen, Jörg K. Wegner, Adriaan P. IJzerman, Herman W. T. van Vlijmen and A. Bender
MedChemComm 2011 vol. 2(Issue 1) pp:16-30
Publication Date(Web):01 Nov 2010
DOI:10.1039/C0MD00165A
‘Proteochemometric modeling’ is a bioactivity modeling technique founded on the description of both small molecules (the ligands), and proteins (the targets). By combining those two elements of a ligand – target interaction proteochemometrics techniques model the interaction complex or the full ligand – target interaction space, and they are able to quantify the similarity between both ligands and targets simultaneously. Consequently, proteochemometric models or complex based models, can be considered an extension of QSAR models, which are ligand based. As proteochemometric models are able to incorporate target information they outperform conventional QSAR models when extrapolating from the activities of known ligands on known targets to novel targets. Vice versa, proteochemometrics can be used to virtually screen for selective compounds that are solely active on a single member of a subfamily of targets, as well as to select compounds with a desired bioactivity profile – a topic particularly relevant with concepts such as ‘ligand polypharmacology’ in mind. Here we illustrate the concept of proteochemometrics and provide a review of relevant methodological publications in the field. We give an overview of the target families proteochemometrics modeling has previously been applied to, and introduce some novel application areas of the modeling technique. We conclude that proteochemometrics is a promising technique in preclinical drug research that allows merging data sets that were previously considered separately, with the potential to extrapolate more reliably both in ligand as well as target space.
Co-reporter:Krishna C. Bulusu, Rajarshi Guha, Daniel J. Mason, Richard P.I. Lewis, ... Andreas Bender
Drug Discovery Today (February 2016) Volume 21(Issue 2) pp:225-238
Publication Date(Web):1 February 2016
DOI:10.1016/j.drudis.2015.09.003
•Review of the state-of-the-art in the field of compound combination modelling.•Significance of quality control of large-scale combination screening data.•Strategies for modelling combination effects using publicly available resources.•Importance of chemical and biological data integration for predictions.•Technical and scientific challenges of data integration in this context are discussed.The development of treatments involving combinations of drugs is a promising approach towards combating complex or multifactorial disorders. However, the large number of compound combinations that can be generated, even from small compound collections, means that exhaustive experimental testing is infeasible. The ability to predict the behaviour of compound combinations in biological systems, whittling down the number of combinations to be tested, is therefore crucial. Here, we review the current state-of-the-art in the field of compound combination modelling, with the aim to support the development of approaches that, as we hope, will finally lead to an integration of chemical with systems-level biological information for predicting the effect of chemical mixtures.