Co-reporter:Menghuan Zhang;Hong Li;Ying He;Han Sun;Li Xia;Lishun Wang;Bo Sun;Liangxiao Ma;Guoqing Zhang;Yixue Li;Lu Xie
Journal of Proteome Research July 2, 2015 Volume 14(Issue 7) pp:2745-2757
Publication Date(Web):2017-2-22
DOI:10.1021/acs.jproteome.5b00249
Protein phosphorylation is the most abundant reversible covalent modification. Human protein kinases participate in almost all biological pathways, and approximately half of the kinases are associated with disease. PhoSigNet was designed to store and display human phosphorylation-mediated signal transduction networks, with additional information related to cancer. It contains 11 976 experimentally validated directed edges and 216 871 phosphorylation sites. Moreover, 3491 differentially expressed proteins in human cancer from dbDEPC, 18 907 human cancer variation sites from CanProVar, and 388 hyperphosphorylation sites from PhosphoSitePlus were collected as annotation information. Compared with other phosphorylation-related databases, PhoSigNet not only takes the kinase–substrate regulatory relationship pairs into account, but also extends regulatory relationships up- and downstream (e.g., from ligand to receptor, from G protein to kinase, and from transcription factor to targets). Furthermore, PhoSigNet allows the user to investigate the impact of phosphorylation modifications on cancer. By using one set of in-house time series phosphoproteomics data, the reconstruction of a conditional and dynamic phosphorylation-mediated signaling network was exemplified. We expect PhoSigNet to be a useful database and analysis platform benefiting both proteomics and cancer studies.Keywords: cancer; database; network; phosphorylation; signal transduction;
Co-reporter:Jia Xu;Lily Wang
Journal of Proteome Research December 5, 2014 Volume 13(Issue 12) pp:5743-5750
Publication Date(Web):2017-2-22
DOI:10.1021/pr5007203
Protein differential expression analysis plays an important role in the understanding of molecular mechanisms as well as the pathogenesis of complex diseases. With the rapid development of mass spectrometry, shotgun proteomics using spectral counts has become a prevailing method for the quantitative analysis of complex protein mixtures. Existing methods in differential proteomics expression typically carry out analysis at the single-protein level. However, it is well-known that proteins interact with each other when they function in biological processes. In this study, focusing on biological network modules, we proposed a negative binomial generalized linear model for differential expression analysis of spectral count data in shotgun proteomics. In order to show the efficacy of the model in protein expression analysis at the level of protein modules, we conducted two simulation studies using synthetic data sets generated from theoretical distribution of count data and a real data set with shuffled counts. Then, we applied our method to a colorectal cancer data set and a nonsmall cell lung cancer data set. When compared with single-protein analysis methods, the results showed that module-based statistical model which takes account of the interactions among proteins led to more effective identification of subtle but coordinated changes at the systems level.Keywords: biological network module; differential expression analysis; negative binomial model; shotgun proteomics; spectral count;
Co-reporter:Menghuan Zhang, Bo Wang, Jia Xu, Xiaojing Wang, Lu Xie, Bing Zhang, Yixue Li, and Jing Li
Journal of Proteome Research 2017 Volume 16(Issue 2) pp:
Publication Date(Web):November 28, 2016
DOI:10.1021/acs.jproteome.6b00505
Identification and annotation of the mutations involved in oncogenesis and tumor progression are crucial for both cancer biology and clinical applications. Previously, we developed a public resource CanProVar, a human cancer proteome variation database for storing and querying single amino acid alterations in the human cancers. Since the publication of CanProVar, extensive cancer genomics efforts have revealed the enormous genomic complexity of various types of human cancers. Thus, there is an overwhelming need for comprehensive annotation of the genomic alterations at the protein level and making such knowledge easily accessible. Here, we describe CanProVar 2.0, a significantly expanded version of CanProVar, in which the amount of cancer-related variations and noncancer specific variations was increased by about 10-fold as compared to the previous version. To facilitate the interpretation of the variations, we added to the database functional data on potential impact of the cancer-related variations on 3D protein interaction and on the differential expression of the variant-bearing proteins between cancer and normal samples. The web interface allows for flexible queries based on gene or protein IDs, cancer types, chromosome locations, or pathways. An integrated protein sequence database containing variations that can be directly used for proteomics database searching can be downloaded.Keywords: cancer; database; proteome; variation;
Co-reporter:Jing Li;Jing Wang
Interdisciplinary Sciences: Computational Life Sciences 2017 Volume 9( Issue 4) pp:545-549
Publication Date(Web):12 October 2016
DOI:10.1007/s12539-016-0192-5
As a public health problem, food allergy is frequently caused by food allergy proteins, which trigger a type-I hypersensitivity reaction in the immune system of atopic individuals. The food allergens in our daily lives are mainly from crops including rice, wheat, soybean and maize. However, allergens in these main crops are far from fully uncovered. Although some bioinformatics tools or methods predicting the potential allergenicity of proteins have been proposed, each method has their limitation. In this paper, we built a novel algorithm PREALW, which integrated PREAL, FAO/WHO criteria and motif-based method by a weighted average score, to benefit the advantages of different methods. Our results illustrated PREALW has better performance significantly in the crops’ allergen prediction. This integrative allergen prediction algorithm could be useful for critical food safety matters. The PREALW could be accessed at http://lilab.life.sjtu.edu.cn:8080/prealw.
Co-reporter:Jing Wang, Litao Yang, Xiaoxiang Zhao, Jing Li, and Dabing Zhang
Journal of Agricultural and Food Chemistry 2014 Volume 62(Issue 1) pp:270-278
Publication Date(Web):December 12, 2013
DOI:10.1021/jf402463w
Most known allergenic proteins in rice (Oryza sativa) seed belong to the Tryp_alpha_amyl family (PF00234), but the sequence characterization and the evolution of the allergenic Tryp_alpha_amyl family members in plants have not been fully investigated. In this study, two specific motifs were found besides the common alpha-amylase inhibitors (AAI) domain from the allergenic Tryp_alpha_amyl family members in rice seeds (trRSAs). To understand the evolution and functional importance of the Tryp_alpha_amy1 family and the specific motifs for the allergenic one, a BLAST search identified 75 homologous proteins of trRSAs (trHAs) from 22 plant species including main crops such as rice, maize (Zea mays), wheat (Triticum aestivum), and sorghum (Sorghum bicolor) from all available sequences in the public databases. Statistical analysis showed that the allergenicity of trHAs is closely associated with these two motifs with high number of cysteine residues (p value = 0.00026), and the trHAs with and without the two motifs were clustered into separate clades, respectively. Furthermore, significant difference was observed on the secondary and tertiary structures of allergenic and nonallergenic trHAs. In addition, expression analysis showed that trHA-encoding genes of purple false brome (Brachypodium distachyon), barrel medic (Medicago truncatula), rice, and sorghum are dominantly expressed in seeds. This work provides insight into the understanding of the properties of allergens in the Tryp_alpha_amyl family and is helpful for allergy therapy.
Co-reporter:Jing Wang;Dabing Zhang
BMC Systems Biology 2013 Volume 7( Issue 5 Supplement) pp:
Publication Date(Web):2013 December
DOI:10.1186/1752-0509-7-S5-S9
Assessment of potential allergenicity of protein is necessary whenever transgenic proteins are introduced into the food chain. Bioinformatics approaches in allergen prediction have evolved appreciably in recent years to increase sophistication and performance. However, what are the critical features for protein's allergenicity have been not fully investigated yet.We presented a more comprehensive model in 128 features space for allergenic proteins prediction by integrating various properties of proteins, such as biochemical and physicochemical properties, sequential features and subcellular locations. The overall accuracy in the cross-validation reached 93.42% to 100% with our new method. Maximum Relevance Minimum Redundancy (mRMR) method and Incremental Feature Selection (IFS) procedure were applied to obtain which features are essential for allergenicity. Results of the performance comparisons showed the superior of our method to the existing methods used widely. More importantly, it was observed that the features of subcellular locations and amino acid composition played major roles in determining the allergenicity of proteins, particularly extracellular/cell surface and vacuole of the subcellular locations for wheat and soybean. To facilitate the allergen prediction, we implemented our computational method in a web application, which can be available at http://gmobl.sjtu.edu.cn/PREAL/index.php.Our new approach could improve the accuracy of allergen prediction. And the findings may provide novel insights for the mechanism of allergies.
Co-reporter:Jing Wang;Yabin Yu;Yunan Zhao;Dabing Zhang
BMC Bioinformatics 2013 Volume 14( Issue 4 Supplement) pp:
Publication Date(Web):2013 March
DOI:10.1186/1471-2105-14-S4-S1
Allergy involves a series of complex reactions and factors that contribute to the development of the disease and triggering of the symptoms, including rhinitis, asthma, atopic eczema, skin sensitivity, even acute and fatal anaphylactic shock. Prediction and evaluation of the potential allergenicity is of importance for safety evaluation of foods and other environment factors. Although several computational approaches for assessing the potential allergenicity of proteins have been developed, their performance and relative merits and shortcomings have not been compared systematically.To evaluate and improve the existing methods for allergen prediction, we collected an up-to-date definitive dataset consisting of 989 known allergens and massive putative non-allergens. The three most widely used allergen computational prediction approaches including sequence-, motif- and SVM-based (Support Vector Machine) methods were systematically compared using the defined parameters and we found that SVM-based method outperformed the other two methods with higher accuracy and specificity. The sequence-based method with the criteria defined by FAO/WHO (FAO: Food and Agriculture Organization of the United Nations; WHO: World Health Organization) has higher sensitivity of over 98%, but having a low specificity. The advantage of motif-based method is the ability to visualize the key motif within the allergen. Notably, the performances of the sequence-based method defined by FAO/WHO and motif eliciting strategy could be improved by the optimization of parameters. To facilitate the allergen prediction, we integrated these three methods in a web-based application proAP, which provides the global search of the known allergens and a powerful tool for allergen predication. Flexible parameter setting and batch prediction were also implemented. The proAP can be accessed at http://gmobl.sjtu.edu.cn/proAP/main.html.This study comprehensively evaluated sequence-, motif- and SVM-based computational prediction approaches for allergens and optimized their parameters to obtain better performance. These findings may provide helpful guidance for the researchers in allergen-prediction. Furthermore, we integrated these methods into a web application proAP, greatly facilitating users to do customizable allergen search and prediction.