Co-reporter:Hao Li, Yuping Cai, Yuan Guo, Fangfang Chen, and Zheng-Jiang Zhu
Analytical Chemistry 2016 Volume 88(Issue 17) pp:8757
Publication Date(Web):July 27, 2016
DOI:10.1021/acs.analchem.6b02122
With recent advances in mass spectrometry, there is an increased interest in data-independent acquisition (DIA) techniques for metabolomics. With DIA technique, all metabolite ions are sequentially selected and isolated using a wide window to generate multiplexed MS/MS spectra. Therefore, DIA strategy enables a continuous and unbiased acquisition of all metabolites and increases the data dimensionality, but presents a challenge to data analysis due to the loss of the direct link between precursor ion and fragment ions. However, very few DIA data processing methods are developed for metabolomics application. Here, we developed a new DIA data analysis approach, namely, MetDIA, for targeted extraction of metabolites from multiplexed MS/MS spectra generated using DIA technique. MetDIA approach considers each metabolite in the spectral library as an analysis target. Ion chromatograms for each metabolite (both precursor ion and fragment ions) and MS2 spectra are readily detected, extracted, and scored for metabolite identification, referred as metabolite-centric identification. A minimum metabolite-centric identification score responsible for 1% false positive rate of identification is determined as 0.8 using fully 13C labeled biological extracts. Finally, the comparisons of our MetDIA method with data-dependent acquisition (DDA) method demonstrated that MetDIA could significantly detect more metabolites in biological samples, and is more accurate and sensitive for metabolite identifications. The MetDIA program and the metabolite spectral library is freely available on the Internet.
Co-reporter:Zhiwei Zhou, Xiaotao Shen, Jia Tu, and Zheng-Jiang Zhu
Analytical Chemistry 2016 Volume 88(Issue 22) pp:11084
Publication Date(Web):October 21, 2016
DOI:10.1021/acs.analchem.6b03091
The rapid development of metabolomics has significantly advanced health and disease related research. However, metabolite identification remains a major analytical challenge for untargeted metabolomics. While the use of collision cross-section (CCS) values obtained in ion mobility-mass spectrometry (IM-MS) effectively increases identification confidence of metabolites, it is restricted by the limited number of available CCS values for metabolites. Here, we demonstrated the use of a machine-learning algorithm called support vector regression (SVR) to develop a prediction method that utilized 14 common molecular descriptors to predict CCS values for metabolites. In this work, we first experimentally measured CCS values (ΩN2) of ∼400 metabolites in nitrogen buffer gas and used these values as training data to optimize the prediction method. The high prediction precision of this method was externally validated using an independent set of metabolites with a median relative error (MRE) of ∼3%, better than conventional theoretical calculation. Using the SVR based prediction method, a large-scale predicted CCS database was generated for 35 203 metabolites in the Human Metabolome Database (HMDB). For each metabolite, five different ion adducts in positive and negative modes were predicted, accounting for 176 015 CCS values in total. Finally, improved metabolite identification accuracy was demonstrated using real biological samples. Conclusively, our results proved that the SVR based prediction method can accurately predict nitrogen CCS values (ΩN2) of metabolites from molecular descriptors and effectively improve identification accuracy and efficiency in untargeted metabolomics. The predicted CCS database, namely, MetCCS, is freely available on the Internet.
Co-reporter:Xiaotao Shen;Xiaoyun Gong;Yuping Cai;Yuan Guo;Jia Tu;Hao Li;Tao Zhang
Metabolomics 2016 Volume 12( Issue 5) pp:
Publication Date(Web):2016 May
DOI:10.1007/s11306-016-1026-5
Untargeted metabolomics studies for biomarker discovery often have hundreds to thousands of human samples. Data acquisition of large-scale samples has to be divided into several batches and may span from months to as long as several years. The signal drift of metabolites during data acquisition (intra- and inter-batch) is unavoidable and is a major confounding factor for large-scale metabolomics studies.We aim to develop a data normalization method to reduce unwanted variations and integrate multiple batches in large-scale metabolomics studies prior to statistical analyses.We developed a machine learning algorithm-based method, support vector regression (SVR), for large-scale metabolomics data normalization and integration. An R package named MetNormalizer was developed and provided for data processing using SVR normalization.After SVR normalization, the portion of metabolite ion peaks with relative standard deviations (RSDs) less than 30 % increased to more than 90 % of the total peaks, which is much better than other common normalization methods. The reduction of unwanted analytical variations helps to improve the performance of multivariate statistical analyses, both unsupervised and supervised, in terms of classification and prediction accuracy so that subtle metabolic changes in epidemiological studies can be detected.SVR normalization can effectively remove the unwanted intra- and inter-batch variations, and is much better than other common normalization methods.
Co-reporter:Yuping Cai;Kai Weng;Yuan Guo;Jie Peng
Metabolomics 2015 Volume 11( Issue 6) pp:1575-1586
Publication Date(Web):2015 December
DOI:10.1007/s11306-015-0809-4
Multiple reaction monitoring (MRM)-based targeted metabolomics can simultaneously analyze up to hundreds of metabolites with high-throughput, good reproducibility, and wide dynamic range. However, when hundreds or thousands of MRM transitions are measured with tens to hundreds of biological samples, the complexity of MRM dataset acquired is no longer amenable to manual evaluation, and presents a challenge for targeted metabolomics. Here, we developed an R package, namely MRMAnalyzer, to process large set of MRM-based targeted metabolomics data automatically without any manual intervention. To demonstrate our MRMAnalyzer program, we first developed a targeted metabolomic method that simultaneously analyzes 182 metabolites in one 15-min LC run, and demonstrated the data processing procedures using MRMAnalyzer. The data processing steps include “pseudo” accurate m/z transformation, peak detection and alignment, metabolite identification, quality control check and statistical analysis. Finally, a targeted metabolomic assay was designed and integrated with MRMAnalyzer to profile the metabolic changes in Escherichia coli subjected to the protein expression. The generated MRM dataset consisting of more than 8000 MRM transitions were readily processed using MRMAnalyzer within 20 min without any manual intervention. Fourty seven out of 140 detected metabolites, enriched in six metabolic pathways, were found significantly affected in E. coli metabolome. In summary, a targeted metabolomic platform is developed for high-throughput metabolite profiling and automated data processing, and the MRMAnalyzer program is a high efficient informatics tool for large scale targeted metabolomics.