Co-reporter:Mengli Fan;Xiuwei Liu;Xiaoming Yu;Xiaoyu Cui;Wensheng Cai
Science China Chemistry 2017 Volume 60( Issue 2) pp:299-304
Publication Date(Web):2017 February
DOI:10.1007/s11426-016-0092-6
Rapid diagnosis is important for efficient treatment in clinical medicine. This study aimed at development of a method for rapid and reliable diagnosis using near-infrared (NIR) spectra of human serum samples with the help of chemometric modelling. The NIR spectra of sera from 48 healthy individuals and 16 patients with suspected kidney disease were analyzed. Discrete wavelet transform (DWT) and variable selection were adopted to extract the useful information from the spectra. Principal component analysis (PCA), linear discriminant analysis (LDA) and partial least squares discriminant analysis (PLSDA) were used for discrimination of the samples. Classification of the two-class sera was obtained using LDA and PLSDA with the help of DWT and variable selection. DWT-LDA produced 93.8% and 83.3% of the recognition rates for the validation samples of the two classes, and 100% recognition rates were obtained using DWT-PLSDA. The results demonstrated that the tiny differences between the spectra of the sera were effectively explored using DWT and variable selection, and the differences can be used for discrimination of the sera from healthy and possible patients. NIR spectroscopy and chemometrics may be a potential technique for fast diagnosis of kidney disease.
Co-reporter:Xiu-Wei Liu, Xiao-Yu Cui, Xiao-Ming Yu, Wen-Sheng Cai, Xue-Guang Shao
Chinese Chemical Letters 2017 Volume 28, Issue 7(Volume 28, Issue 7) pp:
Publication Date(Web):1 July 2017
DOI:10.1016/j.cclet.2017.03.021
Understanding the thermal stability of the proteins in human serum is essential since human serum is the important source of pharmaceutical proteins. Near-infrared (NIR) spectroscopy was applied to the investigation of thermal changes in secondary structure and hydration of human serum proteins. However, as a multicomponent system, the overlap of the broad NIR bands makes the structural analysis very difficult directly using the spectra of serum samples. Therefore, continuous wavelet transform (CWT) was used to improve the resolution of NIR spectra, and Monte Carlo-uninformative variable elimination (MC-UVE) method was applied to the selection of the variables associated with the proteins for the structural analysis. The variables (5956, 5867, 5815, 5747, 4525, 4401, 4359 and 4328 cm−1) related to protein secondary structures and those (7074, 6951, 6827 and 6700 cm−1) connected with water species were selected. Then, the thermal stability was analyzed through the intensity variations of the selected variables with temperature from 30 °C to 80 °C. It was found that the variation of the spectral variables related to both α-helix and β-sheet changes apparently around 60 °C, indicating the beginning of the thermal denaturation and the transition from α-helix to β-sheet. Moreover, an obvious change was found around 60 °C for the content of the water specie S3, i.e., the water cluster containing three hydrogen bonds. The result demonstrates that MC-UVE can identify the protein-related NIR spectral variables, and the water species may be a marker for investigation of the structural change of proteins in biochemical systems.Download high-res image (139KB)Download full-size imageThe temperature dependent near-infrared spectra of human serum samples were collected from 30 °C to 80 °C. The variables related to proteins and water were selected by Monte Carlo-uninformative variable elimination (MC-UVE) from the transformed spectra by continuous wavelet transform (CWT).
Co-reporter:Xiaoyu Cui, Jin Zhang, Wensheng Cai, Xueguang Shao
Chemometrics and Intelligent Laboratory Systems 2017 Volume 170(Volume 170) pp:
Publication Date(Web):15 November 2017
DOI:10.1016/j.chemolab.2017.08.010
•High order algorithms were employed for temperature dependent NIR spectra.•Difference of the spectral features captured by different algorithms was analyzed.•Spectral variations induced by temperature and concentration were obtained.•Quantitative spectral information contained in the spectra was extracted.High dimensional data analysis has gained widespread acceptance with the rapid development of analytical instruments and experimental techniques. Benefiting from the second–order advantage, high order chemometric algorithms have shown a great ability to match the nature of data and extract the latent components from the data. In this study, multiway principal component analysis (NPCA), parallel factor analysis (PARAFAC) and alternating trilinear decomposition (ATLD) were employed, respectively, to extract the information from temperature dependent near infrared (NIR) spectra of alcohol aqueous solutions. The variations of the structure induced by temperature and concentration in the solutions were analyzed by the three algorithms. Spectral features can be observed from the loadings obtained by NPCA, which explain the maximum variances. Spectral profiles computed by PARAFAC and ATLD contain the spectral information of the components. The former prefers to show the information of ethanol, water and ethanol–water cluster, while the latter opts for describing the information of the ethanol and different water clusters in the solution. However, all the three algorithms are able to capture the quantitative information from the spectra. Therefore, high order chemometric algorithms may provide powerful tools for analyzing temperature dependent NIR spectra to obtain the structural and quantitative information of the aqueous solutions.Download high-res image (379KB)Download full-size image
Co-reporter:Cuicui Wang, Shuyu Wang, Wensheng Cai, Xueguang Shao
Talanta 2017 Volume 162() pp:123-129
Publication Date(Web):1 January 2017
DOI:10.1016/j.talanta.2016.10.005
•High reflectivity of silver was utilized to enhance the ability of NIRDRS.•Silver layer was used as the adsorption substrate for the spectral measurement.•High quality spectra was obtained using the substrate.•Sensitive determination was achieved for microanalysis using NIRDRS.Near-infrared diffuse reflectance spectroscopy (NIRDRS) has been proved to be a convenient and fast quantitative method for complex samples. The sensitivity or the detection limit, however, has been the obstacle in practical uses, although great efforts have been made through experimental and chemometric approaches. Due to the strong reflectivity of silver in near–infrared region, a novel method that utilizes silver layer as the adsorption substrate was developed to enhance the detection ability of NIRDRS in this study. For investigating the enhancement effect of the method, lysozyme samples with different concentrations were spotted on the silver layer and NIR spectra were measured. Then quantitative determination was performed using multivariate calibration. For comparison, the comparative experiment was performed using the copper sheet as the substrate. The results show that the intensity of diffuse reflection can be enhanced, and the background variation was reduced by taking the mirror layer as the substrate. A linear variation was obtained between the concentrations and the intensities of the spectral response at a wavenumber. Using multivariate calibration for quantitative analysis, the optimal PLS model was obtained. The maximum deviation of the prediction results can be as low as 12.8 µg. Therefore, this study made a progress for NIRDRS technique in microanalysis.In view of the strong reflectivity of silver in near–infrared region, a novel method that utilizes silver layer as the adsorption substrate for enhancing the detection ability of near–infrared diffuse reflectance spectroscopy is developed.
Co-reporter:Xiao-Xu Ma, Cui-Cui Wang, Wen-Sheng Cai, Xue-Guang Shao
Chinese Chemical Letters 2016 Volume 27(Issue 10) pp:1597-1601
Publication Date(Web):October 2016
DOI:10.1016/j.cclet.2016.03.008
Urinary albumin is an important diagnostic and prognostic marker for cardiorenal disease. Recent studies have shown that elevation of albumin excretion even in normal concentration range is associated with increased cardiorenal risk. Therefore, accurate measurement of urinary albumin in normal concentration range is necessary for clinical diagnosis. In this work, thiourea-functionalized silica nanoparticles are prepared and used for preconcentration of albumin in urine. The adsorbent with the analyte was then used for near-infrared diffuse reflectance spectroscopy measurement directly and partial least squares model was established for quantitative prediction. Forty samples were taken as calibration set for establishing PLS model and 17 samples were used for validation of the method. The correlation coefficient and the root mean squared error of cross validation is 0.9986 and 0.43, respectively. Residual predictive deviation value of the model is as high as 18.8. The recoveries of the 17 validation samples in the concentration range of 3.39–24.39 mg/L are between 95.9%–113.1%. Therefore, the method may provide a candidate method to quantify albumin excretion in urine.Thiourea-functionalized silica nanoparticles were prepared and used for preconcentration of albumin in urine. The adsorbent with the analyte was then used for near-infrared diffuse reflectance spectroscopy measurement and partial least squares model was established for quantitative prediction.Download full-size image
Co-reporter:Zheng-Feng LI, Guang-Jin XU, Jia-Jun WANG, Guo-Rong DU, Wen-Sheng CAI, Xue-Guang SHAO
Chinese Journal of Analytical Chemistry 2016 Volume 44(Issue 2) pp:305-309
Publication Date(Web):February 2016
DOI:10.1016/S1872-2040(16)60907-6
Outlier detection is an important task in multivariate calibration because the quality of a calibration model is determined by that of the calibration data. An outlier detection method was proposed for near infrared (NIR) spectral analysis. The method was based on the definition of outlier and the principle of partial least squares (PLS) regression, i.e., an outlier in a dataset behaved differently from the rest, and the prediction result of a PLS model was an accumulation of several independent latent variables. Therefore, the proposed method built a PLS model with a calibration dataset, and then the contribution of each latent variable was investigated. Outliers were detected by comparing these contributions. An NIR spectral dataset of orange juice samples was adopted for testing the method. Six outliers were detected in the calibration set. The root mean squared error of cross validation (RMSECV) was reduced from 16.870 to 4.809 and the root mean squared error of prediction (RMSEP) was reduced from 3.688 to 3.332 after the removal of the outliers. Compared with a robust regression method, the result of the proposed method seemed more reasonable.An outlier detection method was proposed for near infrared (NIR) spectral analysis based on diagnosing the contribution of the samples in each latent variable to the partial least squares (PLS) model. Local outlier factor (LOF) was used for discriminating the outliers.
Co-reporter:Yan Liu, Wensheng Cai, Xueguang Shao
Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 2016 Volume 169() pp:197-201
Publication Date(Web):5 December 2016
DOI:10.1016/j.saa.2016.06.041
•The method for calibration transfer without standard samples is proposed.•The spectra measured on different instruments are linearly correlated.•The models constructed by spectra from different instruments are similar.•Constrained optimization method is used to transfer the primary model.•The transferred model is similar to the primary model in profile.Calibration transfer is essential for practical applications of near infrared (NIR) spectroscopy because the measurements of the spectra may be performed on different instruments and the difference between the instruments must be corrected. For most of calibration transfer methods, standard samples are necessary to construct the transfer model using the spectra of the samples measured on two instruments, named as master and slave instrument, respectively. In this work, a method named as linear model correction (LMC) is proposed for calibration transfer without standard samples. The method is based on the fact that, for the samples with similar physical and chemical properties, the spectra measured on different instruments are linearly correlated. The fact makes the coefficients of the linear models constructed by the spectra measured on different instruments are similar in profile. Therefore, by using the constrained optimization method, the coefficients of the master model can be transferred into that of the slave model with a few spectra measured on slave instrument. Two NIR datasets of corn and plant leaf samples measured with different instruments are used to test the performance of the method. The results show that, for both the datasets, the spectra can be correctly predicted using the transferred partial least squares (PLS) models. Because standard samples are not necessary in the method, it may be more useful in practical uses.
Co-reporter:Cuicui Wang;Wensheng Cai
Chemical Research in Chinese Universities 2016 Volume 32( Issue 6) pp:912-916
Publication Date(Web):2016 December
DOI:10.1007/s40242-016-6279-z
Enrichment technique has been proved to be an efficient way to make the near-infrared diffuse reflectance spectroscopy(NIRDRS) suitable for micro analysis. However, low selectivity presented by conventional enrichment methods makes the quantitative analysis easy to be affected by the coexisting components. In this study, a specific enrichment method with chemical bonding via thiol-maleimide click reaction was used to achieve the reduction of the interferences. Taking cysteine as the analyzing target, maleimide-functionalized SiO2 nanoparticles were prepared for the enrichment of cysteine. Then determination of cysteine in aqueous solution and human serum was studied using the partial least squares model built from the NIRDRS spectra of the adsorbate. The results show that the concentration that can be quantitatively detected is as low as 2.0 μg/mL, and the correlation coefficient(R) between the reference and predicted concentration is 0.9871 for the validation samples. The recoveries are in the range of 89.5%―113.8% for human serum samples in the concentration range of 0―16.2 μg/mL.
Co-reporter:Zhenzhen Xia;Wensheng Cai
Journal of Separation Science 2015 Volume 38( Issue 4) pp:621-625
Publication Date(Web):
DOI:10.1002/jssc.201400941
The discrimination of counterfeit and/or illegally manufactured medicines is an important task in the pharmaceutical industry for pharmaceutical safety. In this study, 22 slimming capsule samples with illegally added sibutramine and phenolphthalein were analyzed by electronic nose and flash gas chromatography. To reveal the difference among the different classes of samples, principal component analysis and linear discriminant analysis were employed to analyze the data acquired from electronic nose and flash gas chromatography, respectively. The samples without illegal additives can be discriminated from the ones with illegal additives by using electronic nose or flash gas chromatography data individually. To improve the performance of classification, a data fusion strategy was applied to integrate the data from electronic nose and flash gas chromatography data into a single model. The results show that the samples with phenolphthalein, sibutramine and both can be classified well by using fused data.
Co-reporter:Yihang Zeng;Wensheng Cai
Journal of Separation Science 2015 Volume 38( Issue 12) pp:2053-2058
Publication Date(Web):
DOI:10.1002/jssc.201500090
A method was developed for quantifying 17 amino acids in tobacco leaves by using an A300 amino acid analyzer and chemometric resolution. In the method, amino acids were eluted by the buffer solution on an ion-exchange column. After reacting with ninhydrin, the derivatives of amino acids were detected by ultraviolet detection. Most amino acids are separated by the elution program. However, five peaks of the derivatives are still overlapping. A non-negative immune algorithm was employed to extract the profiles of the derivatives from the overlapping signals, and then peak areas were adopted for quantitative analysis of the amino acids. The method was validated by the determination of amino acids in tobacco leaves. The relative standard deviations (n = 5) are all less than 2.54% and the recoveries of the spiked samples are in a range of 94.62–108.21%. The feasibility of the method was proved by analyzing the 17 amino acids in 30 tobacco leaf samples.
Co-reporter:Pao Li, Wensheng Cai and Xueguang Shao
Journal of Analytical Atomic Spectrometry 2015 vol. 30(Issue 4) pp:936-940
Publication Date(Web):12 Feb 2015
DOI:10.1039/C5JA00031A
Inductively coupled plasma optical emission spectrometry is a well-known technique for elemental analysis. However, in the analysis of trace elements in complex matrices, the signals can easily be interfered by those of other elements and drift the baseline. In this work, a chemometric approach was employed for the identification and quantitative analysis of trace elements in complex matrices. With the help of standard signals, the signals of trace elements can be obtained by a non-negative immune algorithm and used for the quantitative analysis. The method was validated by the analysis of trace calcium in rare earth matrices, trace bismuth, lead, and antimony in a tungsten matrix, trace phosphorus in iron and copper matrices and urban recycled water samples. The recoveries of the spiked samples were in the range of 96.1% to 108.3%.
Co-reporter:Ganghui Chu;Wensheng Cai
Journal of Separation Science 2015 Volume 38( Issue 7) pp:1149-1155
Publication Date(Web):
DOI:10.1002/jssc.201400922
To extract flavone glycosides efficiently, a new extraction material based on 4-butylaniline-bonded silica gel was prepared using a two-step grafting method including a ring-opening reaction and synchronous hydrolysis. Preparation of the silica-based material was easily achieved under mild conditions, and the material was characterized by Fourier transform infrared spectroscopy, elemental analysis, and scanning electron microscopy. The material was used in solid-phase extraction, and the extraction can be performed in neutral conditions without regard to ionic strength. Selectivity tests of 14 compounds on the extraction cartridge showed that the material has a high affinity to flavone glycosides in contrast to octadecyl silica, and the extraction yields for four flavone glycosides were found to be >93%. Selectivity tests further reveal that the adsorption on its surface is likely attributed to multiple interactions, including hydrophobic interactions, π–π interactions, and hydrogen bonding. To explore the applicability of 4-butylaniline-bonded silica gel, naringin and hesperidin from Simotang oral liquid were extracted, and the extraction yields were >90%, which is distinguished from <28% on octadecyl silica cartridge.
Co-reporter:Pao Li;Wensheng Cai
Journal of Chemometrics 2015 Volume 29( Issue 5) pp:300-308
Publication Date(Web):
DOI:10.1002/cem.2697
A second-order method, standard signal extraction was proposed for quantitative analysis of target analytes in the samples with complex matrices by gas chromatography–mass spectrometry. In the method, standard addition was adopted. The data of the pure standard, sample, and spiked sample were used in the calculation. By performing principal component analysis on the data aligned with the measured signals of the sample and standard, the standardized signal that is orthogonal to the signals of interferences in the sample can be extracted. Then, the signal of spiked sample was projected to the standardized signal. The concentration ratio of the target analyte in the sample and spiked sample can be obtained. Finally, the concentration of the target analyte in the sample can be calculated with the added concentration in the spiked sample. Both simulated and experimental data were investigated with the proposed approach, compared with rank annihilation factor analysis. The recoveries of standard addition were found in a range of 99–105%, and the relative standard deviations obtained in three repeated measurements were less than 3%. Results show that the method can accurately estimate the quantitative information of analytes of interest in the presence of interferents and can be applied more widely compared with rank annihilation factor analysis. Copyright © 2015 John Wiley & Sons, Ltd.
Co-reporter:Ganghui Chu, Wensheng Cai, Xueguang Shao
Talanta 2015 Volume 136() pp:29-34
Publication Date(Web):1 May 2015
DOI:10.1016/j.talanta.2014.12.051
●4-Butylaniline-bonded attapulgite was prepared and characterized.●The material has high affinity to bisphenol A with Langmuir adsorption model.●SPE of bisphenol A was achieved using the material as an adsorbent.●An HPLC method for detection of bisphenol A in trace quantity was developed.Ring-opening reaction with synchronous hydrolysis was used to prepare 4-butylaniline-modified attapulgite (abbreviated as BA-ATP) for pre-concentration of bisphenol A (BPA) in trace quantity. The preparation was achieved under mild condition, and the material was characterized by Fourier transform infrared spectroscopy (FT-IR), elemental analysis, nuclear magnetic resonance (NMR) and scanning electron microscopy. BA-ATP was used in solid phase extraction (SPE), and SPE was performed in a neutral condition without regard to ionic strength. The results indicate that BA-ATP has high affinity to BPA with a maximum adsorption amount of 44 mg g−1, and the adsorption can be described by Langmuir isotherm model. The content of bisphenol A in water samples was analyzed by HPLC method with the pre-concentration using BA-ATP. The limit of detection (LOD) can be as low as 3.9 ng mL−1, and the average recoveries are in a range of 93–97% with relative standard deviation (RSD) of less than 2%.Preparation of 4-butylaniline-modified attapulgite (BA-ATP) by ring-opening reaction and synchronous hydrolysis is achieved and the material is used for pre-concentration of trace BPA in water sample.
Co-reporter:Ruifeng Shan, Yue Zhao, Mengli Fan, Xiuwei Liu, Wensheng Cai, Xueguang Shao
Talanta 2015 Volume 131() pp:170-174
Publication Date(Web):January 2015
DOI:10.1016/j.talanta.2014.07.081
•Temperature dependent near-infrared spectra were analyzed using MSCA.•QSTR model was established by coefficients in the between-temperature model.•Quantitative analysis was achieved by coefficients in the within-temperature model.•Difference between the QSTR models was used to study composition of solvent.Quantitative spectra-temperature relationship (QSTR) between near-infrared (NIR) spectra and temperature has been used for quantitative determination of the compositions in mixtures. In this work, QSTR is studied using multilevel simultaneous component analysis (MSCA) and the spectral data of the samples with different concentrations measured at different temperatures. MSCA model contains a between-individual model describing the differences between the individuals and a within-individual model capturing the differences within the data of all the individuals. NIR spectra of five different compositions (water–ethanol–isopropanol) measured at seven temperatures were analyzed. A between-temperature model describing the effect of temperature and a within-temperature model describing the variation of concentration were obtained, from which QSTR model is established and quantitative analysis is achieved. Furthermore, the difference between the between-temperature or within-temperature models of different mixtures is used to study the composition of the solvent.
Co-reporter:Zhenzhen Xia;Wensheng Cai
Chemical Research in Chinese Universities 2015 Volume 31( Issue 2) pp:192-197
Publication Date(Web):2015 April
DOI:10.1007/s40242-015-4366-1
Chlorinated paraffins(CPs) are potential persistent organic pollutants(POPs), which threat the safety of environment and organisms. However, the analysis of CPs is a difficult task due to their complex composition containing thousands of congeners. In the present work, quantitative structure retention relationship(QSRR) of CPs was studied. A total of 470 molecular descriptors were generated, for describing the structures of 28 CPs and 12 descriptors relevant to retention time of the CPs were selected by stepwise regression. Then, QSRR models between retention time on the one hand and the selected descriptors on the other hand were established by multiple linear regression(MLR), partial least squares(PLS) and least square support vector regression(LS-SVR). The result shows that PLS model is better than MLR and LS-SVR, obtaining a squared correlation coefficient(r2) of 0.9996 and a root mean squared error(RMSE) of 0.015. The PLS model was then used to predict the retention time of 49 C10-CPs. Three of them were investigated by gas chromatography coupled with mass spectrometry(GC-MS). A well-defined correlation was found between the measured retention time and the predicted value.
Co-reporter:Yan Liu, Wensheng Cai, Xueguang Shao
Analytica Chimica Acta 2014 Volume 836() pp:18-23
Publication Date(Web):11 July 2014
DOI:10.1016/j.aca.2014.05.036
•A method for transferring the spectra measured on multi-instrument is proposed.•Trilinear decomposition is used to calculate the difference between instruments.•Difference between instruments can be corrected by changing a parameter.•Standardization of the spectra measured on multi-instrument was achieved.Calibration model transfer is essential for practical applications of near infrared (NIR) spectroscopy because the measurements of the spectra may be performed on different instruments and the difference between the instruments must be corrected. An approach for calibration transfer based on alternating trilinear decomposition (ATLD) algorithm is proposed in this work. From the three-way spectral matrix measured on different instruments, the relative intensity of concentration, spectrum and instrument is obtained using trilinear decomposition. Because the relative intensity of instrument is a reflection of the spectral difference between instruments, the spectra measured on different instruments can be standardized by a correction of the coefficients in the relative intensity. Two NIR datasets of corn and tobacco leaf samples measured with three instruments are used to test the performance of the method. The results show that, for both the datasets, the spectra measured on one instrument can be correctly predicted using the partial least squares (PLS) models built with the spectra measured on the other instruments.
Co-reporter:Jiarun Tu, Wensheng Cai and Xueguang Shao
Analyst 2014 vol. 139(Issue 5) pp:1016-1022
Publication Date(Web):05 Dec 2013
DOI:10.1039/C3AN01719B
Enhancing the sensitivity of analytical methods by improving the signal quality is a universal goal in analytical chemistry. In analytical electrochemistry systems, the double layer charging current has been an obstacle to the accurate measurement of the faradaic current despite theoretical and experimental efforts. In this study, a method for sensitivity enhancement for potential step voltammetry is developed using chemometric resolution. Trilinear decomposition is used to extract the net faradaic current and double layer charging current directly from the data matrix of the decaying curves measured at different potentials. The feasibility of the method is proven using simulated data and further validated by two experimental datasets of diffusion and adsorption control reactions. Compared with the conventional approach that records the current data at a later pulse time, the voltammogram of the extracted net faradaic current is an ideal sigmoid curve with a horizontal baseline, even for the samples of very low concentration. More importantly, the analytical sensitivity can be greatly improved when the net faradaic current is used for quantitative determination.
Co-reporter:Xi Wu;Wensheng Cai
Journal of Separation Science 2014 Volume 37( Issue 7) pp:828-834
Publication Date(Web):
DOI:10.1002/jssc.201301268
For the rapid analysis of multicomponent mixtures using GC–MS, a chemometric multistep screening approach was proposed to extract the signals of the components from the overlapping signals measured with a very fast temperature program. At first, independent component analysis was used to find all the possible mass spectra from the overlapping signal in the moving windows along the retention time, and iterative target transformation factor analysis was employed to validate the existence of the extracted spectra from each window. Then, identical signals in the validated spectra were excluded using match ratio as a criterion. Finally, the chromatographic profiles for each spectrum were calculated using non-negative immune algorithm, and the spectra with a reasonable profile were taken as the identified components. A mixture of 53 pesticides was analyzed with a very fast temperature program of 7 min. A total of 48 pesticides and 16 interferences were identified from the overlapping GC–MS signal.
Co-reporter:Pao Li;Zhen Mei;Wensheng Cai
Journal of Separation Science 2014 Volume 37( Issue 13) pp:1585-1590
Publication Date(Web):
DOI:10.1002/jssc.201400190
A method for the fast determination of the components in a complex sample by using gas chromatography with mass spectrometry was developed and used for the quantitative analysis of phthalic acid esters in environmental water. In the method, the adaptively corrected mass spectra were used to compensate for the differences between the library spectra and the measured ones in the experiment. The correction was obtained by the iterative transformation of the library spectra using iterative target transformation factor analysis, and the resolution was performed by non-negative immune algorithm using the corrected spectra. Rapid analysis of 16 phthalic acid esters in water samples was achieved using fast elution gas chromatography with mass spectrometry measurements. The results show that the mass spectra and chromatographic profiles of the phthalic acid esters can be obtained from the overlapping signal of 13 min elution, and accurate quantitative analysis can be obtained. The recoveries of the phthalic acid esters obtained by standard addition are between 90.3 and 107.4%, and the relative standard deviations obtained in repeated measurements are less than 9%.
Co-reporter:Jing Han;Pao Li;Wensheng Cai
Journal of Separation Science 2014 Volume 37( Issue 16) pp:2126-2130
Publication Date(Web):
DOI:10.1002/jssc.201400403
Ginseng is a well-known traditional Chinese medicinal herb, and ginsenosides are its major active components. A method for the fast determination of ginsenosides in ginseng samples by high-performance liquid chromatography was developed and used for the quantitative analysis of four ginsenosides in three different ginseng samples. In this method, instead of time-consuming gradient elution, isocratic elution was used to speed up the analysis. Under strong isocratic elution, all the ginsenosides are eluted in 2.3 min. Although the measured signal is composed of overlapped peaks with the interferences and background, the signal of ginsenosides can be extracted by chemometric resolution. A non-negative immune algorithm was employed to obtain the chromatographic information of the target components from the data. Compared with conventional chemometric approaches, the method can perform the extraction for one-dimensional overlapping signals. The method was validated by the determination of four ginsenosides in three different ginseng samples. The recoveries of the spiked samples were in the range of 94.08–107.3%.
Co-reporter:Jiarun Tu, Wensheng Cai, Xueguang Shao
Journal of Electroanalytical Chemistry 2014 Volume 725() pp:25-31
Publication Date(Web):24 June 2014
DOI:10.1016/j.jelechem.2014.05.001
•A three-dimensional data array with physical significance is constructed.•Trilinear decomposition algorithm is employed to process the data array.•Net faradaic and charging current can be simultaneously extracted.•Sensitivity is enhanced because of the used extracted net faradaic current.•Results of quantitative analysis can be directly achieved with the trilinear model.Pulse technologies have been widely employed in analytical electrochemistry systems. Double layer charging current has been an obstacle to the accurate measurement of faradaic current despite the theoretical and experimental efforts. Conventional pulse approaches record the current at a later pulse time for minimizing the capacitative effects and reducing the background currents. In this study, a method for extracting the quantitative information directly from the current curve data of pulse voltammetry is developed using trilinear decomposition algorithm. Feasibility of the proposed method is proved and an application study is performed for quantitative analysis of bovine serum albumin (BSA). The method can simultaneously obtain net faradaic current and double layer charging current directly from the measured signals, and the quantitative information can be obtained as well. Thus the performance of voltammetric techniques can be improved. Because the net signals are used, the sensitivities are increased by about 160%, 90% and 25% for normal pulse voltammetry, differential pulse voltammetry and staircase voltammetry, respectively. Quantitative determination of BSA in solutions was carried out with a calibration model of six samples. The recoveries are in a range from 98.1% to 102.9%, which may prove the reliability of the proposed method.Graphical abstractTrilinear decomposition algorithm was employed to extract high sensitive quantitative information from the data array constructed by the decaying current curves measured with normal pulse voltammetry (NPV), differential pulse voltammetry (DPV) and staircase voltammetry (SV), and quantitative analysis of bovine serum albumin (BSA) in solutions was achieved.
Co-reporter:Ruifeng Shan, Zhiyi Mao, Lihui Yin, Wensheng Cai and Xueguang Shao
Analytical Methods 2014 vol. 6(Issue 13) pp:4692-4697
Publication Date(Web):04 Apr 2014
DOI:10.1039/C4AY00243A
The discrimination of pharmaceutical products has been an important task in pharmaceutical industry and for pharmaceutical safety. In this study, the principal component accumulation (PCAcc) method was investigated for the discrimination of Chinese patent medicines. In the PCAcc method, an accumulation strategy is utilized to combine the classification information contained in multiple PC subspaces by using a rotation, a projection and a summation operation. To improve the performance of classification, continuous wavelet transform is applied as the pretreatment method to eliminate the background. The results show that, among the 12 classes of Chinese patent medicines, 8 classes are correctly classified, and a total of ten samples are misclassified for the other four classes. Compared with the results obtained by principal component analysis, radial basis function artificial neural network and partial least squares discriminant analysis, PCAcc produces the best classification.
Co-reporter:Peng Liu ; Christophe Chipot ; Wensheng Cai
The Journal of Physical Chemistry C 2014 Volume 118(Issue 23) pp:12562-12567
Publication Date(Web):May 16, 2014
DOI:10.1021/jp503241p
Manufacturing at the molecular level engines to power nanocars represents a challenge in the development of nanomachines. A molecular engine formed of β-cyclodextrin (β-CD), aryl, and amide moiety has been studied by means of molecular dynamics simulations combined with free-energy calculations. The compression and decompression strokes involving the binding processes of the (Z)- and (E)-isomers of this engine with 1-adamantanol (AD) have been elucidated by determining the underlying potentials of mean force (PMFs). The difference in the binding-free energies, considered as the work generated by and stored within this engine, is calculated to be +1.5 kcal/mol, in remarkable agreement with the experimentally measured quantity. Partitioning the PMFs into physically meaningful free-energy components suggests that the two binding processes are primarily controlled by the favorable inclusion of AD by the β-CD. The work generated by the engine is harnessed to push the alkyl moiety from the hydrophobic cavity of the CD to water, to modify a dihedral angle by a twisting motion about the C–Cα bond, and to increase the tilt angle between the mean plane of the sugar unit, which connects the amide moiety, and the mean plane of the CD. By deciphering the intricate mechanism whereby the present molecular engine operates, our understanding of how similar nanomachines work is expected to be improved significantly, helping in turn the design of novel, more effective ones.
Co-reporter:Jing Han;Xi Wu;Wensheng Cai
Chemical Research in Chinese Universities 2014 Volume 30( Issue 4) pp:578-581
Publication Date(Web):2014 August
DOI:10.1007/s40242-014-3543-y
Ginseng is one of the most important traditional Chinese medicines and functional foods. A method for the fast determination of amino acids in ginseng samples using high performance liquid chromatography(HPLC) was developed, in which strong isocratic elution was employed for simplifying the separation and speeding up the analysis. All amino acids were eluted within 3 min with the chromatogram composed of overlapped peaks from the interferences. Then, non-negative immune algorithm(NNIA) was adopted to resolve the chromatographic signals of the components from the chromatogram measured. The results show that the signals of the amino acids can be correctly extracted by NNIA and the signal extracted can be used for the quantitative analysis. The method was validated via determining six amino acids of four different samples of ginseng. The recoveries of the spiked samples are in a range of 96.6%–106.3%.
Co-reporter:Zhiyi Mao, Ruifeng Shan, Jiajun Wang, Wensheng Cai, Xueguang Shao
Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 2014 Volume 128() pp:711-715
Publication Date(Web):15 July 2014
DOI:10.1016/j.saa.2014.03.015
•Fast analysis of polyphenols in plant samples using NIRDRS was studied.•Individual spectral preprocessing and variable selection has no or slight effect.•Simultaneous preprocessing and variable selection can improve the models.•Combination of CWT–MSC and RT can be used to build the optimal model.Polyphenols in plant samples have been extensively studied because phenolic compounds are ubiquitous in plants and can be used as antioxidants in promoting human health. A method for rapid determination of three phenolic compounds (chlorogenic acid, scopoletin and rutin) in plant samples using near-infrared diffuse reflectance spectroscopy (NIRDRS) is studied in this work. Partial least squares (PLS) regression was used for building the calibration models, and the effects of spectral preprocessing and variable selection on the models are investigated for optimization of the models. The results show that individual spectral preprocessing and variable selection has no or slight influence on the models, but the combination of the techniques can significantly improve the models. The combination of continuous wavelet transform (CWT) for removing the variant background, multiplicative scatter correction (MSC) for correcting the scattering effect and randomization test (RT) for selecting the informative variables was found to be the best way for building the optimal models. For validation of the models, the polyphenol contents in an independent sample set were predicted. The correlation coefficients between the predicted values and the contents determined by high performance liquid chromatography (HPLC) analysis are as high as 0.964, 0.948 and 0.934 for chlorogenic acid, scopoletin and rutin, respectively.Graphical abstractOptimal models for rapid analysis of three polyphenols in plant samples by near-infrared diffusive reflectance spectroscopy (NIRDRS) were obtained using continuous wavelet transform–multiplicative scatter correction (CWT–MSC) for spectral preprocessing and randomization test (RT) for variable selection.
Co-reporter:Yan Liu, Yu Ning, Wensheng Cai and Xueguang Shao
Analyst 2013 vol. 138(Issue 21) pp:6617-6622
Publication Date(Web):21 Aug 2013
DOI:10.1039/C3AN01232H
Great attention has been paid to near-infrared diffuse reflectance spectroscopy (NIRDRS) due to its practicability in analyzing real complex samples. However, the application of the technique in micro-analysis is badly restricted by its low sensitivity or high detection limit. In this study, the possibility of achieving the sensitive detection of micro-components using NIRDRS with the help of chemometric methods is studied with two experimental datasets. The results show that a very high sensitivity can be obtained when the noise and the variant background are minimized. Quantitative determination of low concentrations of pesticides and trace Cr3+ in solutions is achieved by using preconcentration and chemometric approaches to minimize the noise and background. The absolute prediction error of the method can be as low as 7.6 μg for the pesticide and 28.6 μg for Cr3+. These quantities are equivalent to 76 ng mL−1 and 286 ng mL−1 if 100 mL of solution are used for the analysis.
Co-reporter:Weiwei Yu;Wensheng Cai
Journal of Separation Science 2013 Volume 36( Issue 14) pp:2277-2282
Publication Date(Web):
DOI:10.1002/jssc.201300122
A method for the fast analysis of a specific component in complex samples by GC–MS was developed and used for the quantitative determination of prometryn in hair samples. In this method, the tedious and time-consuming sample pretreatment for purification was avoided, and a short capillary column and fast temperature program were employed to speed up the analysis. Although the measured total ion chromatogram is composed of overlapping peaks with interference and background noise, the signal of prometryn can be extracted by chemometric methods. Window-independent component analysis was used to extract the mass spectrum and a non-negative immune algorithm was employed to obtain the chromatographic profile of the interesting component from the measured data. Due to the complexity of the matrix, a standard addition method was adopted for the quantification. The applicability of the method was validated with spiked samples, and the recoveries were in the range of 99–105%.
Co-reporter:Weiwei Yu;Wensheng Cai
Chinese Journal of Chemistry 2013 Volume 31( Issue 4) pp:545-550
Publication Date(Web):
DOI:10.1002/cjoc.201300004
Abstract
A method for fast determination of the component in complex samples by using gas chromatography-mass spectrometry (GC-MS) was developed and used for quantitative analysis of phenanthrene in soils. In the method, window independent component analysis (WICA) was used for resolving the mass spectrum and non-negative immune algorithm (NNIA) was employed for obtaining the chromatographic profile. Therefore, spectral and chromatographic information of a specific component can be obtained from the measured GC-MS data of overlapping and high background. Six soil samples collected from different places were analyzed. The tedious pretreatments in preparing the samples and the elution in the separation were simplified for speeding up the analysis. Due to the complexity of the matrix, standard addition method was adopted for the final quantification. The applicability of the method was validated with a spiked sample and the results of the six samples are reasonable.
Co-reporter:Fengxia Liu, Wensheng Cai, Xueguang Shao
Vibrational Spectroscopy 2013 Volume 68() pp:104-108
Publication Date(Web):September 2013
DOI:10.1016/j.vibspec.2013.05.014
•An adsorbent was prepared for fast adsorption Hg2+ from dilute solutions.•Multivariate calibration is helpful to improve the selectivity.•The acceptable model was built for quantitative analysis.A method for the selective determination of mercury (II) ion in water was developed by using preconcentration and near-infrared diffuse reflection spectroscopy (NIRDRS). For the method, thiol-functionalized silica porous particles (TFSPP) were prepared for adsorption of mercury (II) ion from dilute solutions. The adsorbent proved very efficient for the adsorption of the analyte, and NIRDRS combined with partial least squares (PLS) for calibration modeling was found suitable for a fast analysis. Under optimized conditions, the adsorption rate of the analyte can be above 98% within 5 min. Synthetic samples prepared with river water and interferences were used for method development. The results show a correlation coefficient of R = 0.9667 between reference values and prediction results for the analyte, and the recovery rates were in the range of 81.5–122% for the validation samples in the concentration range of 3.3–26.9 mg L−1.
Co-reporter:Cai-Jing Cui, Wen-Sheng Cai, Xue-Guang Shao
Chinese Chemical Letters 2013 Volume 24(Issue 1) pp:67-69
Publication Date(Web):January 2013
DOI:10.1016/j.cclet.2012.12.012
Near-infrared diffuse reflectance spectroscopy (NIRDRS) has attracted more and more attention in analyzing the components in samples with complex matrices. However, to apply this technique to micro-analysis, there are still some obstacles to overcome such as the low sensitivity and spectral overlapping associated with this approach. A method for fast determination of bovine serum albumin (BSA) in micro-volume samples was studied using NIRDRS with sample spots and chemometric techniques. 10 μL of sample spotted on a filter paper substrate was used for the spectral measurements. Quantitative analysis was obtained by partial least squares (PLS) regression with signal processing and variable selection. The results show that the correlation coefficient (R) between the predicted and the reference concentration is 0.9897 and the recoveries are in the range of 87.4%–114.4% for the validation samples in the concentration range of 0.61–8.10 mg/mL. These results suggest that the method has the potential to quickly measure proteins in micro-volume solutions.A method for fast determination of bovine serum albumin (BSA) in micro-volume samples was studied using near-infrared diffuse reflectance spectroscopy (NIRDRS) with sample spots and chemometric techniques.
Co-reporter:Xi Wu;Weiwei Yu;Xiangyu Luo;Wensheng Cai
Chromatographia 2013 Volume 76( Issue 13-14) pp:849-855
Publication Date(Web):2013 July
DOI:10.1007/s10337-013-2487-6
Much effort has been made to analyze the pesticides in foods or vegetables due to the interferences in the sample matrix. In this work, taking the determination of prometryn in leek samples as an example, a simple method is proposed for fast analysis of the components in real samples using GC–MS and chemometric resolution. The purification step in preparing the samples was simplified, and a short capillary column and fast temperature program were employed in GC–MS measurement. Although the measured signal is composed of overlapped peaks with the interferences and background, the signal of prometryn can be extracted by chemometric resolution. Six leek samples from different markets were analyzed within an elution time of 6 min. Compared with the results by the standard method, the results by the proposed method were found to be reliable.
Co-reporter:Jiarun Tu, Wensheng Cai, Xueguang Shao
Talanta 2013 Volume 116() pp:575-580
Publication Date(Web):15 November 2013
DOI:10.1016/j.talanta.2013.07.021
•A chemometric method for extracting faradaic and charging current is proposed.•Ideal voltammogram is obtained without blank experiment and assumptions.•Intensity of the signal can be greatly enhanced.•Quantitative analysis is achieved using both electric quantity and current.Double layer charging current in electrochemical systems has been a challenging problem in the last several decades because it causes interference to the accurate measurement of faradaic current. A method for extracting faradaic current and double layer charging current directly from the measured total current in potential step voltammetry is developed by using iterative target transformation factor analysis (ITTFA). The method constructs initial target vectors based on the theoretical formulae of faradaic and charging current, and then calculates the weights of faradaic and charging current in the measured signal via the iterative transformation of the initial vectors. Therefore, the two currents in one experiment can be obtained simultaneously without any assumption. The potential step voltammetric signals of potassium ferricyanide, copper sulfate and paracetamol were analyzed with the proposed method. The results show that the shape of the obtained voltammogram is an ideal sigmoid curve with horizontal straight baseline and plateaus, and the intensity of the signal is greatly enhanced. Therefore, the method provides a new way to measure the pure faradic current in the potential step voltammetric experiment, and may provide an alternative for improving the sensitivity of quantitative analysis.
Co-reporter:Zhen Mei;GuoRong Du;WenSheng Cai
Science China Chemistry 2013 Volume 56( Issue 5) pp:656-663
Publication Date(Web):2013 May
DOI:10.1007/s11426-012-4773-9
A chemometric method to determine selective ion by using non-negative immune algorithm (NNIA) was proposed. In the method, the mutual projections of the chromatographic profiles at different m/z channel are calculated using NNIA. Suppose a GC-MS data with m retention time points and n mass channels, the projections of the GC-MS data onto a chromatographic profile at a mass channel will form a mass spectrum of 1×n vector. If the chromatographic profile at a selective mass channel is used, the extracted mass spectrum will be a correct one. Therefore, by comparing the extracted mass spectrum with a reference spectrum, the selective ion can be identified, and the corresponding chromatographic profile can be obtained at the same time. GC-MS data of 40-pesticide mixture was investigated by the method. The results show that both the mass spectral and the chromatographic information of the interested components can be extracted from the overlapping signals, except for the special cases of isomeric components with very similar mass spectra.
Co-reporter:Yi Wang, Xiang Ma, Yadong Wen, Jingjing Liu, Wensheng Cai and Xueguang Shao
Analytical Methods 2012 vol. 4(Issue 9) pp:2893-2899
Publication Date(Web):13 Jun 2012
DOI:10.1039/C2AY25508A
Discrimination is a universal problem in various fields. Near-infrared (NIR) spectroscopy combined with pattern recognition technique has shown great power and gained wide acceptance for analyzing complex samples. A principal component accumulation (PCAcc) method was proposed in our previous work for the two-class problem of cancer classification based on microarray gene expression data. In this work, PCAcc is applied to the multiclass problem of plant sample classification based on NIR spectroscopy. A hierarchical way is adopted as in the decision tree methods, each node dividing the samples into two classes. A parameter of resolution (Rs) is used to quantitatively measure the difference between the two classes. To validate the performance of the proposed method, it is applied to the NIR spectral datasets of different parts of tobacco leaves and different brands of cigarettes. The results show that the method can discriminate these samples with a very good performance in terms of classification accuracy.
Co-reporter:Yanjun Ma, Ruoshi Bai, Guorong Du, Li Ma, Aijun He, Na Li, Xiaoli Yi, Wensheng Cai, Jun Zhou and Xueguang Shao
Analytical Methods 2012 vol. 4(Issue 5) pp:1371-1376
Publication Date(Web):21 Mar 2012
DOI:10.1039/C2AY25038A
Indirect modeling of trace components in real samples by use of near-infrared spectroscopy (NIRS) has gained much interest, because it may provide a rapid way for analyzing the industrial or agricultural products. Coupling near-infrared diffusive reflectance spectroscopy and chemometric techniques, a method for rapid analysis of four tobacco specific N-nitrosamines (TSNAs) and their total content is studied in this work. For optimization of the models, techniques for spectral preprocessing and variable selection are adopted and compared. It is found that to remove the varying background and correction of the multiplicative scattering effect in the spectra is important in modeling, also variable selection can significantly improve the models. For validation of the models, the TSNA contents of independent test samples and tobacco leaves harvested in different years were predicted. Consistent results were obtained between the reference contents by GC/TEA analysis and those predicted. Although the relative errors for some low content samples are not satisfactory, the method is a practical alternative for industrial analysis due to the non-destructive and rapid nature of the method.
Co-reporter:Xueguang Shao, Min Zhang and Wensheng Cai
Analytical Methods 2012 vol. 4(Issue 2) pp:467-473
Publication Date(Web):03 Jan 2012
DOI:10.1039/C2AY05609G
Near-infrared (NIR) spectral analysis usually needs to take advantage of multivariate calibration. However, not all the variables in the spectra have equal contributions to a calibration model. Identification of informative variables is a key step to build a high performance model. According to the influence of a variable on the calibration model, influential variable (IV) is defined and a method for identification of IVs is proposed in this work. In the method, a set of partial least squares (PLS) models are built using a subset of variables selected randomly by Monte Carlo re-sampling, and then the clustering of these models are investigated by means of principal component analysis. The variables that make the models grouping can be identified as the IVs. Finally, the PLS model built with the selected IVs is adopted as the calibration model. Five NIR spectral datasets are used to test the performance and applicability of the method. The results show that the identified IVs are reasonable and the calibration model is efficient enough to produce accurate and reliable predictions.
Co-reporter:Pao Li, Guorong Du, Wensheng Cai, Xueguang Shao
Journal of Pharmaceutical and Biomedical Analysis 2012 70() pp: 288-294
Publication Date(Web):
DOI:10.1016/j.jpba.2012.07.013
Co-reporter:Yu Ning, Jihui Li, Wensheng Cai, Xueguang Shao
Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 2012 Volume 96() pp:289-294
Publication Date(Web):October 2012
DOI:10.1016/j.saa.2012.05.034
A method for simultaneous determination of metal ions in river water was developed by using preconcentration and near-infrared diffuse reflectance spectroscopy (NIRDRS). An inorganic biomaterial, nano-hydroxyapatite (HAP) was used as a high-efficient adsorbent for gathering the ions from water samples. After adsorbing the analytes onto the adsorbent, NIRDRS was measured and partial least squares (PLS) models were established for fast and simultaneous quantitative prediction. With the samples prepared by river water, determination of Pb2+, Zn2+, Cu2+, Cd2+ and Cr3+ was investigated. The calibration models of Cu2+, Cr3+ and total content were proven to be efficient enough for precise prediction. The determination coefficients (R2) of the independent validation were found as high as 0.9924, 0.9869 and 0.9273 for Cu2+, Cr3+ and total content, respectively. Therefore, the feasibility of NIRDRS for microanalysis of heavy metal ions in waste water was demonstrated.Graphical abstractHeavy metal ions in waste water were quantitatively determined by preconcentration with nano-HAP, NIRDRS and multivariate calibration.Highlights► Feasibility of NIRS for microanalysis of heavy metal ions was studied. ► HAP was used as an adsorbent for gathering the ions from water samples. ► Three precise models were built for quantitative analysis. ► Interaction mechanisms are fundamental of the feasible models.
Co-reporter:Min Zhang, Wensheng Cai and Xueguang Shao
Analyst 2011 vol. 136(Issue 20) pp:4217-4221
Publication Date(Web):26 Aug 2011
DOI:10.1039/C1AN15222J
Continuous wavelet transform (CWT) has been shown to be a high-performance signal processing technique in multivariate calibration. However, the signal processed by CWT with a specific wavelet may account for only a part of the information. To effectively utilize more abundant information contained in analytical signals, a method, named as wavelet unfolded partial least squares (WUPLS), was proposed. In the approach, the measured dataset is firstly extended by CWT with different wavelets, and then partial least squares (PLS) is employed to develop the quantitative model between the extended dataset and the target values. In order to select the representative wavelets, principal component analysis (PCA) is used to investigate the distribution of the signals obtained by CWT with different wavelets. The performance of the method was tested with blood and tobacco powder samples. Compared with the results obtained by PLS methods, the WUPLS method combined with signal processing techniques is proven to be a promising tool for improving the near-infrared (NIR) spectral analysis of complex samples.
Co-reporter:Jingjing Liu, Xiang Ma, Yadong Wen, Yi Wang, Wensheng Cai, and Xueguang Shao
Industrial & Engineering Chemistry Research 2011 Volume 50(Issue 12) pp:7677-7681
Publication Date(Web):May 10, 2011
DOI:10.1021/ie200543v
Process analysis is a great challenge in industry due to the complexity of industrial production. In the present work, an approach was developed based on online near-infrared spectroscopy and alternating trilinear decomposition (ATLD) method for industrial process analysis. The basic idea of the approach is to extract the common information that represents the common property of the batches using ATLD. Using common information, the production process can be monitored by investigating the variation of the common property, and quality assurance can be achieved by discrimination analysis. Taking the tobacco production as an example, the results show that the method is able to capture the intrinsic information of the products and performs well in process analysis and quality assurance.
Co-reporter:Yan Zhang, Yong Hao, Wensheng Cai and Xueguang Shao
Analytical Methods 2011 vol. 3(Issue 3) pp:703-708
Publication Date(Web):10 Feb 2011
DOI:10.1039/C0AY00775G
Near-infrared diffuse reflectance spectroscopy (NIRDRS) has been proven to be a convenient and fast quantitative method for complex samples. A method for simultaneous quantitative determination of phenol and p-nitrophenol in wastewater was developed by using NIRDRS with a preconcentration procedure of resin adsorption. In the method, the analytes were firstly adsorbed onto the adsorbent, and then the NIRDRS spectrum of the adsorbent was measured for quantitative analysis by partial least squares (PLS) regression. The results show that the two phenolic compounds can be immobilized onto the adsorbent and directly measured by NIRDRS. The correlation coefficients (R) between the reference values and prediction results for phenol and p-nitrophenol were 0.958 and 0.957, respectively. Furthermore, the interferences of the coexistence components can be eliminated with the aid of multivariate calibration. The method may provide a sensitive, effective and fast way for simultaneous quantitative analysis of phenol and p-nitrophenol in solution samples.
Co-reporter:Jing Jing Liu, Heng Xu, Wen Sheng Cai, Xue Guang Shao
Chinese Chemical Letters 2011 Volume 22(Issue 10) pp:1241-1244
Publication Date(Web):October 2011
DOI:10.1016/j.cclet.2011.04.019
Near infrared (NIR) spectroscopy technique has shown great power and gained wide acceptance for analyzing complicated samples. The present work is to distinguish different brands of tobacco products by using on-line NIR spectroscopy and pattern recognition techniques. Moreover, since each brand contains a large number of samples, an improved dendrogram was proposed to show the classification of different brands. The results suggest that NIR spectroscopy combined with principal component analysis (PCA) and hierarchical cluster analysis (HCA) performs well in discrimination of the different brands, and the improved dendrogram could provide more information about the difference of the brands.
Co-reporter:Xihui Bian;Da Chen;Wensheng Cai;Edward Grant
Chinese Journal of Chemistry 2011 Volume 29( Issue 11) pp:2525-2532
Publication Date(Web):
DOI:10.1002/cjoc.201180425
Abstract
The application of Raman spectroscopic techniques combined with multivariate chemometrics signal processing promise new means for the rapid multidimensional analysis of metabolites non-destructively, with little or no sample preparation and little sensitivity to water. However, Rayleigh scattering, fluorescence and uncontrolled variance present substantial challenges for the accurate quantitative analysis of metabolites at physiological levels in biologically varying samples. Effective strategies include the application of chemometrics pretreatments for reducing Raman spectral interference. However, the arbitrary application of individual or combined pretreatment procedures can significantly alter the outcome of a measurement, thereby complicating spectral analysis. This paper evaluates and compares six signal pretreatment methods for correcting the baseline variances, together with three variable selection methods for eliminating uninformative variables, all within the context of multivariate calibration models based on partial least squares (PLS) regression. Raman spectra of 90 artificial bio-fluid samples with eight urine metabolites at near-physiological concentrations were used to test these models. The combination of multiplicative scatter correction (MSC), continuous wavelet transform (CWT), randomization test (RT) and PLS modeling presented the best performance for all the metabolites. The correlation coefficient (R) between predicted and prepared concentration reached as high as 0.96.
Co-reporter:Jihui Li, Yan Zhang, Wensheng Cai, Xueguang Shao
Talanta 2011 Volume 84(Issue 3) pp:679-683
Publication Date(Web):15 May 2011
DOI:10.1016/j.talanta.2011.01.072
Analysis of metal ions in environment is of great importance for evaluating the risk of heavy metal to public health and ecological safety. A method for simultaneous determination of metal ions in water samples was developed by using adsorption preconcentration and near-infrared diffuse reflectance spectroscopy (NIRDRS). A high capacity adsorbent of thiol-functionalized magnesium phyllosilicate, named Mg-MTMS, was prepared by co-condensation for preconcentration of Hg2+, Pb2+ and Cd2+ in aqueous solutions. After adsorbing the analytes onto the adsorbent, NIRDRS was measured and PLS models were established for fast and simultaneous quantitative prediction. Because the interaction of the ions with the functional group of the adsorbent can be reflected in the spectra, the models built with the samples prepared by river water were proven to be efficient enough for precise prediction. The determination coefficients (R2) of the validation samples for the three ions were found as high as 0.9197, 0.9599 and 0.9861, respectively. Furthermore, because the high adsorption efficiency of Mg-MTMS, the detected concentrations are as low as milligrams per liter for the three ions, and the concentration can be further reduced. Therefore, the feasibility of quantitative analysis metal ions in river water by NIRDRS is proven and this may provide a new way for fast simultaneous determination of trace metals in environmental waters.
Co-reporter:Jun Kang, Wensheng Cai, Xueguang Shao
Talanta 2011 Volume 85(Issue 1) pp:420-424
Publication Date(Web):15 July 2011
DOI:10.1016/j.talanta.2011.03.089
Quantitative spectra-temperature relationship (QSTR) between near-infrared (NIR) spectra and temperature has been studied in our previous work (Talanta, 2010, 82, 1017-1021). In this study, applicability of the QSTR model for quantitative determination is further studied using the spectra of aqueous ethanol samples in the temperature range of 31–40 °C and the concentration range of 1–99%. The results show that QSTR model can be built by using the spectra in a small temperature range and the quantitative analysis can be achieved by only two spectra at different temperatures. Moreover, calibration curves for different concentration ranges (1–5%, 20–70%, 95–99%, v/v) are investigated by using linear and nonlinear curve fitting, respectively. Both of the linear and nonlinear curves are found to be applicable within these concentration ranges. Therefore, the temperature dependent NIR spectra may provide a new way for quantitative determination and may have high potential in bio-fluids analysis or industrial practices.
Co-reporter:Lingni Miao, Wensheng Cai, Xueguang Shao
Talanta 2011 Volume 83(Issue 4) pp:1247-1253
Publication Date(Web):30 January 2011
DOI:10.1016/j.talanta.2010.07.001
Applications of hyphenated chromatographic techniques, especially GC–MS technique, have been reported in chemical, biological, environmental, agricultural and medical analysis. The complexity of the samples in these fields is still an obstacle for the technique to be practical and the overlapping of the multicomponent signals induces chemometric methods widely employed. In this work, taking the rapid analysis of pesticide mixture as an example, a chemometric approach was proposed for resolution of multicomponent overlapping GC–MS signal. In the method, a mass spectral library of pesticides was organized at first, then target factor analysis (TFA) was employed for testing the existence of a specific pesticide in the multicomponent overlapping GC–MS signal, and finally the chromatographic information of the pesticide was extracted by a non-negative immune algorithm (IA). A GC–MS signal of a 40-component pesticide mixture eluted within 9 min was analyzed by the method. It was found that the mass spectra and chromatographic profiles of almost all the pesticides can be obtained.
Co-reporter:JingJing Liu;WenSheng Cai
Science China Chemistry 2011 Volume 54( Issue 5) pp:802-811
Publication Date(Web):2011 May
DOI:10.1007/s11426-011-4263-5
The classification of cancer is a major research topic in bioinformatics. The nature of high dimensionality and small size associated with gene expression data, however, makes the classification quite challenging. Although principal component analysis (PCA) is of particular interest for the high-dimensional data, it may overemphasize some aspects and ignore some other important information contained in the richly complex data, because it displays only the difference in the first two- or three-dimensional PC subspaces. Based on PCA, a principal component accumulation (PCAcc) method was proposed. It employs the information contained in multiple PC subspaces and improves the class separability of cancers. The effectiveness of the present method was evaluated by four commonly used gene expression datasets, and the results show that the method performs well for cancer classification.
Co-reporter:Xueguang Shao, Xihui Bian, Wensheng Cai
Analytica Chimica Acta 2010 Volume 666(1–2) pp:32-37
Publication Date(Web):7 May 2010
DOI:10.1016/j.aca.2010.03.036
Boosting partial least squares (PLS) has been used for regression to improve the predictive accuracy of PLS models, however, there are still problems when the outliers exist in the calibration dataset. To make the method robust and enhance its prediction ability, an improved boosting PLS is proposed and applied in quantitative analysis of near-infrared (NIR) spectral datasets. In the method, a robust step is added to weaken the effect of the outliers on the model. On the other hand, the loss function defined with relative errors is suggested for updating the sampling weight during the boosting procedure. In addition, the ensemble prediction by the weighted mean of the models in the boosting series is found to be more effective than the commonly used weighted median. The performance of the improved method is tested with two large NIR datasets of industrial production. The method was found to have a marked superiority in robustness and prediction ability, particularly when outliers exist.
Co-reporter:Xihui Bian, Wensheng Cai, Xueguang Shao, Da Chen and Edward R. Grant
Analyst 2010 vol. 135(Issue 11) pp:2841-2847
Publication Date(Web):10 Sep 2010
DOI:10.1039/C0AN00345J
The detection of influential observations is an essential step for building high performance models and has been recognized as an important and challenging task in many industrial and laboratorial applications. A new approach for detecting influential observations is developed based on their effect on partial least squares (PLS) modeling. In this method, we build a large number of PLS models by using Monte Carlo cross-validation (MCCV), and then perform principal component analysis (PCA) on the regression coefficients of these models. Because a model with influential observations is different from the one without influential observation, the series of PLS models cluster into different groups in principal component (PC) spaces, based on the different number of influential observations they contain. The influential observations can be therefore recognized according to the frequency number of each sample in each group. By three examples quantitatively modeling near-infrared (NIR) and Raman spectra, it was shown that the method can detect the influential observations intuitively and veraciously.
Co-reporter:Xueguang Shao, Xihui Bian, Jingjing Liu, Min Zhang and Wensheng Cai
Analytical Methods 2010 vol. 2(Issue 11) pp:1662-1666
Publication Date(Web):04 Oct 2010
DOI:10.1039/C0AY00421A
Near infrared (NIR) spectroscopy has been demonstrated as a powerful technique for both qualitative and quantitative analysis of complex systems in various fields. Calibration, however, is one of the important techniques needed to ensure the quality and practicability of the analyses. In this mini-review, recent developments in multivariate calibration methods for NIR spectroscopic analysis, including non-linear approaches and ensemble techniques, are briefly summarized. The advantages and disadvantages of these methods are compared and discussed critically.
Co-reporter:Heng Xu, Wensheng Cai and Xueguang Shao
Analytical Methods 2010 vol. 2(Issue 3) pp:289-294
Publication Date(Web):15 Jan 2010
DOI:10.1039/B9AY00257J
A weighted partial least squares (PLS) regression method for multivariate calibration of near infrared (NIR) spectra is proposed. In the method, the spectra are split into groups of variables according to the statistic values of variables, i.e., the stability, which has been used to evaluate the importance of variables in a calibration model. Because the stability reflects the relative importance of the variables for modeling, these groups present different spectral information for construction of PLS models. Therefore, if a weight which is proportional to the stability is assigned to each sub-model built with different group variables, a combined model can be built by a weighted combination of the sub-models. This method is different from the commonly used variable selection strategies, making full use of the variables according to their importance, instead of only the important ones. To validate the performance of the proposed method, it was applied to two different NIR spectral data sets. Results show that the proposed method can effectively utilize all variables in the spectra and enhance the prediction ability of the PLS model.
Co-reporter: Dr. Xueguang Shao;Nan Chen ;Wensheng Cai
Chinese Journal of Chemistry 2010 Volume 28( Issue 10) pp:2009-2014
Publication Date(Web):
DOI:10.1002/cjoc.201090335
Abstract
A method for quantitative determination of metal element in aqueous solution was developed by using adsorption and diffuse reflectance near-infrared spectroscopy (DRNIRS). In this method, the analyte is firstly adsorbed onto the resin from the dilute solution, and then the adsorbed analyte is directly determined in the sorbent by using DRNIRS. Enrichment of the analyte is achieved by the adsorption from the dilute solution, and quantitative determination is accomplished by using multivariate calibration technique. Taking chromium(VI) in river water as the analytical target, adsorption conditions and the partial least squares (PLS) model was optimized. The results show that chromium(VI) can be immobilized onto the adsorbent and quantitatively measured by DRNIRS and multivariate calibration. With cross validation and external validation, the correlation coefficient between the reference and predicted concentration was found to be above 0.98 in the range of 0.75–29.90 mg·L−1 for the PLS model, and the interference of the coexisting matrix was eliminated with the aid of multivariate calibration.
Co-reporter:Xueguang Shao, Xia Wu and Wensheng Cai
The Journal of Physical Chemistry A 2010 Volume 114(Issue 1) pp:29-36
Publication Date(Web):December 16, 2009
DOI:10.1021/jp906922v
The growth sequence of aluminum clusters containing up to 310 atoms was studied. The interaction of aluminum atoms is modeled by the NP-B potential fitted by highly accurate electronic structure datum for aluminum clusters and nanoparticles. The putative global minimum structures of Al2−310 clusters are obtained by dynamic lattice searching (DLS) and DLS with constructed cores (DLSc) method. Lower energy structures of Al63 and Al64 were found in comparison with the previously reported Al2−65 clusters. In the optimized structures of Al63−310, all clusters are identified as truncated octahedra (TO) except for five decahedral structures at Al64, Al72, Al74, Al76, and Al101, four stacking fault face-centered cubic structures at Al91, Al99, Al129, and Al135, and one icosahedron at Al147. Therefore, the structural transition from small clusters to bulk metal may occur around Al65. At the same time, the results show that aluminum clusters adopt TO growth pattern, and the growth is found to be based on six complete TO at Al38, Al79, Al116, Al140, Al201, and Al260.
Co-reporter:Xueguang Shao, Xia Wu, and Wensheng Cai
The Journal of Physical Chemistry A 2010 Volume 114(Issue 49) pp:12813-12818
Publication Date(Web):November 19, 2010
DOI:10.1021/jp106339f
Configuration of the surface atoms in aluminum clusters was investigated based on the structures with global minimum potential energy of some Al clusters in the size range of 270−500. The structures were optimized by the dynamic lattice searching with constructed cores (DLSc) method with the NP-B potential. In the optimized structures, all clusters are identified as truncated octahedra (TO) including three complete TO at Al260, Al314, and Al405. With the model of TO260 and TO405, the configurations of the surface atoms in the structures of the clusters from 261 to 314 and from 406 to 459 were investigated. The sites on (100) faces are found to be preferable to those on (111) faces for locating the new atoms with the increase of the cluster size, but for the clusters larger than 405 atoms, the sites on the (111) face are favored when the number of atoms exceeds the site number of a (100) face. Furthermore, the sites on the edge adjoining a (100) face and a (111) face are found to be very important to make a cluster more stable.
Co-reporter:Xueguang Shao, Jun Kang, Wensheng Cai
Talanta 2010 Volume 82(Issue 3) pp:1017-1021
Publication Date(Web):15 August 2010
DOI:10.1016/j.talanta.2010.06.009
Near-infrared (NIR) spectra are sensitive to the variation of experimental conditions, such as temperature. In this work, the relationship between NIR absorption spectra and temperature was quantitatively analyzed and applied to the quantitative determination of the compositions in mixtures. It was found that, for the solvents such as water and ethanol, a quantitative spectra–temperature relationship (QSTR) model between NIR spectra and temperature can be established by using partial least squares (PLS) regression. Therefore, the temperature of a solution can be predicted by using the model and NIR spectrum. Furthermore, it was also found that the difference between the predicted results of different solutions is a quantitative reflection of concentration. The variation of intercept in the relationship of the predicted and measured temperature can be used to determine the concentration of the compositions. The mixtures of water, methanol, ethanol and ethylenediamine in a concentration range of 5–80% (v/v) were studied. The calibration curves are found to be reliable with the correlation coefficients (R) higher than 0.99. Both the QSTR and calibration model may extend the application of NIR spectroscopy and provide novel techniques for analytical chemistry.
Co-reporter:Xueguang Shao, Zhichao Liu and Wensheng Cai
Analyst 2009 vol. 134(Issue 10) pp:2095-2099
Publication Date(Web):31 Jul 2009
DOI:10.1039/B902664A
Extraction of chemical information of the components from a complex analytical signal has been a great challenge in chemometrcal studies for complex sample analysis. Independent component analysis (ICA) has been widely applied in complex signal separation, including the multicomponent overlapping signals in analytical chemistry. Difficulties, however, still exist in the application of ICA in chemical signal processing because chemical signals of different components are generally correlated and non-negative, instead of independence as hypothesized in ICA. In this study, a non-negative ICA method is proposed by means of a post rotation of the independent components (ICs) and applied to the extraction of the chemical information of the components from the signals of complex samples. Raman spectra of pharmaceutical tablets and gas chromatography-mass spectrometry (GC-MS) data of cigarette smoke are qualitatively analyzed. The results show that the Raman spectrum of the active substance in the pharmaceutical tablets and the mass spectra of the components in the overlapping GC-MS signal can be effectively and accurately extracted by using the proposed method.
Co-reporter:Zhichao Liu, Wensheng Cai and Xueguang Shao
Analyst 2009 vol. 134(Issue 2) pp:261-266
Publication Date(Web):23 Oct 2008
DOI:10.1039/B810623A
A weighted multiscale regression for building a combined model in multivariate calibration of near infrared spectra is proposed. In the approach, the spectra are decomposed into different scale blocks (or frequency components) by wavelet transform (WT) at first, then partial least squares (PLS) models are built with the decomposed components, and at last a combined model is built by a weighted averaging. The weight of each model is determined by the prediction residual error sum of squares (PRESS) value obtained with Monte Carlo cross validation (MCCV). The underlying philosophy of the strategy is that useful information may be embedded in all the components obtained by WT, although the higher and lower frequency components mainly represent noise and background, respectively. To validate the effectiveness and universality of the proposed method, it was applied to two different sets of near-infrared (NIR) spectra of tobacco lamina. Compared with the results obtained with commonly used PLS methods, the proposed method is proved to be a high-performance tool for multivariate calibration of complex NIR spectra.
Co-reporter:Zhichao Liu, Wensheng Cai, Xueguang Shao
Journal of Chromatography A 2009 Volume 1216(Issue 9) pp:1469-1475
Publication Date(Web):27 February 2009
DOI:10.1016/j.chroma.2008.12.098
Hyphenated techniques such as gas chromatography–mass spectrometry (GC–MS) or high-performance liquid chromatography–mass spectrometry (LC–MS) produce a large amount of data in a form of two-way data matrix. It has been a great challenge to furthest extract the useful information from the data. In this work, a chemometric approach based on a modification of adaptive immune algorithm (AIA) was proposed for a high-throughput analysis of the multicomponent overlapping GC–MS signals. With the proposed method, the chromatographic profile of each component in an overlapping signal can be extracted independently and sequentially along the retention time. In order to show the efficiency of the method, a stimulated GC–MS data of six components with background and an experimental GC–MS data of 40 pesticides were investigated. It was found that the multicomponent overlapping GC–MS signals could be fast and accurately resolved. Furthermore, the quantitative property of the extracted information was also investigated. The correlation coefficients (r) between the peak area and the added volumes of the sample are in the range 0.9658–0.9953.
Co-reporter:Xueguang SHAO;Da CHEN;Heng XU;Zhichao LIU ;Wensheng CAI
Chinese Journal of Chemistry 2009 Volume 27( Issue 7) pp:1328-1332
Publication Date(Web):
DOI:10.1002/cjoc.200990222
Abstract
Partial least-squares (PLS) regression has been presented as a powerful tool for spectral quantitative measurement. However, the improvement of the robustness and stability of PLS models is still needed, because it is difficult to build a stable model when complex samples are analyzed or outliers are contained in the calibration data set. To achieve the purpose, a robust ensemble PLS technique based on probability resampling was proposed, which is named RE-PLS. In the proposed method, a probability is firstly obtained for each calibration sample from its residual in a robust regression. Then, multiple PLS models are constructed based on probability resampling. At last, the multiple PLS models are used to predict unknown samples by taking the average of the predictions from the multiple models as final prediction result. To validate the effectiveness and universality of the proposed method, it was applied to two different sets of NIR spectra. The results show that RE-PLS can not only effectively avoid the interference of outliers but also enhance the precision of prediction and the stability of PLS regression. Thus, it may provide a useful tool for multivariate calibration with multiple outliers.
Co-reporter:Xia Wu, Wensheng Cai, Xueguang Shao
Chemical Physics 2009 Volume 363(1–3) pp:72-77
Publication Date(Web):18 September 2009
DOI:10.1016/j.chemphys.2009.08.001
Abstract
Global optimization of large clusters has been a difficult task, though much effort has been paid and many efficient methods have been proposed. During our works, a rotation operation (RO) is designed to realize the structural transformation from decahedra to icosahedra for the optimization of large clusters, by rotating the atoms below the center atom with a definite degree around the fivefold axis. Based on the RO, a development of the previous dynamic lattice searching with constructed core (DLSc), named as DLSc–RO, is presented. With an investigation of the method for the optimization of Lennard-Jones (LJ) clusters, i.e., LJ500, LJ561, LJ600, LJ665–667, LJ670, LJ685, and LJ923, Morse clusters, silver clusters by Gupta potential, and aluminum clusters by NP-B potential, it was found that both the global minima with icosahedral and decahedral motifs can be obtained, and the method is proved to be efficient and universal.
Co-reporter:Yong Hao, Wensheng Cai, Xueguang Shao
Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 2009 Volume 72(Issue 1) pp:115-119
Publication Date(Web):February 2009
DOI:10.1016/j.saa.2008.08.011
Near-infrared diffuse reflectance spectroscopy (NIRDRS) has been proved to be a convenient and fast quantitative method for complex samples. The high detection limit or the low sensitivity of the method, however, is a big problem obstructing its application in the analysis of low concentration samples. A strategy for quantitative determination of low concentration samples was developed by using NIRDRS. The method takes an adsorbent as a substrate for gathering the analytes from a solution, and uses the multivariate calibration technique for quantitative calculation. So, the detection limit can be improved and the interferences can be eliminated when complex samples are analyzed. Taking benzoic and sorbic acids as the analyzing targets and the alumina as the adsorbent, partial least squares (PLS) model is built from the NIRDRS of the adsorbates. The results show that the concentrations that can be quantitatively detected are as low as 0.011 and 0.013 mg mL−1 for benzoic and sorbic acids, respectively, and the co-adsorbates do not interfere each other.
Co-reporter:Nan Sheng, Wensheng Cai, Xueguang Shao
Talanta 2009 Volume 79(Issue 2) pp:339-343
Publication Date(Web):15 July 2009
DOI:10.1016/j.talanta.2009.03.059
Near-infrared spectroscopy (NIRS) has been proved to be a powerful analytical tool and used in various fields, it is seldom, however, used in the analysis of metal ions in solutions. A method for quantitative determination of metal ions in solution is developed by using resin adsorption and near-infrared diffuse reflectance spectroscopy (NIRDRS). The method makes use of the resin adsorption for gathering the analytes from a dilute solution, and then NIRDRS of the adsorbate is measured. Because both the information of the metal ions and their interaction with the functional group of resin can be reflected in the spectrum, quantitative determination is achieved by using multivariate calibration technique. Taking copper (Cu2+), cobalt (Co2+) and nickel (Ni2+) as the analyzing targets and D401 resin as the adsorbent, partial least squares (PLS) model is built from the NIRDRS of the adsorbates. The results show that the concentrations that can be quantitatively detected are as low as 1.00, 1.98 and 1.00 mg L−1 for Cu2+, Co2+ and Ni2+, respectively, and the coexistent ions do not influence the determination.
Co-reporter:Xueguang Shao;Xia Wu;Wensheng Cai
Frontiers of Chemistry in China 2009 Volume 4( Issue 4) pp:
Publication Date(Web):2009 December
DOI:10.1007/s11458-009-0104-x
Global optimization of clusters is a subject of intense interest in computational chemistry. Especially for large clusters, locating the global minima is a challenging problem. Two strategies are generally used for the problem, i.e., the stochastic optimization and the static modeling strategy. The former is known as unbiased global optimization method, while the latter is more efficient but biased. This review describes the development of a dynamic lattice searching (DLS) approach. In DLS, the lattices are constructed dynamically and optimization is achieved by searching these lattices. Therefore, DLS possesses the characteristics of both the stochastic and static methods. With the aim of improving the efficiency of DLS for optimization of large clusters, several variants of the method have been developed. The results show that DLS methods may be promising tools for fast modeling of large clusters. With this review, greater interests are expected for global optimization of atomic or molecular clusters.
Co-reporter:Da Chen, Wensheng Cai, Xueguang Shao
Vibrational Spectroscopy 2008 Volume 47(Issue 2) pp:113-118
Publication Date(Web):17 July 2008
DOI:10.1016/j.vibspec.2008.03.002
Near-infrared (NIR) spectroscopy will present a more promising tool for quantitative measurement if the reliability of the calibration model is further improved. To achieve this purpose, a new partial least squares (PLSs) technique based on Monte Carlo (MC) resampling is proposed, which is named as MCPLS. In this method, the outliers are firstly removed based on probability statistics. Then, the models without outliers are averaged and combined into a single prediction model as done in a consensus modeling, which can greatly enhance the reliability of PLS calibration. To validate the effectiveness and universality of the proposed method, it was applied to two different sets of NIR spectra. It was found that MCPLS could effectively avoid the swamping and masking effects caused by multiple outliers. The results show that the method is of value to enhance the reliability of PLS model involving complex NIR matrices with a small number of outliers.
Co-reporter:Qian Liu, Wensheng Cai, Xueguang Shao
Talanta 2008 Volume 77(Issue 2) pp:679-683
Publication Date(Web):15 December 2008
DOI:10.1016/j.talanta.2008.07.011
A method for determination of seven polyphenols (chlorogenic acid, esculetin, caffeic acid, scopoletin, rutin, quercetin hydrate, kaempferol) by reversed-phase high performance liquid chromatography (RP-HPLC) combined with preconcentration was developed. The preconcentration was accomplished by adsorption–desorption method with a styrene-divinylbenzene resin (XAD-4), and the analytes were desorbed by methanol. The parameters of adsorption and desorption, such as the amounts of resin, adsorption time, pH of the adsorption solution, and the volume of methanol for desorption were optimized. RP-HPLC with photodiode array detector (PAD) was employed for the qualitative and quantitative analysis. Methanol and acetic–water (1:99, v/v) solution were used as the mobile phase, and a gradient program was established for separation. Calibration curves of the seven analytes were obtained in the range of 0.8–3 mg L−1, with correlation coefficients (R) higher than 0.9990. With standard samples, the recoveries for the preconcentration step under optimal conditions were 93–99%, and the relative standard deviations were 0.2–2.0% (n = 5). Polyphenols in simulated tobacco-polluted water were analyzed with the optimized conditions. Chlorogenic acid and rutin were found and determined, whose concentrations were 32.8 and 19.2 μg L−1, respectively. The spiked recoveries of the polyphenols were 83–95% except quercetin hydrate (63%), the relative standard deviations were less than 3.5% (n = 5).
Co-reporter:ZhiChao Liu;WenSheng Cai
Science China Chemistry 2008 Volume 51( Issue 8) pp:
Publication Date(Web):2008 August
DOI:10.1007/s11426-008-0080-x
An outlier detection method is proposed for near-infrared spectral analysis. The underlying philosophy of the method is that, in random test (Monte Carlo) cross-validation, the probability of outliers presenting in good models with smaller prediction residual error sum of squares (PRESS) or in bad models with larger PRESS should be obviously different from normal samples. The method builds a large number of PLS models by using random test cross-validation at first, then the models are sorted by the PRESS, and at last the outliers are recognized according to the accumulative probability of each sample in the sorted models. For validation of the proposed method, four data sets, including three published data sets and a large data set of tobacco lamina, were investigated. The proposed method was proved to be highly efficient and veracious compared with the conventional leave-one-out (LOO) cross validation method.
Co-reporter:Da Chen, Wensheng Cai, Xueguang Shao
Analytica Chimica Acta 2007 Volume 598(Issue 1) pp:19-26
Publication Date(Web):15 August 2007
DOI:10.1016/j.aca.2007.07.023
A strategy, named as removing uncertain variables based on ensemble partial least squares (RUV-EPLS), was proposed. In this strategy, the uncertainty in PLS regression coefficients is evaluated by the criterion of stability, and the variables whose regression coefficients carry a relatively large uncertainty are eliminated. Then, a new EPLS model with the remaining variables is constructed. To reasonably control the quality of the PLS member models in the RUV-EPLS, an objective criterion based on the F-test is used, which makes the RUV-EPLS convenient to perform in practice. To validate the effectiveness and universality of the strategy, it was applied to two different sets of near-infrared (NIR) spectra. It is of great interest to be found that the RUV-EPLS is not so sensitive to the outliers as many other calibration methods, and the selected variables are indeed known to be informative for corresponding compounds, which results in a reliable and high-quality calibration model. The study reveals that the RUV-EPLS method is of value to improve stability and predictive ability of multivariate calibration involving complex matrices that may contain a small number of outliers.
Co-reporter:Wei Wang;WenSheng Cai
Science China Chemistry 2007 Volume 50( Issue 4) pp:530-537
Publication Date(Web):2007 August
DOI:10.1007/s11426-007-0065-1
Independent component analysis (ICA) has demonstrated its power to extract mass spectra from overlapping GC/MS signal. However, there is still a problem that mass spectra with negative peaks at some m/z will be obtained in the resolved results when there are overlapping peaks in the mass spectra of a mixture. Based on a detail theoretical analysis of the preconditions for ICA and the non-negative property of GC/MS signals, a post-modification based on chemical knowledge (PMBK) strategy is proposed to solve this problem. By both simulated and experimental GC/MS signals, it was proved that the PMBK strategy can improve the resolution effectively.
Co-reporter:Longjiu Cheng;Wensheng Cai Dr.
ChemPhysChem 2007 Volume 8(Issue 4) pp:569-577
Publication Date(Web):7 FEB 2007
DOI:10.1002/cphc.200600604
A newly developed unbiased structural optimization method, named dynamic lattice searching (DLS), is proposed as an approach for conformational analysis of atomic/molecular clusters and used in understanding the energy landscape of large clusters. The structures of clusters are described in terms of the number of basic tetrahedron (BT) units they contain. We found that the hit numbers of different structural motifs in DLS runs is proportional to the number of BTs. A parameter Tmax is defined to limit the maximal number of atoms moved in a structural transition. Results show that Tmax is a key parameter for modulating the efficiency of the DLS method and has a great influence on the hit number of different motifs in DLS runs. Finally, the effect of potential range on the conformational distribution of the (Morse)98 cluster is also discussed with different potential-range parameters.
Co-reporter:Jinchao Shen, Xueguang Shao
Analytica Chimica Acta 2006 Volume 561(1–2) pp:83-87
Publication Date(Web):2 March 2006
DOI:10.1016/j.aca.2006.01.002
The feasibility of employing cloud point extraction (CPE) as a simple and effective alternative for recovery of alkaloids from tobacco samples followed by GC–MS analysis is demonstrated. An aqueous surfactant solution containing 5% Triton X114 was used for extraction of tobacco alkaloids. Then, the analytes were back-extracted with ultrasonic assistance from the surfactant-rich phase into dichloromethane. No other cleanup step preceded GC–MS analysis. Seven alkaloids in tobacco sample with different concentrations were analyzed simultaneously. Results show that the recovery of nicotine is 80.4% and the limit of detection (LOD) is 7.1 μg g−1. The relative standard deviations for the seven alkaloids are in the range of 2.77–9.97%. The results yielded by the proposed method were almost identical with those achieved by the more laborious industrial standard method, i.e., the continuous flow method.
Co-reporter:Huan Zhan, Longjiu Cheng, Wensheng Cai, Xueguang Shao
Chemical Physics Letters 2006 Volume 422(4–6) pp:358-362
Publication Date(Web):10 May 2006
DOI:10.1016/j.cplett.2006.02.084
Abstract
Structures of silver clusters from Ag61 to Ag120 were optimized with an unbiased global optimization algorithm named dynamic lattice searching (DLS). The interaction among silver atoms is modeled by the Gupta potential. New global minima of Ag79 and Ag80 were found. The results show that there are two magic number clusters Ag75 and Ag101 from Ag61 to Ag120. Most of the clusters in the studied sizes have decahedral motifs, however, there are 9 clusters with non-decahedral pattern between the two neighboring magic numbers. These results might help us understand the growth rules of medium sized silver clusters.
Co-reporter:Xueguang Shao, Wei Wang, Zhenyu Hou, Wensheng Cai
Talanta 2006 Volume 69(Issue 3) pp:676-680
Publication Date(Web):15 May 2006
DOI:10.1016/j.talanta.2005.10.039
Based on independent component analysis (ICA), a new regression method, independent component regression (ICR), was developed to build the model of NIR spectra and the routine components of plant samples. It is found that ICR and principal component regression (PCR) are completely equivalent when they are applied in quantitative prediction. However, independent components (ICs) can give more chemical explanation than principal components (PCs) because independence is a high-order statistic that is a much stronger condition than orthogonality. Three ICs are obtained by ICA from the NIR spectra of plant samples; it is found that they are strongly correlated to the NIR spectra of water, hydrocarbons and organonitrogen compounds, respectively. Therefore, ICA may be a promising tool to retrieve both quantitative and qualitative information from complex chemical data sets.
Co-reporter:Zhiyi Mao, Wensheng Cai, Xueguang Shao
Journal of Biomedical Informatics (August 2013) Volume 46(Issue 4) pp:594-601
Publication Date(Web):1 August 2013
DOI:10.1016/j.jbi.2013.03.009
•A new approach was developed to identify genes from gene expression data.•A statistic is defined to evaluate the significance of the genes in the method.•Informative genes are selected by the statistic for cancer classification.•The method may provide an alternative for gene selection problem.Gene selection is an important task in bioinformatics studies, because the accuracy of cancer classification generally depends upon the genes that have biological relevance to the classifying problems. In this work, randomization test (RT) is used as a gene selection method for dealing with gene expression data. In the method, a statistic derived from the statistics of the regression coefficients in a series of partial least squares discriminant analysis (PLSDA) models is used to evaluate the significance of the genes. Informative genes are selected for classifying the four gene expression datasets of prostate cancer, lung cancer, leukemia and non-small cell lung cancer (NSCLC) and the rationality of the results is validated by multiple linear regression (MLR) modeling and principal component analysis (PCA). With the selected genes, satisfactory results can be obtained.Graphical abstractDownload high-res image (48KB)Download full-size image
Co-reporter:Xiaoyu Cui, Xiuwei Liu, Xiaoming Yu, Wensheng Cai, Xueguang Shao
Analytica Chimica Acta (8 March 2017) Volume 957() pp:
Publication Date(Web):8 March 2017
DOI:10.1016/j.aca.2017.01.004
•Water as a probe for glucose sensing in aqueous systems is studied.•Water structural variations are captured by temperature dependent NIR spectra.•Multilevel simultaneous component analysis is used to extract spectral change.•Quantification of blood glucose in human serum samples was achieved.Near infrared (NIR) spectra are sensitive to the variation on water structure caused by perturbations, such as temperature and additives. In this work, water was applied as a probe to detect glucose in aqueous glucose solutions and human serum samples. Spectral changes of water were captured from the temperature dependent NIR spectra using multilevel simultaneous component analysis (MSCA). The first and second level model were established to describe the quantitative spectra–temperature relationship (QSTR) and the quantitative spectra–concentration relationship (QSCR), i.e., the calibration curve, respectively. The score of the first level model shows that the content of free OH in water molecules increases with temperature elevation. The correlation coefficients (R2) of the QSTR model between the score and temperature are higher than 0.99, and that of the calibration model (QSCR) between the spectral features of water clusters and the concentration of glucose are 0.99 and 0.84 for glucose solutions and serum samples, respectively. External validation of the calibration model was further performed with human serum samples. The standard error of the prediction is 0.45. In addition, the linearity of the QSCR models may reveal that glucose interacts with small water clusters and enhances the formation of the hydration shell. Therefore, using water as a probe may provide a new way for quantitative determination of the analytes in aqueous solutions by NIR spectroscopy.
Co-reporter:Wensheng Cai, Yankun Li, Xueguang Shao
Chemometrics and Intelligent Laboratory Systems (15 February 2008) Volume 90(Issue 2) pp:
Publication Date(Web):15 February 2008
DOI:10.1016/j.chemolab.2007.10.001
Variable (or wavelength) selection plays an important role in the quantitative analysis of near-infrared (NIR) spectra. A modified method of uninformative variable elimination (UVE) was proposed for variable selection in NIR spectral modeling based on the principle of Monte Carlo (MC) and UVE. The method builds a large number of models with randomly selected calibration samples at first, and then each variable is evaluated with a stability of the corresponding coefficients in these models. Variables with poor stability are known as uninformative variable and eliminated. The performance of the proposed method is compared with UVE-PLS and conventional PLS for modeling the NIR data sets of tobacco samples. Results show that the proposed method is able to select important wavelengths from the NIR spectra, and makes the prediction more robust and accurate in quantitative analysis. Furthermore, if wavelet compression is combined with the method, more parsimonious and efficient model can be obtained.
Co-reporter:Pao Li, Wensheng Cai and Xueguang Shao
Journal of Analytical Atomic Spectrometry 2015 - vol. 30(Issue 4) pp:NaN940-940
Publication Date(Web):2015/02/12
DOI:10.1039/C5JA00031A
Inductively coupled plasma optical emission spectrometry is a well-known technique for elemental analysis. However, in the analysis of trace elements in complex matrices, the signals can easily be interfered by those of other elements and drift the baseline. In this work, a chemometric approach was employed for the identification and quantitative analysis of trace elements in complex matrices. With the help of standard signals, the signals of trace elements can be obtained by a non-negative immune algorithm and used for the quantitative analysis. The method was validated by the analysis of trace calcium in rare earth matrices, trace bismuth, lead, and antimony in a tungsten matrix, trace phosphorus in iron and copper matrices and urban recycled water samples. The recoveries of the spiked samples were in the range of 96.1% to 108.3%.
Co-reporter:
Analytical Methods (2009-Present) 2012 - vol. 4(Issue 5) pp:
Publication Date(Web):
DOI:10.1039/C2AY25038A
Indirect modeling of trace components in real samples by use of near-infrared spectroscopy (NIRS) has gained much interest, because it may provide a rapid way for analyzing the industrial or agricultural products. Coupling near-infrared diffusive reflectance spectroscopy and chemometric techniques, a method for rapid analysis of four tobacco specific N-nitrosamines (TSNAs) and their total content is studied in this work. For optimization of the models, techniques for spectral preprocessing and variable selection are adopted and compared. It is found that to remove the varying background and correction of the multiplicative scattering effect in the spectra is important in modeling, also variable selection can significantly improve the models. For validation of the models, the TSNA contents of independent test samples and tobacco leaves harvested in different years were predicted. Consistent results were obtained between the reference contents by GC/TEA analysis and those predicted. Although the relative errors for some low content samples are not satisfactory, the method is a practical alternative for industrial analysis due to the non-destructive and rapid nature of the method.
Co-reporter:
Analytical Methods (2009-Present) 2014 - vol. 6(Issue 13) pp:
Publication Date(Web):
DOI:10.1039/C4AY00243A
The discrimination of pharmaceutical products has been an important task in pharmaceutical industry and for pharmaceutical safety. In this study, the principal component accumulation (PCAcc) method was investigated for the discrimination of Chinese patent medicines. In the PCAcc method, an accumulation strategy is utilized to combine the classification information contained in multiple PC subspaces by using a rotation, a projection and a summation operation. To improve the performance of classification, continuous wavelet transform is applied as the pretreatment method to eliminate the background. The results show that, among the 12 classes of Chinese patent medicines, 8 classes are correctly classified, and a total of ten samples are misclassified for the other four classes. Compared with the results obtained by principal component analysis, radial basis function artificial neural network and partial least squares discriminant analysis, PCAcc produces the best classification.
Co-reporter:Yi Wang, Xiang Ma, Yadong Wen, Jingjing Liu, Wensheng Cai and Xueguang Shao
Analytical Methods (2009-Present) 2012 - vol. 4(Issue 9) pp:NaN2899-2899
Publication Date(Web):2012/06/13
DOI:10.1039/C2AY25508A
Discrimination is a universal problem in various fields. Near-infrared (NIR) spectroscopy combined with pattern recognition technique has shown great power and gained wide acceptance for analyzing complex samples. A principal component accumulation (PCAcc) method was proposed in our previous work for the two-class problem of cancer classification based on microarray gene expression data. In this work, PCAcc is applied to the multiclass problem of plant sample classification based on NIR spectroscopy. A hierarchical way is adopted as in the decision tree methods, each node dividing the samples into two classes. A parameter of resolution (Rs) is used to quantitatively measure the difference between the two classes. To validate the performance of the proposed method, it is applied to the NIR spectral datasets of different parts of tobacco leaves and different brands of cigarettes. The results show that the method can discriminate these samples with a very good performance in terms of classification accuracy.
Co-reporter:Yan Zhang, Yong Hao, Wensheng Cai and Xueguang Shao
Analytical Methods (2009-Present) 2011 - vol. 3(Issue 3) pp:
Publication Date(Web):
DOI:10.1039/C0AY00775G
Co-reporter:Xueguang Shao, Xihui Bian, Jingjing Liu, Min Zhang and Wensheng Cai
Analytical Methods (2009-Present) 2010 - vol. 2(Issue 11) pp:NaN1666-1666
Publication Date(Web):2010/10/04
DOI:10.1039/C0AY00421A
Near infrared (NIR) spectroscopy has been demonstrated as a powerful technique for both qualitative and quantitative analysis of complex systems in various fields. Calibration, however, is one of the important techniques needed to ensure the quality and practicability of the analyses. In this mini-review, recent developments in multivariate calibration methods for NIR spectroscopic analysis, including non-linear approaches and ensemble techniques, are briefly summarized. The advantages and disadvantages of these methods are compared and discussed critically.
Co-reporter:Heng Xu, Wensheng Cai and Xueguang Shao
Analytical Methods (2009-Present) 2010 - vol. 2(Issue 3) pp:NaN294-294
Publication Date(Web):2010/01/15
DOI:10.1039/B9AY00257J
A weighted partial least squares (PLS) regression method for multivariate calibration of near infrared (NIR) spectra is proposed. In the method, the spectra are split into groups of variables according to the statistic values of variables, i.e., the stability, which has been used to evaluate the importance of variables in a calibration model. Because the stability reflects the relative importance of the variables for modeling, these groups present different spectral information for construction of PLS models. Therefore, if a weight which is proportional to the stability is assigned to each sub-model built with different group variables, a combined model can be built by a weighted combination of the sub-models. This method is different from the commonly used variable selection strategies, making full use of the variables according to their importance, instead of only the important ones. To validate the performance of the proposed method, it was applied to two different NIR spectral data sets. Results show that the proposed method can effectively utilize all variables in the spectra and enhance the prediction ability of the PLS model.
Co-reporter:
Analytical Methods (2009-Present) 2012 - vol. 4(Issue 2) pp:
Publication Date(Web):
DOI:10.1039/C2AY05609G
Near-infrared (NIR) spectral analysis usually needs to take advantage of multivariate calibration. However, not all the variables in the spectra have equal contributions to a calibration model. Identification of informative variables is a key step to build a high performance model. According to the influence of a variable on the calibration model, influential variable (IV) is defined and a method for identification of IVs is proposed in this work. In the method, a set of partial least squares (PLS) models are built using a subset of variables selected randomly by Monte Carlo re-sampling, and then the clustering of these models are investigated by means of principal component analysis. The variables that make the models grouping can be identified as the IVs. Finally, the PLS model built with the selected IVs is adopted as the calibration model. Five NIR spectral datasets are used to test the performance and applicability of the method. The results show that the identified IVs are reasonable and the calibration model is efficient enough to produce accurate and reliable predictions.