Kimito Funatsu

Find an error

Name:
Organization: The University of Tokyo
Department: Department of Chemical System Engineering
Title:
Co-reporter:H. Kaneko;K. Funatsu
Industrial & Engineering Chemistry Research October 9, 2013 Volume 52(Issue 40) pp:14505-14505
Publication Date(Web):Publication Date (Web): September 26, 2013
DOI:10.1021/ie4030984
Co-reporter:Hiromasa Kaneko;Takeshi Okada
Industrial & Engineering Chemistry Research October 15, 2014 Volume 53(Issue 41) pp:15962-15968
Publication Date(Web):2017-2-22
DOI:10.1021/ie502058t
Soft sensors are widely used to realize efficient operations in chemical processes because some governing variables, such as product quality, cannot be measured directly through hardware in real time. One of the design problems of soft sensors is the degradation of their prediction accuracy. To reduce degradation, a range of adaptive models has been developed, such as moving window, just-in-time, and time difference models. However, none of these adaptive models performs well in all process states. To address this problem, we developed an online monitoring system using multivariate statistical process control to select the appropriate adaptive model for each process state. The proposed method was applied to dynamic simulation data and empirical industrial data. Higher predictive accuracy than from traditional adaptive models was achieved. This novel approach may be used to reduce the maintenance cost of soft sensors.
Co-reporter:Yasuyuki Masuda;Hiromasa Kaneko
Industrial & Engineering Chemistry Research May 21, 2014 Volume 53(Issue 20) pp:8553-8564
Publication Date(Web):2017-2-22
DOI:10.1021/ie501024w
The development of process monitoring and control methods is important to maintaining product quality in chemical plants safely and effectively. Therefore, multivariate statistical process control (MSPC) methods have been developed, but traditional MSPC methods cannot detect faults relating to process variables that are difficult to measure online. In this work, a new MSPC method including soft sensor prediction is proposed to solve this problem. Soft sensors predict values of difficult-to-measure variables that are used as input variables of fault detection models. The proposed method enables the real-time control of processes using difficult-to-measure variables. The fault detection performance of the proposed method is demonstrated and compared with that of traditional MSPC methods using the Tennessee Eastman process and real industrial process data sets. The results show that the proposed method can achieve more accurate and earlier fault detection than traditional MSPC methods.
Co-reporter:Atsuyuki Nakao;Hiromasa Kaneko
Industrial & Engineering Chemistry Research May 18, 2016 Volume 55(Issue 19) pp:5726-5735
Publication Date(Web):Publication Date (Web): April 27, 2016
DOI:10.1021/acs.iecr.6b00852
In materials design and development, experimenters must make experiments under various conditions until they achieve a required physical property, yield, cost, or other objective. Use of a regression model based on existing experimental results is one suitable way to reduce the number of experiments required and development costs. Although adaptive experimental design methods using regression models for sequential and parallel experiments have previously been developed, those methods are sequential sampling methods or maximization and minimization methods, which is not always suitable for material design. Therefore, we have developed an adaptive experimental design method for parallel experiments in the field of material design, which uses the probability that more than one experimental result will achieve a result within a target range of a property on the next set of parallel experiments. We used Gaussian process regression to consider correlation of the predicted values of a property in multiple experiments. The probability of achieving results within a required property range on the next set of parallel experiments is calculated on the basis of this correlation. Using five case studies, we demonstrated that the proposed method could select experimental conditions more efficiently than traditional methods without requiring any parameters to be set in advance.
Co-reporter:Shojiro Shibayama;Hiromasa Kaneko
AAPS PharmSciTech 2017 Volume 18( Issue 3) pp:595-604
Publication Date(Web):2017 April
DOI:10.1208/s12249-016-0547-6
This article proposes a novel concentration prediction model that requires little training data and is useful for rapid process understanding. Process analytical technology is currently popular, especially in the pharmaceutical industry, for enhancement of process understanding and process control. A calibration-free method, iterative optimization technology (IOT), was proposed to predict pure component concentrations, because calibration methods such as partial least squares, require a large number of training samples, leading to high costs. However, IOT cannot be applied to concentration prediction in non-ideal mixtures because its basic equation is derived from the Beer–Lambert law, which cannot be applied to non-ideal mixtures. We proposed a novel method that realizes prediction of pure component concentrations in mixtures from a small number of training samples, assuming that spectral changes arising from molecular interactions can be expressed as a function of concentration. The proposed method is named IOT with virtual molecular interaction spectra (IOT-VIS) because the method takes spectral change as a virtual spectrum xnonlin,i into account. It was confirmed through the two case studies that the predictive accuracy of IOT-VIS was the highest among existing IOT methods.
Co-reporter:Tomoyuki Miyao; Hiromasa Kaneko
Journal of Chemical Information and Modeling 2016 Volume 56(Issue 2) pp:286-299
Publication Date(Web):January 28, 2016
DOI:10.1021/acs.jcim.5b00628
Retrieving descriptor information (x information) from a value of an objective variable (y) is a fundamental problem in inverse quantitative structure–property relationship (inverse-QSPR) analysis but challenging because of the complexity of the preimage function. Herewith, we propose using a cluster-wise multiple linear regression (cMLR) model as a QSPR model for inverse-QSPR analysis. x information is acquired as a probability density function by combining cMLR and the prior distribution modeled with a mixture of Gaussians (GMMs). Three case studies were conducted to demonstrate various aspects of the potential of cMLR. It was found that the predictive power of cMLR was superior to that of MLR, especially for data with nonlinearity. Moreover, it turned out that the applicability domain could be considered since the posterior distribution inherits the prior distribution’s feature (i.e., training data feature) and represents the possibility of having the desired property. Finally, a series of inverse analyses with the GMMs/cMLR was demonstrated with the aim to generate de novo structures having specific aqueous solubility.
Co-reporter:Shunichi Takeda, Hiromasa Kaneko, and Kimito Funatsu
Journal of Chemical Information and Modeling 2016 Volume 56(Issue 10) pp:1885-1893
Publication Date(Web):September 15, 2016
DOI:10.1021/acs.jcim.6b00038
To discover drug compounds in chemical space containing an enormous number of compounds, a structure generator is required to produce virtual drug-like chemical structures. The de novo design algorithm for exploring chemical space (DAECS) visualizes the activity distribution on a two-dimensional plane corresponding to chemical space and generates structures in a target area on a plane selected by the user. In this study, we modify the DAECS to enable the user to select a target area to consider properties other than activity and improve the diversity of the generated structures by visualizing the drug-likeness distribution and the activity distribution, generating structures by substructure-based structural changes, including addition, deletion, and substitution of substructures, as well as the slight structural changes used in the DAECS. Through case studies using ligand data for the human adrenergic alpha2A receptor and the human histamine H1 receptor, the modified DAECS can generate high diversity drug-like structures, and the usefulness of the modification of the DAECS is verified.
Co-reporter:Hiromasa Kaneko, Kimito Funatsu
Chemometrics and Intelligent Laboratory Systems 2016 Volume 153() pp:75-81
Publication Date(Web):15 April 2016
DOI:10.1016/j.chemolab.2016.02.011
•Predictive ability of adaptive soft sensors depends on databases.•Initial databases include huge data sets and are data rich, but information poor.•We propose to select small but comprehensive data sets from huge data sets.•The Kennard–Stone algorithm is modified for data selection.•The performance is confirmed using a simulated data set and two industrial data sets.Soft sensors predict values of process variables that are difficult to measure in real time. Predictive ability of adaptive soft sensors depends on databases. However, there is no way to construct initial databases from huge data sets that are measured in plants and that are data rich, but information poor. Therefore, we propose a method to select comprehensive data from huge data sets to build soft sensors with high predictive ability. A genetic algorithm and the Kennard–Stone algorithm are modified for data selection considering predictive ability of regression models and data distribution. Through the analyses of numerical simulation data and real industrial data, we confirm that initial databases could be appropriately constructed from huge data sets and predictive accuracy of soft sensors subsequently increased.
Co-reporter:Shojiro Shibayama, Hiromasa Kaneko, Kimito Funatsu
Chemometrics and Intelligent Laboratory Systems 2016 Volume 156() pp:137-147
Publication Date(Web):15 August 2016
DOI:10.1016/j.chemolab.2016.06.001
•The study proposes a novel prediction method with infrared spectroscopy.•The method enables to predict concentration with at least one calibration sample.•The method will predict concentration well in both powder and liquid mixtures.•The method provided accurate prediction in binary liquid mixtures.Process analytical technology (PAT) plays an important role in the pharmaceutical industry. Calibration-free/minimum methods in PAT are expected to aid in a deeper understanding of processes in the early development stage of new drugs. Iterative optimization technology (IOT), an existing calibration-free method, is not able to predict the compositions of nonideal mixtures because the Beer–Lambert law does not hold in some wavelength regions. In this paper, we propose IOT with wavelength selection based on excess absorption (WLSEA), which is available with at least one calibration sample. Excess absorption (EA) is the residual between the measured and ideal spectra of a mixture, and includes noise and spectral change related to molecular interactions. WLSEA determines a threshold of EA that separates noise and spectral change by minimizing prediction errors of IOT. Consequently, WLSEA selects a set of regions where predictive accuracy of IOT is high. WLSEA-IOT can be applied to predict compositions of both ideal and nonideal mixtures that have ideal regions. The performance of the proposed IOT is verified by analyses with three types of mixture spectra. The proposed wavelength selection method will enhance both development of quantitative methods and analyses of molecular interactions with infrared spectroscopy.
Co-reporter:Tomoyuki Miyao;Hiromasa Kaneko
Journal of Computer-Aided Molecular Design 2016 Volume 30( Issue 5) pp:425-446
Publication Date(Web):2016 May
DOI:10.1007/s10822-016-9916-1
Generating chemical graphs in silico by combining building blocks is important and fundamental in virtual combinatorial chemistry. A premise in this area is that generated structures should be irredundant as well as exhaustive. In this study, we develop structure generation algorithms regarding combining ring systems as well as atom fragments. The proposed algorithms consist of three parts. First, chemical structures are generated through a canonical construction path. During structure generation, ring systems can be treated as reduced graphs having fewer vertices than those in the original ones. Second, diversified structures are generated by a simple rule-based generation algorithm. Third, the number of structures to be generated can be estimated with adequate accuracy without actual exhaustive generation. The proposed algorithms were implemented in structure generator Molgilla. As a practical application, Molgilla generated chemical structures mimicking rosiglitazone in terms of a two dimensional pharmacophore pattern. The strength of the algorithms lies in simplicity and flexibility. Therefore, they may be applied to various computer programs regarding structure generation by combining building blocks.
Co-reporter:Hiromasa Kaneko and Kimito Funatsu
Industrial & Engineering Chemistry Research 2015 Volume 54(Issue 50) pp:12630-12638
Publication Date(Web):December 3, 2015
DOI:10.1021/acs.iecr.5b03054
Soft sensors estimate values of difficult-to-measure process variables (y) from values of easy-to-measure process variables (X). Although adaptive soft sensors have been developed to reduce degradation of soft sensor models, noise in data has harmful effects to predictive ability of soft sensors. Many chemometric methods such as partial least-squares regression and support vector regression can handle noise. However, these methods do not consider characteristics of operating data or time-series data. Data measured closely in time have strong relationships and correlations. We propose to combine soft sensors with smoothing methods such as simple moving average, linearly weighted moving average, exponentially weighted moving average and Savitzky-Golay filtering. Before model construction and prediction, a smoothing method is applied to each X-variable. Case studies using simulated and industrial data sets confirm that the use of the proposed methods enables soft sensors to predict y-values smoothly and accurately.
Co-reporter:Hiromasa Kaneko and Kimito Funatsu
Industrial & Engineering Chemistry Research 2015 Volume 54(Issue 2) pp:700-704
Publication Date(Web):January 4, 2015
DOI:10.1021/ie503962e
Soft sensors can predict values of a process variable y that is difficult to measure in real time. Adaptive mechanisms are applied to soft sensors to maintain their predictive ability. However, traditional adaptive soft sensors need a significant number of new y measurements. It is difficult to maintain the accuracy if the measurement interval is large. We propose two soft sensor models that produce accurate results with a small number of y measurements. We combined either a moving window technique or a just-in-time technique with a time difference model to handle changes of the slope between input variables X and y and shifts in X and y values. We analyzed a numerical simulation data set and a real industrial data set, demonstrating the superiority of the time difference model combined with a moving window technique.
Co-reporter:Hiromasa Kaneko and Kimito Funatsu
Journal of Chemical Information and Modeling 2014 Volume 54(Issue 9) pp:2469-2482
Publication Date(Web):August 13, 2014
DOI:10.1021/ci500364e
We discuss applicability domains (ADs) based on ensemble learning in classification and regression analyses. In regression analysis, the AD can be appropriately set, although attention needs to be paid to the bias of the predicted values. However, because the AD set in classification analysis is too wide, we propose an AD based on ensemble learning and data density. First, we set a threshold for data density below which the prediction result of new data is not reliable. Then, only for new data with a data density higher than the threshold, we consider the reliability of the prediction result based on ensemble learning. By analyzing data from numerical simulations and quantitative structural relationships, we validate our discussion of ADs in classification and regression analyses and confirm that appropriate ADs can be set using the proposed method.
Co-reporter:Kiyoshi Hasegawa, Kimito Funatsu
Chemometrics and Intelligent Laboratory Systems 2014 Volume 135() pp:166-171
Publication Date(Web):15 July 2014
DOI:10.1016/j.chemolab.2014.04.015
•The application of L-shaped PLS (LPLS) method has been reported from our group.•We used orthogonal LPLS (OLPLS) and tried a validation study for chemogenomic data.•We selected adenosine inhibitory activity data as chemogenomic data.•We successfully elucidated the predictive and orthogonal fragments on a ligand structure.We carried out a validation study of the orthogonal L-shaped PLS (OLPLS) method using chemogenomic data based on adenosine receptor inhibitor activity measurements. Using OLPLS, the ligand and protein descriptors could be connected to eight adenosine receptor inhibitor activities. The fingerprints representing specific chemical substructures on the ligands were used as the ligand descriptors, while z-scales were used as the protein descriptors. Three clusters were observed in the chemical and protein spaces from the predictive scores and loadings. From these, the predictive and orthogonal ligand structure fragments towards three adenosine receptors could be successfully elucidated. The predictive fragment for the human adenosine 2A receptor was confirmed by comparison to the X-ray crystal structure. As expected, the orthogonal fragments contained no physicochemical features required for specific interaction with the adenosine receptors.
Co-reporter:Kiyoshi Hasegawa, Kimito Funatsu
Chemometrics and Intelligent Laboratory Systems 2014 Volume 139() pp:64-69
Publication Date(Web):15 December 2014
DOI:10.1016/j.chemolab.2014.09.010
•Orthogonal LPLS (OLPLS) was applied to the data set of human ABC transporters.•OLPLS is suitable for detecting the predictive and orthogonal chemical parts.•The predictive and orthogonal chemical parts were expressed in atom-colorings.•The orthogonal chemical parts are roughly reverse to the predictive ones.•Structural and chemical informatics would be further integrated for effective molecular design.In the orthogonal LPLS (OLPLS) method, the scores and loadings of the latent variables can be transformed to the regression coefficient value of each descriptor. Furthermore, the regression coefficient values are expressed in the atom-coloring method. That is, the fragments and atoms are colored by the signs and values of the regression coefficients. Both computational and medicinal chemists make cooperatively molecular design using the common language of chemical structures and atom colors. In this paper, we examined the possibility of the clear chemical interpretation using the human ATP-binding cassette (ABC) transporters inhibitory data set. The full data set was generated using the pair-wise kernel regression method. The generated score matrix was analyzed by the ECFP_6 fingerprints and the z-scales derived from the chemical structures and the amino acid residues in the active sites of human ABC transporters. The result of atom-coloring for predictive chemical parts of an inhibitor was examined by the human ABCB1 transporter homology model based on the mouse X-ray crystal structure. It was well matched to strong hydrophobic sites within the active site. The originality of this paper has two folds: the regression coefficients in the OLPLS model are expressed by the atom-coloring method and the chemical interpretation for inhibitors is rigorously validated by the human ABC transporter homology model.
Co-reporter:Kiyoshi Hasegawa
Journal of Chemometrics 2014 Volume 28( Issue 9) pp:696-703
Publication Date(Web):
DOI:10.1002/cem.2632

The visualization and characterization of protein pockets is the starting point for many structure-based drug design projects. The size and shape of protein pockets dictate 3D geometry of ligands that can strongly inhibit the following biological events. Thus, a minimal requirement for inhibition is that a molecule sterically binds the active site with some allowance for induced fit. Methods for direct display of active sites in a protein have become prevalent in recent years.

In this study, a new mapping method, generative topographic mapping, is investigated to describe the 3D surface of protein pocket. The β2 receptor protein is used as a benchmark. By mapping the molecular surface points and assigning the associated molecular electrostatic potential (MEP) values, the original 3D structure of the active site is well reproduced by the 2D latent map in generative topographic mapping. The distributions of MEP values of two 2D latent maps derived from the inhibitor and the β2 receptor protein are well complemented. Using three-way partial least squares modeling, a predictive model linking the inhibitory activity and their MEP values can be constructed, which was not feasible in the previous spherical self-organizing map studies. The resulting regression coefficient matrix of the three-way partial least squares model has many insights for understanding the structural requirements for β2 inhibitory activity. Copyright © 2014 John Wiley & Sons, Ltd.

Co-reporter:Hiromasa Kaneko and Kimito Funatsu
Journal of Chemical Information and Modeling 2013 Volume 53(Issue 9) pp:2341-2348
Publication Date(Web):August 23, 2013
DOI:10.1021/ci4003766
We propose predictive performance criteria for nonlinear regression models without cross-validation. The proposed criteria are the determination coefficient and the root-mean-square error for the midpoints between k-nearest-neighbor data points. These criteria can be used to evaluate predictive ability after the regression models are updated, whereas cross-validation cannot be performed in such a situation. The proposed method is effective and helpful in handling big data when cross-validation cannot be applied. By analyzing data from numerical simulations and quantitative structural relationships, we confirm that the proposed criteria enable the predictive ability of the nonlinear regression models to be appropriately quantified.
Co-reporter:Hiromasa Kaneko and Kimito Funatsu
Industrial & Engineering Chemistry Research 2013 Volume 52(Issue 3) pp:1322
Publication Date(Web):January 3, 2013
DOI:10.1021/ie302582v
In chemical plants, soft sensors are widely used to estimate process variables that are difficult to measure online. The predictive accuracy of soft sensors decreases over time because of changes in the state of chemical plants, and soft sensor models based on time difference (TD) have been constructed. However, many details of models based on TD remain to be clarified. In this study, TD models are discussed in terms of noise in data, autocorrelation in process variables, predictive accuracy, and so on. We theoretically clarify and formulate the differences of predictive accuracy between normal models and TD models and the effects of noise, autocorrelation, TD intervals, and so on on the predictive accuracy. The relationships and the formulas were verified by analyzing simulation data. Furthermore, we analyzed dynamic simulation data and real industrial data and confirmed that the predictive accuracy of TD models increased when TD intervals were optimized.
Co-reporter:Kiyoshi Hasegawa, Kimito Funatsu
Bioorganic & Medicinal Chemistry 2012 Volume 20(Issue 18) pp:5410-5415
Publication Date(Web):15 September 2012
DOI:10.1016/j.bmc.2012.03.041
In a previous report, we studied the mapping ability of the spherical self-organizing map (SSOM). The original 3D structure of the active site of the β2 protein structure was well reproduced by the SSOM. To validate the geometrical transformation and the resulting molecular electrostatic potential (MEP) distribution, the molecular surfaces of 20 β2 ligands were mapped onto the protein SSOM sphere. The MEP values of the two spheres derived from the ligand and the β2 receptor protein were compared. In most cases involving potent ligands, the two spheres had a moderate negative correlation. This indicates that the SSOM approach has excellent potential to represent a complex protein surface as a simple spherical structure.In this study, we perform a quantitative structure–activity relationship (QSAR) study of caspase-3 inhibitors based on the SSOM technique. Initially, the active site of the protein structure ‘caspase-3’ was characterized by the SSOM using the MEP values. Each inhibitor was then projected onto the protein SSOM sphere and the chemical descriptors were derived from the ligand SSOM sphere. The correlation of the chemical descriptors and the inhibitory activities was investigated using the support vector regression (SVR) method. Finally, the important MEP descriptors from the final SVR model were examined. The structural requirements of caspase-3 inhibitors are discussed from the perspectives of both the ligand and protein structures.In this study, we perform a QSAR study of caspase-3 inhibitors based on the SSOM technique. The MEP values on the ligand SSOM sphere were used as chemical descriptors. The correlation of the chemical descriptors and the inhibitory activities was investigated by the SVR method. The important MEP descriptors were derived from the final SVR model. Based on the X-ray crystal structure of the protein, the descriptors matched the structural requirements of caspase-3 inhibitors.
Co-reporter:Hiromasa Kaneko, Susumu Inasawa, Nagisa Morimoto, Mitsutaka Nakamura, Hirofumi Inokuchi, Yukio Yamaguchi, and Kimito Funatsu
Industrial & Engineering Chemistry Research 2012 Volume 51(Issue 29) pp:9906
Publication Date(Web):June 25, 2012
DOI:10.1021/ie300315t
We have constructed statistical models that predict thermal resistance after fouling layer formation in a heat exchanger, in which a slurry of stearic acid in toluene was cooled. Chemoinformatics was used, and the initial rate of increase in thermal resistance (dU–1/dt) was calculated from experimental conditions such as coolant flow rate and the degree of supersaturation. We then constructed models for thermal resistance at a steady state using calculated values of dU–1/dt and experimental conditions. Our model gives a good correlation with the experimental results. The contribution of operating conditions to fouling layer formation was discussed semiquantitatively on the basis of linear regression coefficients that were obtained from our model. Because only operating conditions and set values were used as input, our approach is very practical for prediction of thermal resistance given certain operating conditions.
Co-reporter:Hiromasa Kaneko and Kimito Funatsu
Industrial & Engineering Chemistry Research 2011 Volume 50(Issue 18) pp:10643-10651
Publication Date(Web):August 11, 2011
DOI:10.1021/ie200692m
Soft sensors are widely used to estimate process variables that are difficult to measure online. Though regression models are reconstructed with new data to adapt changes of the plants, some problems remain in practice. Hence, it is attempted to construct soft sensor models based on the time difference of an objective variable and that of explanatory variables for reducing the effects of deterioration with age such as the drift and gradual changes in the state of plants. In this paper, we have proposed to construct time difference models after modeling nonlinear relationship between and among process variables. Variables obtained by physical models or those calculated by statistical nonlinear regression methods are used to consider the nonlinearity, and then, a time difference model is constructed including these variables. We applied these methods to the actual industrial data obtained during an industrial polymer process and confirmed the usefulness of the proposed methods.
Co-reporter:Masamoto Arakawa;Yosuke Yamashita
Journal of Chemometrics 2011 Volume 25( Issue 1) pp:10-19
Publication Date(Web):
DOI:10.1002/cem.1339

Abstract

In this paper, we propose a genetic algorithm-based wavelength selection (GAWLS) method for visible and near-infrared (Vis/NIR) spectral calibration. The objective of GAWLS is to construct robust and predictive regression models by selecting informative wavelength regions. To demonstrate the ability of the proposed method, regression models for soil properties and sugar content of apples are constructed by using GAWLS and other variable selection methods. Copyright © 2010 John Wiley & Sons, Ltd.

Co-reporter:Kiyoshi Hasegawa, Michio Koyama, Masamoto Arakawa, Kimito Funatsu
Chemometrics and Intelligent Laboratory Systems 2009 Volume 99(Issue 1) pp:66-70
Publication Date(Web):15 November 2009
DOI:10.1016/j.chemolab.2009.07.011
Rough set theory (RST) is a new data mining method originally proposed in chemometrics. RST selects the least descriptor sets for discriminating one sample from the others. These descriptor sets are called reducts. RST constructs any possible rules for high activity using the specific reduct. We have used dihydrofolate reductase (DHFR) inhibitors as a validation set of RST. This data set has been thoroughly investigated in several studies and the structural requirements for high activity have been well known. The RST-based rules were well matched to these structural requirements and thus utility of RST has been proved. According to the success in this study, further applications to data sets that have more diverse compounds and more noisy activity would be expected.
Co-reporter:Masamoto Arakawa, Kiyoshi Hasegawa, Kimito Funatsu
Chemometrics and Intelligent Laboratory Systems 2008 Volume 92(Issue 2) pp:145-151
Publication Date(Web):15 July 2008
DOI:10.1016/j.chemolab.2008.02.004
Structure-based drug design (SBDD) is a computational technique for designing new drug candidates based on physico-chemical interactions between a protein and a ligand molecule. The most important thing for SBDD is accurate estimation of binding affinity of the ligand molecule against the target protein. Scoring function, which is basically a mathematical equation that approximates the thermodynamics of binding, has to be defined in advance. In this paper, we propose a novel method for building a tailored scoring function using comparative molecular binding energy (COMBINE) descriptors and support vector regression (SVR). COMBINE descriptors are energy terms between the ligand molecule and each amino acid residue of the target protein. SVR is a promising nonlinear regression method based on the theory of support vector machine (SVM). In these types of regression methodology, variable selection is one of the most important issues to construct a robust and predictive quantitative structure–activity relationship (QSAR) model. We adopted a variable selection method based on sensitivity analysis of each variable. The usefulness of the proposed method has been validated by applying to real QSAR data set, benzamidine derivatives as Trypsin inhibitors. The final SVR model could successfully identify important amino acid residues for explaining inhibitory activities.
Co-reporter:Masamoto Arakawa, Kiyoshi Hasegawa, Kimito Funatsu
Chemometrics and Intelligent Laboratory Systems 2006 Volume 83(Issue 2) pp:91-98
Publication Date(Web):15 September 2006
DOI:10.1016/j.chemolab.2006.01.009
Quantitative structure–activity relationship (QSAR) has been developed for a set of inhibitors of the human immunodeficiency virus 1 (HIV-1) reverse transcriptase, derivatives of 1-[(2-hydroxyethoxy)methyl]-6-(phenylthio)thymine (HEPT). Structural descriptors used in this study are Hansch constants for each substituent and topological descriptors. We have applied the variable selection method based on multi-objective genetic programming (GP) to the HEPT data and constructed the nonlinear QSAR model using counter-propagation (CP) neural network with the selected variables. The obtained network is accurate and interpretable. Moreover in order to confirm a predictive ability of the model, a validation test was performed.
Co-reporter:Hiromasa Kaneko, Takuya Matsumoto, Shigeki Ootakara, Kimito Funatsu
IFAC-PapersOnLine (2016) Volume 49(Issue 7) pp:371-376
Publication Date(Web):1 January 2016
DOI:10.1016/j.ifacol.2016.07.364
As a result of collaboration between Mitsui Chemicals, Inc. and the University of Tokyo, a soft sensor tool was developed and implemented in several plants in Mitsui Chemicals, Inc. A soft sensor is an inferential model constructed between process variables that are easy to measure (X) and process variables that are difficult to measure (y). y-values can be estimated in real time by inputting X-values into a soft sensor. To maintain predictive ability of a soft sensor to be high, we employ ensemble online support vector regression (EOSVR) model as an adaptive soft sensor model, which can adapt to both nonlinear changes and time-varying changes. Additionally, to reduce noise in estimated y-values, Savitzky-Golay (SG) filtering is used for estimated y-values. Our proposed method is called EOSVR-SG and implemented as a soft sensor tool. In this paper, we show our soft sensor tool used in real chemical plants and its execution results in which the EOSVR-SG model could estimate y-values accurately and smoothly.
Co-reporter:Hiromasa Kaneko, Kimito Funatsu
Procedia Computer Science (2013) Volume 22() pp:580-589
Publication Date(Web):1 January 2013
DOI:10.1016/j.procs.2013.09.138
Soft sensors are used in chemical plants to estimate process variables that are difficult to measure online. However, the predictive accuracy of adaptive soft sensor models decreases when sudden process changes occur. An online support vector regression (OSVR) model with a time variable can adapt to rapid changes among process variables. One problem faced by the proposed model is finding appropriate hyperparameters for the OSVR model; we discussed three methods to select parameters based on predictive accuracy and computation time. The proposed method was applied to simulation data and industrial data, and achieved high predictive accuracy when time-varying changes occurred.
Co-reporter:Hiromasa Kaneko, Kimito Funatsu
Procedia Computer Science (2013) Volume 22() pp:580-589
Publication Date(Web):1 January 2013
DOI:10.1016/j.procs.2013.09.138
Soft sensors are used in chemical plants to estimate process variables that are difficult to measure online. However, the predictive accuracy of adaptive soft sensor models decreases when sudden process changes occur. An online support vector regression (OSVR) model with a time variable can adapt to rapid changes among process variables. One problem faced by the proposed model is finding appropriate hyperparameters for the OSVR model; we discussed three methods to select parameters based on predictive accuracy and computation time. The proposed method was applied to simulation data and industrial data, and achieved high predictive accuracy when time-varying changes occurred.
Co-reporter:Hiromasa Kaneko, Masamoto Arakawa, Kimito Funatsu
IFAC Proceedings Volumes (2009) Volume 42(Issue 19) pp:551-558
Publication Date(Web):1 January 2009
DOI:10.3182/20090921-3-TR-3005.00095
AbstractSoft sensors are widely used to estimate values of process variables that are difficult to measure online, for example, polymer quality variables. Industrial polymer processes generally produce many grades of products. In order to reduce quantity of off-grade material and produce a consistent product, values of polymer quality variables should be estimated with high accuracy by using soft sensor models. However, the predictive accuracy during grade transition can be low because a state in a polymer reactor is unsteady in transition. Values of process variables in the unsteady state can differ from those which is used to construct a regression model. It is desired to know the time on which the polymer quality meets product specifications. Thus, we propose to construct a model which detects completion of transition in order to assure predicted values of the polymer quality variables after the transition. By using the model and constructing regression models for each grade of a product, values of the objective variables can be predicted with high accuracy, selecting a regression model appropriately. We analyzed real industrial data as application of the proposed method. The proposed method achieved higher predictive accuracy than traditional ones.
Co-reporter:Kiyoshi Hasegawa, Kimito Funatsu
Chemometrics and Intelligent Laboratory Systems (15 January 2014) Volume 130() pp:
Publication Date(Web):15 January 2014
DOI:10.1016/j.chemolab.2013.11.003
•We have devised both of the ligand and protein matrices in the frame of LPLS analysis.•The 3D protein pocket was mapped to the spherical self-organizing map sphere.•We could easily identify four selective inhibitors and elucidate structural requirements for selectivity.Recently, chemogenomics has been given high attention in pharmaceutical industry. By definition, chemogenomics data is a two-dimensional matrix, where proteins are usually reported as columns and ligands as rows, and where reported values are usually inhibitory activities. In a straightforward manner, it would be the best choice that the chemogenomics matrix is explained by two matrices that each consists of ligand and protein descriptors. Bi-modal PLS follows this concept and several variants have been proposed. Among them, we focus on the L-shaped PLS (LPLS) method and apply it to aminergic G protein-coupled receptor inhibitory activity data in a previous study.In this study, we have devised both of the ligand and protein matrices in the frame of LPLS analysis for four adrenergic alpha receptors. In the ligand matrix, the similarity matrix derived from the Tanimoto similarity value between the pair of inhibitors was employed. As for the protein matrix, the 3D protein pocket was mapped to the spherical self-organizing map sphere. Then, the lipophilic potential value on each node was used as protein descriptors. Thanks to four plots of LPLS, we could easily identify four selective inhibitors and elucidate structural requirements for selectivity.
(R)-(1-Methyl-1H-indol-3-yl)(4,5,6,7-tetrahydro-1H-benzo[d]imidazol-6-yl)methanone
Eplivanserin
(3aR,4R,6aR,9R,9aS,9bS)-4,9-dihydroxy-9-methyl-3,6-dimethylidenedecahydroazuleno[4,5-b]furan-2(3H)-one
Octacosanal, 11-oxo-
Gemeprost
Prosta-5,9,13-trien-1-oicacid, 15-hydroxy-11-oxo-, (5Z,13E,15S)-
8-[(2R)-2,3-dihydroxy-3-methylbutyl]-5,7-dimethoxy-2H-chromen-2-one