Cecilia Clementi

Find an error

Name: ?Clementi, Cecilia
Organization: Rice University , USA
Department: and Department of Chemistry
Title: Professor(PhD)
Co-reporter:Frank Noé, Cecilia Clementi
Current Opinion in Structural Biology 2017 Volume 43(Volume 43) pp:
Publication Date(Web):1 April 2017
DOI:10.1016/j.sbi.2017.02.006
Highlights•Extensive simulation data can now be easily generated and complement experiment.•Collective variables are key to analyze, interpret, and enhance molecular simulation.•A variational principle defines optimal collective variables for slow dynamics.•Methods and algorithms to find collective variables from data are reviewed.•Kinetic distance maps can measure the transition time between configurations.Collective variables are an important concept to study high-dimensional dynamical systems, such as molecular dynamics of macromolecules, liquids, or polymers, in particular to define relevant metastable states and state-transition or phase-transition. Over the past decade, a rigorous mathematical theory has been formulated to define optimal collective variables to characterize slow dynamical processes. Here we review recent developments, including a variational principle to find optimal approximations to slow collective variables from simulation data, and algorithms such as the time-lagged independent component analysis. Using these concepts, a distance metric can be defined that quantifies how slowly molecular conformations interconvert. Extensions and open questions are discussed.
Co-reporter:Frank Noé, Ralf Banisch, and Cecilia Clementi
Journal of Chemical Theory and Computation 2016 Volume 12(Issue 11) pp:5620-5630
Publication Date(Web):October 3, 2016
DOI:10.1021/acs.jctc.6b00762
Identification of the main reaction coordinates and building of kinetic models of macromolecular systems require a way to measure distances between molecular configurations that can distinguish slowly interconverting states. Here we define the commute distance that can be shown to be closely related to the expected commute time needed to go from one configuration to the other, and back. A practical merit of this quantity is that it can be easily approximated from molecular dynamics data sets when an approximation of the Markov operator eigenfunctions is available, which can be achieved by the variational approach to approximate eigenfunctions of Markov operators, also called variational approach of conformation dynamics (VAC) or the time-lagged independent component analysis (TICA). The VAC or TICA components can be scaled such that a so-called commute map is obtained in which Euclidean distance corresponds to the commute distance, and thus kinetic models such as Markov state models can be computed based on Euclidean operations, such as standard clustering. In addition, the distance metric gives rise to a quantity we call total kinetic content, which is an excellent score to rank input feature sets and kinetic model quality.
Co-reporter:Lorenzo Boninsegna, Gianpaolo Gobbo, Frank Noé, and Cecilia Clementi
Journal of Chemical Theory and Computation 2015 Volume 11(Issue 12) pp:5947-5960
Publication Date(Web):November 2, 2015
DOI:10.1021/acs.jctc.5b00749
Identification of the collective coordinates that describe rare events in complex molecular transitions such as protein folding has been a key challenge in the theoretical molecular sciences. In the Diffusion Map approach, one assumes that the molecular configurations sampled have been generated by a diffusion process, and one uses the eigenfunctions of the corresponding diffusion operator as reaction coordinates. While diffusion coordinates (DCs) appear to provide a good approximation to the true dynamical reaction coordinates, they are not parametrized using dynamical information. Thus, their approximation quality could not, as yet, be validated, nor could the diffusion map eigenvalues be used to compute relaxation rate constants of the system. Here we combine the Diffusion Map approach with the recently proposed Variational Approach for Conformation Dynamics (VAC). Diffusion Map coordinates are used as a basis set, and their optimal linear combination is sought using the VAC, which employs time-correlation information on the molecular dynamics (MD) trajectories. We have applied this approach to ultra-long MD simulations of the Fip35 WW domain and found that the first DCs are indeed a good approximation to the true reaction coordinates of the system, but they could be further improved using the VAC. Using the Diffusion Map basis, excellent approximations to the relaxation rates of the system are obtained. Finally, we evaluate the quality of different metric spaces and find that pairwise minimal root-mean-square deviation performs poorly, while operating in the recently introduced kinetic maps based on the time-lagged independent component analysis gives the best performance.
Co-reporter:Frank Noé and Cecilia Clementi
Journal of Chemical Theory and Computation 2015 Volume 11(Issue 10) pp:5002-5011
Publication Date(Web):September 2, 2015
DOI:10.1021/acs.jctc.5b00553
Characterizing macromolecular kinetics from molecular dynamics (MD) simulations requires a distance metric that can distinguish slowly interconverting states. Here, we build upon diffusion map theory and define a kinetic distance metric for irreducible Markov processes that quantifies how slowly molecular conformations interconvert. The kinetic distance can be computed given a model that approximates the eigenvalues and eigenvectors (reaction coordinates) of the MD Markov operator. Here, we employ the time-lagged independent component analysis (TICA). The TICA components can be scaled to provide a kinetic map in which the Euclidean distance corresponds to the kinetic distance. As a result, the question of how many TICA dimensions should be kept in a dimensionality reduction approach becomes obsolete, and one parameter less needs to be specified in the kinetic model construction. We demonstrate the approach using TICA and Markov state model (MSM) analyses for illustrative models, protein conformation dynamics in bovine pancreatic trypsin inhibitor and protein-inhibitor association in trypsin and benzamidine. We find that the total kinetic variance (TKV) is an excellent indicator of model quality and can be used to rank different input feature sets.
Co-reporter:Jordane Preto and Cecilia Clementi  
Physical Chemistry Chemical Physics 2014 vol. 16(Issue 36) pp:19181-19191
Publication Date(Web):30 May 2014
DOI:10.1039/C3CP54520B
The reaction pathways characterizing macromolecular systems of biological interest are associated with high free energy barriers. Resorting to the standard all-atom molecular dynamics (MD) to explore such critical regions may be inappropriate as the time needed to observe the relevant transitions can be remarkably long. In this paper, we present a new method called Extended Diffusion-Map-directed Molecular Dynamics (extended DM-d-MD) used to enhance the sampling of MD trajectories in such a way as to rapidly cover all important regions of the free energy landscape including deep metastable states and critical transition paths. Moreover, extended DM-d-MD was combined with a reweighting scheme enabling to save on-the-fly information about the Boltzmann distribution. Our algorithm was successfully applied to two systems, alanine dipeptide and alanine-12. Due to the enhanced sampling, the Boltzmann distribution is recovered much faster than in plain MD simulations. For alanine dipeptide, we report a speedup of one order of magnitude with respect to plain MD simulations. For alanine-12, our algorithm allows us to highlight all important unfolded basins in several days of computation when one single misfolded event is barely observable within the same amount of computational time by plain MD simulations. Our method is reaction coordinate free, shows little dependence on the a priori knowledge of the system, and can be implemented in such a way that the biased steps are not computationally expensive with respect to MD simulations thus making our approach well adapted for larger complex systems from which little information is known.
Co-reporter:Wenwei Zheng, Mary A. Rohrdanz, and Cecilia Clementi
The Journal of Physical Chemistry B 2013 Volume 117(Issue 42) pp:12769-12776
Publication Date(Web):July 18, 2013
DOI:10.1021/jp401911h
The gap between the time scale of interesting behavior in macromolecular systems and that which our computational resources can afford often limits molecular dynamics (MD) from understanding experimental results and predicting what is inaccessible in experiments. In this paper, we introduce a new sampling scheme, named diffusion-map-directed MD (DM-d-MD), to rapidly explore molecular configuration space. The method uses a diffusion map to guide MD on the fly. DM-d-MD can be combined with other methods to reconstruct the equilibrium free energy, and here, we used umbrella sampling as an example. We present results from two systems: alanine dipeptide and alanine-12. In both systems, we gain tremendous speedup with respect to standard MD both in exploring the configuration space and reconstructing the equilibrium distribution. In particular, we obtain 3 orders of magnitude of speedup over standard MD in the exploration of the configurational space of alanine-12 at 300 K with DM-d-MD. The method is reaction coordinate free and minimally dependent on a priori knowledge of the system. We expect wide applications of DM-d-MD to other macromolecular systems in which equilibrium sampling is not affordable by standard MD.
Co-reporter:Wenwei Zheng, Bo Qi, Mary A. Rohrdanz, Amedeo Caflisch, Aaron R. Dinner, and Cecilia Clementi
The Journal of Physical Chemistry B 2011 Volume 115(Issue 44) pp:13065-13074
Publication Date(Web):September 23, 2011
DOI:10.1021/jp2076935
Several methods have been developed in the past few years for the analysis of molecular dynamics simulations of biological (macro)molecules whose complexity is difficult to capture by simple projections of the free-energy surface onto one or two geometric variables. The locally scaled diffusion map (LSDMap) method is a nonlinear dimensionality reduction technique for describing the dynamics of complex systems in terms of a few collective coordinates. Here, we compare LSDMap to two previously developed approaches for the characterization of the configurational landscape associated with the folding dynamics of a three-stranded antiparallel β-sheet peptide, termed Beta3s. The analysis is aided by an improved procedure for extracting pathways from the equilibrium transition network, which enables calculation of pathway-specific cut-based free energy profiles. We find that the results from LSDMap are consistent with analysis based on transition networks and allow a coherent interpretation of metastable states and folding pathways in terms of different time scales of transitions between minima on the free energy projections.
Co-reporter:Payel Das;Mark Moll;Lydia E. Kavraki;Hernán Stamati
PNAS 2006 Volume 103 (Issue 26 ) pp:9885-9890
Publication Date(Web):2006-06-27
DOI:10.1073/pnas.0603553103
The definition of reaction coordinates for the characterization of a protein-folding reaction has long been a controversial issue, even for the “simple” case in which one single free-energy barrier separates the folded and unfolded ensemble. We propose a general approach to this problem to obtain a few collective coordinates by using nonlinear dimensionality reduction. We validate the usefulness of this method by characterizing the folding landscape associated with a coarse-grained protein model of src homology 3 as sampled by molecular dynamics simulations. The folding free-energy landscape projected on the few relevant coordinates emerging from the dimensionality reduction can correctly identify the transition-state ensemble of the reaction. The first embedding dimension efficiently captures the evolution of the folding process along the main folding route. These results clearly show that the proposed method can efficiently find a low-dimensional representation of a complex process such as protein folding.
Co-reporter:Payel Das;Silvina Matysiak
PNAS 2005 102 (29 ) pp:10141-10146
Publication Date(Web):2005-07-19
DOI:10.1073/pnas.0409471102
Coarse-grained models have been extremely valuable in promoting our understanding of protein folding. However, the quantitative accuracy of existing simplified models is strongly hindered either from the complete removal of frustration (as in the widely used Gō-like models) or from the compromise with the minimal frustration principle and/or realistic protein geometry (as in the simple on-lattice models). We present a coarse-grained model that “naturally” incorporates sequence details and energetic frustration into an overall minimally frustrated folding landscape. The model is coupled with an optimization procedure to design the parameters of the protein Hamiltonian to fold into a desired native structure. The application to the study of src-Src homology 3 domain shows that this coarse-grained model contains the main physical-chemical ingredients that are responsible for shaping the folding landscape of this protein. The results illustrate the importance of nonnative interactions and energetic heterogeneity for a quantitative characterization of folding mechanisms.
Co-reporter:Payel Das;Corey J. Wilson;Giovanni Fossati;Pernilla Wittung-Stafshede;Kathleen S. Matthews
PNAS 2005 Volume 102 (Issue 41 ) pp:14569-14574
Publication Date(Web):2005-10-11
DOI:10.1073/pnas.0505844102
Recent theoretical/computational studies based on simplified protein models and experimental investigation have suggested that the native structure of a protein plays a primary role in determining the folding rate and mechanism of relatively small single-domain proteins. Here, we extend the study of the relationship between protein topology and folding mechanism to a larger protein with complex topology, by analyzing the folding process of monomeric lactose repressor (MLAc) computationally by using a Gō-like Cα model. Next, we combine simulation and experimental results (see companion article in this issue) to achieve a comprehensive assessment of the folding landscape of this protein. Remarkably, simulated kinetic and equilibrium analyses show an excellent quantitative agreement with the experimental folding data of this study. The results of this comparison show that a simplified, completely unfrustrated Cα model correctly reproduces the complex folding features of a large multidomain protein with complex topology. The success of this effort underlines the importance of synergistic experimental/theoretical approaches to achieve a broader understanding of the folding landscape.
Co-reporter:Amarda Shehu, Lydia E. Kavraki, Cecilia Clementi
Biophysical Journal (1 March 2007) Volume 92(Issue 5) pp:
Publication Date(Web):1 March 2007
DOI:10.1529/biophysj.106.094409
Describing and understanding the biological function of a protein requires a detailed structural and thermodynamic description of the protein's native state ensemble. Obtaining such a description often involves characterizing equilibrium fluctuations that occur beyond the nanosecond timescale. Capturing such fluctuations remains nontrivial even for very long molecular dynamics and Monte Carlo simulations. We propose a novel multiscale computational method to exhaustively characterize, in atomistic detail, the protein conformations constituting the native state with no inherent timescale limitations. Applications of this method to proteins of various folds and sizes show that thermodynamic observables measured as averages over the native state ensembles obtained by the method agree remarkably well with nuclear magnetic resonance data that span multiple timescales. By characterizing equilibrium fluctuations at atomistic detail over a broad range of timescales, from picoseconds to milliseconds, our method offers to complement current simulation techniques and wet-lab experiments and can impact our understanding and description of the relationship between protein flexibility and function.
Co-reporter:Silvina Matysiak, Cecilia Clementi
Archives of Biochemistry and Biophysics (1 January 2008) Volume 469(Issue 1) pp:29-33
Publication Date(Web):1 January 2008
DOI:10.1016/j.abb.2007.08.019
Co-reporter:Jordane Preto and Cecilia Clementi
Physical Chemistry Chemical Physics 2014 - vol. 16(Issue 36) pp:NaN19191-19191
Publication Date(Web):2014/05/30
DOI:10.1039/C3CP54520B
The reaction pathways characterizing macromolecular systems of biological interest are associated with high free energy barriers. Resorting to the standard all-atom molecular dynamics (MD) to explore such critical regions may be inappropriate as the time needed to observe the relevant transitions can be remarkably long. In this paper, we present a new method called Extended Diffusion-Map-directed Molecular Dynamics (extended DM-d-MD) used to enhance the sampling of MD trajectories in such a way as to rapidly cover all important regions of the free energy landscape including deep metastable states and critical transition paths. Moreover, extended DM-d-MD was combined with a reweighting scheme enabling to save on-the-fly information about the Boltzmann distribution. Our algorithm was successfully applied to two systems, alanine dipeptide and alanine-12. Due to the enhanced sampling, the Boltzmann distribution is recovered much faster than in plain MD simulations. For alanine dipeptide, we report a speedup of one order of magnitude with respect to plain MD simulations. For alanine-12, our algorithm allows us to highlight all important unfolded basins in several days of computation when one single misfolded event is barely observable within the same amount of computational time by plain MD simulations. Our method is reaction coordinate free, shows little dependence on the a priori knowledge of the system, and can be implemented in such a way that the biased steps are not computationally expensive with respect to MD simulations thus making our approach well adapted for larger complex systems from which little information is known.
L-Threonine, L-threonyl-L-tryptophyl-L-isoleucyl-L-glutaminyl-L-asparaginylglycyl-L-seryl-L-threonyl-L-lysyl-L-tryptophyl-L-tyrosyl-L-glutaminyl-L-asparaginylglycyl-L-seryl-L-threonyl-L-lysyl-L-isoleucyl-L-tyrosyl-
Benzeneacetic acid, α-diazo-, 1,1-dimethylethyl ester
L-Alanine,L-alanyl-L-alanyl-L-alanyl-L-alanyl-L-alanyl-L-alanyl-L-alanyl-L-alanyl-L-alanyl-L-alanyl-L-alanyl-
Cyclopropanecarboxylic acid, 1,2-diphenyl-, methyl ester, (1R,2S)-
Propanamide,2-(acetylamino)-N-methyl-, (2S)-
Anthramycin