Source author record

Vijay S. Pande

Vijay S. Pande appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Biomolecules Biological Physics physics.chem-ph cond-mat.stat-mech Machine Learning physics.comp-ph Applications math.ST physics.data-an Statistics Theory

Catalog footprint

What is connected

14works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

Physical machine learning outperforms "human learning" in Quantum Chemistry

Two types of approaches to modeling molecular systems have demonstrated high practical efficiency. Density functional theory (DFT), the most widely used quantum chemical method, is a physical approach predicting energies and electron densities of molecules. Recently, numerous papers on machine learning (ML) of molecular properties have also been published. ML models greatly outperform DFT in terms of computational costs, and may even reach comparable accuracy, but they are missing physicality - a direct link to Quantum Physics - which limits their applicability. Here, we propose an approach that combines the strong sides of DFT and ML, namely, physicality and low computational cost. By generalizing the famous Hohenberg-Kohn theorems, we derive general equations for exact electron densities and energies that can naturally guide applications of ML in Quantum Chemistry. Based on these equations, we build a deep neural network that can compute electron densities and energies of a wide range of organic molecules not only much faster, but also closer to exact physical values than current versions of DFT. In particular, we reached a mean absolute error in energies of molecules with up to eight non-hydrogen atoms as low as 0.9 kcal/mol relative to CCSD(T) values, noticeably lower than those of DFT (down to ~3 kcal/mol on the same set of molecules) and ML (down to ~1.5 kcal/mol) methods. A simultaneous improvement in the accuracy of predictions of electron densities and energies suggests that the proposed approach describes the physics of molecules better than DFT functionals developed by "human learning" earlier. Thus, physics-based ML offers exciting opportunities for modeling, with high-theory-level quantum chemical accuracy, of much larger molecular systems than currently possible.

preprint2016arXiv

Computationally Discovered Potentiating Role of Glycans on NMDA Receptors

N-methyl-D-aspartate receptors (NMDARs) are glycoproteins in the brain central to learning and memory. The effects of glycosylation on the structure and dynamics of NMDARs are largely unknown. In this work, we use extensive molecular dynamics simulations of GluN1 and GluN2B ligand binding domains (LBDs) of NMDARs to investigate these effects. Our simulations predict that intra-domain interactions involving the glycan attached to residue GluN1-N440 stabilize closed-clamshell conformations of the GluN1 LBD. The glycan on GluN2B-N688 shows a similar, though weaker, effect. Based on these results, and assuming the transferability of the results of LBD simulations to the full receptor, we predict that glycans at GluN1-N440 might play a potentiator role in NMDARs. To validate this prediction, we perform electrophysiological analysis of full-length NMDARs with a glycosylation-preventing GluN1-N440Q mutation, and demonstrate an increase in the glycine EC50 value. Overall, our results suggest an intramolecular potentiating role of glycans on NMDA receptors.

preprint2016arXiv

Learning Protein Dynamics with Metastable Switching Systems

We introduce a machine learning approach for extracting fine-grained representations of protein evolution from molecular dynamics datasets. Metastable switching linear dynamical systems extend standard switching models with a physically-inspired stability constraint. This constraint enables the learning of nuanced representations of protein dynamics that closely match physical reality. We derive an EM algorithm for learning, where the E-step extends the forward-backward algorithm for HMMs and the M-step requires the solution of large biconvex optimization problems. We construct an approximate semidefinite program solver based on the Frank-Wolfe algorithm and use it to solve the M-step. We apply our EM algorithm to learn accurate dynamics from large simulation datasets for the opioid peptide met-enkephalin and the proto-oncogene Src-kinase. Our learned models demonstrate significant improvements in temporal coherence over HMMs and standard switching models for met-enkephalin, and sample transition paths (possibly useful in rational drug design) for Src-kinase.

preprint2015arXiv

Efficient maximum likelihood parameterization of continuous-time Markov processes

Continuous-time Markov processes over finite state-spaces are widely used to model dynamical processes in many fields of natural and social science. Here, we introduce an maximum likelihood estimator for constructing such models from data observed at a finite time interval. This estimator is dramatically more efficient than prior approaches, enables the calculation of deterministic confidence intervals in all model parameters, and can easily enforce important physical constraints on the models such as detailed balance. We demonstrate and discuss the advantages of these models over existing discrete-time Markov models for the analysis of molecular dynamics simulations.

preprint2015arXiv

Variational cross-validation of slow dynamical modes in molecular kinetics

Markov state models (MSMs) are a widely used method for approximating the eigenspectrum of the molecular dynamics propagator, yielding insight into the long-timescale statistical kinetics and slow dynamical modes of biomolecular systems. However, the lack of a unified theoretical framework for choosing between alternative models has hampered progress, especially for non-experts applying these methods to novel biological systems. Here, we consider cross-validation with a new objective function for estimators of these slow dynamical modes, a generalized matrix Rayleigh quotient (GMRQ), which measures the ability of a rank-$m$ projection operator to capture the slow subspace of the system. It is shown that a variational theorem bounds the GMRQ from above by the sum of the first $m$ eigenvalues of the system's propagator, but that this bound can be violated when the requisite matrix elements are estimated subject to statistical uncertainty. This overfitting can be detected and avoided through cross-validation. These result make it possible to construct Markov state models for protein dynamics in a way that appropriately captures the tradeoff between systematic and statistical errors.

preprint2014arXiv

Efficient inference of protein structural ensembles

It is becoming clear that traditional, single-structure models of proteins are insufficient for understanding their biological function. Here, we outline one method for inferring, from experiments, not only the most common structure a protein adopts (native state), but the entire ensemble of conformations the system can adopt. Such ensemble mod- els are necessary to understand intrinsically disordered proteins, enzyme catalysis, and signaling. We suggest that the most difficult aspect of generating such a model will be finding a small set of configurations to accurately model structural heterogeneity and present one way to overcome this challenge.

preprint2014arXiv

Perspective: Markov Models for Long-Timescale Biomolecular Dynamics

Molecular dynamics simulations have the potential to provide atomic-level detail and insight to important questions in chemical physics that cannot be observed in typical experiments. However, simply generating a long trajectory is insufficient, as researchers must be able to transform the data in a simulation trajectory into specific scientific insights. Although this analysis step has often been taken for granted, it deserves further attention as large-scale simulations become increasingly routine. In this perspective, we discuss the application of Markov models to the analysis of large-scale biomolecular simulations. We draw attention to recent improvements in the construction of these models as well as several important open issues. In addition, we highlight recent theoretical advances that pave the way for a new generation of models of molecular kinetics.

preprint2014arXiv

Understanding Protein Dynamics with L1-Regularized Reversible Hidden Markov Models

We present a machine learning framework for modeling protein dynamics. Our approach uses L1-regularized, reversible hidden Markov models to understand large protein datasets generated via molecular dynamics simulations. Our model is motivated by three design principles: (1) the requirement of massive scalability; (2) the need to adhere to relevant physical law; and (3) the necessity of providing accessible interpretations, critical for both cellular biology and rational drug design. We present an EM algorithm for learning and introduce a model selection criteria based on the physical notion of convergence in relaxation timescales. We contrast our model with standard methods in biophysics and demonstrate improved robustness. We implement our algorithm on GPUs and apply the method to two large protein simulation datasets generated respectively on the NCSA Bluewaters supercomputer and the Folding@Home distributed computing network. Our analysis identifies the conformational dynamics of the ubiquitin protein critical to cellular signaling, and elucidates the stepwise activation mechanism of the c-Src kinase protein.

preprint2013arXiv

Inferring the Rate-Length Law of Protein Folding

We investigate the rate-length scaling law of protein folding, a key undetermined scaling law in the analytical theory of protein folding. We demonstrate that chain length is a dominant factor determining folding times, and that the unambiguous determination of the way chain length corre- lates with folding times could provide key mechanistic insight into the folding process. Four specific proposed laws (power law, exponential, and two stretched exponentials) are tested against one an- other, and it is found that the power law best explains the data. At the same time, the fit power law results in rates that are very fast, nearly unreasonably so in a biological context. We show that any of the proposed forms are viable, conclude that more data is necessary to unequivocally infer the rate-length law, and that such data could be obtained through a small number of protein folding experiments on large protein domains.

preprint2013arXiv

Probing the Origins of Two-State Folding

Many protein systems fold in a two-state manner. Random models, however, rarely display two-state kinetics and thus such behavior should not be accepted as a default. To date, many theories for the prevalence of two-state kinetics have been presented, but none sufficiently explain the breadth of experimental observations. A model, making a minimum of assumptions, is introduced that suggests two-state behavior is likely for any system with an overwhelmingly populated native state. We show two-state folding is emergent and strengthened by increasing the occupancy population of the native state. Further, the model exhibits a hub-like behavior, with slow interconversions between unfolded states. Despite this, the unfolded state equilibrates quickly relative to the folding time. This apparent paradox is readily understood through this model. Finally, our results compare favorable with experimental measurements of protein folding rates as a function of chain length and Keq, and provide new insight into these results.

preprint2012arXiv

Reducing the effect of Metropolization on mixing times in molecular dynamics simulations

Molecular dynamics algorithms are subject to some amount of error dependent on the size of the time step that is used. This error can be corrected by periodically updating the system with a Metropolis criteria, where the integration step is treated as a selection probability for candidate state generation. Such a method, closely related to generalized hybrid Monte Carlo (GHMC), satisfies the balance condition by imposing a reversal of momenta upon candidate rejection. In the present study, we demonstrate that such momentum reversals can have a significant impact on molecular kinetics and extend the time required for system decorrelation, resulting in an order of magnitude increase in the integrated autocorrelation times of molecular variables for the worst cases. We present a simple method, referred to as reduced-flipping GHMC, that uses the information of the previous, current, and candidate states to reduce the probability of momentum flipping following candidate rejection while rigorously satisfying the balance condition. This method is a simple modification to traditional, automatic-flipping, GHMC methods and significantly mitigates the impact of such algorithms on molecular kinetics and simulation mixing times.

preprint2011arXiv

A robust approach to estimating rates from time-correlation functions

While seemingly straightforward in principle, the reliable estimation of rate constants is seldom easy in practice. Numerous issues, such as the complication of poor reaction coordinates, cause obvious approaches to yield unreliable estimates. When a reliable order parameter is available, the reactive flux theory of Chandler allows the rate constant to be extracted from the plateau region of an appropriate reactive flux function. However, when applied to real data from single-molecule experiments or molecular dynamics simulations, the rate can sometimes be difficult to extract due to the numerical differentiation of a noisy empirical correlation function or difficulty in locating the plateau region at low sampling frequencies. We present a modified version of this theory which does not require numerical derivatives, allowing rate constants to be robustly estimated from the time-correlation function directly. We compare these approaches using single-molecule force spectroscopy measurements of an RNA hairpin.

preprint2011arXiv

Splitting probabilities as a test of reaction coordinate choice in single-molecule experiments

To explain the observed dynamics in equilibrium single-molecule measurements of biomolecules, the experimental observable is often chosen as a putative reaction coordinate along which kinetic behavior is presumed to be governed by diffusive dynamics. Here, we invoke the splitting probability as a test of the suitability of such a proposed reaction coordinate. Comparison of the observed splitting probability with that computed from the kinetic model provides a simple test to reject poor reaction coordinates. We demonstrate this test for a force spectroscopy measurement of a DNA hairpin.

preprint2010arXiv

A simple theory of protein folding kinetics

We present a simple model of protein folding dynamics that captures key qualitative elements recently seen in all-atom simulations. The goals of this theory are to serve as a simple formalism for gaining deeper insight into the physical properties seen in detailed simulations as well as to serve as a model to easily compare why these simulations suggest a different kinetic mechanism than previous simple models. Specifically, we find that non-native contacts play a key role in determining the mechanism, which can shift dramatically as the energetic strength of non-native interactions is changed. For protein-like non-native interactions, our model finds that the native state is a kinetic hub, connecting the strength of relevant interactions directly to the nature of folding kinetics.

Vijay S. Pande

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

Physical machine learning outperforms "human learning" in Quantum Chemistry

Computationally Discovered Potentiating Role of Glycans on NMDA Receptors

Learning Protein Dynamics with Metastable Switching Systems

Efficient maximum likelihood parameterization of continuous-time Markov processes

Variational cross-validation of slow dynamical modes in molecular kinetics

Efficient inference of protein structural ensembles

Perspective: Markov Models for Long-Timescale Biomolecular Dynamics

Understanding Protein Dynamics with L1-Regularized Reversible Hidden Markov Models

Inferring the Rate-Length Law of Protein Folding

Probing the Origins of Two-State Folding

Reducing the effect of Metropolization on mixing times in molecular dynamics simulations

A robust approach to estimating rates from time-correlation functions

Splitting probabilities as a test of reaction coordinate choice in single-molecule experiments

A simple theory of protein folding kinetics