Source author record

Glen M. Hocky

Glen M. Hocky appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

cond-mat.stat-mech physics.chem-ph Biomolecules Biological Physics cond-mat.soft Machine Learning

Catalog footprint

What is connected

9works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Large-Scale Multi-omic Biosequence Transformers for Modeling Protein-Nucleic Acid Interactions

The transformer architecture has revolutionized bioinformatics and driven progress in the understanding and prediction of the properties of biomolecules. To date, most biosequence transformers have been trained on single-omic data - either proteins or nucleic acids - and have seen incredible success in downstream tasks in each domain, with particularly noteworthy breakthroughs in protein structural modeling. However, single-omic pretraining limits the ability of these models to capture cross-modal interactions. Here we present OmniBioTE, the largest open-source multi-omic model trained on over 250 billion tokens of mixed protein and nucleic acid data. We show that despite only being trained on unlabeled sequence data, OmniBioTE learns joint representations mapping genes to their corresponding protein sequences. We further demonstrate that OmniBioTE achieves state-of-the-art results predicting the change in Gibbs free energy ({ΔG}) of the binding interaction between a given nucleic acid and protein. Remarkably, we show that multi-omic biosequence transformers emergently learn useful structural information without any a priori structural training, allowing us to predict which protein residues are most involved in the protein-nucleic acid binding interaction. Compared to single-omic controls trained with identical compute, OmniBioTE also demonstrates superior performance-per-FLOP across both multi-omic and single-omic benchmarks. Together, these results highlight the power of a unified modeling approach for biological sequences and establish OmniBioTE as a foundation model for multi-omic discovery.

preprint2022arXiv

Natural Language Processing Models That Automate Programming Will Transform Chemistry Research and Teaching

Natural language processing models have emerged that can generate usable software and automate a number of programming tasks with high fidelity. These tools have yet to have an impact on the chemistry community. Yet, our initial testing demonstrates that this form of Artificial Intelligence is poised to transform chemistry and chemical engineering research. Here, we review developments that brought us to this point, examine applications in chemistry, and give our perspective on how this may fundamentally alter research and teaching.

preprint2022arXiv

Size-and-Shape Space Gaussian Mixture Models for Structural Clustering of Molecular Dynamics Trajectories

Determining the optimal number and identity of structural clusters from an ensemble of molecular configurations continues to be a challenge. Recent structural clustering methods have focused on the use of internal coordinates due to the innate rotational and translational invariance of these features. The vast number of possible internal coordinates necessitates a feature space supervision step to make clustering tractable, but yields a protocol that can be system type specific. Particle positions offer an appealing alternative to internal coordinates, but suffer from a lack of rotational and translational invariance, as well as a perceived insensitivity to regions of structural dissimilarity. Here, we present a method, denoted shape-GMM, that overcomes the shortcomings of particle positions using a weighted maximum likelihood (ML) alignment procedure. This alignment strategy is then built into an expectation maximization Gaussian mixture model (GMM) procedure to capture metastable states in the free energy landscape. The resulting algorithm distinguishes between a variety of different structures, including those indistinguishable by RMSD and pair-wise distances, as demonstrated on several model systems. Shape-GMM results on an extensive simulation of the the fast-folding HP35 Nle/Nle mutant protein support a 4-state folding/unfolding mechanism which is consistent with previous experimental results and provides kinetic detail comparable to previous state of the art clustering approaches, as measured by the VAMP-2 score. Currently, training of shape-GMMs is recommended for systems (or subsystems) that can be represented by $\lesssim$ 200 particles and $\lesssim$ 100K configurations to estimate high-dimensional covariance matrices and balance computational expense. Once a shape-GMM is trained, it can be used to predict the cluster identities of millions of configurations.

preprint2021arXiv

Assessing models of force-dependent unbinding rates via infrequent metadynamics

Protein-ligand interactions are crucial for a wide range of physiological processes. Many cellular functions result in these non-covalent `bonds' being mechanically strained, and this can be integral to proper cellular function. Broadly, two classes of force dependence have been observed -- slip bonds, where unbinding rate increases, and catch bonds where unbinding rate decreases. Despite much theoretical work, we cannot we predict for which protein-ligand pairs, pulling coordinates, and forces a particular rate dependence will appear. Here, we assess the ability of MD simulations combined with enhanced sampling techniques to probe the force dependence of unbinding rates. We show that the infrequent metadynamics technique correctly produces both catch and slip bonding kinetics for model potentials. We then apply it to the well-studied case of a buckyball in a hydrophobic cavity, which appears to exhibit an ideal slip bond. Finally, we compute the force-dependent unbinding rate of biotin-streptavidin. Here, the complex nature of the unbinding process causes the infrequent metadynamics method to begin to break down due to the presence of unbinding intermediates, despite use of a previously optimized sampling coordinate. Allowing for this limitation, a combination of kinetic and free energy computations predict an overall slip bond for larger forces consistent with prior experimental results, although there are substantial deviations at small forces that require further investigation. This work demonstrates the promise of predicting force-dependent unbinding rates using enhanced sampling MD techniques, while also revealing the methodological barriers that must be overcome to tackle more complex targets in the future.

preprint2020arXiv

A Minimal Experimental Bias on the Hydrogen Bond Greatly Improves Ab Initio Molecular Dynamics Simulations of Water

Experiment Directed Simulations (EDS) is a method within a class of techniques seeking to improve molecular simulations by minimally biasing the system Hamiltonian to reproduce certain experimental observables. In a previous application of EDS to ab initio molecular dynamics (AIMD) simulation-based on electronic density functional theory (DFT), the AIMD simulations of water were biased to reproduce its experimentally derived solvation structure. In particular, by solely biasing the O-O pair correlation functions, other structural and dynamical properties that were not biased were improved. In this work, the hypothesis is tested that directly biasing the OH pair correlation, will provide an even better improvement of DFT-based water properties in AIMD simulations. The logic behind this hypothesis is that for most electronic DFT descriptions of water the hydrogen bonding is known to be deficient due to anomalous charge transfer and over polarization in the DFT. Using recent advances to the EDS learning algorithm, we thus train a minimal bias on AIMD water that reproduces the O-H radial distribution function derived from the highly accurate MB-pol model of water. It is then confirmed that biasing the O-H pair correlation alone can lead to improved AIMD water properties, with structural and dynamical properties in even closer to experiment than the previous EDS-AIMD model.

preprint2019arXiv

Infinite Switch Simulated Tempering in Force (FISST)

Many proteins in cells are capable of sensing and responding to piconewton scale forces, a regime in which conformational changes are small but significant for biological processes. In order to efficiently and effectively sample the response of these proteins to small forces, enhanced sampling techniques will be required. In this work, we derive, implement, and evaluate an efficient method to simultaneously sample the result of applying any constant pulling force within a specified range to a molecular system of interest. We start from Simulated Tempering in Force, whereby force is applied as a linear bias on a collective variable to the system's Hamiltonian, and the coefficient is taken as a continuous auxiliary degree of freedom. We derive a formula for an average collective-variable-dependent force, which depends on a set of weights, learned on-the-fly throughout a simulation, that reflect the limit where force varies infinitely quickly. These weights can then be used to retroactively compute averages of any observable at any force within the specified range. This technique is based on recent work deriving similar equations for Infinite Switch Simulated Tempering in Temperature, that showed the infinite switch limit is the most efficient for sampling. Here, we demonstrate that our method accurately and simultaneously samples molecular systems at all forces within a user defined force range, and show how it can serve as an enhanced sampling tool for cases where the pulling direction destabilizes states of low free-energy at zero-force. This method is implemented in, and will be freely-distributed with, the PLUMED open-source sampling library, and hence can be readily applied to problems using a wide range of molecular dynamics software packages.

preprint2014arXiv

Correlation of Local Order with Particle Mobility in Supercooled Liquids is Highly System Dependent

We investigate the connection between local structure and dynamical heterogeneity in supercooled liquids. Through the study of four different models we show that the correlation between a particle's mobility and the degree of local order in nearby regions is highly system dependent. Our results suggest that the correlation between local structure and dynamics is weak or absent in systems that conform well to the mean-field picture of glassy dynamics and strong in those that deviate from this paradigm. Finally, we investigate the role of order-agnostic point-to-set correlations and reveal that they provide similar information content to local structure measures, at least in the system where local order is most pronounced.

preprint2013arXiv

A small subset of normal modes mimics the properties of dynamical heterogeneity in a model supercooled liquid

In this work, we study the nature of transitions between inherent structures of a two-dimensional model supercooled liquid. We demonstrate that these transitions occur predominately along a small number of directions on the energy landscape. Moreover, we show that the number of such directions decreases as the temperature of the liquid is decreased in the mildly supercooled regime, in concert with earlier studies on an athermal jamming system. We show that this decrease happens in parallel with a change in character of the transitions as dynamics in the system become more heterogeneous and localized. We investigate the origin of these trends, which suggests interesting connections between jamming and thermal glassy phenomena.

preprint2012arXiv

Growing point-to-set length scale correlates with growing relaxation times in model supercooled liquids

It has been demonstrated recently that supercooled liquids sharing simple structural features (e.g. pair distribution functions) may exhibit strikingly distinct dynamical behavior. Here we show that a more subtle structural feature correlates with relaxation times in three simulated systems that have nearly identical radial distribution functions but starkly different dynamical behavior. In particular, for the first time we determine the thermodynamic "point-to-set" length scale in several canonical model systems and demonstrate the quantitative connection between this length scale and the growth of relaxation times. Our results provide clues necessary for distinguishing competing theories of the glass transition.