Researcher profile

Glen M. Hocky

Glen M. Hocky contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2026arXiv

Large-Scale Multi-omic Biosequence Transformers for Modeling Protein-Nucleic Acid Interactions

The transformer architecture has revolutionized bioinformatics and driven progress in the understanding and prediction of the properties of biomolecules. To date, most biosequence transformers have been trained on single-omic data - either proteins or nucleic acids - and have seen incredible success in downstream tasks in each domain, with particularly noteworthy breakthroughs in protein structural modeling. However, single-omic pretraining limits the ability of these models to capture cross-modal interactions. Here we present OmniBioTE, the largest open-source multi-omic model trained on over 250 billion tokens of mixed protein and nucleic acid data. We show that despite only being trained on unlabeled sequence data, OmniBioTE learns joint representations mapping genes to their corresponding protein sequences. We further demonstrate that OmniBioTE achieves state-of-the-art results predicting the change in Gibbs free energy ({ΔG}) of the binding interaction between a given nucleic acid and protein. Remarkably, we show that multi-omic biosequence transformers emergently learn useful structural information without any a priori structural training, allowing us to predict which protein residues are most involved in the protein-nucleic acid binding interaction. Compared to single-omic controls trained with identical compute, OmniBioTE also demonstrates superior performance-per-FLOP across both multi-omic and single-omic benchmarks. Together, these results highlight the power of a unified modeling approach for biological sequences and establish OmniBioTE as a foundation model for multi-omic discovery.

preprint2022arXiv

Natural Language Processing Models That Automate Programming Will Transform Chemistry Research and Teaching

Natural language processing models have emerged that can generate usable software and automate a number of programming tasks with high fidelity. These tools have yet to have an impact on the chemistry community. Yet, our initial testing demonstrates that this form of Artificial Intelligence is poised to transform chemistry and chemical engineering research. Here, we review developments that brought us to this point, examine applications in chemistry, and give our perspective on how this may fundamentally alter research and teaching.

preprint2022arXiv

Size-and-Shape Space Gaussian Mixture Models for Structural Clustering of Molecular Dynamics Trajectories

Determining the optimal number and identity of structural clusters from an ensemble of molecular configurations continues to be a challenge. Recent structural clustering methods have focused on the use of internal coordinates due to the innate rotational and translational invariance of these features. The vast number of possible internal coordinates necessitates a feature space supervision step to make clustering tractable, but yields a protocol that can be system type specific. Particle positions offer an appealing alternative to internal coordinates, but suffer from a lack of rotational and translational invariance, as well as a perceived insensitivity to regions of structural dissimilarity. Here, we present a method, denoted shape-GMM, that overcomes the shortcomings of particle positions using a weighted maximum likelihood (ML) alignment procedure. This alignment strategy is then built into an expectation maximization Gaussian mixture model (GMM) procedure to capture metastable states in the free energy landscape. The resulting algorithm distinguishes between a variety of different structures, including those indistinguishable by RMSD and pair-wise distances, as demonstrated on several model systems. Shape-GMM results on an extensive simulation of the the fast-folding HP35 Nle/Nle mutant protein support a 4-state folding/unfolding mechanism which is consistent with previous experimental results and provides kinetic detail comparable to previous state of the art clustering approaches, as measured by the VAMP-2 score. Currently, training of shape-GMMs is recommended for systems (or subsystems) that can be represented by $\lesssim$ 200 particles and $\lesssim$ 100K configurations to estimate high-dimensional covariance matrices and balance computational expense. Once a shape-GMM is trained, it can be used to predict the cluster identities of millions of configurations.

preprint2021arXiv

Assessing models of force-dependent unbinding rates via infrequent metadynamics

Protein-ligand interactions are crucial for a wide range of physiological processes. Many cellular functions result in these non-covalent `bonds' being mechanically strained, and this can be integral to proper cellular function. Broadly, two classes of force dependence have been observed -- slip bonds, where unbinding rate increases, and catch bonds where unbinding rate decreases. Despite much theoretical work, we cannot we predict for which protein-ligand pairs, pulling coordinates, and forces a particular rate dependence will appear. Here, we assess the ability of MD simulations combined with enhanced sampling techniques to probe the force dependence of unbinding rates. We show that the infrequent metadynamics technique correctly produces both catch and slip bonding kinetics for model potentials. We then apply it to the well-studied case of a buckyball in a hydrophobic cavity, which appears to exhibit an ideal slip bond. Finally, we compute the force-dependent unbinding rate of biotin-streptavidin. Here, the complex nature of the unbinding process causes the infrequent metadynamics method to begin to break down due to the presence of unbinding intermediates, despite use of a previously optimized sampling coordinate. Allowing for this limitation, a combination of kinetic and free energy computations predict an overall slip bond for larger forces consistent with prior experimental results, although there are substantial deviations at small forces that require further investigation. This work demonstrates the promise of predicting force-dependent unbinding rates using enhanced sampling MD techniques, while also revealing the methodological barriers that must be overcome to tackle more complex targets in the future.

preprint2020arXiv

A Minimal Experimental Bias on the Hydrogen Bond Greatly Improves Ab Initio Molecular Dynamics Simulations of Water

Experiment Directed Simulations (EDS) is a method within a class of techniques seeking to improve molecular simulations by minimally biasing the system Hamiltonian to reproduce certain experimental observables. In a previous application of EDS to ab initio molecular dynamics (AIMD) simulation-based on electronic density functional theory (DFT), the AIMD simulations of water were biased to reproduce its experimentally derived solvation structure. In particular, by solely biasing the O-O pair correlation functions, other structural and dynamical properties that were not biased were improved. In this work, the hypothesis is tested that directly biasing the OH pair correlation, will provide an even better improvement of DFT-based water properties in AIMD simulations. The logic behind this hypothesis is that for most electronic DFT descriptions of water the hydrogen bonding is known to be deficient due to anomalous charge transfer and over polarization in the DFT. Using recent advances to the EDS learning algorithm, we thus train a minimal bias on AIMD water that reproduces the O-H radial distribution function derived from the highly accurate MB-pol model of water. It is then confirmed that biasing the O-H pair correlation alone can lead to improved AIMD water properties, with structural and dynamical properties in even closer to experiment than the previous EDS-AIMD model.

preprint2019arXiv

Infinite Switch Simulated Tempering in Force (FISST)

Many proteins in cells are capable of sensing and responding to piconewton scale forces, a regime in which conformational changes are small but significant for biological processes. In order to efficiently and effectively sample the response of these proteins to small forces, enhanced sampling techniques will be required. In this work, we derive, implement, and evaluate an efficient method to simultaneously sample the result of applying any constant pulling force within a specified range to a molecular system of interest. We start from Simulated Tempering in Force, whereby force is applied as a linear bias on a collective variable to the system's Hamiltonian, and the coefficient is taken as a continuous auxiliary degree of freedom. We derive a formula for an average collective-variable-dependent force, which depends on a set of weights, learned on-the-fly throughout a simulation, that reflect the limit where force varies infinitely quickly. These weights can then be used to retroactively compute averages of any observable at any force within the specified range. This technique is based on recent work deriving similar equations for Infinite Switch Simulated Tempering in Temperature, that showed the infinite switch limit is the most efficient for sampling. Here, we demonstrate that our method accurately and simultaneously samples molecular systems at all forces within a user defined force range, and show how it can serve as an enhanced sampling tool for cases where the pulling direction destabilizes states of low free-energy at zero-force. This method is implemented in, and will be freely-distributed with, the PLUMED open-source sampling library, and hence can be readily applied to problems using a wide range of molecular dynamics software packages.