Source author record

Edward O. Pyzer-Knapp

Edward O. Pyzer-Knapp appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning physics.chem-ph Quantitative Methods Biomolecules Cryptography and Security physics.comp-ph

Catalog footprint

What is connected

6works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Principled Method for the Creation of Synthetic Multi-fidelity Data Sets

Multifidelity and multioutput optimisation algorithms are of active interest in many areas of computational design as they allow cheaper computational proxies to be used intelligently to aid experimental searches for high-performing species. Characterisation of these algorithms involves benchmarks that typically either use analytic functions or existing multifidelity datasets. However, analytic functions are often not representative of relevant problems, while preexisting datasets do not allow systematic investigation of the influence of characteristics of the lower fidelity proxies. To bridge this gap, we present a methodology for systematic generation of synthetic fidelities derived from preexisting datasets. This allows for the construction of benchmarks that are both representative of practical optimisation problems while also allowing systematic investigation of the influence of the lower fidelity proxies.

preprint2022arXiv

Self-focusing virtual screening with active design space pruning

High-throughput virtual screening is an indispensable technique utilized in the discovery of small molecules. In cases where the library of molecules is exceedingly large, the cost of an exhaustive virtual screen may be prohibitive. Model-guided optimization has been employed to lower these costs through dramatic increases in sample efficiency compared to random selection. However, these techniques introduce new costs to the workflow through the surrogate model training and inference steps. In this study, we propose an extension to the framework of model-guided optimization that mitigates inferences costs using a technique we refer to as design space pruning (DSP), which irreversibly removes poor-performing candidates from consideration. We study the application of DSP to a variety of optimization tasks and observe significant reductions in overhead costs while exhibiting similar performance to the baseline optimization. DSP represents an attractive extension of model-guided optimization that can limit overhead costs in optimization settings where these costs are non-negligible relative to objective costs, such as docking.

preprint2020arXiv

Privacy-Preserving Gaussian Process Regression -- A Modular Approach to the Application of Homomorphic Encryption

Much of machine learning relies on the use of large amounts of data to train models to make predictions. When this data comes from multiple sources, for example when evaluation of data against a machine learning model is offered as a service, there can be privacy issues and legal concerns over the sharing of data. Fully homomorphic encryption (FHE) allows data to be computed on whilst encrypted, which can provide a solution to the problem of data privacy. However, FHE is both slow and restrictive, so existing algorithms must be manipulated to make them work efficiently under the FHE paradigm. Some commonly used machine learning algorithms, such as Gaussian process regression, are poorly suited to FHE and cannot be manipulated to work both efficiently and accurately. In this paper, we show that a modular approach, which applies FHE to only the sensitive steps of a workflow that need protection, allows one party to make predictions on their data using a Gaussian process regression model built from another party's data, without either party gaining access to the other's data, in a way which is both accurate and efficient. This construction is, to our knowledge, the first example of an effectively encrypted Gaussian process.

preprint2020arXiv

Using Bayesian Optimization to Accelerate Virtual Screening for the Discovery of Therapeutics Appropriate for Repurposing for COVID-19

The novel Wuhan coronavirus known as SARS-CoV-2 has brought almost unprecedented effects for a non-wartime setting, hitting social, economic and health systems hard.~ Being able to bring to bear pharmaceutical interventions to counteract its effects will represent a major turning point in the fight to turn the tides in this ongoing battle.~ Recently, the World's most powerful supercomputer, SUMMIT, was used to identify existing small molecule pharmaceuticals which may have the desired activity against SARS-CoV-2 through a high throughput virtual screening approach. In this communication, we demonstrate how the use of Bayesian optimization can provide a valuable service for the prioritisation of these calculations, leading to the accelerated identification of high-performing candidates, and thus expanding the scope of the utility of HPC systems for time critical screening

preprint2016arXiv

Space-Filling Curves as a Novel Crystal Structure Representation for Machine Learning Models

A fundamental problem in applying machine learning techniques for chemical problems is to find suitable representations for molecular and crystal structures. While the structure representations based on atom connectivities are prevalent for molecules, two-dimensional descriptors are not suitable for describing molecular crystals. In this work, we introduce the SFC-M family of feature representations, which are based on Morton space-filling curves, as an alternative means of representing crystal structures. Latent Semantic Indexing (LSI) was employed in a novel setting to reduce sparsity of feature representations. The quality of the SFC-M representations were assessed by using them in combination with artificial neural networks to predict Density Functional Theory (DFT) single point, Ewald summed, lattice, and many-body dispersion energies of 839 organic molecular crystal unit cells from the Cambridge Structural Database that consist of the elements C, H, N, and O. Promising initial results suggest that the SFC-M representations merit further exploration to improve its ability to predict solid-state properties of organic crystal structures

preprint2015arXiv

A Bayesian Approach to Calibrating High-Throughput Virtual Screening Results and Application to Organic Photovoltaic Materials

A novel approach for calibrating quantum-chemical properties determined as part of a high-throughput virtual screen to experimental analogs is presented. Information on the molecular graph is extracted through the use of extended connectivity fingerprints, and exploited using a Gaussian process to calibrate both electronic properties such as frontier orbital energies, and optical gaps and device properties such as short circuit current density, open circuit voltage and power conversion efficiency. The Bayesian nature of this process affords a value for uncertainty in addition to each calibrated value. This allows the researcher to gain intuition about the model as well as the ability to respect its bounds.