Source author record

Samuel W. K. Wong

Samuel W. K. Wong appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Applications Biomolecules math.DS math.ST Methodology Populations and Evolution Statistics Theory

Catalog footprint

What is connected

7works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Monte Carlo sampling of flexible protein structures: an application to the SARS-CoV-2 omicron variant

Proteins can exhibit dynamic structural flexibility as they carry out their functions, especially in binding regions that interact with other molecules. For the key SARS-CoV-2 spike protein that facilitates COVID-19 infection, studies have previously identified several such highly flexible regions with therapeutic importance. However, protein structures available from the Protein Data Bank are presented as static snapshots that may not adequately depict this flexibility, and furthermore these cannot keep pace with new mutations and variants. In this paper we present a sequential Monte Carlo method for broadly sampling the 3-D conformational space of protein structure, according to the Boltzmann distribution of a given energy function. Our approach is distinct from previous sampling methods that focus on finding the lowest-energy conformation for predicting a single stable structure. We exemplify our method on the SARS-CoV-2 omicron variant as an application of timely interest. Our results identify sequence positions 495-508 as a key region where omicron mutations have the most impact on the space of possible conformations, which coincides with the findings of other preliminary studies on the binding properties of the omicron variant.

preprint2021arXiv

Comparing regional and provincial-wide COVID-19 models with physical distancing in British Columbia

We study the effects of physical distancing measures for the spread of COVID-19 in regional areas within British Columbia, using the reported cases of the five provincial Health Authorities. Building on the Bayesian epidemiological model of Anderson et al. (2020), we propose a hierarchical regional Bayesian model with time-varying regional parameters between March to December of 2020. In the absence of COVID-19 variants and vaccinations during this period, we examine the regionalized basic reproduction number, modelled prevalence, relative reduction in contact due to physical distancing, and proportion of anticipated cases that have been tested and reported. We observe significant differences between the regional and provincial-wide models and demonstrate the hierarchical regional model can better estimate regional prevalence, especially in rural regions. These results indicate that it can be useful to apply similar regional models to other parts of Canada or other countries.

preprint2021arXiv

Inference of dynamic systems from noisy and sparse data via manifold-constrained Gaussian processes

Parameter estimation for nonlinear dynamic system models, represented by ordinary differential equations (ODEs), using noisy and sparse data is a vital task in many fields. We propose a fast and accurate method, MAGI (MAnifold-constrained Gaussian process Inference), for this task. MAGI uses a Gaussian process model over time-series data, explicitly conditioned on the manifold constraint that derivatives of the Gaussian process must satisfy the ODE system. By doing so, we completely bypass the need for numerical integration and achieve substantial savings in computational time. MAGI is also suitable for inference with unobserved system components, which often occur in real experiments. MAGI is distinct from existing approaches as we provide a principled statistical construction under a Bayesian framework, which incorporates the ODE system through the manifold constraint. We demonstrate the accuracy and speed of MAGI using realistic examples based on physical experiments.

preprint2021arXiv

Statistical challenges in the analysis of sequence and structure data for the COVID-19 spike protein

As the major target of many vaccines and neutralizing antibodies against SARS-CoV-2, the spike (S) protein is observed to mutate over time. In this paper, we present statistical approaches to tackle some challenges associated with the analysis of S-protein data. We build a Bayesian hierarchical model to study the temporal and spatial evolution of S-protein sequences, after grouping the sequences into representative clusters. We then apply sampling methods to investigate possible changes to the S-protein's 3-D structure as a result of commonly observed mutations. While the increasing spread of D614G variants has been noted in other research, our results also show that the co-occurring mutations of D614G together with S477N or A222V may spread even more rapidly, as quantified by our model estimates.

preprint2020arXiv

Calibrating wood products for load duration and rate: A statistical look at three damage models

Lumber and wood-based products are versatile construction materials that are susceptible to weakening as a result of applied stresses. To assess the effects of load duration and rate, experiments have been carried out by applying preset load profiles to sample specimens. This paper studies these effects via a damage modeling approach, by considering three models in the literature: the Gerhards and Foschi accumulated damage models, and a degradation model based on the gamma process. We present a statistical framework for fitting these models to failure time data generated by a combination of ramp and constant load settings, and show how estimation uncertainty can be quantified. The models and methods are illustrated and compared via a novel analysis of a Hemlock lumber dataset. Practical usage of the fitted damage models is demonstrated with an application to long-term reliability prediction under stochastic future loadings.

preprint2020arXiv

On the circular correlation coefficients for bivariate von Mises distributions on a torus

This paper studies circular correlations for the bivariate von Mises sine and cosine distributions. These are two simple and appealing models for bivariate angular data with five parameters each that have interpretations comparable to those in the ordinary bivariate normal model. However, the variability and association of the angle pairs cannot be easily deduced from the model parameters unlike the bivariate normal. Thus to compute such summary measures, tools from circular statistics are needed. We derive analytic expressions and study the properties of the Jammalamadaka-Sarma and Fisher-Lee circular correlation coefficients for the von Mises sine and cosine models. Likelihood-based inference of these coefficients from sample data is then presented. The correlation coefficients are illustrated with numerical and visual examples, and the maximum likelihood estimators are assessed on simulated and real data, with comparisons to their non-parametric counterparts. Implementations of these computations for practical use are provided in our R package BAMBI.

preprint2020arXiv

Where Does Haydn End and Mozart Begin? Composer Classification of String Quartets

For centuries, the history and music of Joseph Franz Haydn and Wolfgang Amadeus Mozart have been compared by scholars. Recently, the growing field of music information retrieval (MIR) has offered quantitative analyses to complement traditional qualitative analyses of these composers. In this MIR study, we classify the composer of Haydn and Mozart string quartets based on the content of their scores. Our contribution is an interpretable statistical and machine learning approach that provides high classification accuracies and musical relevance. We develop novel global features that are automatically computed from symbolic data and informed by musicological Haydn-Mozart comparative studies, particularly relating to the sonata form. Several of these proposed features are found to be important for distinguishing between Haydn and Mozart string quartets. Our Bayesian logistic regression model attains leave-one-out classification accuracies over 84%, higher than prior works and providing interpretations that could aid in assessing musicological claims. Overall, our work can help expand the longstanding dialogue surrounding Haydn and Mozart and exemplify the benefit of interpretable machine learning in MIR, with potential applications to music generation and classification of other classical composers.

Samuel W. K. Wong

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Monte Carlo sampling of flexible protein structures: an application to the SARS-CoV-2 omicron variant

Comparing regional and provincial-wide COVID-19 models with physical distancing in British Columbia

Inference of dynamic systems from noisy and sparse data via manifold-constrained Gaussian processes

Statistical challenges in the analysis of sequence and structure data for the COVID-19 spike protein

Calibrating wood products for load duration and rate: A statistical look at three damage models

On the circular correlation coefficients for bivariate von Mises distributions on a torus

Where Does Haydn End and Mozart Begin? Composer Classification of String Quartets