Source author record

Wing Hung Wong

Wing Hung Wong appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

7works
7topics
4close collaborators

Actions

Connect this record

Log in to claim

Research graph

See the researcher in context

Open full explorer

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2016arXiv

co-BPM: a Bayesian Model for Divergence Estimation

Divergence is not only an important mathematical concept in information theory, but also applied to machine learning problems such as low-dimensional embedding, manifold learning, clustering, classification, and anomaly detection. We proposed a bayesian model---co-BPM---to characterize the discrepancy of two sample sets, i.e., to estimate the divergence of their underlying distributions. In order to avoid the pitfalls of plug-in methods that estimate each density independently, our bayesian model attempts to learn a coupled binary partition of the sample space that best captures the landscapes of both distributions, then make direct inference on their divergences. The prior is constructed by leveraging the sequential buildup of the coupled binary partitions and the posterior is sampled via our specialized MCMC. Our model provides a unified way to estimate various types of divergences and enjoys convincing accuracy. We demonstrate its effectiveness through simulations, comparisons with the \emph{state-of-the-art} and a real data example.

preprint2016arXiv

Discovering and Visualizing Hierarchy in Multivariate Data

How to extract useful insights from data is always a challenge, especially if the data is multidimensional. Often, the data can be organized according to certain hierarchical structure that are stemmed either from data collection process or from the information and phenomena carried by the data itself. The current study attempts to discover and visualize these underlying hierarchies. By regarding each observation in the data as a draw from a (hypothetical) multidimensional joint density, our first goal is to approximate this unknown density with a piecewise constant function via binary partition, our non-parametric approach makes no assumptions on the form of the density. Given the piecewise constant density function and its corresponding binary partition, our second goal is to construct a connected graph and build up a tree representation of the data by level sets. To demonstrate that our method is a general data mining and visualization tool which can provide "multi-resolution" summaries and reveal different levels of information of the data, we apply it to two real data sets from Flow Cytometry and Social Network.

preprint2016arXiv

Learning a nonlinear dynamical system model of gene regulation: A perturbed steady-state approach

Biological structure and function depend on complex regulatory interactions between many genes. A wealth of gene expression data is available from high-throughput genome-wide measurement technologies, but effective gene regulatory network inference methods are still needed. Model-based methods founded on quantitative descriptions of gene regulation are among the most promising, but many such methods rely on simple, local models or on ad hoc inference approaches lacking experimental interpretability. We propose an experimental design and develop an associated statistical method for inferring a gene network by learning a standard quantitative, interpretable, predictive, biophysics-based ordinary differential equation model of gene regulation. We fit the model parameters using gene expression measurements from perturbed steady-states of the system, like those following overexpression or knockdown experiments. Although the original model is nonlinear, our design allows us to transform it into a convex optimization problem by restricting attention to steady-states and using the lasso for parameter selection. Here, we describe the model and inference algorithm and apply them to a synthetic six-gene system, demonstrating that the model is detailed and flexible enough to account for activation and repression as well as synergistic and self-regulation, and the algorithm can efficiently and accurately recover the parameters used to generate the data.

preprint2015arXiv

Multivariate Density Estimation via Adaptive Partitioning (I): Sieve MLE

We study a non-parametric approach to multivariate density estimation. The estimators are piecewise constant density functions supported by binary partitions. The partition of the sample space is learned by maximizing the likelihood of the corresponding histogram on that partition. We analyze the convergence rate of the sieve maximum likelihood estimator, and reach a conclusion that for a relatively rich class of density functions the rate does not directly depend on the dimension. This suggests that, under certain conditions, this method is immune to the curse of dimensionality, in the sense that it is possible to get close to the parametric rate even in high dimensions. We also apply this method to several special cases, and calculate the explicit convergence rates respectively.

preprint2015arXiv

Multivariate Density Estimation via Adaptive Partitioning (II): Posterior Concentration

In this paper, we study a class of non-parametric density estimators under Bayesian settings. The estimators are piecewise constant functions on binary partitions. We analyze the concentration rate of the posterior distribution under a suitable prior, and demonstrate that the rate does not directly depend on the dimension of the problem. This paper can be viewed as an extension of a parallel work where the convergence rate of a related sieve MLE was established. Compared to the sieve MLE, the main advantage of the Bayesian method is that it can adapt to the unknown complexity of the true density function, thus achieving the optimal convergence rate without artificial conditions on the density.

preprint2013arXiv

Computational Aspects of Optional Pólya Tree

Optional Pólya Tree (OPT) is a flexible non-parametric Bayesian model for density estimation. Despite its merits, the computation for OPT inference is challenging. In this paper we present time complexity analysis for OPT inference and propose two algorithmic improvements. The first improvement, named Limited-Lookahead Optional Pólya Tree (LL-OPT), aims at greatly accelerate the computation for OPT inference. The second improvement modifies the output of OPT or LL-OPT and produces a continuous piecewise linear density estimate. We demonstrate the performance of these two improvements using simulations.

preprint2011arXiv

Statistical Modeling of RNA-Seq Data

Recently, ultra high-throughput sequencing of RNA (RNA-Seq) has been developed as an approach for analysis of gene expression. By obtaining tens or even hundreds of millions of reads of transcribed sequences, an RNA-Seq experiment can offer a comprehensive survey of the population of genes (transcripts) in any sample of interest. This paper introduces a statistical model for estimating isoform abundance from RNA-Seq data and is flexible enough to accommodate both single end and paired end RNA-Seq data and sampling bias along the length of the transcript. Based on the derivation of minimal sufficient statistics for the model, a computationally feasible implementation of the maximum likelihood estimator of the model is provided. Further, it is shown that using paired end RNA-Seq provides more accurate isoform abundance estimates than single end sequencing at fixed sequencing depth. Simulation studies are also given.