Source author record

Jianhua Z. Huang

Jianhua Z. Huang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Applications math.ST Statistics Theory astro-ph.IM astro-ph.SR Biological Physics Computer Vision Methodology Quantitative Methods

Catalog footprint

What is connected

12works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2016arXiv

A Study of Functional Depths

Functional depth is used for ranking functional observations from most outlying to most typical. The ranks produced by functional depth have been proposed as the basis for functional classifiers, rank tests, and data visualization procedures. Many of the proposed functional depths are invariant to domain permutation, an unusual property for a functional data analysis procedure. Essentially these depths treat functional data as if it were multivariate data. In this work, we compare the performance of several existing functional depths to a simple adaptation of an existing multivariate depth notion, $L^\infty$ depth ($L^{\infty}D$). On simulated and real data, we show $L^{\infty}D$ has performance comparable or superior to several existing notions of functional depth. In addition, we review how depth functions are evaluated and propose some improvements. In particular, we show that empirical depth function asymptotics can be mis--leading and instead propose a new method, the rank--rank plot, for evaluating empirical depth rank stability.

preprint2016arXiv

Period estimation for sparsely-sampled quasi-periodic light curves applied to Miras

We develop a non-linear semi-parametric Gaussian process model to estimate periods of Miras with sparsely-sampled light curves. The model uses a sinusoidal basis for the periodic variation and a Gaussian process for the stochastic changes. We use maximum likelihood to estimate the period and the parameters of the Gaussian process, while integrating out the effects of other nuisance parameters in the model with respect to a suitable prior distribution obtained from earlier studies. Since the likelihood is highly multimodal for period, we implement a hybrid method that applies the quasi-Newton algorithm for Gaussian process parameters and search the period/frequency parameter over a dense grid. A large-scale, high-fidelity simulation is conducted to mimic the sampling quality of Mira light curves obtained by the M33 Synoptic Stellar Survey. The simulated data set is publicly available and can serve as a testbed for future evaluation of different period estimation methods. The semi-parametric model outperforms an existing algorithm on this simulated test data set as measured by period recovery rate and quality of the resulting Period-Luminosity relations.

preprint2014arXiv

Efficient semiparametric estimation in generalized partially linear additive models for longitudinal/clustered data

We consider efficient estimation of the Euclidean parameters in a generalized partially linear additive models for longitudinal/clustered data when multiple covariates need to be modeled nonparametrically, and propose an estimation procedure based on a spline approximation of the nonparametric part of the model and the generalized estimating equations (GEE). Although the model in consideration is natural and useful in many practical applications, the literature on this model is very limited because of challenges in dealing with dependent data for nonparametric additive models. We show that the proposed estimators are consistent and asymptotically normal even if the covariance structure is misspecified. An explicit consistent estimate of the asymptotic variance is also provided. Moreover, we derive the semiparametric efficiency score and information bound under general moment conditions. By showing that our estimators achieve the semiparametric information bound, we effectively establish their efficiency in a stronger sense than what is typically considered for GEE. The derivation of our asymptotic results relies heavily on the empirical processes tools that we develop for the longitudinal/clustered data. Numerical results are used to illustrate the finite sample performance of the proposed estimators.

preprint2013arXiv

Asymptotic optimality and efficient computation of the leave-subject-out cross-validation

Although the leave-subject-out cross-validation (CV) has been widely used in practice for tuning parameter selection for various nonparametric and semiparametric models of longitudinal data, its theoretical property is unknown and solving the associated optimization problem is computationally expensive, especially when there are multiple tuning parameters. In this paper, by focusing on the penalized spline method, we show that the leave-subject-out CV is optimal in the sense that it is asymptotically equivalent to the empirical squared error loss function minimization. An efficient Newton-type algorithm is developed to compute the penalty parameters that optimize the CV criterion. Simulated and real data are used to demonstrate the effectiveness of the leave-subject-out CV in selecting both the penalty parameters and the working correlation matrix.

preprint2013arXiv

Bayesian object classification of gold nanoparticles

The properties of materials synthesized with nanoparticles (NPs) are highly correlated to the sizes and shapes of the nanoparticles. The transmission electron microscopy (TEM) imaging technique can be used to measure the morphological characteristics of NPs, which can be simple circles or more complex irregular polygons with varying degrees of scales and sizes. A major difficulty in analyzing the TEM images is the overlapping of objects, having different morphological properties with no specific information about the number of objects present. Furthermore, the objects lying along the boundary render automated image analysis much more difficult. To overcome these challenges, we propose a Bayesian method based on the marked-point process representation of the objects. We derive models, both for the marks which parameterize the morphological aspects and the points which determine the location of the objects. The proposed model is an automatic image segmentation and classification procedure, which simultaneously detects the boundaries and classifies the NPs into one of the predetermined shape families. We execute the inference by sampling the posterior distribution using Markov chain Monte Carlo (MCMC) since the posterior is doubly intractable. We apply our novel method to several TEM imaging samples of gold NPs, producing the needed statistical characterization of their morphology.

preprint2013arXiv

Robust regularized singular value decomposition with application to mortality data

We develop a robust regularized singular value decomposition (RobRSVD) method for analyzing two-way functional data. The research is motivated by the application of modeling human mortality as a smooth two-way function of age group and year. The RobRSVD is formulated as a penalized loss minimization problem where a robust loss function is used to measure the reconstruction error of a low-rank matrix approximation of the data, and an appropriately defined two-way roughness penalty function is used to ensure smoothness along each of the two functional domains. By viewing the minimization problem as two conditional regularized robust regressions, we develop a fast iterative reweighted least squares algorithm to implement the method. Our implementation naturally incorporates missing values. Furthermore, our formulation allows rigorous derivation of leave-one-row/column-out cross-validation and generalized cross-validation criteria, which enable computationally efficient data-driven penalty parameter selection. The advantages of the new robust method over nonrobust ones are shown via extensive simulation studies and the mortality rate application.

preprint2012arXiv

A two-way regularization method for MEG source reconstruction

The MEG inverse problem refers to the reconstruction of the neural activity of the brain from magnetoencephalography (MEG) measurements. We propose a two-way regularization (TWR) method to solve the MEG inverse problem under the assumptions that only a small number of locations in space are responsible for the measured signals (focality), and each source time course is smooth in time (smoothness). The focality and smoothness of the reconstructed signals are ensured respectively by imposing a sparsity-inducing penalty and a roughness penalty in the data fitting criterion. A two-stage algorithm is developed for fast computation, where a raw estimate of the source time course is obtained in the first stage and then refined in the second stage by the two-way regularization. The proposed method is shown to be effective on both synthetic and real-world examples.

preprint2012arXiv

Covariance approximation for large multivariate spatial data sets with an application to multiple climate model errors

This paper investigates the cross-correlations across multiple climate model errors. We build a Bayesian hierarchical model that accounts for the spatial dependence of individual models as well as cross-covariances across different climate models. Our method allows for a nonseparable and nonstationary cross-covariance structure. We also present a covariance approximation approach to facilitate the computation in the modeling and analysis of very large multivariate spatial data sets. The covariance approximation consists of two parts: a reduced-rank part to capture the large-scale spatial dependence, and a sparse covariance matrix to correct the small-scale dependence error induced by the reduced rank approximation. We pay special attention to the case that the second part of the approximation has a block-diagonal structure. Simulation results of model fitting and prediction show substantial improvement of the proposed approximation over the predictive process approximation and the independent blocks analysis. We then apply our computational approach to the joint statistical modeling of multiple climate model errors.

preprint2012arXiv

Functional dynamic factor models with application to yield curve forecasting

Accurate forecasting of zero coupon bond yields for a continuum of maturities is paramount to bond portfolio management and derivative security pricing. Yet a universal model for yield curve forecasting has been elusive, and prior attempts often resulted in a trade-off between goodness of fit and consistency with economic theory. To address this, herein we propose a novel formulation which connects the dynamic factor model (DFM) framework with concepts from functional data analysis: a DFM with functional factor loading curves. This results in a model capable of forecasting functional time series. Further, in the yield curve context we show that the model retains economic interpretation. Model estimation is achieved through an expectation-maximization algorithm, where the time series parameters and factor loading curves are simultaneously estimated in a single step. Efficient computing is implemented and a data-driven smoothing parameter is nicely incorporated. We show that our model performs very well on forecasting actual yield data compared with existing approaches, especially in regard to profit-based assessment for an innovative trading exercise. We further illustrate the viability of our model to applications outside of yield forecasting.

preprint2011arXiv

Bootstrap consistency for general semiparametric $M$-estimation

Consider $M$-estimation in a semiparametric model that is characterized by a Euclidean parameter of interest and an infinite-dimensional nuisance parameter. As a general purpose approach to statistical inferences, the bootstrap has found wide applications in semiparametric $M$-estimation and, because of its simplicity, provides an attractive alternative to the inference approach based on the asymptotic distribution theory. The purpose of this paper is to provide theoretical justifications for the use of bootstrap as a semiparametric inferential tool. We show that, under general conditions, the bootstrap is asymptotically consistent in estimating the distribution of the $M$-estimate of Euclidean parameter; that is, the bootstrap distribution asymptotically imitates the distribution of the $M$-estimate. We also show that the bootstrap confidence set has the asymptotically correct coverage probability. These general conclusions hold, in particular, when the nuisance parameter is not estimable at root-$n$ rate, and apply to a broad class of bootstrap methods with exchangeable bootstrap weights. This paper provides a first general theoretical study of the bootstrap in semiparametric models.

preprint2010arXiv

Sparse logistic principal components analysis for binary data

We develop a new principal components analysis (PCA) type dimension reduction method for binary data. Different from the standard PCA which is defined on the observed data, the proposed PCA is defined on the logit transform of the success probabilities of the binary observations. Sparsity is introduced to the principal component (PC) loading vectors for enhanced interpretability and more stable extraction of the principal components. Our sparse PCA is formulated as solving an optimization problem with a criterion function motivated from a penalized Bernoulli likelihood. A Majorization--Minimization algorithm is developed to efficiently solve the optimization problem. The effectiveness of the proposed sparse logistic PCA method is illustrated by application to a single nucleotide polymorphism data set and a simulation study.

preprint2010arXiv

Use of multiple singular value decompositions to analyze complex intracellular calcium ion signals

We compare calcium ion signaling ($\mathrm {Ca}^{2+}$) between two exposures; the data are present as movies, or, more prosaically, time series of images. This paper describes novel uses of singular value decompositions (SVD) and weighted versions of them (WSVD) to extract the signals from such movies, in a way that is semi-automatic and tuned closely to the actual data and their many complexities. These complexities include the following. First, the images themselves are of no interest: all interest focuses on the behavior of individual cells across time, and thus, the cells need to be segmented in an automated manner. Second, the cells themselves have 100$+$ pixels, so that they form 100$+$ curves measured over time, so that data compression is required to extract the features of these curves. Third, some of the pixels in some of the cells are subject to image saturation due to bit depth limits, and this saturation needs to be accounted for if one is to normalize the images in a reasonably unbiased manner. Finally, the $\mathrm {Ca}^{2+}$ signals have oscillations or waves that vary with time and these signals need to be extracted. Thus, our aim is to show how to use multiple weighted and standard singular value decompositions to detect, extract and clarify the $\mathrm {Ca}^{2+}$ signals. Our signal extraction methods then lead to simple although finely focused statistical methods to compare $\mathrm {Ca}^{2+}$ signals across experimental conditions.

Jianhua Z. Huang

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

A Study of Functional Depths

Period estimation for sparsely-sampled quasi-periodic light curves applied to Miras

Efficient semiparametric estimation in generalized partially linear additive models for longitudinal/clustered data

Asymptotic optimality and efficient computation of the leave-subject-out cross-validation

Bayesian object classification of gold nanoparticles

Robust regularized singular value decomposition with application to mortality data

A two-way regularization method for MEG source reconstruction

Covariance approximation for large multivariate spatial data sets with an application to multiple climate model errors

Functional dynamic factor models with application to yield curve forecasting

Bootstrap consistency for general semiparametric $M$-estimation

Sparse logistic principal components analysis for binary data

Use of multiple singular value decompositions to analyze complex intracellular calcium ion signals