Source author record

Grace Wahba

Grace Wahba appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology math.ST Statistics Theory Applications Machine Learning

Catalog footprint

What is connected

6works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2014arXiv

Distance Shrinkage and Euclidean Embedding via Regularized Kernel Estimation

Although recovering an Euclidean distance matrix from noisy observations is a common problem in practice, how well this could be done remains largely unknown. To fill in this void, we study a simple distance matrix estimate based upon the so-called regularized kernel estimate. We show that such an estimate can be characterized as simply applying a constant amount of shrinkage to all observed pairwise distances. This fact allows us to establish risk bounds for the estimate implying that the true distances can be estimated consistently in an average sense as the number of objects increases. In addition, such a characterization suggests an efficient algorithm to compute the distance matrix estimator, as an alternative to the usual second order cone programming known not to scale well for large problems. Numerical experiments and an application in visualizing the diversity of Vpu protein sequences from a recent HIV-1 study further demonstrate the practical merits of the proposed method.

preprint2014arXiv

Using distance covariance for improved variable selection with applications to genetic risk models

Variable selection is of increasing importance to address the difficulties of high dimensionality in many scientific areas. In this paper, we demonstrate a property for distance covariance, which is incorporated in a novel feature screening procedure together with the use of distance correlation. The approach makes no distributional assumptions for the variables and does not require the specification of a regression model, and hence is especially attractive in variable selection given an enormous number of candidate attributes without much information about the true model with the response. The method is applied to two genetic risk problems, where issues including uncertainty of variable selection via cross validation, subgroup of hard-to-classify cases and the application of a reject option are discussed.

preprint2013arXiv

Group variable selection via convex Log-Exp-Sum penalty with application to a breast cancer survivor study

In many scientific and engineering applications, covariates are naturally grouped. When the group structures are available among covariates, people are usually interested in identifying both important groups and important variables within the selected groups. Among existing successful group variable selection methods, some methods fail to conduct the within group selection. Some methods are able to conduct both group and within group selection, but the corresponding objective functions are non-convex. Such a non-convexity may require extra numerical effort. In this paper, we propose a novel Log-Exp-Sum(LES) penalty for group variable selection. The LES penalty is strictly convex. It can identify important groups as well as select important variables within the group. We develop an efficient group-level coordinate descent algorithm to fit the model. We also derive non-asymptotic error bounds and asymptotic group selection consistency for our method in the high-dimensional setting where the number of covariates can be much larger than the sample size. Numerical results demonstrate the good performance of our method in both variable selection and prediction. We applied the proposed method to an American Cancer Society breast cancer survivor dataset. The findings are clinically meaningful and lead immediately to testable clinical hypotheses.

preprint2013arXiv

Multivariate Bernoulli distribution

In this paper, we consider the multivariate Bernoulli distribution as a model to estimate the structure of graphs with binary nodes. This distribution is discussed in the framework of the exponential family, and its statistical properties regarding independence of the nodes are demonstrated. Importantly the model can estimate not only the main effects and pairwise interactions among the nodes but also is capable of modeling higher order interactions, allowing for the existence of complex clique effects. We compare the multivariate Bernoulli model with existing graphical inference models - the Ising model and the multivariate Gaussian model, where only the pairwise interactions are considered. On the other hand, the multivariate Bernoulli distribution has an interesting property in that independence and uncorrelatedness of the component random variables are equivalent. Both the marginal and conditional distributions of a subset of variables in the multivariate Bernoulli distribution still follow the multivariate Bernoulli distribution. Furthermore, the multivariate Bernoulli logistic model is developed under generalized linear model theory by utilizing the canonical link function in order to include covariate information on the nodes, edges and cliques. We also consider variable selection techniques such as LASSO in the logistic model to impose sparsity structure on the graph. Finally, we discuss extending the smoothing spline ANOVA approach to the multivariate Bernoulli logistic model to enable estimation of non-linear effects of the predictor variables.

preprint2013arXiv

Statistical Model Building, Machine Learning, and the Ah-Ha Moment

The Committee of Presidents of Statistical Societies (COPSS) will celebrate its 50th Anniversary in 2013. As part of its celebration, COPSS intends to publish a book with contributions from the past recipients of its four awards, namely the Fisher Lecture Award, the President's Award, the Elizabeth Scott Award, and the FN David Award. The theme of the book is Past, Present and Future of Statistical Science. As a winner of the Elizabeth Scott Award, I have been invited to contribute. We were given several topics to choose from and I have chosen to focus on "Statistical Career: Your reflection on your own career, lessons and experience you have learned, and advice you would like to provide to young statisticians if sought." This article is my contribution.

preprint2010arXiv

Penalized Likelihood Regression in Reproducing Kernel Hilbert Spaces with Randomized Covariate Data

Classical penalized likelihood regression problems deal with the case that the independent variables data are known exactly. In practice, however, it is common to observe data with incomplete covariate information. We are concerned with a fundamentally important case where some of the observations do not represent the exact covariate information, but only a probability distribution. In this case, the maximum penalized likelihood method can be still applied to estimating the regression function. We first show that the maximum penalized likelihood estimate exists under a mild condition. In the computation, we propose a dimension reduction technique to minimize the penalized likelihood and derive a GACV (Generalized Approximate Cross Validation) to choose the smoothing parameter. Our methods are extended to handle more complicated incomplete data problems, such as, covariate measurement error and partially missing covariates.

Grace Wahba

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Distance Shrinkage and Euclidean Embedding via Regularized Kernel Estimation

Using distance covariance for improved variable selection with applications to genetic risk models

Group variable selection via convex Log-Exp-Sum penalty with application to a breast cancer survivor study

Multivariate Bernoulli distribution

Statistical Model Building, Machine Learning, and the Ah-Ha Moment

Penalized Likelihood Regression in Reproducing Kernel Hilbert Spaces with Randomized Covariate Data