Source author record

Hugh Chipman

Hugh Chipman appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Machine Learning Computation Applications Artificial Intelligence Computer Vision Social and Information Networks

Catalog footprint

What is connected

9works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

MLCBART: Multilabel Classification with Bayesian Additive Regression Trees

Multilabel Classification (MLC) deals with the simultaneous classification of multiple binary labels. The task is challenging because, not only may there be arbitrarily different and complex relationships between predictor variables and each label, but associations among labels may exist even after accounting for effects of predictor variables. In this paper, we present a Bayesian additive regression tree (BART) framework to model the problem. BART is a nonparametric and flexible model structure capable of uncovering complex relationships within the data. Our adaptation, MLCBART, assumes that labels arise from thresholding an underlying numeric scale, where a multivariate normal model allows explicit estimation of the correlation structure among labels. This enables the discovery of complicated relationships in various forms and improves MLC predictive performance. Our Bayesian framework not only enables uncertainty quantification for each predicted label, but our MCMC draws produce an estimated conditional probability distribution of label combinations for any predictor values. Simulation experiments demonstrate the effectiveness of the proposed model by comparing its performance with a set of models, including the oracle model with the correct functional form. Results show that our model predicts vectors of labels more accurately than other contenders and its performance is close to the oracle model. An example highlights how the method's ability to produce measures of uncertainty on predictions provides nuanced understanding of classification results.

preprint2016arXiv

A Continuous-time Stochastic Block Model for Basketball Networks

For professional basketball, finding valuable and suitable players is the key to building a winning team. To deal with such challenges, basketball managers, scouts and coaches are increasingly turning to analytics. Objective evaluation of players and teams has always been the top goal of basketball analytics. Typical statistical analytics mainly focuses on the box score and has developed various metrics. In spite of the more and more advanced methods, metrics built upon box score statistics provide limited information about how players interact with each other. Two players with similar box scores may deliver distinct team plays. Thus professional basketball scouts have to watch real games to evaluate players. Live scouting is effective, but suffers from inefficiency and subjectivity. In this paper, we go beyond the static box score and model basketball games as dynamic networks. The proposed Continuous-time Stochastic Block Model clusters the players according to their playing style and performance. The model provides cluster-specific estimates of the effectiveness of players at scoring, rebounding, stealing, etc, and also captures player interaction patterns within and between clusters. By clustering similar players together, the model can help basketball scouts to narrow down the search space. Moreover, the model is able to reveal the subtle differences in the offensive strategies of different teams. An application to NBA basketball games illustrates the performance of the model.

preprint2014arXiv

Monotone Function Estimation for Computer Experiments

In statistical modeling of computer experiments sometimes prior information is available about the underlying function. For example, the physical system simulated by the computer code may be known to be monotone with respect to some or all inputs. We develop a Bayesian approach to Gaussian process modelling capable of incorporating monotonicity information for computer model emulation. Markov chain Monte Carlo methods are used to sample from the posterior distribution of the process given the simulator output and monotonicity information. The performance of the proposed approach in terms of predictive accuracy and uncertainty quantification is demonstrated in a number of simulated examples as well as a real queueing system application.

preprint2013arXiv

A new Bayesian ensemble of trees classifier for identifying multi-class labels in satellite images

Classification of satellite images is a key component of many remote sensing applications. One of the most important products of a raw satellite image is the classified map which labels the image pixels into meaningful classes. Though several parametric and non-parametric classifiers have been developed thus far, accurate labeling of the pixels still remains a challenge. In this paper, we propose a new reliable multiclass-classifier for identifying class labels of a satellite image in remote sensing applications. The proposed multiclass-classifier is a generalization of a binary classifier based on the flexible ensemble of regression trees model called Bayesian Additive Regression Trees (BART). We used three small areas from the LANDSAT 5 TM image, acquired on August 15, 2009 (path/row: 08/29, L1T product, UTM map projection) over Kings County, Nova Scotia, Canada to classify the land-use. Several prediction accuracy and uncertainty measures have been used to compare the reliability of the proposed classifier with the state-of-the-art classifiers in remote sensing.

preprint2013arXiv

GPfit: An R package for Gaussian Process Model Fitting using a New Optimization Algorithm

Gaussian process (GP) models are commonly used statistical metamodels for emulating expensive computer simulators. Fitting a GP model can be numerically unstable if any pair of design points in the input space are close together. Ranjan, Haynes, and Karsten (2011) proposed a computationally stable approach for fitting GP models to deterministic computer simulators. They used a genetic algorithm based approach that is robust but computationally intensive for maximizing the likelihood. This paper implements a slightly modified version of the model proposed by Ranjan et al. (2011), as the new R package GPfit. A novel parameterization of the spatial correlation function and a new multi-start gradient based optimization algorithm yield optimization that is robust and typically faster than the genetic algorithm based approach. We present two examples with R codes to illustrate the usage of the main functions in GPfit. Several test functions are used for performance comparison with a popular R package mlegp. GPfit is a free software and distributed under the general public license, as part of the R software project (R Development Core Team 2012).

preprint2012arXiv

A Short Note on Gaussian Process Modeling for Large Datasets using Graphics Processing Units

The graphics processing unit (GPU) has emerged as a powerful and cost effective processor for general performance computing. GPUs are capable of an order of magnitude more floating-point operations per second as compared to modern central processing units (CPUs), and thus provide a great deal of promise for computationally intensive statistical applications. Fitting complex statistical models with a large number of parameters and/or for large datasets is often very computationally expensive. In this study, we focus on Gaussian process (GP) models -- statistical models commonly used for emulating expensive computer simulators. We demonstrate that the computational cost of implementing GP models can be significantly reduced by using a CPU+GPU heterogeneous computing system over an analogous implementation on a traditional computing system with no GPU acceleration. Our small study suggests that GP models are fertile ground for further implementation on CPU+GPU systems.

preprint2012arXiv

Sequential Design for Computer Experiments with a Flexible Bayesian Additive Model

In computer experiments, a mathematical model implemented on a computer is used to represent complex physical phenomena. These models, known as computer simulators, enable experimental study of a virtual representation of the complex phenomena. Simulators can be thought of as complex functions that take many inputs and provide an output. Often these simulators are themselves expensive to compute, and may be approximated by "surrogate models" such as statistical regression models. In this paper we consider a new kind of surrogate model, a Bayesian ensemble of trees (Chipman et al. 2010), with the specific goal of learning enough about the simulator that a particular feature of the simulator can be estimated. We focus on identifying the simulator's global minimum. Utilizing the Bayesian version of the Expected Improvement criterion (Jones et al. 1998), we show that this ensemble is particularly effective when the simulator is ill-behaved, exhibiting nonstationarity or abrupt changes in the response. A number of illustrations of the approach are given, including a tidal power application.

preprint2010arXiv

Branch and Bound Algorithms for Maximizing Expected Improvement Functions

Deterministic computer simulations are often used as a replacement for complex physical experiments. Although less expensive than physical experimentation, computer codes can still be time-consuming to run. An effective strategy for exploring the response surface of the deterministic simulator is the use of an approximation to the computer code, such as a Gaussian process (GP) model, coupled with a sequential sampling strategy for choosing design points that can be used to build the GP model. The ultimate goal of such studies is often the estimation of specific features of interest of the simulator output, such as the maximum, minimum, or a level set (contour). Before approximating such features with the GP model, sufficient runs of the computer simulator must be completed. Sequential designs with an expected improvement (EI) function can yield good estimates of the features with a minimal number of runs. The challenge is that the expected improvement function itself is often multimodal and difficult to maximize. We develop branch and bound algorithms for efficiently maximizing the EI function in specific problems, including the simultaneous estimation of a minimum and a maximum, and in the estimation of a contour. These branch and bound algorithms outperform other optimization strategies such as genetic algorithms, and over a number of sequential design steps can lead to dramatically superior accuracy in estimation of features of interest.

preprint2010arXiv

Mixed-Membership Stochastic Block-Models for Transactional Networks

Transactional network data can be thought of as a list of one-to-many communications(e.g., email) between nodes in a social network. Most social network models convert this type of data into binary relations between pairs of nodes. We develop a latent mixed membership model capable of modeling richer forms of transactional network data, including relations between more than two nodes. The model can cluster nodes and predict transactions. The block-model nature of the model implies that groups can be characterized in very general ways. This flexible notion of group structure enables discovery of rich structure in transactional networks. Estimation and inference are accomplished via a variational EM algorithm. Simulations indicate that the learning algorithm can recover the correct generative model. Interesting structure is discovered in the Enron email dataset and another dataset extracted from the Reddit website. Analysis of the Reddit data is facilitated by a novel performance measure for comparing two soft clusterings. The new model is superior at discovering mixed membership in groups and in predicting transactions.

Hugh Chipman

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

MLCBART: Multilabel Classification with Bayesian Additive Regression Trees

A Continuous-time Stochastic Block Model for Basketball Networks

Monotone Function Estimation for Computer Experiments

A new Bayesian ensemble of trees classifier for identifying multi-class labels in satellite images

GPfit: An R package for Gaussian Process Model Fitting using a New Optimization Algorithm

A Short Note on Gaussian Process Modeling for Large Datasets using Graphics Processing Units

Sequential Design for Computer Experiments with a Flexible Bayesian Additive Model

Branch and Bound Algorithms for Maximizing Expected Improvement Functions

Mixed-Membership Stochastic Block-Models for Transactional Networks