Source author record

V. Roshan Joseph

V. Roshan Joseph appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation Methodology Applications Machine Learning

Catalog footprint

What is connected

10works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Enhancing Sample Quality through Minimum Energy Importance Weights

Importance sampling is a powerful tool for correcting the distributional mismatch in many statistical and machine learning problems, but in practice its performance is limited by the usage of simple proposals whose importance weights can be computed analytically. To address this limitation, Liu and Lee (2017) proposed a Black-Box Importance Sampling (BBIS) algorithm that computes the importance weights for arbitrary simulated samples by minimizing the kernelized Stein discrepancy. However, this requires knowing the score function of the target distribution, which is not easy to compute for many Bayesian problems. Hence, in this paper we propose another novel BBIS algorithm using minimum energy design, BBIS-MED, that requires only the unnormalized density function, which can be utilized as a post-processing step to improve the quality of Markov Chain Monte Carlo samples. We demonstrate the effectiveness and wide applicability of our proposed BBIS-MED algorithm on extensive simulations and a real-world Bayesian model calibration problem where the score function cannot be derived analytically.

preprint2022arXiv

Optimal Ratio for Data Splitting

It is common to split a dataset into training and testing sets before fitting a statistical or machine learning model. However, there is no clear guidance on how much data should be used for training and testing. In this article we show that the optimal splitting ratio is $\sqrt{p}:1$, where $p$ is the number of parameters in a linear regression model that explains the data well.

preprint2021arXiv

Constrained Minimum Energy Designs

Space-filling designs are important in computer experiments, which are critical for building a cheap surrogate model that adequately approximates an expensive computer code. Many design construction techniques in the existing literature are only applicable for rectangular bounded space, but in real world applications, the input space can often be non-rectangular because of constraints on the input variables. One solution to generate designs in a constrained space is to first generate uniformly distributed samples in the feasible region, and then use them as the candidate set to construct the designs. Sequentially Constrained Monte Carlo (SCMC) is the state-of-the-art technique for candidate generation, but it still requires large number of constraint evaluations, which is problematic especially when the constraints are expensive to evaluate. Thus, to reduce constraint evaluations and improve efficiency, we propose the Constrained Minimum Energy Design (CoMinED) that utilizes recent advances in deterministic sampling methods. Extensive simulation results on 15 benchmark problems with dimensions ranging from 2 to 13 are provided for demonstrating the improved performance of CoMinED over the existing methods.

preprint2021arXiv

Data Twinning

In this work, we develop a method named Twinning, for partitioning a dataset into statistically similar twin sets. Twinning is based on SPlit, a recently proposed model-independent method for optimally splitting a dataset into training and testing sets. Twinning is orders of magnitude faster than the SPlit algorithm, which makes it applicable to Big Data problems such as data compression. Twinning can also be used for generating multiple splits of a given dataset to aid divide-and-conquer procedures and $k$-fold cross validation.

preprint2020arXiv

Population Quasi-Monte Carlo

Monte Carlo methods are widely used for approximating complicated, multidimensional integrals for Bayesian inference. Population Monte Carlo (PMC) is an important class of Monte Carlo methods, which utilizes a population of proposals to generate weighted samples that approximate the target distribution. The generic PMC framework iterates over three steps: samples are simulated from a set of proposals, weights are assigned to such samples to correct for mismatch between the proposal and target distributions, and the proposals are then adapted via resampling from the weighted samples. When the target distribution is expensive to evaluate, the PMC has its computational limitation since the convergence rate is $\mathcal{O}(N^{-1/2})$. To address this, we propose in this paper a new Population Quasi-Monte Carlo (PQMC) framework, which integrates Quasi-Monte Carlo ideas within the sampling and adaptation steps of PMC. A key novelty in PQMC is the idea of importance support points resampling, a deterministic method for finding an "optimal" subsample from the weighted proposal samples. Moreover, within the PQMC framework, we develop an efficient covariance adaptation strategy for multivariate normal proposals. Lastly, a new set of correction weights is introduced for the weighted PMC estimator to improve the efficiency from the standard PMC estimator. We demonstrate the improved empirical convergence of PQMC over PMC in extensive numerical simulations and a friction drilling application.

preprint2016arXiv

Minimax and minimax projection designs using clustering

Minimax designs provide a uniform coverage of a design space $\mathcal{X} \subseteq \mathbb{R}^p$ by minimizing the maximum distance from any point in this space to its nearest design point. Although minimax designs have many useful applications, e.g., for optimal sensor allocation or as space-filling designs for computer experiments, there has been little work in developing algorithms for generating these designs, due to its computational complexity. In this paper, a new hybrid algorithm combining particle swarm optimization and clustering is proposed for generating minimax designs on any convex and bounded design space. The computation time of this algorithm scales linearly in dimension $p$, meaning our method can generate minimax designs efficiently for high-dimensional regions. Simulation studies and a real-world example show that the proposed algorithm provides improved minimax performance over existing methods on a variety of design spaces. Finally, we introduce a new type of experimental design called a minimax projection design, and show that this proposed design provides better minimax performance on projected subspaces of $\mathcal{X}$ compared to existing designs. An efficient implementation of these algorithms can be found in the R package minimaxdesign.

preprint2016arXiv

Orthogonal Gaussian process models

Gaussian processes models are widely adopted for nonparameteric/semi-parametric modeling. Identifiability issues occur when the mean model contains polynomials with unknown coefficients. Though resulting prediction is unaffected, this leads to poor estimation of the coefficients in the mean model, and thus the estimated mean model loses interpretability. This paper introduces a new Gaussian process model whose stochastic part is orthogonal to the mean part to address this issue. This paper also discusses applications to multi-fidelity simulations using data examples.

preprint2013arXiv

Composite Gaussian process models for emulating expensive functions

A new type of nonstationary Gaussian process model is developed for approximating computationally expensive functions. The new model is a composite of two Gaussian processes, where the first one captures the smooth global trend and the second one models local details. The new predictor also incorporates a flexible variance model, which makes it more capable of approximating surfaces with varying volatility. Compared to the commonly used stationary Gaussian process model, the new predictor is numerically more stable and can more accurately approximate complex surfaces when the experimental design is sparse. In addition, the new model can also improve the prediction intervals by quantifying the change of local variability associated with the response. Advantages of the new predictor are demonstrated using several examples.

preprint2012arXiv

Analysis of Computer Experiments with Functional Response

This paper is motivated by a computer experiment conducted for optimizing residual stresses in the machining of metals. Although kriging is widely used in the analysis of computer experiments, it cannot be easily applied to model the residual stresses because they are obtained as a profile. The high dimensionality caused by this functional response introduces severe computational challenges in kriging. It is well known that if the functional data are observed on a regular grid, the computations can be simplified using an application of Kronecker products. However, the case of irregular grid is quite complex. In this paper, we develop a Gibbs sampling-based expectation maximization algorithm, which converts the irregularly spaced data into a regular grid so that the Kronecker product-based approach can be employed for efficiently fitting a kriging model to the functional data.

preprint2010arXiv

Structured variable selection and estimation

In linear regression problems with related predictors, it is desirable to do variable selection and estimation by maintaining the hierarchical or structural relationships among predictors. In this paper we propose non-negative garrote methods that can naturally incorporate such relationships defined through effect heredity principles or marginality principles. We show that the methods are very easy to compute and enjoy nice theoretical properties. We also show that the methods can be easily extended to deal with more general regression problems such as generalized linear models. Simulations and real examples are used to illustrate the merits of the proposed methods.