Source author record

Edsel A. Peña

Edsel A. Peña appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.ST Statistics Theory

Catalog footprint

What is connected

4works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2016arXiv

Bayesian quantile additive regression trees

Ensemble of regression trees have become popular statistical tools for the estimation of conditional mean given a set of predictors. However, quantile regression trees and their ensembles have not yet garnered much attention despite the increasing popularity of the linear quantile regression model. This work proposes a Bayesian quantile additive regression trees model that shows very good predictive performance illustrated using simulation studies and real data applications. Further extension to tackle binary classification problems is also considered.

preprint2016arXiv

MPBART - Multinomial Probit Bayesian Additive Regression Trees

This article proposes Multinomial Probit Bayesian Additive Regression Trees (MPBART) as a multinomial probit extension of BART - Bayesian Additive Regression Trees (Chipman et al (2010)). MPBART is flexible to allow inclusion of predictors that describe the observed units as well as the available choice alternatives. Through two simulation studies and four real data examples, we show that MPBART exhibits very good predictive performance in comparison to other discrete choice and multiclass classification methods. To implement MPBART, we have developed an R package mpbart available freely from CRAN repositories.

preprint2011arXiv

Bayes Multiple Decision Functions

This paper deals with the problem of simultaneously making many (M) binary decisions based on one realization of a random data matrix X. M is typically large and X will usually have M rows associated with each of the M decisions to make, but for each row the data may be low dimensional. A Bayesian decision-theoretic approach for this problem is implemented with the overall loss function being a cost-weighted linear combination of Type I and Type II loss functions. The class of loss functions considered allows for the use of the false discovery rate (FDR), false nondiscovery rate (FNR), and missed discovery rate (MDR) in assessing the decision. Through this Bayesian paradigm, the Bayes multiple decision function (BMDF) is derived and an efficient algorithm to obtain the optimal Bayes action is described. In contrast to many works in the literature where the rows of the matrix X are assumed to be stochastically independent, we allow in this paper a dependent data structure with the associations obtained through a class of frailty-induced Archimedean copulas. In particular, non-Gaussian dependent data structure, which is the norm rather than the exception when dealing with failure-time data, can be entertained. The numerical implementation of the determination of the Bayes optimal action is facilitated through sequential Monte Carlo techniques. The main theory developed could also be extended to the problem of multiple hypotheses testing, multiple classification and prediction, and high-dimensional variable selection. The proposed procedure is illustrated for the simple versus simple and for the composite hypotheses setting via simulation studies. The procedure is also applied to a subset of a real microarray data set from a colon cancer study.

preprint2011arXiv

Power-enhanced multiple decision functions controlling family-wise error and false discovery rates

Improved procedures, in terms of smaller missed discovery rates (MDR), for performing multiple hypotheses testing with weak and strong control of the family-wise error rate (FWER) or the false discovery rate (FDR) are developed and studied. The improvement over existing procedures such as the Šidák procedure for FWER control and the Benjamini--Hochberg (BH) procedure for FDR control is achieved by exploiting possible differences in the powers of the individual tests. Results signal the need to take into account the powers of the individual tests and to have multiple hypotheses decision functions which are not limited to simply using the individual $p$-values, as is the case, for example, with the Šidák, Bonferroni, or BH procedures. They also enhance understanding of the role of the powers of individual tests, or more precisely the receiver operating characteristic (ROC) functions of decision processes, in the search for better multiple hypotheses testing procedures. A decision-theoretic framework is utilized, and through auxiliary randomizers the procedures could be used with discrete or mixed-type data or with rank-based nonparametric tests. This is in contrast to existing $p$-value based procedures whose theoretical validity is contingent on each of these $p$-value statistics being stochastically equal to or greater than a standard uniform variable under the null hypothesis. Proposed procedures are relevant in the analysis of high-dimensional "large $M$, small $n$" data sets arising in the natural, physical, medical, economic and social sciences, whose generation and creation is accelerated by advances in high-throughput technology, notably, but not limited to, microarray technology.

Edsel A. Peña

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

Bayesian quantile additive regression trees

MPBART - Multinomial Probit Bayesian Additive Regression Trees

Bayes Multiple Decision Functions

Power-enhanced multiple decision functions controlling family-wise error and false discovery rates