Source author record

Oscar Hernan Madrid Padilla

Oscar Hernan Madrid Padilla appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Machine Learning math.ST Statistics Theory Computation econ.EM math.OC

Catalog footprint

What is connected

16works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Change point localization in dependent dynamic nonparametric random dot product graphs

In this paper, we study the offline change point localization problem in a sequence of dependent nonparametric random dot product graphs. To be specific, assume that at every time point, a network is generated from a nonparametric random dot product graph model \citep[see e.g.][]{athreya2017statistical}, where the latent positions are generated from unknown underlying distributions. The underlying distributions are piecewise constant in time and change at unknown locations, called change points. Most importantly, we allow for dependence among networks generated between two consecutive change points. This setting incorporates edge-dependence within networks and temporal dependence between networks, which is the most flexible setting in the published literature. To accomplish the task of consistently localizing change points, we propose a novel change point detection algorithm, consisting of two steps. First, we estimate the latent positions of the random dot product model, our theoretical result being a refined version of the state-of-the-art results, allowing the dimension of the latent positions to diverge. Subsequently, we construct a nonparametric version of the CUSUM statistic \citep[e.g.][]{Page1954, padilla2019optimal} that allows for temporal dependence. Consistent localization is proved theoretically and supported by extensive numerical experiments, which illustrate state-of-the-art performance. We also provide in depth discussion of possible extensions to give more understanding and insights.

preprint2022arXiv

Dynamic and heterogeneous treatment effects with abrupt changes

From personalised medicine to targeted advertising, it is an inherent task to provide a sequence of decisions with historical covariates and outcome data. This requires understanding of both the dynamics and heterogeneity of treatment effects. In this paper, we are concerned with detecting abrupt changes in the treatment effects in terms of the conditional average treatment effect (CATE) in a sequential fashion. To be more specific, at each time point, we consider a nonparametric model to allow for maximal flexibility and robustness. Along the time, we allow for temporal dependence on historical covariates and noise functions. We provide a kernel-based change point estimator, which is shown to be consistent in terms of its detection delay, under an average run length control. Numerical results are provided to support our theoretical findings.

preprint2022arXiv

Extensions to the Proximal Distance Method of Constrained Optimization

The current paper studies the problem of minimizing a loss $f(\boldsymbol{x})$ subject to constraints of the form $\boldsymbol{D}\boldsymbol{x} \in S$, where $S$ is a closed set, convex or not, and $\boldsymbol{D}$ is a matrix that fuses parameters. Fusion constraints can capture smoothness, sparsity, or more general constraint patterns. To tackle this generic class of problems, we combine the Beltrami-Courant penalty method with the proximal distance principle. The latter is driven by minimization of penalized objectives $f(\boldsymbol{x})+\fracρ{2}\text{dist}(\boldsymbol{D}\boldsymbol{x},S)^2$ involving large tuning constants $ρ$ and the squared Euclidean distance of $\boldsymbol{D}\boldsymbol{x}$ from $S$. The next iterate $\boldsymbol{x}_{n+1}$ of the corresponding proximal distance algorithm is constructed from the current iterate $\boldsymbol{x}_n$ by minimizing the majorizing surrogate function $f(\boldsymbol{x})+\fracρ{2}\|\boldsymbol{D}\boldsymbol{x}-\mathcal{P}_{S}(\boldsymbol{D}\boldsymbol{x}_n)\|^2$. For fixed $ρ$ and a subanalytic loss $f(\boldsymbol{x})$ and a subanalytic constraint set $S$, we prove convergence to a stationary point. Under stronger assumptions, we provide convergence rates and demonstrate linear local convergence. We also construct a steepest descent (SD) variant to avoid costly linear system solves. To benchmark our algorithms, we compare against the alternating direction method of multipliers (ADMM). Our extensive numerical tests include problems on metric projection, convex regression, convex clustering, total variation image denoising, and projection of a matrix to a good condition number. These experiments demonstrate the superior speed and acceptable accuracy of our steepest variant on high-dimensional problems.

preprint2022arXiv

Feature Grouping and Sparse Principal Component Analysis with Truncated Regularization

In this paper, we consider a new variant for principal component analysis (PCA), aiming to capture the grouping and/or sparse structures of factor loadings simultaneously. To achieve these goals, we employ a non-convex truncated regularization with naturally adjustable sparsity and grouping effects, and propose the Feature Grouping and Sparse Principal Component Analysis (FGSPCA). The proposed FGSPCA method encourages the factor loadings with similar values to collapse into disjoint homogeneous groups for feature grouping or into a special zero-valued group for feature selection, which in turn helps reducing model complexity and increasing model interpretation. Usually, existing structured PCA methods require prior knowledge to construct the regularization term. However, the proposed FGSPCA can simultaneously capture the grouping and/or sparse structures of factor loadings without any prior information. To solve the resulting non-convex optimization problem, we propose an alternating algorithm that incorporates the difference-of-convex programming, augmented Lagrange method and coordinate descent method. Experimental results demonstrate the promising performance and efficiency of the new method on both synthetic and real-world datasets. An R implementation of FGSPCA can be found on github {https://github.com/higeeks/FGSPCA}.

preprint2022arXiv

High Dimensional Latent Panel Quantile Regression with an Application to Asset Pricing

We propose a generalization of the linear panel quantile regression model to accommodate both \textit{sparse} and \textit{dense} parts: sparse means while the number of covariates available is large, potentially only a much smaller number of them have a nonzero impact on each conditional quantile of the response variable; while the dense part is represent by a low-rank matrix that can be approximated by latent factors and their loadings. Such a structure poses problems for traditional sparse estimators, such as the $\ell_1$-penalised Quantile Regression, and for traditional latent factor estimator, such as PCA. We propose a new estimation procedure, based on the ADMM algorithm, consists of combining the quantile loss function with $\ell_1$ \textit{and} nuclear norm regularization. We show, under general conditions, that our estimator can consistently estimate both the nonzero coefficients of the covariates and the latent low-rank matrix. Our proposed model has a "Characteristics + Latent Factors" Asset Pricing Model interpretation: we apply our model and estimator with a large-dimensional panel of financial data and find that (i) characteristics have sparser predictive power once latent factors were controlled (ii) the factors and coefficients at upper and lower quantiles are different from the median.

preprint2022arXiv

Kernel Biclustering algorithm in Hilbert Spaces

Biclustering algorithms partition data and covariates simultaneously, providing new insights in several domains, such as analyzing gene expression to discover new biological functions. This paper develops a new model-free biclustering algorithm in abstract spaces using the notions of energy distance (ED) and the maximum mean discrepancy (MMD) -- two distances between probability distributions capable of handling complex data such as curves or graphs. The proposed method can learn more general and complex cluster shapes than most existing literature approaches, which usually focus on detecting mean and variance differences. Although the biclustering configurations of our approach are constrained to create disjoint structures at the datum and covariate levels, the results are competitive. Our results are similar to state-of-the-art methods in their optimal scenarios, assuming a proper kernel choice, outperforming them when cluster differences are concentrated in higher-order moments. The model's performance has been tested in several situations that involve simulated and real-world datasets. Finally, new theoretical consistency results are established using some tools of the theory of optimal transport.

preprint2022arXiv

Optimal partition recovery in general graphs

We consider a graph-structured change point problem in which we observe a random vector with piecewise constant but unknown mean and whose independent, sub-Gaussian coordinates correspond to the $n$ nodes of a fixed graph. We are interested in the localisation task of recovering the partition of the nodes associated to the constancy regions of the mean vector. When the partition $\mathcal{S}$ consists of only two elements, we characterise the difficulty of the localisation problem in terms of four key parameters: the maximal noise variance $σ^2$, the size $Δ$ of the smaller element of the partition, the magnitude $κ$ of the difference in the signal values across contiguous elements of the partition and the sum of the effective resistance edge weights $|\partial_r(\mathcal{S})|$ of the corresponding cut -- a graph theoretic quantity quantifying the size of the partition boundary. In particular, we demonstrate an information theoretical lower bound implying that, in the low signal-to-noise ratio regime $κ^2 Δσ^{-2} |\partial_r(\mathcal{S})|^{-1} \lesssim 1$, no consistent estimator of the true partition exists. On the other hand, when $κ^2 Δσ^{-2} |\partial_r(\mathcal{S})|^{-1} \gtrsim ζ_n \log\{r(|E|)\}$, with $r(|E|)$ being the sum of effective resistance weighted edges and $ζ_n$ being any diverging sequence in $n$, we show that a polynomial-time, approximate $\ell_0$-penalised least squared estimator delivers a localisation error -- measured by the symmetric difference between the true and estimated partition -- of order $ κ^{-2} σ^2 |\partial_r(\mathcal{S})| \log\{r(|E|)\}$. Aside from the $\log\{r(|E|)\}$ term, this rate is minimax optimal. Finally, we provide discussions on the localisation error for more general partitions of unknown sizes.

preprint2022arXiv

Sequentially learning the topological ordering of causal directed acyclic graphs with likelihood ratio scores

Causal discovery, the learning of causality in a data mining scenario, has been of strong scientific and theoretical interest as a starting point to identify "what causes what?" Contingent on assumptions and a proper learning algorithm, it is sometimes possible to identify and accurately estimate a causal directed acyclic graph (DAG), as opposed to a Markov equivalence class of graphs that gives ambiguity of causal directions. The focus of this paper is in highlighting the identifiability and estimation of DAGs with general error distributions through a general sequential sorting procedure that orders variables one at a time, starting at root nodes, followed by children of the root nodes, and so on until completion. We demonstrate a novel application of this general approach to estimate the topological ordering of a DAG. At each step of the procedure, only simple likelihood ratio scores are calculated on regression residuals to decide the next node to append to the current partial ordering. The computational complexity of our algorithm on a p-node problem is O(pd), where d is the maximum neighborhood size. Under mild assumptions, the population version of our procedure provably identifies a true ordering of the underlying DAG. We provide extensive numerical evidence to demonstrate that this sequential procedure scales to possibly thousands of nodes and works well for high-dimensional data. We accompany these numerical experiments with an application to a single-cell gene expression dataset.

preprint2021arXiv

Optimal network online change point localisation

We study the problem of online network change point detection. In this setting, a collection of independent Bernoulli networks is collected sequentially, and the underlying distributions change when a change point occurs. The goal is to detect the change point as quickly as possible, if it exists, subject to a constraint on the number or probability of false alarms. In this paper, on the detection delay, we establish a minimax lower bound and two upper bounds based on NP-hard algorithms and polynomial-time algorithms, i.e., \[ \mbox{detection delay} \begin{cases} \gtrsim \log(1/α) \frac{\max\{r^2/n, \, 1\}}{κ_0^2 n ρ},\\ \lesssim \log(Δ/α) \frac{\max\{r^2/n, \, \log(r)\}}{κ_0^2 n ρ}, & \mbox{with NP-hard algorithms},\\ \lesssim \log(Δ/α) \frac{r}{κ_0^2 n ρ}, & \mbox{with polynomial-time algorithms}, \end{cases} \] where $κ_0, n, ρ, r$ and $α$ are the normalised jump size, network size, entrywise sparsity, rank sparsity and the overall Type-I error upper bound. All the model parameters are allowed to vary as $Δ$, the location of the change point, diverges. The polynomial-time algorithms are novel procedures that we propose in this paper, designed for quick detection under two different forms of Type-I error control. The first is based on controlling the overall probability of a false alarm when there are no change points, and the second is based on specifying a lower bound on the expected time of the first false alarm. Extensive experiments show that, under different scenarios and the aforementioned forms of Type-I error control, our proposed approaches outperform state-of-the-art methods.

preprint2020arXiv

Optimal nonparametric multivariate change point detection and localization

We study the multivariate nonparametric change point detection problem, where the data are a sequence of independent $p$-dimensional random vectors whose distributions are piecewise-constant with Lipschitz densities changing at unknown times, called change points. We quantify the size of the distributional change at any change point with the supremum norm of the difference between the corresponding densities. We are concerned with the localization task of estimating the positions of the change points. In our analysis, we allow for the model parameters to vary with the total number of time points, including the minimal spacing between consecutive change points and the magnitude of the smallest distributional change. We provide information-theoretic lower bounds on both the localization rate and the minimal signal-to-noise ratio required to guarantee consistent localization. We formulate a novel algorithm based on kernel density estimation that nearly achieves the minimax lower bound, save possibly for logarithm factors. We have provided extensive numerical evidence to support our theoretical findings.

preprint2020arXiv

Optimal post-selection inference for sparse signals: a nonparametric empirical-Bayes approach

Many recently developed Bayesian methods have focused on sparse signal detection. However, much less work has been done addressing the natural follow-up question: how to make valid inferences for the magnitude of those signals after selection. Ordinary Bayesian credible intervals suffer from selection bias, owing to the fact that the target of inference is chosen adaptively. Existing Bayesian approaches for correcting this bias produce credible intervals with poor frequentist properties, while existing frequentist approaches require sacrificing the benefits of shrinkage typical in Bayesian methods, resulting in confidence intervals that are needlessly wide. We address this gap by proposing a nonparametric empirical-Bayes approach for constructing optimal selection-adjusted confidence sets. Our method produces confidence sets that are as short as possible on average, while both adjusting for selection and maintaining exact frequentist coverage uniformly over the parameter space. Our main theoretical result establishes an important consistency property of our procedure: that under mild conditions, it asymptotically converges to the results of an oracle-Bayes analysis in which the prior distribution of signal sizes is known exactly. Across a series of examples, the method outperforms existing frequentist techniques for post-selection inference, producing confidence sets that are notably shorter but with the same coverage guarantee.

preprint2016arXiv

Nonparametric density estimation by histogram trend filtering

We propose a novel approach for density estimation called histogram trend filtering. Our estimator arises from looking at surrogate Poisson model for counts of observations in a partition of the support of the data. We begin by showing consistency for a variational estimator for this density estimation problem. We then study a discrete estimator that can be efficiently found via convex optimization. We show that the estimator enjoys strong statistical guarantees, yet is much more practical and computationally efficient than other estimators that enjoy similar guarantees. Finally, in our simulation study the proposed method showed smaller averaged mean square error than competing methods. This favorable blend of properties makes histogram trend filtering an ideal candidate for use in routine data-analysis applications that call for a quick, efficient, accurate density estimate.

preprint2016arXiv

Sequential nonparametric tests for a change in distribution: an application to detecting radiological anomalies

We propose a sequential nonparametric test for detecting a change in distribution, based on windowed Kolmogorov--Smirnov statistics. The approach is simple, robust, highly computationally efficient, easy to calibrate, and requires no parametric assumptions about the underlying null and alternative distributions. We show that both the false-alarm rate and the power of our procedure are amenable to rigorous analysis, and that the method outperforms existing sequential testing procedures in practice. We then apply the method to the problem of detecting radiological anomalies, using data collected from measurements of the background gamma-radiation spectrum on a large university campus. In this context, the proposed method leads to substantial improvements in time-to-detection for the kind of radiological anomalies of interest in law-enforcement and border-security applications.

preprint2016arXiv

Tensor decomposition with generalized lasso penalties

We present an approach for penalized tensor decomposition (PTD) that estimates smoothly varying latent factors in multi-way data. This generalizes existing work on sparse tensor decomposition and penalized matrix decompositions, in a manner parallel to the generalized lasso for regression and smoothing problems. Our approach presents many nontrivial challenges at the intersection of modeling and computation, which are studied in detail. An efficient coordinate-wise optimization algorithm for (PTD) is presented, and its convergence properties are characterized. The method is applied both to simulated data and real data on flu hospitalizations in Texas. These results show that our penalized tensor decomposition can offer major improvements on existing methods for analyzing multi-way data that exhibit smooth spatial or temporal features.

preprint2015arXiv

Priors for Random Count Matrices Derived from a Family of Negative Binomial Processes

We define a family of probability distributions for random count matrices with a potentially unbounded number of rows and columns. The three distributions we consider are derived from the gamma-Poisson, gamma-negative binomial, and beta-negative binomial processes. Because the models lead to closed-form Gibbs sampling update equations, they are natural candidates for nonparametric Bayesian priors over count matrices. A key aspect of our analysis is the recognition that, although the random count matrices within the family are defined by a row-wise construction, their columns can be shown to be i.i.d. This fact is used to derive explicit formulas for drawing all the columns at once. Moreover, by analyzing these matrices' combinatorial structure, we describe how to sequentially construct a column-i.i.d. random count matrix one row at a time, and derive the predictive distribution of a new row count vector with previously unseen features. We describe the similarities and differences between the three priors, and argue that the greater flexibility of the gamma- and beta- negative binomial processes, especially their ability to model over-dispersed, heavy-tailed count data, makes these well suited to a wide variety of real-world applications. As an example of our framework, we construct a naive-Bayes text classifier to categorize a count vector to one of several existing random count matrices of different categories. The classifier supports an unbounded number of features, and unlike most existing methods, it does not require a predefined finite vocabulary to be shared by all the categories, and needs neither feature selection nor parameter tuning. Both the gamma- and beta- negative binomial processes are shown to significantly outperform the gamma-Poisson process for document categorization, with comparable performance to other state-of-the-art supervised text classification algorithms.

preprint2015arXiv

Vector-Space Markov Random Fields via Exponential Families

We present Vector-Space Markov Random Fields (VS-MRFs), a novel class of undirected graphical models where each variable can belong to an arbitrary vector space. VS-MRFs generalize a recent line of work on scalar-valued, uni-parameter exponential family and mixed graphical models, thereby greatly broadening the class of exponential families available (e.g., allowing multinomial and Dirichlet distributions). Specifically, VS-MRFs are the joint graphical model distributions where the node-conditional distributions belong to generic exponential families with general vector space domains. We also present a sparsistent $M$-estimator for learning our class of MRFs that recovers the correct set of edges with high probability. We validate our approach via a set of synthetic data experiments as well as a real-world case study of over four million foods from the popular diet tracking app MyFitnessPal. Our results demonstrate that our algorithm performs well empirically and that VS-MRFs are capable of capturing and highlighting interesting structure in complex, real-world data. All code for our algorithm is open source and publicly available.

Oscar Hernan Madrid Padilla

What is connected

Connect this record

See the researcher in context

Building this map preview

16 published item(s)

Change point localization in dependent dynamic nonparametric random dot product graphs

Dynamic and heterogeneous treatment effects with abrupt changes

Extensions to the Proximal Distance Method of Constrained Optimization

Feature Grouping and Sparse Principal Component Analysis with Truncated Regularization

High Dimensional Latent Panel Quantile Regression with an Application to Asset Pricing

Kernel Biclustering algorithm in Hilbert Spaces

Optimal partition recovery in general graphs

Sequentially learning the topological ordering of causal directed acyclic graphs with likelihood ratio scores

Optimal network online change point localisation

Optimal nonparametric multivariate change point detection and localization

Optimal post-selection inference for sparse signals: a nonparametric empirical-Bayes approach

Nonparametric density estimation by histogram trend filtering

Sequential nonparametric tests for a change in distribution: an application to detecting radiological anomalies

Tensor decomposition with generalized lasso penalties

Priors for Random Count Matrices Derived from a Family of Negative Binomial Processes

Vector-Space Markov Random Fields via Exponential Families