Source author record

Dogyoon Song

Dogyoon Song appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.ST Statistics Theory math.OC

Catalog footprint

What is connected

5works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

Local Minima Structures in Gaussian Mixture Models

We investigate the landscape of the negative log-likelihood function of Gaussian Mixture Models (GMMs) with a general number of components in the population limit. As the objective function is non-convex, there can be multiple local minima that are not globally optimal, even for well-separated mixture models. Our study reveals that all local minima share a common structure that partially identifies the cluster centers (i.e., means of the Gaussian components) of the true location mixture. Specifically, each local minimum can be represented as a non-overlapping combination of two types of sub-configurations: fitting a single mean estimate to multiple Gaussian components or fitting multiple estimates to a single true component. These results apply to settings where the true mixture components satisfy a certain separation condition, and are valid even when the number of components is over- or under-specified. We also present a more fine-grained analysis for the setting of one-dimensional GMMs with three components, which provide sharper approximation error bounds with improved dependence on the separation.

preprint2021arXiv

On Approximations of the PSD Cone by a Polynomial Number of Smaller-sized PSD Cones

We study the problem of approximating the cone of positive semidefinite (PSD) matrices with a cone that can be described by smaller-sized PSD constraints. Specifically, we ask the question: "how closely can we approximate the set of unit-trace $n \times n$ PSD matrices, denoted by $D$, using at most $N$ number of $k \times k$ PSD constraints?" In this paper, we prove lower bounds on $N$ to achieve a good approximation of $D$ by considering two constructions of an approximating set. First, we consider the unit-trace $n \times n$ symmetric matrices that are PSD when restricted to a fixed set of $k$-dimensional subspaces in $\mathbb{RR}^n$. We prove that if this set is a good approximation of $D$, then the number of subspaces must be at least exponentially large in $n$ for any $k = o(n)$. % Second, we show that any set $S$ that approximates $D$ within a constant approximation ratio must have superpolynomial $\mathbf{S}_+^k$-extension complexity. To be more precise, if $S$ is a constant factor approximation of $D$, then $S$ must have $\mathbf{S}_+^k$-extension complexity at least $\exp( C \cdot \min \{ \sqrt{n}, n/k \})$ where $C$ is some absolute constant. In addition, we show that any set $S$ such that $D \subseteq S$ and the Gaussian width of $D$ is at most a constant times larger than the Gaussian width of $D$ must have $\mathbf{S}_+^k$-extension complexity at least $\exp( C \cdot \min \{ n^{1/3}, \sqrt{n/k} \})$. These results imply that the cone of $n \times n$ PSD matrices cannot be approximated by a polynomial number of $k \times k$ PSD constraints for any $k = o(n / \log^2 n)$. These results generalize the recent work of Fawzi on the hardness of polyhedral approximations of $\mathbf{S}_+^n$, which corresponds to the special case with $k=1$.

preprint2020arXiv

Deconvolution with Unknown Error Distribution Interpreted as Blind Isotonic Regression

Deconvolution is a statistical inverse problem to estimate the distribution of a random variable based on its noisy observations. Despite the extensive studies on the topic, deconvolution with unknown noise distribution remains as a notoriously hard problem. We propose a matrix-based viewpoint for collective deconvolution that subsumes the setup with repeated measurements as a special case. As the main result, we describe a simple algorithm that partially utilizes matrix structure to solve deconvolution problem and provide non-asymptotic error analysis for the algorithm. We show that the proposed algorithm achieves the minimax optimal rate for deconvolution in a restricted sense. We also remark the connection between the collective deconvolution and the so-called statistical seriation as a byproduct or our matrix viewpoint. We conjecture that the link suggests that collective deconvolution, as well as deconvolution with repeated measurements, is intrinsically much easier than usual deconvolution of a single distribution.

preprint2020arXiv

Learning RUMs: Reducing Mixture to Single Component via PCA

We consider the problem of learning a mixture of Random Utility Models (RUMs). Despite the success of RUMs in various domains and the versatility of mixture RUMs to capture the heterogeneity in preferences, there has been only limited progress in learning a mixture of RUMs from partial data such as pairwise comparisons. In contrast, there have been significant advances in terms of learning a single component RUM using pairwise comparisons. In this paper, we aim to bridge this gap between mixture learning and single component learning of RUM by developing a `reduction' procedure. We propose to utilize PCA-based spectral clustering that simultaneously `de-noises' pairwise comparison data. We prove that our algorithm manages to cluster the partial data correctly (i.e., comparisons from the same RUM component are grouped in the same cluster) with high probability even when data is generated from a possibly {\em heterogeneous} mixture of well-separated {\em generic} RUMs. Both the time and the sample complexities scale polynomially in model parameters including the number of items. Two key features in the analysis are in establishing (1) a meaningful upper bound on the sub-Gaussian norm for RUM components embedded into the vector space of pairwise marginals and (2) the robustness of PCA with missing values in the $L_{2, \infty}$ sense, which might be of interest in their own right.

preprint2020arXiv

Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation

We consider the question of learning $Q$-function in a sample efficient manner for reinforcement learning with continuous state and action spaces under a generative model. If $Q$-function is Lipschitz continuous, then the minimal sample complexity for estimating $ε$-optimal $Q$-function is known to scale as $Ω(\frac{1}{ε^{d_1+d_2 +2}})$ per classical non-parametric learning theory, where $d_1$ and $d_2$ denote the dimensions of the state and action spaces respectively. The $Q$-function, when viewed as a kernel, induces a Hilbert-Schmidt operator and hence possesses square-summable spectrum. This motivates us to consider a parametric class of $Q$-functions parameterized by its "rank" $r$, which contains all Lipschitz $Q$-functions as $r \to \infty$. As our key contribution, we develop a simple, iterative learning algorithm that finds $ε$-optimal $Q$-function with sample complexity of $\widetilde{O}(\frac{1}{ε^{\max(d_1, d_2)+2}})$ when the optimal $Q$-function has low rank $r$ and the discounting factor $γ$ is below a certain threshold. Thus, this provides an exponential improvement in sample complexity. To enable our result, we develop a novel Matrix Estimation algorithm that faithfully estimates an unknown low-rank matrix in the $\ell_\infty$ sense even in the presence of arbitrary bounded noise, which might be of interest in its own right. Empirical results on several stochastic control tasks confirm the efficacy of our "low-rank" algorithms.

Dogyoon Song

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Local Minima Structures in Gaussian Mixture Models

On Approximations of the PSD Cone by a Polynomial Number of Smaller-sized PSD Cones

Deconvolution with Unknown Error Distribution Interpreted as Blind Isotonic Regression

Learning RUMs: Reducing Mixture to Single Component via PCA

Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation