Source author record

Simon Coste

Simon Coste appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.PR Computer Vision math.ST Statistics Theory

Catalog footprint

What is connected

3works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Wavelet Score-Based Generative Modeling

Score-based generative models (SGMs) synthesize new data samples from Gaussian white noise by running a time-reversed Stochastic Differential Equation (SDE) whose drift coefficient depends on some probabilistic score. The discretization of such SDEs typically requires a large number of time steps and hence a high computational cost. This is because of ill-conditioning properties of the score that we analyze mathematically. We show that SGMs can be considerably accelerated, by factorizing the data distribution into a product of conditional probabilities of wavelet coefficients across scales. The resulting Wavelet Score-based Generative Model (WSGM) synthesizes wavelet coefficients with the same number of time steps at all scales, and its time complexity therefore grows linearly with the image size. This is proved mathematically over Gaussian distributions, and shown numerically over physical processes at phase transition and natural image datasets.

preprint2021arXiv

A simpler spectral approach for clustering in directed networks

We study the task of clustering in directed networks. We show that using the eigenvalue/eigenvector decomposition of the adjacency matrix is simpler than all common methods which are based on a combination of data regularization and SVD truncation, and works well down to the very sparse regime where the edge density has constant order. Our analysis is based on a Master Theorem describing sharp asymptotics for isolated eigenvalues/eigenvectors of sparse, non-symmetric matrices with independent entries. We also describe the limiting distribution of the entries of these eigenvectors; in the task of digraph clustering with spectral embeddings, we provide numerical evidence for the superiority of Gaussian Mixture clustering over the widely used k-means algorithm.

preprint2020arXiv

Detection thresholds in very sparse matrix completion

Let $A$ be a rectangular matrix of size $m\times n$ and $A_1$ be the random matrix where each entry of $A$ is multiplied by an independent $\{0,1\}$-Bernoulli random variable with parameter $1/2$. This paper is about when, how and why the non-Hermitian eigen-spectra of the randomly induced asymmetric matrices $A_1 (A - A_1)^*$ and $(A-A_1)^*A_1$ captures more of the relevant information about the principal component structure of $A$ than via its SVD or the eigen-spectra of $A A^*$ and $A^* A$, respectively. Hint: the asymmetry inducing randomness breaks the echo-chamber effect that cripples the SVD. We illustrate the application of this striking phenomenon on the low-rank matrix completion problem for the setting where each entry is observed with probability $d/n$, including the very sparse regime where $d$ is of order $1$, where matrix completion via the SVD of $A$ fails or produces unreliable recovery. We determine an asymptotically exact, matrix-dependent, non-universal detection threshold above which reliable, statistically optimal matrix recovery using a new, universal data-driven matrix-completion algorithm is possible. Averaging the left and right eigenvectors provably improves the recovered matrix but not the detection threshold. We define another variant of this asymmetric procedure that bypasses the randomization step and has a detection threshold that is smaller by a constant factor but with a computational cost that is larger by a polynomial factor of the number of observed entries. Both detection thresholds shatter the seeming barrier due to the well-known information theoretical limit $d \asymp \log n$ for matrix completion found in the literature.