Source author record

Sitan Chen

Sitan Chen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Data Structures and Algorithms Computational Complexity Cryptography and Security math.CO math.NT math.ST nlin.CG quant-ph Statistics Theory

Catalog footprint

What is connected

12works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

High-accuracy and dimension-free sampling with diffusions

Diffusion models have shown remarkable empirical success in sampling from rich multi-modal distributions. Their inference relies on numerically solving a certain differential equation. This differential equation cannot be solved in closed form, and its resolution via discretization typically requires many small iterations to produce \emph{high-quality} samples. More precisely, prior works have shown that the iteration complexity of discretization methods for diffusion models scales polynomially in the ambient dimension and the inverse accuracy $1/\varepsilon$. In this work, we propose a new solver for diffusion models relying on a subtle interplay between low-degree approximation and the collocation method (Lee, Song, Vempala 2018), and we prove that its iteration complexity scales \emph{polylogarithmically} in $1/\varepsilon$, yielding the first ``high-accuracy'' guarantee for a diffusion-based sampler that only uses (approximate) access to the scores of the data distribution. In addition, our bound does not depend explicitly on the ambient dimension; more precisely, the dimension affects the complexity of our solver through the \emph{effective radius} of the support of the target distribution only.

preprint2022arXiv

Learning (Very) Simple Generative Models Is Hard

Motivated by the recent empirical successes of deep generative models, we study the computational complexity of the following unsupervised learning problem. For an unknown neural network $F:\mathbb{R}^d\to\mathbb{R}^{d'}$, let $D$ be the distribution over $\mathbb{R}^{d'}$ given by pushing the standard Gaussian $\mathcal{N}(0,\textrm{Id}_d)$ through $F$. Given i.i.d. samples from $D$, the goal is to output any distribution close to $D$ in statistical distance. We show under the statistical query (SQ) model that no polynomial-time algorithm can solve this problem even when the output coordinates of $F$ are one-hidden-layer ReLU networks with $\log(d)$ neurons. Previously, the best lower bounds for this problem simply followed from lower bounds for supervised learning and required at least two hidden layers and $\mathrm{poly}(d)$ neurons [Daniely-Vardi '21, Chen-Gollakota-Klivans-Meka '22]. The key ingredient in our proof is an ODE-based construction of a compactly supported, piecewise-linear function $f$ with polynomially-bounded slopes such that the pushforward of $\mathcal{N}(0,1)$ under $f$ matches all low-degree moments of $\mathcal{N}(0,1)$.

preprint2022arXiv

Learning Polynomial Transformations

We consider the problem of learning high dimensional polynomial transformations of Gaussians. Given samples of the form $p(x)$, where $x\sim N(0, \mathrm{Id}_r)$ is hidden and $p: \mathbb{R}^r \to \mathbb{R}^d$ is a function where every output coordinate is a low-degree polynomial, the goal is to learn the distribution over $p(x)$. This problem is natural in its own right, but is also an important special case of learning deep generative models, namely pushforwards of Gaussians under two-layer neural networks with polynomial activations. Understanding the learnability of such generative models is crucial to understanding why they perform so well in practice. Our first main result is a polynomial-time algorithm for learning quadratic transformations of Gaussians in a smoothed setting. Our second main result is a polynomial-time algorithm for learning constant-degree polynomial transformations of Gaussian in a smoothed setting, when the rank of the associated tensors is small. In fact our results extend to any rotation-invariant input distribution, not just Gaussian. These are the first end-to-end guarantees for learning a pushforward under a neural network with more than one layer. Along the way, we also give the first polynomial-time algorithms with provable guarantees for tensor ring decomposition, a popular generalization of tensor decomposition that is used in practice to implicitly store large tensors.

preprint2022arXiv

Minimax Optimality (Probably) Doesn't Imply Distribution Learning for GANs

Arguably the most fundamental question in the theory of generative adversarial networks (GANs) is to understand to what extent GANs can actually learn the underlying distribution. Theoretical and empirical evidence suggests local optimality of the empirical training objective is insufficient. Yet, it does not rule out the possibility that achieving a true population minimax optimal solution might imply distribution learning. In this paper, we show that standard cryptographic assumptions imply that this stronger condition is still insufficient. Namely, we show that if local pseudorandom generators (PRGs) exist, then for a large family of natural continuous target distributions, there are ReLU network generators of constant depth and polynomial size which take Gaussian random seeds so that (i) the output is far in Wasserstein distance from the target distribution, but (ii) no polynomially large Lipschitz discriminator ReLU network can detect this. This implies that even achieving a population minimax optimal solution to the Wasserstein GAN objective is likely insufficient for distribution learning in the usual statistical sense. Our techniques reveal a deep connection between GANs and PRGs, which we believe will lead to further insights into the computational landscape of GANs.

preprint2022arXiv

Symmetric Sparse Boolean Matrix Factorization and Applications

In this work, we study a variant of nonnegative matrix factorization where we wish to find a symmetric factorization of a given input matrix into a sparse, Boolean matrix. Formally speaking, given $\mathbf{M}\in\mathbb{Z}^{m\times m}$, we want to find $\mathbf{W}\in\{0,1\}^{m\times r}$ such that $\| \mathbf{M} - \mathbf{W}\mathbf{W}^\top \|_0$ is minimized among all $\mathbf{W}$ for which each row is $k$-sparse. This question turns out to be closely related to a number of questions like recovering a hypergraph from its line graph, as well as reconstruction attacks for private neural network training. As this problem is hard in the worst-case, we study a natural average-case variant that arises in the context of these reconstruction attacks: $\mathbf{M} = \mathbf{W}\mathbf{W}^{\top}$ for $\mathbf{W}$ a random Boolean matrix with $k$-sparse rows, and the goal is to recover $\mathbf{W}$ up to column permutation. Equivalently, this can be thought of as recovering a uniformly random $k$-uniform hypergraph from its line graph. Our main result is a polynomial-time algorithm for this problem based on bootstrapping higher-order information about $\mathbf{W}$ and then decomposing an appropriate tensor. The key ingredient in our analysis, which may be of independent interest, is to show that such a matrix $\mathbf{W}$ has full column rank with high probability as soon as $m = \widetildeΩ(r)$, which we do using tools from Littlewood-Offord theory and estimates for binary Krawtchouk polynomials.

preprint2020arXiv

Entanglement is Necessary for Optimal Quantum Property Testing

There has been a surge of progress in recent years in developing algorithms for testing and learning quantum states that achieve optimal copy complexity. Unfortunately, they require the use of entangled measurements across many copies of the underlying state and thus remain outside the realm of what is currently experimentally feasible. A natural question is whether one can match the copy complexity of such algorithms using only independent---but possibly adaptively chosen---measurements on individual copies. We answer this in the negative for arguably the most basic quantum testing problem: deciding whether a given $d$-dimensional quantum state is equal to or $ε$-far in trace distance from the maximally mixed state. While it is known how to achieve optimal $O(d/ε^2)$ copy complexity using entangled measurements, we show that with independent measurements, $Ω(d^{4/3}/ε^2)$ is necessary, even if the measurements are chosen adaptively. This resolves a question of Wright. To obtain this lower bound, we develop several new techniques, including a chain-rule style proof of Paninski's lower bound for classical uniformity testing, which may be of independent interest.

preprint2020arXiv

Learning Polynomials of Few Relevant Dimensions

Polynomial regression is a basic primitive in learning and statistics. In its most basic form the goal is to fit a degree $d$ polynomial to a response variable $y$ in terms of an $n$-dimensional input vector $x$. This is extremely well-studied with many applications and has sample and runtime complexity $Θ(n^d)$. Can one achieve better runtime if the intrinsic dimension of the data is much smaller than the ambient dimension $n$? Concretely, we are given samples $(x,y)$ where $y$ is a degree at most $d$ polynomial in an unknown $r$-dimensional projection (the relevant dimensions) of $x$. This can be seen both as a generalization of phase retrieval and as a special case of learning multi-index models where the link function is an unknown low-degree polynomial. Note that without distributional assumptions, this is at least as hard as junta learning. In this work we consider the important case where the covariates are Gaussian. We give an algorithm that learns the polynomial within accuracy $ε$ with sample complexity that is roughly $N = O_{r,d}(n \log^2(1/ε) (\log n)^d)$ and runtime $O_{r,d}(N n^2)$. Prior to our work, no such results were known even for the case of $r=1$. We introduce a new filtered PCA approach to get a warm start for the true subspace and use geodesic SGD to boost to arbitrary accuracy; our techniques may be of independent interest, especially for problems dealing with subspace recovery or analyzing SGD on manifolds.

preprint2020arXiv

Learning Structured Distributions From Untrusted Batches: Faster and Simpler

We revisit the problem of learning from untrusted batches introduced by Qiao and Valiant [QV17]. Recently, Jain and Orlitsky [JO19] gave a simple semidefinite programming approach based on the cut-norm that achieves essentially information-theoretically optimal error in polynomial time. Concurrently, Chen et al. [CLM19] considered a variant of the problem where $μ$ is assumed to be structured, e.g. log-concave, monotone hazard rate, $t$-modal, etc. In this case, it is possible to achieve the same error with sample complexity sublinear in $n$, and they exhibited a quasi-polynomial time algorithm for doing so using Haar wavelets. In this paper, we find an appealing way to synthesize the techniques of [JO19] and [CLM19] to give the best of both worlds: an algorithm which runs in polynomial time and can exploit structure in the underlying distribution to achieve sublinear sample complexity. Along the way, we simplify the approach of [JO19] by avoiding the need for SDP rounding and giving a more direct interpretation of it through the lens of soft filtering, a powerful recent technique in high-dimensional robust estimation. We validate the usefulness of our algorithms in preliminary experimental evaluations.

preprint2015arXiv

Basis Collapse for Holographic Algorithms Over All Domain Sizes

The theory of holographic algorithms introduced by Valiant represents a novel approach to achieving polynomial-time algorithms for seemingly intractable counting problems via a reduction to counting planar perfect matchings and a linear change of basis. Two fundamental parameters in holographic algorithms are the \emph{domain size} and the \emph{basis size}. Roughly, the domain size is the range of colors involved in the counting problem at hand (e.g. counting graph $k$-colorings is a problem over domain size $k$), while the basis size $\ell$ captures the dimensionality of the representation of those colors. A major open problem has been: for a given $k$, what is the smallest $\ell$ for which any holographic algorithm for a problem over domain size $k$ "collapses to" (can be simulated by) a holographic algorithm with basis size $\ell$? Cai and Lu showed in 2008 that over domain size 2, basis size 1 suffices, opening the door to an extensive line of work on the structural theory of holographic algorithms over the Boolean domain. Cai and Fu later showed for signatures of full rank that over domain sizes 3 and 4, basis sizes 1 and 2, respectively, suffice, and they conjectured that over domain size $k$ there is a collapse to basis size $\lfloor\log_2 k\rfloor$. In this work, we resolve this conjecture in the affirmative for signatures of full rank for all $k$.

preprint2015arXiv

Pseudorandomness for Read-Once, Constant-Depth Circuits

For Boolean functions computed by read-once, depth-$D$ circuits with unbounded fan-in over the de Morgan basis, we present an explicit pseudorandom generator with seed length $\tilde{O}(\log^{D+1} n)$. The previous best seed length known for this model was $\tilde{O}(\log^{D+4} n)$, obtained by Trevisan and Xue (CCC `13) for all of $AC^0$ (not just read-once). Our work makes use of Fourier analytic techniques for pseudorandomness introduced by Reingold, Steinke, and Vadhan (RANDOM `13) to show that the generator of Gopalan et al. (FOCS `12) fools read-once $AC^0$. To this end, we prove a new Fourier growth bound for read-once circuits, namely that for every $F: \{0,1\}^n\to\{0,1\}$ computed by a read-once, depth-$D$ circuit, \begin{equation*}\sum_{s\subseteq[n], |s|=k}|\hat{F}[s]|\le O(\log^{D-1}n)^k,\end{equation*} where $\hat{F}$ denotes the Fourier transform of $F$ over $\mathbb{Z}^n_2$.

preprint2013arXiv

Cellular Automata to More Efficiently Compute the Collatz Map

The Collatz, or 3x+1, Conjecture claims that for every positive integer n, there exists some k such that T^k(n)=1, where T is the Collatz map. We present three cellular automata (CA) that transform the global problem of mimicking the Collatz map in bases 2, 3, and 4 into a local one of transforming the digits of iterates. The CAs streamline computation first by bypassing calculation of certain parts of trajectories: the binary CA bypasses division by two altogether. In addition, they allow for multiple trajectories to be calculated simultaneously, representing both a significant improvement upon existing sequential methods of computing the Collatz map and a demonstration of the efficacy of using a massively parallel approach with cellular automata to tackle iterative problems like the Collatz Conjecture.

preprint2013arXiv

On the Rank Number of Grid Graphs

A vertex k-ranking is a labeling of the vertices of a graph with integers from 1 to k so any path connecting two vertices with the same label will pass through a vertex with a greater label. The rank number of a graph is defined to be the minimum possible k for which a k-ranking exists for that graph. For mxn grid graphs, the rank number has been found only for m<4. In this paper, we determine its for m=4 and improve its upper bound for general grids. Furthermore, we improve lower bounds on the rank numbers for square and triangle grid graphs from logarithmic to linear. These new lower bounds are key to characterizing the rank number for general grids, and our results have applications in optimizing VLSI circuit design and parallel processing, search, and scheduling.

Sitan Chen

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

High-accuracy and dimension-free sampling with diffusions

Learning (Very) Simple Generative Models Is Hard

Learning Polynomial Transformations

Minimax Optimality (Probably) Doesn't Imply Distribution Learning for GANs

Symmetric Sparse Boolean Matrix Factorization and Applications

Entanglement is Necessary for Optimal Quantum Property Testing

Learning Polynomials of Few Relevant Dimensions

Learning Structured Distributions From Untrusted Batches: Faster and Simpler

Basis Collapse for Holographic Algorithms Over All Domain Sizes

Pseudorandomness for Read-Once, Constant-Depth Circuits

Cellular Automata to More Efficiently Compute the Collatz Map

On the Rank Number of Grid Graphs