Source author record

Zhao Song

Zhao Song appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Data Structures and Algorithms Computational Complexity Artificial Intelligence Computation and Language Cryptography and Security Information Theory math.IT Discrete Mathematics eess.AS Sound math.CO math.FA math.OC math.PR physics.soc-ph Populations and Evolution quant-ph Social and Information Networks

Catalog footprint

What is connected

41works

19topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Strategic commitments shape collective cybersecurity under AI inequality

The growing integration of AI into cybersecurity is reshaping the balance between attackers and defenders. When access to advanced AI-enabled defence tools is uneven, resource-limited defenders may be unable to adopt effective protection, creating persistent system vulnerabilities. We study the impact of differential AI access using an evolutionary game-theoretic model in a finite population. We first show that when high-capability defence is costly, the population is driven toward low-cost, weak-defence behaviour, sustaining attacks and weakening long-run security. To address this problem, we introduce differential access to AI defence tools by allowing defenders to choose between low- and high-capability protection based on their resources. We then examine the role of a small group of committed defenders who always adopt strong defence and influence others through social learning. Although commitment increases the prevalence of strong defence, it alone cannot stabilise secure outcomes due to high defence costs. We therefore incorporate a targeted subsidy to remove the cost disadvantage from committed defenders. Our analysis shows that subsidised commitment significantly increases strong defence adoption, suppresses successful attacks, and improves overall system resilience. Simulations across a broad parameter space confirm that subsidies consistently outperform commitment alone. In addition, social-welfare analysis shows improved defender outcomes while keeping attacker gains low. These findings suggest that targeted support for key defenders can be an effective mechanism for stabilising cybersecurity in AI-driven environments and provide a theoretical bridge between cybersecurity policy, AI governance, and strategic allocation of defensive AI capabilities.

preprint2023arXiv

Dynamic Tensor Product Regression

In this work, we initiate the study of \emph{Dynamic Tensor Product Regression}. One has matrices $A_1\in \mathbb{R}^{n_1\times d_1},\ldots,A_q\in \mathbb{R}^{n_q\times d_q}$ and a label vector $b\in \mathbb{R}^{n_1\ldots n_q}$, and the goal is to solve the regression problem with the design matrix $A$ being the tensor product of the matrices $A_1, A_2, \dots, A_q$ i.e. $\min_{x\in \mathbb{R}^{d_1\ldots d_q}}~\|(A_1\otimes \ldots\otimes A_q)x-b\|_2$. At each time step, one matrix $A_i$ receives a sparse change, and the goal is to maintain a sketch of the tensor product $A_1\otimes\ldots \otimes A_q$ so that the regression solution can be updated quickly. Recomputing the solution from scratch for each round is very slow and so it is important to develop algorithms which can quickly update the solution with the new design matrix. Our main result is a dynamic tree data structure where any update to a single matrix can be propagated quickly throughout the tree. We show that our data structure can be used to solve dynamic versions of not only Tensor Product Regression, but also Tensor Product Spline regression (which is a generalization of ridge regression) and for maintaining Low Rank Approximations for the tensor product.

preprint2023arXiv

Faster Sinkhorn's Algorithm with Small Treewidth

Computing optimal transport (OT) distances such as the earth mover's distance is a fundamental problem in machine learning, statistics, and computer vision. In this paper, we study the problem of approximating the general OT distance between two discrete distributions of size $n$. Given the cost matrix $C=AA^\top$ where $A \in \mathbb{R}^{n \times d}$, we proposed a faster Sinkhorn's Algorithm to approximate the OT distance when matrix $A$ has treewidth $τ$. To approximate the OT distance, our algorithm improves the state-of-the-art results [Dvurechensky, Gasnikov, and Kroshnin ICML 2018] from $\widetilde{O}(ε^{-2} n^2)$ time to $\widetilde{O}(ε^{-2} n τ)$ time.

preprint2023arXiv

Online Adaptive Mahalanobis Distance Estimation

Mahalanobis metrics are widely used in machine learning in conjunction with methods like $k$-nearest neighbors, $k$-means clustering, and $k$-medians clustering. Despite their importance, there has not been any prior work on applying sketching techniques to speed up algorithms for Mahalanobis metrics. In this paper, we initiate the study of dimension reduction for Mahalanobis metrics. In particular, we provide efficient data structures for solving the Approximate Distance Estimation (ADE) problem for Mahalanobis distances. We first provide a randomized Monte Carlo data structure. Then, we show how we can adapt it to provide our main data structure which can handle sequences of \textit{adaptive} queries and also online updates to both the Mahalanobis metric matrix and the data points, making it amenable to be used in conjunction with prior algorithms for online learning of Mahalanobis metrics.

preprint2023arXiv

Smoothed Online Combinatorial Optimization Using Imperfect Predictions

Smoothed online combinatorial optimization considers a learner who repeatedly chooses a combinatorial decision to minimize an unknown changing cost function with a penalty on switching decisions in consecutive rounds. We study smoothed online combinatorial optimization problems when an imperfect predictive model is available, where the model can forecast the future cost functions with uncertainty. We show that using predictions to plan for a finite time horizon leads to regret dependent on the total predictive uncertainty and an additional switching cost. This observation suggests choosing a suitable planning window to balance between uncertainty and switching cost, which leads to an online algorithm with guarantees on the upper and lower bounds of the cumulative regret. Empirically, our algorithm shows a significant improvement in cumulative regret compared to other baselines in synthetic online distributed streaming problems.

preprint2022arXiv

A Sublinear Adversarial Training Algorithm

Adversarial training is a widely used strategy for making neural networks resistant to adversarial perturbations. For a neural network of width $m$, $n$ input training data in $d$ dimension, it takes $Ω(mnd)$ time cost per training iteration for the forward and backward computation. In this paper we analyze the convergence guarantee of adversarial training procedure on a two-layer neural network with shifted ReLU activation, and shows that only $o(m)$ neurons will be activated for each input data per iteration. Furthermore, we develop an algorithm for adversarial training with time cost $o(m n d)$ per iteration by applying half-space reporting data structure.

preprint2022arXiv

Accelerating Frank-Wolfe Algorithm using Low-Dimensional and Adaptive Data Structures

In this paper, we study the problem of speeding up a type of optimization algorithms called Frank-Wolfe, a conditional gradient method. We develop and employ two novel inner product search data structures, improving the prior fastest algorithm in [Shrivastava, Song and Xu, NeurIPS 2021]. * The first data structure uses low-dimensional random projection to reduce the problem to a lower dimension, then uses efficient inner product data structure. It has preprocessing time $\tilde O(nd^{ω-1}+dn^{1+o(1)})$ and per iteration cost $\tilde O(d+n^ρ)$ for small constant $ρ$. * The second data structure leverages the recent development in adaptive inner product search data structure that can output estimations to all inner products. It has preprocessing time $\tilde O(nd)$ and per iteration cost $\tilde O(d+n)$. The first algorithm improves the state-of-the-art (with preprocessing time $\tilde O(d^2n^{1+o(1)})$ and per iteration cost $\tilde O(dn^ρ)$) in all cases, while the second one provides an even faster preprocessing time and is suitable when the number of iterations is small.

preprint2022arXiv

An $O(k\log n)$ Time Fourier Set Query Algorithm

Fourier transformation is an extensively studied problem in many research fields. It has many applications in machine learning, signal processing, compressed sensing, and so on. In many real-world applications, approximated Fourier transformation is sufficient and we only need to do the Fourier transform on a subset of coordinates. Given a vector $x \in \mathbb{C}^{n}$, an approximation parameter $ε$ and a query set $S \subset [n]$ of size $k$, we propose an algorithm to compute an approximate Fourier transform result $x'$ which uses $O(ε^{-1} k \log(n/δ))$ Fourier measurements, runs in $O(ε^{-1} k \log(n/δ))$ time and outputs a vector $x'$ such that $\| ( x' - \widehat{x} )_S \|_2^2 \leq ε\| \widehat{x}_{\bar{S}} \|_2^2 + δ\| \widehat{x} \|_1^2 $ holds with probability of at least $9/10$.

preprint2022arXiv

An improved quantum-inspired algorithm for linear regression

We give a classical algorithm for linear regression analogous to the quantum matrix inversion algorithm [Harrow, Hassidim, and Lloyd, Physical Review Letters'09, arXiv:0811.3171] for low-rank matrices [Wossnig, Zhao, and Prakash, Physical Review Letters'18, arXiv:1704.06174], when the input matrix $A$ is stored in a data structure applicable for QRAM-based state preparation. Namely, suppose we are given an $A \in \mathbb{C}^{m\times n}$ with minimum non-zero singular value $σ$ which supports certain efficient $\ell_2$-norm importance sampling queries, along with a $b \in \mathbb{C}^m$. Then, for some $x \in \mathbb{C}^n$ satisfying $\|x - A^+b\| \leq \varepsilon\|A^+b\|$, we can output a measurement of $|x\rangle$ in the computational basis and output an entry of $x$ with classical algorithms that run in $\tilde{\mathcal{O}}\big(\frac{\|A\|_{\mathrm{F}}^6\|A\|^6}{σ^{12}\varepsilon^4}\big)$ and $\tilde{\mathcal{O}}\big(\frac{\|A\|_{\mathrm{F}}^6\|A\|^2}{σ^8\varepsilon^4}\big)$ time, respectively. This improves on previous "quantum-inspired" algorithms in this line of research by at least a factor of $\frac{\|A\|^{16}}{σ^{16}\varepsilon^2}$ [Chia, Gilyén, Li, Lin, Tang, and Wang, STOC'20, arXiv:1910.06151]. As a consequence, we show that quantum computers can achieve at most a factor-of-12 speedup for linear regression in this QRAM data structure setting and related settings. Our work applies techniques from sketching algorithms and optimization to the quantum-inspired literature. Unlike earlier works, this is a promising avenue that could lead to feasible implementations of classical regression in a quantum-inspired settings, for comparison against future quantum computers.

preprint2022arXiv

Bounding the Width of Neural Networks via Coupled Initialization -- A Worst Case Analysis

A common method in training neural networks is to initialize all the weights to be independent Gaussian vectors. We observe that by instead initializing the weights into independent pairs, where each pair consists of two identical Gaussian vectors, we can significantly improve the convergence analysis. While a similar technique has been studied for random inputs [Daniely, NeurIPS 2020], it has not been analyzed with arbitrary inputs. Using this technique, we show how to significantly reduce the number of neurons required for two-layer ReLU networks, both in the under-parameterized setting with logistic loss, from roughly $γ^{-8}$ [Ji and Telgarsky, ICLR 2020] to $γ^{-2}$, where $γ$ denotes the separation margin with a Neural Tangent Kernel, as well as in the over-parameterized setting with squared loss, from roughly $n^4$ [Song and Yang, 2019] to $n^2$, implicitly also improving the recent running time bound of [Brand, Peng, Song and Weinstein, ITCS 2021]. For the under-parameterized setting we also prove new lower bounds that improve upon prior work, and that under certain assumptions, are best possible.

preprint2022arXiv

Fast Distance Oracles for Any Symmetric Norm

In the Distance Oracle problem, the goal is to preprocess $n$ vectors $x_1, x_2, \cdots, x_n$ in a $d$-dimensional metric space $(\mathbb{X}^d, \| \cdot \|_l)$ into a cheap data structure, so that given a query vector $q \in \mathbb{X}^d$ and a subset $S\subseteq [n]$ of the input data points, all distances $\| q - x_i \|_l$ for $x_i\in S$ can be quickly approximated (faster than the trivial $\sim d|S|$ query time). This primitive is a basic subroutine in machine learning, data mining and similarity search applications. In the case of $\ell_p$ norms, the problem is well understood, and optimal data structures are known for most values of $p$. Our main contribution is a fast $(1+\varepsilon)$ distance oracle for any symmetric norm $\|\cdot\|_l$. This class includes $\ell_p$ norms and Orlicz norms as special cases, as well as other norms used in practice, e.g. top-$k$ norms, max-mixture and sum-mixture of $\ell_p$ norms, small-support norms and the box-norm. We propose a novel data structure with $\tilde{O}(n (d + \mathrm{mmc}(l)^2 ) )$ preprocessing time and space, and $t_q = \tilde{O}(d + |S| \cdot \mathrm{mmc}(l)^2)$ query time, for computing distances to a subset $S$ of data points, where $\mathrm{mmc}(l)$ is a complexity-measure (concentration modulus) of the symmetric norm. When $l = \ell_{p}$ , this runtime matches the aforementioned state-of-art oracles.

preprint2022arXiv

Federated Adversarial Learning: A Framework with Convergence Analysis

Federated learning (FL) is a trending training paradigm to utilize decentralized training data. FL allows clients to update model parameters locally for several epochs, then share them to a global model for aggregation. This training paradigm with multi-local step updating before aggregation exposes unique vulnerabilities to adversarial attacks. Adversarial training is a popular and effective method to improve the robustness of networks against adversaries. In this work, we formulate a general form of federated adversarial learning (FAL) that is adapted from adversarial learning in the centralized setting. On the client side of FL training, FAL has an inner loop to generate adversarial samples for adversarial training and an outer loop to update local model parameters. On the server side, FAL aggregates local model updates and broadcast the aggregated model. We design a global robust training loss and formulate FAL training as a min-max optimization problem. Unlike the convergence analysis in classical centralized training that relies on the gradient direction, it is significantly harder to analyze the convergence in FAL for three reasons: 1) the complexity of min-max optimization, 2) model not updating in the gradient direction due to the multi-local updates on the client-side before aggregation and 3) inter-client heterogeneity. We address these challenges by using appropriate gradient approximation and coupling techniques and present the convergence analysis in the over-parameterized regime. Our main result theoretically shows that the minimum loss under our algorithm can converge to $ε$ small with chosen learning rate and communication rounds. It is noteworthy that our analysis is feasible for non-IID clients.

preprint2022arXiv

Hyperbolic Concentration, Anti-concentration, and Discrepancy

Chernoff bound is a fundamental tool in theoretical computer science. It has been extensively used in randomized algorithm design and stochastic type analysis. Discrepancy theory, which deals with finding a bi-coloring of a set system such that the coloring of each set is balanced, has a huge number of applications in approximation algorithms design. Chernoff bound [Che52] implies that a random bi-coloring of any set system with $n$ sets and $n$ elements will have discrepancy $O(\sqrt{n \log n})$ with high probability, while the famous result by Spencer [Spe85] shows that there exists an $O(\sqrt{n})$ discrepancy solution. The study of hyperbolic polynomials dates back to the early 20th century when used to solve PDEs by Gårding [Går59]. In recent years, more applications are found in control theory, optimization, real algebraic geometry, and so on. In particular, the breakthrough result by Marcus, Spielman, and Srivastava [MSS15] uses the theory of hyperbolic polynomials to prove the Kadison-Singer conjecture [KS59], which is closely related to discrepancy theory. In this paper, we present a list of new results for hyperbolic polynomials: * We show two nearly optimal hyperbolic Chernoff bounds: one for Rademacher sum of arbitrary vectors and another for random vectors in the hyperbolic cone. * We show a hyperbolic anti-concentration bound. * We generalize the hyperbolic Kadison-Singer theorem [Brä18] for vectors in sub-isotropic position, and prove a hyperbolic Spencer theorem for any constant hyperbolic rank vectors. The classical matrix Chernoff and discrepancy results are based on determinant polynomial. To the best of our knowledge, this paper is the first work that shows either concentration or anti-concentration results for hyperbolic polynomials. We hope our findings provide more insights into hyperbolic and discrepancy theories.

preprint2022arXiv

Perfectly Balanced: Improving Transfer and Robustness of Supervised Contrastive Learning

An ideal learned representation should display transferability and robustness. Supervised contrastive learning (SupCon) is a promising method for training accurate models, but produces representations that do not capture these properties due to class collapse -- when all points in a class map to the same representation. Recent work suggests that "spreading out" these representations improves them, but the precise mechanism is poorly understood. We argue that creating spread alone is insufficient for better representations, since spread is invariant to permutations within classes. Instead, both the correct degree of spread and a mechanism for breaking this invariance are necessary. We first prove that adding a weighted class-conditional InfoNCE loss to SupCon controls the degree of spread. Next, we study three mechanisms to break permutation invariance: using a constrained encoder, adding a class-conditional autoencoder, and using data augmentation. We show that the latter two encourage clustering of latent subclasses under more realistic conditions than the former. Using these insights, we show that adding a properly-weighted class-conditional InfoNCE loss and a class-conditional autoencoder to SupCon achieves 11.1 points of lift on coarse-to-fine transfer across 5 standard datasets and 4.7 points on worst-group robustness on 3 datasets, setting state-of-the-art on CelebA by 11.5 points.

preprint2022arXiv

Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models

Overparameterized neural networks generalize well but are expensive to train. Ideally, one would like to reduce their computational cost while retaining their generalization benefits. Sparse model training is a simple and promising approach to achieve this, but there remain challenges as existing methods struggle with accuracy loss, slow training runtime, or difficulty in sparsifying all model components. The core problem is that searching for a sparsity mask over a discrete set of sparse matrices is difficult and expensive. To address this, our main insight is to optimize over a continuous superset of sparse matrices with a fixed structure known as products of butterfly matrices. As butterfly matrices are not hardware efficient, we propose simple variants of butterfly (block and flat) to take advantage of modern hardware. Our method (Pixelated Butterfly) uses a simple fixed sparsity pattern based on flat block butterfly and low-rank matrices to sparsify most network layers (e.g., attention, MLP). We empirically validate that Pixelated Butterfly is 3x faster than butterfly and speeds up training to achieve favorable accuracy--efficiency tradeoffs. On the ImageNet classification and WikiText-103 language modeling tasks, our sparse models train up to 2.5x faster than the dense MLP-Mixer, Vision Transformer, and GPT-2 medium with no drop in accuracy.

preprint2022arXiv

Speeding Up Sparsification using Inner Product Search Data Structures

We present a general framework that utilizes different efficient data structures to improve various sparsification problems involving an iterative process. We also provide insights and characterization for different iterative process, and answer that when should we use which data structures in what type of problem. We obtain improved running time for the following problems. * For constructing linear-sized spectral sparsifier (Batson, Spielman and Srivastava, 2012), all the existing deterministic algorithms require $Ω(d^4)$ time. In this work, we provide the first deterministic algorithm that breaks that barrier which runs in $O(d^{ω+1})$ time, where $ω$ is the exponent of matrix multiplication. * For one-sided Kadison-Singer-typed discrepancy problem, we give fast algorithms for both small and large number of iterations. * For experimental design problem, we speed up a key swapping process. In the heart of our work is the design of a variety of different inner product search data structures that have efficient initialization, query and update time, compatible to dimensionality reduction and robust against adaptive adversary.

preprint2022arXiv

Symmetric Sparse Boolean Matrix Factorization and Applications

In this work, we study a variant of nonnegative matrix factorization where we wish to find a symmetric factorization of a given input matrix into a sparse, Boolean matrix. Formally speaking, given $\mathbf{M}\in\mathbb{Z}^{m\times m}$, we want to find $\mathbf{W}\in\{0,1\}^{m\times r}$ such that $\| \mathbf{M} - \mathbf{W}\mathbf{W}^\top \|_0$ is minimized among all $\mathbf{W}$ for which each row is $k$-sparse. This question turns out to be closely related to a number of questions like recovering a hypergraph from its line graph, as well as reconstruction attacks for private neural network training. As this problem is hard in the worst-case, we study a natural average-case variant that arises in the context of these reconstruction attacks: $\mathbf{M} = \mathbf{W}\mathbf{W}^{\top}$ for $\mathbf{W}$ a random Boolean matrix with $k$-sparse rows, and the goal is to recover $\mathbf{W}$ up to column permutation. Equivalently, this can be thought of as recovering a uniformly random $k$-uniform hypergraph from its line graph. Our main result is a polynomial-time algorithm for this problem based on bootstrapping higher-order information about $\mathbf{W}$ and then decomposing an appropriate tensor. The key ingredient in our analysis, which may be of independent interest, is to show that such a matrix $\mathbf{W}$ has full column rank with high probability as soon as $m = \widetildeΩ(r)$, which we do using tools from Littlewood-Offord theory and estimates for binary Krawtchouk polynomials.

preprint2021arXiv

InstaHide: Instance-hiding Schemes for Private Distributed Learning

How can multiple distributed entities collaboratively train a shared deep net on their private data while preserving privacy? This paper introduces InstaHide, a simple encryption of training images, which can be plugged into existing distributed deep learning pipelines. The encryption is efficient and applying it during training has minor effect on test accuracy. InstaHide encrypts each training image with a "one-time secret key" which consists of mixing a number of randomly chosen images and applying a random pixel-wise mask. Other contributions of this paper include: (a) Using a large public dataset (e.g. ImageNet) for mixing during its encryption, which improves security. (b) Experimental results to show effectiveness in preserving privacy against known attacks with only minor effects on accuracy. (c) Theoretical analysis showing that successfully attacking privacy requires attackers to solve a difficult computational problem. (d) Demonstrating that use of the pixel-wise mask is important for security, since Mixup alone is shown to be insecure to some some efficient attacks. (e) Release of a challenge dataset https://github.com/Hazelsuko07/InstaHide_Challenge Our code is available at https://github.com/Hazelsuko07/InstaHide

preprint2021arXiv

Near-Optimal Two-Pass Streaming Algorithm for Sampling Random Walks over Directed Graphs

For a directed graph $G$ with $n$ vertices and a start vertex $u_{\sf start}$, we wish to (approximately) sample an $L$-step random walk over $G$ starting from $u_{\sf start}$ with minimum space using an algorithm that only makes few passes over the edges of the graph. This problem found many applications, for instance, in approximating the PageRank of a webpage. If only a single pass is allowed, the space complexity of this problem was shown to be $\tildeΘ(n \cdot L)$. Prior to our work, a better space complexity was only known with $\tilde{O}(\sqrt{L})$ passes. We settle the space complexity of this random walk simulation problem for two-pass streaming algorithms, showing that it is $\tildeΘ(n \cdot \sqrt{L})$, by giving almost matching upper and lower bounds. Our lower bound argument extends to every constant number of passes $p$, and shows that any $p$-pass algorithm for this problem uses $\tildeΩ(n \cdot L^{1/p})$ space. In addition, we show a similar $\tildeΘ(n \cdot \sqrt{L})$ bound on the space complexity of any algorithm (with any number of passes) for the related problem of sampling an $L$-step random walk from every vertex in the graph.

Zhao Song

What is connected

Connect this record

See the researcher in context

Building this map preview

41 published item(s)

Strategic commitments shape collective cybersecurity under AI inequality

Dynamic Tensor Product Regression

Faster Sinkhorn's Algorithm with Small Treewidth

Online Adaptive Mahalanobis Distance Estimation

Smoothed Online Combinatorial Optimization Using Imperfect Predictions

A Sublinear Adversarial Training Algorithm

Accelerating Frank-Wolfe Algorithm using Low-Dimensional and Adaptive Data Structures

An $O(k\log n)$ Time Fourier Set Query Algorithm

An improved quantum-inspired algorithm for linear regression

Bounding the Width of Neural Networks via Coupled Initialization -- A Worst Case Analysis

Fast Distance Oracles for Any Symmetric Norm

Federated Adversarial Learning: A Framework with Convergence Analysis

Hyperbolic Concentration, Anti-concentration, and Discrepancy

Perfectly Balanced: Improving Transfer and Robustness of Supervised Contrastive Learning

Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models

Speeding Up Sparsification using Inner Product Search Data Structures

Symmetric Sparse Boolean Matrix Factorization and Applications

InstaHide: Instance-hiding Schemes for Private Distributed Learning

Near-Optimal Two-Pass Streaming Algorithm for Sampling Random Walks over Directed Graphs

A Faster Interior Point Method for Semidefinite Programming

A novel route to cyclic dominance in voluntary social dilemmas

An Improved Cutting Plane Method for Convex Optimization, Convex-Concave Games and its Applications

Average Case Column Subset Selection for Entrywise $\ell_1$-Norm Loss

Faster Dynamic Matrix Inverse for Faster LPs

Four Deviations Suffice for Rank 1 Matrices

Generalized Leverage Score Sampling for Neural Networks

Low Rank Approximation with Entrywise $\ell_1$-Norm Error

Meta-learning for mixed linear regression

Non-Autoregressive Neural Text-to-Speech

Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality

Privacy-preserving Learning via Deep Net Pruning

Quadratic Suffices for Over-parametrization via Matrix Chernoff Bound

Sketching Transformed Matrices with Applications to Natural Language Processing

Towards a Zero-One Law for Column Subset Selection

WaveFlow: A Compact Flow-based Model for Raw Audio

A Robust Sparse Fourier Transform in the Continuous Setting

Fourier-sparse interpolation without a frequency gap

The $p$-Center Problem in Tree Networks Revisited

Batch Codes through Dense Graphs without Short Cycles

A Max-Product EM Algorithm for Reconstructing Markov-tree Sparse Signals from Compressive Samples

PREMIER - PRobabilistic Error-correction using Markov Inference in Errored Reads