Source author record

Christos Boutsidis

Christos Boutsidis appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Machine Learning math.NA Numerical Analysis Artificial Intelligence Information Theory math.IT Computation math.ST Statistics Theory

Catalog footprint

What is connected

23works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2016arXiv

A Randomized Algorithm for Approximating the Log Determinant of a Symmetric Positive Definite Matrix

We introduce a novel algorithm for approximating the logarithm of the determinant of a symmetric positive definite (SPD) matrix. The algorithm is randomized and approximates the traces of a small number of matrix powers of a specially constructed matrix, using the method of Avron and Toledo~\cite{AT11}. From a theoretical perspective, we present additive and relative error bounds for our algorithm. Our additive error bound works for any SPD matrix, whereas our relative error bound works for SPD matrices whose eigenvalues lie in the interval $(θ_1,1)$, with $0<θ_1<1$; the latter setting was proposed in~\cite{icml2015_hana15}. From an empirical perspective, we demonstrate that a C++ implementation of our algorithm can approximate the logarithm of the determinant of large matrices very accurately in a matter of seconds.

preprint2016arXiv

Optimal Principal Component Analysis in Distributed and Streaming Models

We study the Principal Component Analysis (PCA) problem in the distributed and streaming models of computation. Given a matrix $A \in R^{m \times n},$ a rank parameter $k < rank(A)$, and an accuracy parameter $0 < ε< 1$, we want to output an $m \times k$ orthonormal matrix $U$ for which $$ || A - U U^T A ||_F^2 \le \left(1 + ε\right) \cdot || A - A_k||_F^2, $$ where $A_k \in R^{m \times n}$ is the best rank-$k$ approximation to $A$. This paper provides improved algorithms for distributed PCA and streaming PCA.

preprint2015arXiv

Greedy Minimization of Weakly Supermodular Set Functions

This paper defines weak-$α$-supermodularity for set functions. Many optimization objectives in machine learning and data mining seek to minimize such functions under cardinality constrains. We prove that such problems benefit from a greedy extension phase. Explicitly, let $S^*$ be the optimal set of cardinality $k$ that minimizes $f$ and let $S_0$ be an initial solution such that $f(S_0)/f(S^*) \le ρ$. Then, a greedy extension $S \supset S_0$ of size $|S| \le |S_0| + \lceil αk \ln(ρ/\varepsilon) \rceil$ yields $f(S)/f(S^*) \le 1+\varepsilon$. As example usages of this framework we give new bicriteria results for $k$-means, sparse regression, and columns subset selection.

preprint2015arXiv

Optimal Sparse Linear Auto-Encoders and Sparse PCA

Principal components analysis (PCA) is the optimal linear auto-encoder of data, and it is often used to construct features. Enforcing sparsity on the principal components can promote better generalization, while improving the interpretability of the features. We study the problem of constructing optimal sparse linear auto-encoders. Two natural questions in such a setting are: i) Given a level of sparsity, what is the best approximation to PCA that can be achieved? ii) Are there low-order polynomial-time algorithms which can asymptotically achieve this optimal tradeoff between the sparsity and the approximation quality? In this work, we answer both questions by giving efficient low-order polynomial-time algorithms for constructing asymptotically \emph{optimal} linear auto-encoders (in particular, sparse features with near-PCA reconstruction error) and demonstrate the performance of our algorithms on real data.

preprint2015arXiv

Spectral Clustering via the Power Method -- Provably

Spectral clustering is one of the most important algorithms in data mining and machine intelligence; however, its computational complexity limits its application to truly large scale data analysis. The computational bottleneck in spectral clustering is computing a few of the top eigenvectors of the (normalized) Laplacian matrix corresponding to the graph representing the data to be clustered. One way to speed up the computation of these eigenvectors is to use the "power method" from the numerical linear algebra literature. Although the power method has been empirically used to speed up spectral clustering, the theory behind this approach, to the best of our knowledge, remains unexplored. This paper provides the \emph{first} such rigorous theoretical justification, arguing that a small number of power iterations suffices to obtain near-optimal partitionings using the approximate eigenvectors. Specifically, we prove that solving the $k$-means clustering problem on the approximate eigenvectors obtained via the power method gives an additive-error approximation to solving the $k$-means problem on the optimal eigenvectors.

preprint2014arXiv

Fast Matrix Multiplication with Sketching

We present an approximate algorithm for matrix multiplication based on matrix sketching techniques. First one of the matrix is chosen and sparsified using the online matrix sketching algorithm, and then the matrix product is calculated using the sparsified matrix. We prove when the sample number grows large compared to the sample dimensions the proposed algorithm achieves similar accuracy bound with a smaller computational cost compared to the state-of-the-art algorithms.

preprint2014arXiv

Faster SVD-Truncated Least-Squares Regression

We develop a fast algorithm for computing the "SVD-truncated" regularized solution to the least-squares problem: $ \min_{\x} \TNorm{\matA \x - \b}. $ Let $\matA_k$ of rank $k$ be the best rank $k$ matrix computed via the SVD of $\matA$. Then, the SVD-truncated regularized solution is: $ \x_k = \pinv{\matA}_k \b. $ If $\matA$ is $m \times n$, then, it takes $O(m n \min\{m,n\})$ time to compute $\x_k $ using the SVD of \math{\matA}. We give an approximation algorithm for \math{\x_k} which constructs a rank-\math{k} approximation $\tilde{\matA}_{k}$ and computes $ \tilde{\x}_{k} = \pinv{\tilde\matA}_{k} \b$ in roughly $O(\nnz(\matA) k \log n)$ time. Our algorithm uses a randomized variant of the subspace iteration. We show that, with high probability: $ \TNorm{\matA \tilde{\x}_{k} - \b} \approx \TNorm{\matA \x_k - \b}$ and $\TNorm{\x_k - \tilde\x_k} \approx 0. $

preprint2014arXiv

On Truncated-SVD-like Sparse Solutions to Least-Squares Problems of Arbitrary Dimensions

We describe two algorithms for computing a sparse solution to a least-squares problem where the coefficient matrix can have arbitrary dimensions. We show that the solution vector obtained by our algorithms is close to the solution vector obtained via the truncated SVD approach.

preprint2014arXiv

Optimal CUR Matrix Decompositions

The CUR decomposition of an $m \times n$ matrix $A$ finds an $m \times c$ matrix $C$ with a subset of $c < n$ columns of $A,$ together with an $r \times n$ matrix $R$ with a subset of $r < m$ rows of $A,$ as well as a $c \times r$ low-rank matrix $U$ such that the matrix $C U R$ approximates the matrix $A,$ that is, $ || A - CUR ||_F^2 \le (1+ε) || A - A_k||_F^2$, where $||.||_F$ denotes the Frobenius norm and $A_k$ is the best $m \times n$ matrix of rank $k$ constructed via the SVD. We present input-sparsity-time and deterministic algorithms for constructing such a CUR decomposition where $c=O(k/ε)$ and $r=O(k/ε)$ and rank$(U) = k$. Up to constant factors, our algorithms are simultaneously optimal in $c, r,$ and rank$(U)$.

preprint2014arXiv

Provable Deterministic Leverage Score Sampling

We explain theoretically a curious empirical phenomenon: "Approximating a matrix by deterministically selecting a subset of its columns with the corresponding largest leverage scores results in a good low-rank matrix surrogate". To obtain provable guarantees, previous work requires randomized sampling of the columns with probabilities proportional to their leverage scores. In this work, we provide a novel theoretical analysis of deterministic leverage score sampling. We show that such deterministic sampling can be provably as accurate as its randomized counterparts, if the leverage scores follow a moderately steep power-law decay. We support this power-law assumption by providing empirical evidence that such decay laws are abundant in real-world data sets. We then demonstrate empirically the performance of deterministic leverage score sampling, which many times matches or outperforms the state-of-the-art techniques.

preprint2014arXiv

Random Projections for Linear Support Vector Machines

Let X be a data matrix of rank ρ, whose rows represent n points in d-dimensional space. The linear support vector machine constructs a hyperplane separator that maximizes the 1-norm soft margin. We develop a new oblivious dimension reduction technique which is precomputed and can be applied to any input matrix X. We prove that, with high probability, the margin and minimum enclosing ball in the feature space are preserved to within ε-relative error, ensuring comparable generalization as in the original space in the case of classification. For regression, we show that the margin is preserved to ε-relative error with high probability. We present extensive experiments with real and synthetic data to support our theory.

preprint2014arXiv

Randomized Dimensionality Reduction for k-means Clustering

We study the topic of dimensionality reduction for $k$-means clustering. Dimensionality reduction encompasses the union of two approaches: \emph{feature selection} and \emph{feature extraction}. A feature selection based algorithm for $k$-means clustering selects a small subset of the input features and then applies $k$-means clustering on the selected features. A feature extraction based algorithm for $k$-means clustering constructs a small set of new artificial features and then applies $k$-means clustering on the constructed features. Despite the significance of $k$-means clustering as well as the wealth of heuristic methods addressing it, provably accurate feature selection methods for $k$-means clustering are not known. On the other hand, two provably accurate feature extraction methods for $k$-means clustering are known in the literature; one is based on random projections and the other is based on the singular value decomposition (SVD). This paper makes further progress towards a better understanding of dimensionality reduction for $k$-means clustering. Namely, we present the first provably accurate feature selection method for $k$-means clustering and, in addition, we present two feature extraction methods. The first feature extraction method is based on random projections and it improves upon the existing results in terms of time complexity and number of features needed to be extracted. The second feature extraction method is based on fast approximate SVD factorizations and it also improves upon the existing results in terms of time complexity. The proposed algorithms are randomized and provide constant-factor approximation guarantees with respect to the optimal $k$-means objective value.

preprint2013arXiv

A note on sparse least-squares regression

We compute a \emph{sparse} solution to the classical least-squares problem $\min_x||A x -b||,$ where $A$ is an arbitrary matrix. We describe a novel algorithm for this sparse least-squares problem. The algorithm operates as follows: first, it selects columns from $A$, and then solves a least-squares problem only with the selected columns. The column selection algorithm that we use is known to perform well for the well studied column subset selection problem. The contribution of this article is to show that it gives favorable results for sparse least-squares as well. Specifically, we prove that the solution vector obtained by our algorithm is close to the solution vector obtained via what is known as the "SVD-truncated regularization approach".

preprint2013arXiv

Deterministic Feature Selection for $k$-means Clustering

We study feature selection for $k$-means clustering. Although the literature contains many methods with good empirical performance, algorithms with provable theoretical behavior have only recently been developed. Unfortunately, these algorithms are randomized and fail with, say, a constant probability. We address this issue by presenting a deterministic feature selection algorithm for k-means with theoretical guarantees. At the heart of our algorithm lies a deterministic method for decompositions of the identity.

preprint2013arXiv

Efficient Dimensionality Reduction for Canonical Correlation Analysis

We present a fast algorithm for approximate Canonical Correlation Analysis (CCA). Given a pair of tall-and-thin matrices, the proposed algorithm first employs a randomized dimensionality reduction transform to reduce the size of the input matrices, and then applies any CCA algorithm to the new pair of matrices. The algorithm computes an approximate CCA to the original pair of matrices with provable guarantees, while requiring asymptotically less operations than the state-of-the-art exact algorithms.

preprint2013arXiv

Faster Subset Selection for Matrices and Applications

We study subset selection for matrices defined as follows: given a matrix $\matX \in \R^{n \times m}$ ($m > n$) and an oversampling parameter $k$ ($n \le k \le m$), select a subset of $k$ columns from $\matX$ such that the pseudo-inverse of the subsampled matrix has as smallest norm as possible. In this work, we focus on the Frobenius and the spectral matrix norms. We describe several novel (deterministic and randomized) approximation algorithms for this problem with approximation bounds that are optimal up to constant factors. Additionally, we show that the combinatorial problem of finding a low-stretch spanning tree in an undirected graph corresponds to subset selection, and discuss various implications of this reduction.

preprint2013arXiv

Improved matrix algorithms via the Subsampled Randomized Hadamard Transform

Several recent randomized linear algebra algorithms rely upon fast dimension reduction methods. A popular choice is the Subsampled Randomized Hadamard Transform (SRHT). In this article, we address the efficacy, in the Frobenius and spectral norms, of an SRHT-based low-rank matrix approximation technique introduced by Woolfe, Liberty, Rohklin, and Tygert. We establish a slightly better Frobenius norm error bound than currently available, and a much sharper spectral norm error bound (in the presence of reasonable decay of the singular values). Along the way, we produce several results on matrix operations with SRHTs (such as approximate matrix multiplication) that may be of independent interest. Our approach builds upon Tropp's in "Improved analysis of the Subsampled Randomized Hadamard Transform".

preprint2013arXiv

Near-Optimal Column-Based Matrix Reconstruction

We consider low-rank reconstruction of a matrix using its columns and we present asymptotically optimal algorithms for both spectral norm and Frobenius norm reconstruction. The main tools we introduce to obtain our r esults are: (i) the use of fast approximate SVD-like decompositions for column reconstruction, and (ii) two deter ministic algorithms for selecting rows from matrices with orthonormal columns, building upon the sparse represen tation theorem for decompositions of the identity that appeared in \cite{BSS09}.

preprint2013arXiv

Near-optimal Coresets For Least-Squares Regression

We study (constrained) least-squares regression as well as multiple response least-squares regression and ask the question of whether a subset of the data, a coreset, suffices to compute a good approximate solution to the regression. We give deterministic, low order polynomial-time algorithms to construct such coresets with approximation guarantees, together with lower bounds indicating that there is not much room for improvement upon our results.

preprint2012arXiv

Improved Low-rank Matrix Decompositions via the Subsampled Randomized Hadamard Transform

We comment on two randomized algorithms for constructing low-rank matrix decompositions. Both algorithms employ the Subsampled Randomized Hadamard Transform [14]. The first algorithm appeared recently in [9]; here, we provide a novel analysis that significantly improves the approximation bound obtained in [9]. A preliminary version of the second algorithm appeared in [7]; here, we present a mild modification of this algorithm that achieves the same approximation bound but significantly improves the corresponding running time.

preprint2011arXiv

Topics in Matrix Sampling Algorithms

We study three fundamental problems of Linear Algebra, lying in the heart of various Machine Learning applications, namely: 1)"Low-rank Column-based Matrix Approximation". We are given a matrix A and a target rank k. The goal is to select a subset of columns of A and, by using only these columns, compute a rank k approximation to A that is as good as the rank k approximation that would have been obtained by using all the columns; 2) "Coreset Construction in Least-Squares Regression". We are given a matrix A and a vector b. Consider the (over-constrained) least-squares problem of minimizing ||Ax-b||, over all vectors x in D. The domain D represents the constraints on the solution and can be arbitrary. The goal is to select a subset of the rows of A and b and, by using only these rows, find a solution vector that is as good as the solution vector that would have been obtained by using all the rows; 3) "Feature Selection in K-means Clustering". We are given a set of points described with respect to a large number of features. The goal is to select a subset of the features and, by using only this subset, obtain a k-partition of the points that is as good as the partition that would have been obtained by using all the features. We present novel algorithms for all three problems mentioned above. Our results can be viewed as follow-up research to a line of work known as "Matrix Sampling Algorithms". [Frieze, Kanna, Vempala, 1998] presented the first such algorithm for the Low-rank Matrix Approximation problem. Since then, such algorithms have been developed for several other problems, e.g. Graph Sparsification and Linear Equation Solving. Our contributions to this line of research are: (i) improved algorithms for Low-rank Matrix Approximation and Regression (ii) algorithms for a new problem domain (K-means Clustering).

preprint2010arXiv

An Improved Approximation Algorithm for the Column Subset Selection Problem

We consider the problem of selecting the best subset of exactly $k$ columns from an $m \times n$ matrix $A$. We present and analyze a novel two-stage algorithm that runs in $O(\min\{mn^2,m^2n\})$ time and returns as output an $m \times k$ matrix $C$ consisting of exactly $k$ columns of $A$. In the first (randomized) stage, the algorithm randomly selects $Θ(k \log k)$ columns according to a judiciously-chosen probability distribution that depends on information in the top-$k$ right singular subspace of $A$. In the second (deterministic) stage, the algorithm applies a deterministic column-selection procedure to select and return exactly $k$ columns from the set of columns selected in the first stage. Let $C$ be the $m \times k$ matrix containing those $k$ columns, let $P_C$ denote the projection matrix onto the span of those columns, and let $A_k$ denote the best rank-$k$ approximation to the matrix $A$. Then, we prove that, with probability at least 0.8, $$ \FNorm{A - P_CA} \leq Θ(k \log^{1/2} k) \FNorm{A-A_k}. $$ This Frobenius norm bound is only a factor of $\sqrt{k \log k}$ worse than the best previously existing existential result and is roughly $O(\sqrt{k!})$ better than the best previous algorithmic result for the Frobenius norm version of this Column Subset Selection Problem (CSSP). We also prove that, with probability at least 0.8, $$ \TNorm{A - P_CA} \leq Θ(k \log^{1/2} k)\TNorm{A-A_k} + Θ(k^{3/4}\log^{1/4}k)\FNorm{A-A_k}. $$ This spectral norm bound is not directly comparable to the best previously existing bounds for the spectral norm version of this CSSP. Our bound depends on $\FNorm{A-A_k}$, whereas previous results depend on $\sqrt{n-k}\TNorm{A-A_k}$; if these two quantities are comparable, then our bound is asymptotically worse by a $(k \log k)^{1/4}$ factor.

preprint2010arXiv

Random Projections for $k$-means Clustering

This paper discusses the topic of dimensionality reduction for $k$-means clustering. We prove that any set of $n$ points in $d$ dimensions (rows in a matrix $A \in \RR^{n \times d}$) can be projected into $t = Ω(k / \eps^2)$ dimensions, for any $\eps \in (0,1/3)$, in $O(n d \lceil \eps^{-2} k/ \log(d) \rceil )$ time, such that with constant probability the optimal $k$-partition of the point set is preserved within a factor of $2+\eps$. The projection is done by post-multiplying $A$ with a $d \times t$ random matrix $R$ having entries $+1/\sqrt{t}$ or $-1/\sqrt{t}$ with equal probability. A numerical implementation of our technique and experiments on a large face images dataset verify the speed and the accuracy of our theoretical results.

Christos Boutsidis

What is connected

Connect this record

See the researcher in context

Building this map preview

23 published item(s)

A Randomized Algorithm for Approximating the Log Determinant of a Symmetric Positive Definite Matrix

Optimal Principal Component Analysis in Distributed and Streaming Models

Greedy Minimization of Weakly Supermodular Set Functions

Optimal Sparse Linear Auto-Encoders and Sparse PCA

Spectral Clustering via the Power Method -- Provably

Fast Matrix Multiplication with Sketching

Faster SVD-Truncated Least-Squares Regression

On Truncated-SVD-like Sparse Solutions to Least-Squares Problems of Arbitrary Dimensions

Optimal CUR Matrix Decompositions

Provable Deterministic Leverage Score Sampling

Random Projections for Linear Support Vector Machines

Randomized Dimensionality Reduction for k-means Clustering

A note on sparse least-squares regression

Deterministic Feature Selection for $k$-means Clustering

Efficient Dimensionality Reduction for Canonical Correlation Analysis

Faster Subset Selection for Matrices and Applications

Improved matrix algorithms via the Subsampled Randomized Hadamard Transform

Near-Optimal Column-Based Matrix Reconstruction

Near-optimal Coresets For Least-Squares Regression

Improved Low-rank Matrix Decompositions via the Subsampled Randomized Hadamard Transform

Topics in Matrix Sampling Algorithms

An Improved Approximation Algorithm for the Column Subset Selection Problem

Random Projections for $k$-means Clustering