Source author record

Megasthenis Asteris

Megasthenis Asteris appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Machine Learning math.OC Data Structures and Algorithms Distributed, Parallel, and Cluster Computing Methodology Networking and Internet Architecture Systems and Control

Catalog footprint

What is connected

10works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2016arXiv

A simple and provable algorithm for sparse diagonal CCA

Given two sets of variables, derived from a common set of samples, sparse Canonical Correlation Analysis (CCA) seeks linear combinations of a small number of variables in each set, such that the induced canonical variables are maximally correlated. Sparse CCA is NP-hard. We propose a novel combinatorial algorithm for sparse diagonal CCA, i.e., sparse CCA under the additional assumption that variables within each set are standardized and uncorrelated. Our algorithm operates on a low rank approximation of the input data and its computational complexity scales linearly with the number of input variables. It is simple to implement, and parallelizable. In contrast to most existing approaches, our algorithm administers precise control on the sparsity of the extracted canonical vectors, and comes with theoretical data-dependent global approximation guarantees, that hinge on the spectrum of the input data. Finally, it can be straightforwardly adapted to other constrained variants of CCA enforcing structure beyond sparsity. We empirically evaluate the proposed scheme and apply it on a real neuroimaging dataset to investigate associations between brain activity and behavior measurements.

preprint2016arXiv

Bipartite Correlation Clustering -- Maximizing Agreements

In Bipartite Correlation Clustering (BCC) we are given a complete bipartite graph $G$ with `+' and `-' edges, and we seek a vertex clustering that maximizes the number of agreements: the number of all `+' edges within clusters plus all `-' edges cut across clusters. BCC is known to be NP-hard. We present a novel approximation algorithm for $k$-BCC, a variant of BCC with an upper bound $k$ on the number of clusters. Our algorithm outputs a $k$-clustering that provably achieves a number of agreements within a multiplicative ${(1-δ)}$-factor from the optimal, for any desired accuracy $δ$. It relies on solving a combinatorially constrained bilinear maximization on the bi-adjacency matrix of $G$. It runs in time exponential in $k$ and $δ^{-1}$, but linear in the size of the input. Further, we show that, in the (unconstrained) BCC setting, an ${(1-δ)}$-approximation can be achieved by $O(δ^{-1})$ clusters regardless of the size of the graph. In turn, our $k$-BCC algorithm implies an Efficient PTAS for the BCC objective of maximizing agreements.

preprint2016arXiv

Trading-off variance and complexity in stochastic gradient descent

Stochastic gradient descent is the method of choice for large-scale machine learning problems, by virtue of its light complexity per iteration. However, it lags behind its non-stochastic counterparts with respect to the convergence rate, due to high variance introduced by the stochastic updates. The popular Stochastic Variance-Reduced Gradient (SVRG) method mitigates this shortcoming, introducing a new update rule which requires infrequent passes over the entire input dataset to compute the full-gradient. In this work, we propose CheapSVRG, a stochastic variance-reduction optimization scheme. Our algorithm is similar to SVRG but instead of the full gradient, it uses a surrogate which can be efficiently computed on a small subset of the input data. It achieves a linear convergence rate ---up to some error level, depending on the nature of the optimization problem---and features a trade-off between the computational complexity and the convergence rate. Empirical evaluation shows that CheapSVRG performs at least competitively compared to the state of the art.

preprint2015arXiv

On Approximating the Sum-Rate for Multiple-Unicasts

We study upper bounds on the sum-rate of multiple-unicasts. We approximate the Generalized Network Sharing Bound (GNS cut) of the multiple-unicasts network coding problem with $k$ independent sources. Our approximation algorithm runs in polynomial time and yields an upper bound on the joint source entropy rate, which is within an $O(\log^2 k)$ factor from the GNS cut. It further yields a vector-linear network code that achieves joint source entropy rate within an $O(\log^2 k)$ factor from the GNS cut, but \emph{not} with independent sources: the code induces a correlation pattern among the sources. Our second contribution is establishing a separation result for vector-linear network codes: for any given field $\mathbb{F}$ there exist networks for which the optimum sum-rate supported by vector-linear codes over $\mathbb{F}$ for independent sources can be multiplicatively separated by a factor of $k^{1-δ}$, for any constant ${δ>0}$, from the optimum joint entropy rate supported by a code that allows correlation between sources. Finally, we establish a similar separation result for the asymmetric optimum vector-linear sum-rates achieved over two distinct fields $\mathbb{F}_{p}$ and $\mathbb{F}_{q}$ for independent sources, revealing that the choice of field can heavily impact the performance of a linear network code.

preprint2015arXiv

Sparse PCA via Bipartite Matchings

We consider the following multi-component sparse PCA problem: given a set of data points, we seek to extract a small number of sparse components with disjoint supports that jointly capture the maximum possible variance. These components can be computed one by one, repeatedly solving the single-component problem and deflating the input data matrix, but as we show this greedy procedure is suboptimal. We present a novel algorithm for sparse PCA that jointly optimizes multiple disjoint components. The extracted features capture variance that lies within a multiplicative factor arbitrarily close to 1 from the optimal. Our algorithm is combinatorial and computes the desired components by solving multiple instances of the bipartite maximum weight matching problem. Its complexity grows as a low order polynomial in the ambient dimension of the input data matrix, but exponentially in its rank. However, it can be effectively applied on a low-dimensional sketch of the data; this allows us to obtain polynomial-time approximation guarantees via spectral bounds. We evaluate our algorithm on real data-sets and empirically demonstrate that in many cases it outperforms existing, deflation-based approaches.

preprint2015arXiv

Stay on path: PCA along graph paths

We introduce a variant of (sparse) PCA in which the set of feasible support sets is determined by a graph. In particular, we consider the following setting: given a directed acyclic graph $G$ on $p$ vertices corresponding to variables, the non-zero entries of the extracted principal component must coincide with vertices lying along a path in $G$. From a statistical perspective, information on the underlying network may potentially reduce the number of observations required to recover the population principal component. We consider the canonical estimator which optimally exploits the prior knowledge by solving a non-convex quadratic maximization on the empirical covariance. We introduce a simple network and analyze the estimator under the spiked covariance model. We show that side information potentially improves the statistical complexity. We propose two algorithms to approximate the solution of the constrained quadratic maximization, and recover a component with the desired properties. We empirically evaluate our schemes on synthetic and real datasets.

preprint2014arXiv

Repairable Fountain Codes

We introduce a new family of Fountain codes that are systematic and also have sparse parities. Given an input of $k$ symbols, our codes produce an unbounded number of output symbols, generating each parity independently by linearly combining a logarithmic number of randomly selected input symbols. The construction guarantees that for any $ε>0$ accessing a random subset of $(1+ε)k$ encoded symbols, asymptotically suffices to recover the $k$ input symbols with high probability. Our codes have the additional benefit of logarithmic locality: a single lost symbol can be repaired by accessing a subset of $O(\log k)$ of the remaining encoded symbols. This is a desired property for distributed storage systems where symbols are spread over a network of storage nodes. Beyond recovery upon loss, local reconstruction provides an efficient alternative for reading symbols that cannot be accessed directly. In our code, a logarithmic number of disjoint local groups is associated with each systematic symbol, allowing multiple parallel reads. Our main mathematical contribution involves analyzing the rank of sparse random matrices with specific structure over finite fields. We rely on establishing that a new family of sparse random bipartite graphs have perfect matchings with high probability.

preprint2013arXiv

The Sparse Principal Component of a Constant-rank Matrix

The computation of the sparse principal component of a matrix is equivalent to the identification of its principal submatrix with the largest maximum eigenvalue. Finding this optimal submatrix is what renders the problem ${\mathcal{NP}}$-hard. In this work, we prove that, if the matrix is positive semidefinite and its rank is constant, then its sparse principal component is polynomially computable. Our proof utilizes the auxiliary unit vector technique that has been recently developed to identify problems that are polynomially solvable. Moreover, we use this technique to design an algorithm which, for any sparsity value, computes the sparse principal component with complexity ${\mathcal O}\left(N^{D+1}\right)$, where $N$ and $D$ are the matrix size and rank, respectively. Our algorithm is fully parallelizable and memory efficient.

preprint2013arXiv

XORing Elephants: Novel Erasure Codes for Big Data

Distributed storage systems for large clusters typically use replication to provide reliability. Recently, erasure codes have been used to reduce the large storage overhead of three-replicated systems. Reed-Solomon codes are the standard design choice and their high repair cost is often considered an unavoidable price to pay for high storage efficiency and high reliability. This paper shows how to overcome this limitation. We present a novel family of erasure codes that are efficiently repairable and offer higher reliability compared to Reed-Solomon codes. We show analytically that our codes are optimal on a recently identified tradeoff between locality and minimum distance. We implement our new codes in Hadoop HDFS and compare to a currently deployed HDFS module that uses Reed-Solomon codes. Our modified HDFS implementation shows a reduction of approximately 2x on the repair disk I/O and repair network traffic. The disadvantage of the new coding scheme is that it requires 14% more storage compared to Reed-Solomon codes, an overhead shown to be information theoretically optimal to obtain locality. Because the new codes repair failures faster, this provides higher reliability, which is orders of magnitude higher compared to replication.

preprint2011arXiv

Sparse Principal Component of a Rank-deficient Matrix

We consider the problem of identifying the sparse principal component of a rank-deficient matrix. We introduce auxiliary spherical variables and prove that there exists a set of candidate index-sets (that is, sets of indices to the nonzero elements of the vector argument) whose size is polynomially bounded, in terms of rank, and contains the optimal index-set, i.e. the index-set of the nonzero elements of the optimal solution. Finally, we develop an algorithm that computes the optimal sparse principal component in polynomial time for any sparsity degree.

Megasthenis Asteris

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

A simple and provable algorithm for sparse diagonal CCA

Bipartite Correlation Clustering -- Maximizing Agreements

Trading-off variance and complexity in stochastic gradient descent

On Approximating the Sum-Rate for Multiple-Unicasts

Sparse PCA via Bipartite Matchings

Stay on path: PCA along graph paths

Repairable Fountain Codes

The Sparse Principal Component of a Constant-rank Matrix

XORing Elephants: Novel Erasure Codes for Big Data

Sparse Principal Component of a Rank-deficient Matrix