Source author record

Anastasios Zouzias

Anastasios Zouzias appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Discrete Mathematics Machine Learning Computational Complexity Information Theory math.IT math.NA Applications Artificial Intelligence Computational Geometry Distributed, Parallel, and Cluster Computing Hardware Architecture math.PR math.ST Statistics Theory

Catalog footprint

What is connected

16works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Parallel Scan on Ascend AI Accelerators

We design and implement parallel prefix sum (scan) algorithms using Ascend AI accelerators. Ascend accelerators feature specialized computing units: the cube units for efficient matrix multiplication and the vector units for optimized vector operations. A key feature of the proposed scan algorithms is their extensive use of matrix multiplications and accumulations enabled by the cube unit. To showcase the effectiveness of these algorithms, we also implement and evaluate several scan-based operators commonly used in AI workloads, including sorting, tensor masking, and top-$k$ / top-$p$ sampling. Our single-core results demonstrate substantial performance improvements, with speedups ranging from $5\times$ to $9.6\times$ compared to vector-only implementations for sufficiently large input lengths. Additionally, we present a multi-core scan algorithm that fully utilizes both the cube and vector units of Ascend, reaching up to 74.9\% of the memory bandwidth achieved by memory copy. Furthermore, our radix sort implementation, which utilizes matrix multiplications for its parallel splits, showcases the potential of matrix engines to enhance complex operations, offering up to $3.3\times$ speedup over the vector-only baseline.

preprint2022arXiv

Identifying and Exploiting Sparse Branch Correlations for Optimizing Branch Prediction

Branch prediction is arguably one of the most important speculative mechanisms within a high-performance processor architecture. A common approach to improve branch prediction accuracy is to employ lengthy history records of previously seen branch directions to capture distant correlations between branches. The larger the history, the richer the information that the predictor can exploit for discovering predictive patterns. However, without appropriate filtering, such an approach may also heavily disorganize the predictor's internal mechanisms, leading to diminishing returns. This paper studies a fundamental control-flow property: the sparsity in the correlation between branches and recent history. First, we show that sparse branch correlations exist in standard applications and, more importantly, such correlations can be computed efficiently using sparse modeling methods. Second, we introduce a sparsity-aware branch prediction mechanism that can compactly encode and store sparse models to unlock essential performance opportunities. We evaluated our approach for various design parameters demonstrating MPKI improvements of up to 42% (2.3% on average) with 2KB of additional storage overhead. Our circuit-level evaluation of the design showed that it can operate within accepted branch prediction latencies, and under reasonable power and area limitations.

preprint2020arXiv

Team voyTECH: User Activity Modeling with Boosting Trees

This paper describes our winning solution for the ECML-PKDD ChAT Discovery Challenge 2020. We show that whether or not a Twitch user has subscribed to a channel can be well predicted by modeling user activity with boosting trees. We introduce the connection between target-encodings and boosting trees in the context of high cardinality categoricals and find that modeling user activity is more powerful then direct modeling of content when encoded properly and combined with a suitable optimization approach.

preprint2016arXiv

A Randomized Algorithm for Approximating the Log Determinant of a Symmetric Positive Definite Matrix

We introduce a novel algorithm for approximating the logarithm of the determinant of a symmetric positive definite (SPD) matrix. The algorithm is randomized and approximates the traces of a small number of matrix powers of a specially constructed matrix, using the method of Avron and Toledo~\cite{AT11}. From a theoretical perspective, we present additive and relative error bounds for our algorithm. Our additive error bound works for any SPD matrix, whereas our relative error bound works for SPD matrices whose eigenvalues lie in the interval $(θ_1,1)$, with $0<θ_1<1$; the latter setting was proposed in~\cite{icml2015_hana15}. From an empirical perspective, we demonstrate that a C++ implementation of our algorithm can approximate the logarithm of the determinant of large matrices very accurately in a matter of seconds.

preprint2015arXiv

Randomized Block Kaczmarz Method with Projection for Solving Least Squares

The Kaczmarz method is an iterative method for solving overcomplete linear systems of equations Ax=b. The randomized version of the Kaczmarz method put forth by Strohmer and Vershynin iteratively projects onto a randomly chosen solution space given by a single row of the matrix A and converges exponentially in expectation to the solution of a consistent system. In this paper we analyze two block versions of the method each with a randomized projection, that converge in expectation to the least squares solution of inconsistent systems. Our approach utilizes a paving of the matrix A to guarantee exponential convergence, and suggests that paving yields a significant improvement in performance in certain regimes. The proposed method is an extension of the block Kaczmarz method analyzed by Needell and Tropp and the Randomized Extended Kaczmarz method of Zouzias and Freris. The contribution is thus two-fold; unlike the standard Kaczmarz method, our methods converge to the least-squares solution of inconsistent systems, and by using appropriate blocks of the matrix this convergence can be significantly accelerated. Numerical experiments suggest that the proposed algorithm can indeed lead to advantages in practice.

preprint2014arXiv

Approximate Matrix Multiplication with Application to Linear Embeddings

In this paper, we study the problem of approximately computing the product of two real matrices. In particular, we analyze a dimensionality-reduction-based approximation algorithm due to Sarlos [1], introducing the notion of nuclear rank as the ratio of the nuclear norm over the spectral norm. The presented bound has improved dependence with respect to the approximation error (as compared to previous approaches), whereas the subspace -- on which we project the input matrices -- has dimensions proportional to the maximum of their nuclear rank and it is independent of the input dimensions. In addition, we provide an application of this result to linear low-dimensional embeddings. Namely, we show that any Euclidean point-set with bounded nuclear rank is amenable to projection onto number of dimensions that is independent of the input dimensionality, while achieving additive error guarantees.

preprint2014arXiv

Non-uniform Feature Sampling for Decision Tree Ensembles

We study the effectiveness of non-uniform randomized feature selection in decision tree classification. We experimentally evaluate two feature selection methodologies, based on information extracted from the provided dataset: $(i)$ \emph{leverage scores-based} and $(ii)$ \emph{norm-based} feature selection. Experimental evaluation of the proposed feature selection techniques indicate that such approaches might be more effective compared to naive uniform feature selection and moreover having comparable performance to the random forest algorithm [3]

preprint2014arXiv

Randomized Dimensionality Reduction for k-means Clustering

We study the topic of dimensionality reduction for $k$-means clustering. Dimensionality reduction encompasses the union of two approaches: \emph{feature selection} and \emph{feature extraction}. A feature selection based algorithm for $k$-means clustering selects a small subset of the input features and then applies $k$-means clustering on the selected features. A feature extraction based algorithm for $k$-means clustering constructs a small set of new artificial features and then applies $k$-means clustering on the constructed features. Despite the significance of $k$-means clustering as well as the wealth of heuristic methods addressing it, provably accurate feature selection methods for $k$-means clustering are not known. On the other hand, two provably accurate feature extraction methods for $k$-means clustering are known in the literature; one is based on random projections and the other is based on the singular value decomposition (SVD). This paper makes further progress towards a better understanding of dimensionality reduction for $k$-means clustering. Namely, we present the first provably accurate feature selection method for $k$-means clustering and, in addition, we present two feature extraction methods. The first feature extraction method is based on random projections and it improves upon the existing results in terms of time complexity and number of features needed to be extracted. The second feature extraction method is based on fast approximate SVD factorizations and it also improves upon the existing results in terms of time complexity. The proposed algorithms are randomized and provide constant-factor approximation guarantees with respect to the optimal $k$-means objective value.

preprint2013arXiv

Efficient Dimensionality Reduction for Canonical Correlation Analysis

We present a fast algorithm for approximate Canonical Correlation Analysis (CCA). Given a pair of tall-and-thin matrices, the proposed algorithm first employs a randomized dimensionality reduction transform to reduce the size of the input matrices, and then applies any CCA algorithm to the new pair of matrices. The algorithm computes an approximate CCA to the original pair of matrices with provable guarantees, while requiring asymptotically less operations than the state-of-the-art exact algorithms.

preprint2012arXiv

A Matrix Hyperbolic Cosine Algorithm and Applications

In this paper, we generalize Spencer's hyperbolic cosine algorithm to the matrix-valued setting. We apply the proposed algorithm to several problems by analyzing its computational efficiency under two special cases of matrices; one in which the matrices have a group structure and an other in which they have rank-one. As an application of the former case, we present a deterministic algorithm that, given the multiplication table of a finite group of size $n$, it constructs an expanding Cayley graph of logarithmic degree in near-optimal O(n^2 log^3 n) time. For the latter case, we present a fast deterministic algorithm for spectral sparsification of positive semi-definite matrices, which implies an improved deterministic algorithm for spectral graph sparsification of dense graphs. In addition, we give an elementary connection between spectral sparsification of positive semi-definite matrices and element-wise matrix sparsification. As a consequence, we obtain improved element-wise sparsification algorithms for diagonally dominant-like matrices.

preprint2012arXiv

Hidden cliques and the certification of the restricted isometry property

Compressed sensing is a technique for finding sparse solutions to underdetermined linear systems. This technique relies on properties of the sensing matrix such as the restricted isometry property. Sensing matrices that satisfy this property with optimal parameters are mainly obtained via probabilistic arguments. Deciding whether a given matrix satisfies the restricted isometry property is a non-trivial computational problem. Indeed, we show in this paper that restricted isometry parameters cannot be approximated in polynomial time within any constant factor under the assumption that the hidden clique problem is hard. Moreover, on the positive side we propose an improvement on the brute-force enumeration algorithm for checking the restricted isometry property.

preprint2011arXiv

A Note on Element-wise Matrix Sparsification via a Matrix-valued Bernstein Inequality

Given an n x n matrix A, we present a simple, element-wise sparsification algorithm that zeroes out all sufficiently small elements of A and then retains some of the remaining elements with probabilities proportional to the square of their magnitudes. We analyze the approximation accuracy of the proposed algorithm using a recent, elegant non-commutative Bernstein inequality, and compare our bounds with all existing (to the best of our knowledge) element-wise matrix sparsification algorithms.

preprint2011arXiv

On the Certification of the Restricted Isometry Property

Compressed sensing is a technique for finding sparse solutions to underdetermined linear systems. This technique relies on properties of the sensing matrix such as the restricted isometry property. Sensing matrices that satisfy the restricted isometry property with optimal parameters are mainly obtained via probabilistic arguments. Given any matrix, deciding whether it satisfies the restricted isometry property is a non-trivial computational problem. In this paper, we give reductions from dense subgraph problems to the certification of the restricted isometry property. This gives evidence that certifying the restricted isometry property is unlikely to be feasible in polynomial-time. Moreover, on the positive side we propose an improvement on the brute-force enumeration algorithm for checking the restricted isometry property. Another contribution of independent interest is a spectral algorithm for certifying that a random graph does not contain any dense k-subgraph. This "skewed spectral algorithm" performs better than the basic spectral algorithm in a certain range of parameters.

preprint2010arXiv

Low Dimensional Euclidean Volume Preserving Embeddings

Let $\mathcal{P}$ be an $n$-point subset of Euclidean space and $d\geq 3$ be an integer. In this paper we study the following question: What is the smallest (normalized) relative change of the volume of subsets of $\mathcal{P}$ when it is projected into $\RR^d$. We prove that there exists a linear mapping $f:\mathcal{P} \mapsto \RR^d$ that relatively preserves the volume of all subsets of size up to $\lfloor d/2\rfloor$ within at most a factor of $O(n^{2/d}\sqrt{\log n \log\log n})$.

preprint2010arXiv

Low Rank Matrix-Valued Chernoff Bounds and Approximate Matrix Multiplication

In this paper we develop algorithms for approximating matrix multiplication with respect to the spectral norm. Let A\in{\RR^{n\times m}} and B\in\RR^{n \times p} be two matrices and \eps>0. We approximate the product A^\top B using two down-sampled sketches, \tilde{A}\in\RR^{t\times m} and \tilde{B}\in\RR^{t\times p}, where t\ll n such that \norm{\tilde{A}^\top \tilde{B} - A^\top B} \leq \eps \norm{A}\norm{B} with high probability. We use two different sampling procedures for constructing \tilde{A} and \tilde{B}; one of them is done by i.i.d. non-uniform sampling rows from A and B and the other is done by taking random linear combinations of their rows. We prove bounds that depend only on the intrinsic dimensionality of A and B, that is their rank and their stable rank; namely the squared ratio between their Frobenius and operator norm. For achieving bounds that depend on rank we employ standard tools from high-dimensional geometry such as concentration of measure arguments combined with elaborate \eps-net constructions. For bounds that depend on the smaller parameter of stable rank this technology itself seems weak. However, we show that in combination with a simple truncation argument is amenable to provide such bounds. To handle similar bounds for row sampling, we develop a novel matrix-valued Chernoff bound inequality which we call low rank matrix-valued Chernoff bound. Thanks to this inequality, we are able to give bounds that depend only on the stable rank of the input matrices...

preprint2010arXiv

Random Projections for $k$-means Clustering

This paper discusses the topic of dimensionality reduction for $k$-means clustering. We prove that any set of $n$ points in $d$ dimensions (rows in a matrix $A \in \RR^{n \times d}$) can be projected into $t = Ω(k / \eps^2)$ dimensions, for any $\eps \in (0,1/3)$, in $O(n d \lceil \eps^{-2} k/ \log(d) \rceil )$ time, such that with constant probability the optimal $k$-partition of the point set is preserved within a factor of $2+\eps$. The projection is done by post-multiplying $A$ with a $d \times t$ random matrix $R$ having entries $+1/\sqrt{t}$ or $-1/\sqrt{t}$ with equal probability. A numerical implementation of our technique and experiments on a large face images dataset verify the speed and the accuracy of our theoretical results.

Anastasios Zouzias

What is connected

Connect this record

See the researcher in context

Building this map preview

16 published item(s)

Parallel Scan on Ascend AI Accelerators

Identifying and Exploiting Sparse Branch Correlations for Optimizing Branch Prediction

Team voyTECH: User Activity Modeling with Boosting Trees

A Randomized Algorithm for Approximating the Log Determinant of a Symmetric Positive Definite Matrix

Randomized Block Kaczmarz Method with Projection for Solving Least Squares

Approximate Matrix Multiplication with Application to Linear Embeddings

Non-uniform Feature Sampling for Decision Tree Ensembles

Randomized Dimensionality Reduction for k-means Clustering

Efficient Dimensionality Reduction for Canonical Correlation Analysis

A Matrix Hyperbolic Cosine Algorithm and Applications

Hidden cliques and the certification of the restricted isometry property

A Note on Element-wise Matrix Sparsification via a Matrix-valued Bernstein Inequality

On the Certification of the Restricted Isometry Property

Low Dimensional Euclidean Volume Preserving Embeddings

Low Rank Matrix-Valued Chernoff Bounds and Approximate Matrix Multiplication

Random Projections for $k$-means Clustering