Source author record

Bartłomiej Wróblewski

Bartłomiej Wróblewski appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Discrete Mathematics Distributed, Parallel, and Cluster Computing math.CO

Catalog footprint

What is connected

2works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Parallel Scan on Ascend AI Accelerators

We design and implement parallel prefix sum (scan) algorithms using Ascend AI accelerators. Ascend accelerators feature specialized computing units: the cube units for efficient matrix multiplication and the vector units for optimized vector operations. A key feature of the proposed scan algorithms is their extensive use of matrix multiplications and accumulations enabled by the cube unit. To showcase the effectiveness of these algorithms, we also implement and evaluate several scan-based operators commonly used in AI workloads, including sorting, tensor masking, and top-$k$ / top-$p$ sampling. Our single-core results demonstrate substantial performance improvements, with speedups ranging from $5\times$ to $9.6\times$ compared to vector-only implementations for sufficiently large input lengths. Additionally, we present a multi-core scan algorithm that fully utilizes both the cube and vector units of Ascend, reaching up to 74.9\% of the memory bandwidth achieved by memory copy. Furthermore, our radix sort implementation, which utilizes matrix multiplications for its parallel splits, showcases the potential of matrix engines to enhance complex operations, offering up to $3.3\times$ speedup over the vector-only baseline.

preprint2020arXiv

Efficient fully dynamic elimination forests with applications to detecting long paths and cycles

We present a data structure that in a dynamic graph of treedepth at most $d$, which is modified over time by edge insertions and deletions, maintains an optimum-height elimination forest. The data structure achieves worst-case update time $2^{{\cal O}(d^2)}$, which matches the best known parameter dependency in the running time of a static fpt algorithm for computing the treedepth of a graph. This improves a result of Dvořák et al. [ESA 2014], who for the same problem achieved update time $f(d)$ for some non-elementary (i.e. tower-exponential) function $f$. As a by-product, we improve known upper bounds on the sizes of minimal obstructions for having treedepth $d$ from doubly-exponential in $d$ to $d^{{\cal O}(d)}$. As applications, we design new fully dynamic parameterized data structures for detecting long paths and cycles in general graphs. More precisely, for a fixed parameter $k$ and a dynamic graph $G$, modified over time by edge insertions and deletions, our data structures maintain answers to the following queries: - Does $G$ contain a simple path on $k$ vertices? - Does $G$ contain a simple cycle on at least $k$ vertices? In the first case, the data structure achieves amortized update time $2^{{\cal O}(k^2)}$. In the second case, the amortized update time is $2^{{\cal O}(k^4)} + {\cal O}(k \log n)$. In both cases we assume access to a dictionary on the edges of $G$.