Researcher profile

Bartłomiej Wróblewski

Bartłomiej Wróblewski contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2026arXiv

Parallel Scan on Ascend AI Accelerators

We design and implement parallel prefix sum (scan) algorithms using Ascend AI accelerators. Ascend accelerators feature specialized computing units: the cube units for efficient matrix multiplication and the vector units for optimized vector operations. A key feature of the proposed scan algorithms is their extensive use of matrix multiplications and accumulations enabled by the cube unit. To showcase the effectiveness of these algorithms, we also implement and evaluate several scan-based operators commonly used in AI workloads, including sorting, tensor masking, and top-$k$ / top-$p$ sampling. Our single-core results demonstrate substantial performance improvements, with speedups ranging from $5\times$ to $9.6\times$ compared to vector-only implementations for sufficiently large input lengths. Additionally, we present a multi-core scan algorithm that fully utilizes both the cube and vector units of Ascend, reaching up to 74.9\% of the memory bandwidth achieved by memory copy. Furthermore, our radix sort implementation, which utilizes matrix multiplications for its parallel splits, showcases the potential of matrix engines to enhance complex operations, offering up to $3.3\times$ speedup over the vector-only baseline.

preprint2020arXiv

Efficient fully dynamic elimination forests with applications to detecting long paths and cycles

We present a data structure that in a dynamic graph of treedepth at most $d$, which is modified over time by edge insertions and deletions, maintains an optimum-height elimination forest. The data structure achieves worst-case update time $2^{{\cal O}(d^2)}$, which matches the best known parameter dependency in the running time of a static fpt algorithm for computing the treedepth of a graph. This improves a result of Dvořák et al. [ESA 2014], who for the same problem achieved update time $f(d)$ for some non-elementary (i.e. tower-exponential) function $f$. As a by-product, we improve known upper bounds on the sizes of minimal obstructions for having treedepth $d$ from doubly-exponential in $d$ to $d^{{\cal O}(d)}$. As applications, we design new fully dynamic parameterized data structures for detecting long paths and cycles in general graphs. More precisely, for a fixed parameter $k$ and a dynamic graph $G$, modified over time by edge insertions and deletions, our data structures maintain answers to the following queries: - Does $G$ contain a simple path on $k$ vertices? - Does $G$ contain a simple cycle on at least $k$ vertices? In the first case, the data structure achieves amortized update time $2^{{\cal O}(k^2)}$. In the second case, the amortized update time is $2^{{\cal O}(k^4)} + {\cal O}(k \log n)$. In both cases we assume access to a dictionary on the edges of $G$.