Source author record

Francesco Silvestri

Francesco Silvestri appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Distributed, Parallel, and Cluster Computing Machine Learning Databases Information Retrieval Computational Engineering, Finance, and Science Computational Geometry Computer Vision cond-mat.mtrl-sci Discrete Mathematics Hardware Architecture math.OC Mathematical Software Quantitative Methods

Catalog footprint

What is connected

20works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

How many users have been here for a long time? Efficient solutions for counting long aggregated visits

This paper addresses the Counting Long Aggregated Visits problem, which is defined as follows. We are given $n$ users and $m$ regions, where each user spends some time visiting some regions. For a parameter $k$ and a query consisting of a subset of $r$ regions, the task is to count the number of distinct users whose aggregate time spent visiting the query regions is at least $k$. This problem is motivated by queries arising in the analysis of large-scale mobility datasets. We present several exact and approximate data structures for supporting counting long aggregated visits, as well as conditional and unconditional lower bounds. First, we describe an exact data structure that exhibits a space-time tradeoff, as well as efficient approximate solutions based on sampling and sketching techniques. We then study the problem in geometric settings where regions are points in $\mathbb{R}^d$ and queries are hyperrectangles, and derive exact data structures that achieve improved performance in these structured spaces.

preprint2022arXiv

Blocking Techniques for Sparse Matrix Multiplication on Tensor Accelerators

Tensor accelerators have gained popularity because they provide a cheap and efficient solution for speeding up computational-expensive tasks in Deep Learning and, more recently, in other Scientific Computing applications. However, since their features are specifically designed for tensor algebra (typically dense matrix-product), it is commonly assumed that they are not suitable for applications with sparse data. To challenge this viewpoint, we discuss methods and present solutions for accelerating sparse matrix multiplication on such architectures. In particular, we present a 1-dimensional blocking algorithm with theoretical guarantees on the density, which builds dense blocks from arbitrary sparse matrices. Experimental results show that, even for unstructured and highly-sparse matrices, our block-based solution which exploits Nvidia Tensor Cores is faster than its sparse counterpart. We observed significant speed-ups of up to two orders of magnitude on real-world sparse matrices.

preprint2022arXiv

Understanding the Role of Non-Fullerene Acceptors Crystallinity on the Charge Transport Properties and Performance of Organic Solar Cells

The active layer crystallinity has long been associated with favourable organic solar cells (OSCs) properties such as high mobility and Fill Factor. In particular, this applies to acceptor materials such as fullerene-derivatives and the most recent Non-Fullerene Acceptors (NFAs), which are now surpassing 19% of Power Conversion Efficiency. Despite these advantages are being commonly attributed to their 3-dimensional crystal packing motif in the single crystal, the bridge that links the acceptor crystal packing from single crystals to solar cells has not clearly been shown yet. In this work, we investigate the molecular organisation of seven NFAs (o-IDTBR, IDIC, ITIC, m-ITIC, 4TIC, 4TICO, m-4TICO), following the evolution of their packing motif in single-crystals, powder and thin films made with pure NFAs and donor:NFA blends. In general, we observed a good correlation between the NFA single crystal packing and their molecular arrangement in the bulk heterojunction. However, the NFA packing motif is not directly affecting the device parameters but it provide an impact on the material propensity to form highly crystalline domain in the blend. Although that NFA crystallinity is required to obtain high mobility, the domain purity is more important to limit the bimolecular recombination and to obtain high efficiency organic solar cells.

preprint2021arXiv

Sampling a Near Neighbor in High Dimensions -- Who is the Fairest of Them All?

Similarity search is a fundamental algorithmic primitive, widely used in many computer science disciplines. Given a set of points $S$ and a radius parameter $r>0$, the $r$-near neighbor ($r$-NN) problem asks for a data structure that, given any query point $q$, returns a point $p$ within distance at most $r$ from $q$. In this paper, we study the $r$-NN problem in the light of individual fairness and providing equal opportunities: all points that are within distance $r$ from the query should have the same probability to be returned. In the low-dimensional case, this problem was first studied by Hu, Qiao, and Tao (PODS 2014). Locality sensitive hashing (LSH), the theoretically strongest approach to similarity search in high dimensions, does not provide such a fairness guarantee. In this work, we show that LSH based algorithms can be made fair, without a significant loss in efficiency. We propose several efficient data structures for the exact and approximate variants of the fair NN problem. Our approach works more generally for sampling uniformly from a sub-collection of sets of a given collection and can be used in a few other applications. We also develop a data structure for fair similarity search under inner product that requires nearly-linear space and exploits locality sensitive filters. The paper concludes with an experimental evaluation that highlights the inherent unfairness of NN data structures and shows the performance of our algorithms on real-world datasets.

preprint2020arXiv

A Computational Model for Tensor Core Units

To respond to the need of efficient training and inference of deep neural networks, a plethora of domain-specific hardware architectures have been introduced, such as Google Tensor Processing Units and NVIDIA Tensor Cores. A common feature of these architectures is a hardware circuit for efficiently computing a dense matrix multiplication of a given small size. In order to broaden the class of algorithms that exploit these systems, we propose a computational model, named the TCU model, that captures the ability to natively multiply small matrices. We then use the TCU model for designing fast algorithms for several problems, including matrix operations (dense and sparse multiplication, Gaussian Elimination), graph algorithms (transitive closure, all pairs shortest distances), Discrete Fourier Transform, stencil computations, integer multiplication, and polynomial evaluation. We finally highlight a relation between the TCU model and the external memory model.

preprint2020arXiv

Fair Near Neighbor Search: Independent Range Sampling in High Dimensions

Similarity search is a fundamental algorithmic primitive, widely used in many computer science disciplines. There are several variants of the similarity search problem, and one of the most relevant is the $r$-near neighbor ($r$-NN) problem: given a radius $r>0$ and a set of points $S$, construct a data structure that, for any given query point $q$, returns a point $p$ within distance at most $r$ from $q$. In this paper, we study the $r$-NN problem in the light of fairness. We consider fairness in the sense of equal opportunity: all points that are within distance $r$ from the query should have the same probability to be returned. In the low-dimensional case, this problem was first studied by Hu, Qiao, and Tao (PODS 2014). Locality sensitive hashing (LSH), the theoretically strongest approach to similarity search in high dimensions, does not provide such a fairness guarantee. To address this, we propose efficient data structures for $r$-NN where all points in $S$ that are near $q$ have the same probability to be selected and returned by the query. Specifically, we first propose a black-box approach that, given any LSH scheme, constructs a data structure for uniformly sampling points in the neighborhood of a query. Then, we develop a data structure for fair similarity search under inner product that requires nearly-linear space and exploits locality sensitive filters. The paper concludes with an experimental evaluation that highlights (un)fairness in a recommendation setting on real-world datasets and discusses the inherent unfairness introduced by solving other variants of the problem.

preprint2020arXiv

Similarity Search with Tensor Core Units

Tensor Core Units (TCUs) are hardware accelerators developed for deep neural networks, which efficiently support the multiplication of two dense $\sqrt{m}\times \sqrt{m}$ matrices, where $m$ is a given hardware parameter. In this paper, we show that TCUs can speed up similarity search problems as well. We propose algorithms for the Johnson-Lindenstrauss dimensionality reduction and for similarity join that, by leveraging TCUs, achieve a $\sqrt{m}$ speedup up with respect to traditional approaches.

preprint2016arXiv

Approximate Furthest Neighbor with Application to Annulus Query

Much recent work has been devoted to approximate nearest neighbor queries. Motivated by applications in recommender systems, we consider approximate furthest neighbor (AFN) queries and present a simple, fast, and highly practical data structure for answering AFN queries in high- dimensional Euclidean space. The method builds on the technique of In- dyk (SODA 2003), storing random projections to provide sublinear query time for AFN. However, we introduce a different query algorithm, improving on Indyk's approximation factor and reducing the running time by a logarithmic factor. We also present a variation based on a query- independent ordering of the database points; while this does not have the provable approximation factor of the query-dependent data structure, it offers significant improvement in time and space complexity. We give a theoretical analysis, and experimental results. As an application, the query-dependent approach is used for deriving a data structure for the approximate annulus query problem, which is defined as follows: given an input set S and two parameters r > 0 and w >= 1, construct a data structure that returns for each query point q a point p in S such that the distance between p and q is at least r/w and at most wr.

preprint2016arXiv

Distance Sensitive Bloom Filters Without False Negatives

A Bloom filter is a widely used data-structure for representing a set $S$ and answering queries of the form "Is $x$ in $S$?". By allowing some false positive answers (saying "yes" when the answer is in fact `no') Bloom filters use space significantly below what is required for storing $S$. In the distance sensitive setting we work with a set $S$ of (Hamming) vectors and seek a data structure that offers a similar trade-off, but answers queries of the form "Is $x$ close to an element of $S$?" (in Hamming distance). Previous work on distance sensitive Bloom filters have accepted false positive and false negative answers. Absence of false negatives is of critical importance in many applications of Bloom filters, so it is natural to ask if this can be also achieved in the distance sensitive setting. Our main contributions are upper and lower bounds (that are tight in several cases) for space usage in the distance sensitive setting where false negatives are not allowed.

preprint2016arXiv

On the Complexity of Inner Product Similarity Join

A number of tasks in classification, information retrieval, recommendation systems, and record linkage reduce to the core problem of inner product similarity join (IPS join): identifying pairs of vectors in a collection that have a sufficiently large inner product. IPS join is well understood when vectors are normalized and some approximation of inner products is allowed. However, the general case where vectors may have any length appears much more challenging. Recently, new upper bounds based on asymmetric locality-sensitive hashing (ALSH) and asymmetric embeddings have emerged, but little has been known on the lower bound side. In this paper we initiate a systematic study of inner product similarity join, showing new lower and upper bounds. Our main results are: * Approximation hardness of IPS join in subquadratic time, assuming the strong exponential time hypothesis. * New upper and lower bounds for (A)LSH-based algorithms. In particular, we show that asymmetry can be avoided by relaxing the LSH definition to only consider the collision probability of distinct elements. * A new indexing method for IPS based on linear sketches, implying that our hardness results are not far from being tight. Our technical contributions include new asymmetric embeddings that may be of independent interest. At the conceptual level we strive to provide greater clarity, for example by distinguishing among signed and unsigned variants of IPS join and shedding new light on the effect of asymmetry.

preprint2016arXiv

Symmetry-free SDP Relaxations for Affine Subspace Clustering

We consider clustering problems where the goal is to determine an optimal partition of a given point set in Euclidean space in terms of a collection of affine subspaces. While there is vast literature on heuristics for this kind of problem, such approaches are known to be susceptible to poor initializations and getting trapped in bad local optima. We alleviate these issues by introducing a semidefinite relaxation based on Lasserre's method of moments. While a similiar approach is known for classical Euclidean clustering problems, a generalization to our more general subspace scenario is not straightforward, due to the high symmetry of the objective function that weakens any convex relaxation. We therefore introduce a new mechanism for symmetry breaking based on covering the feasible region with polytopes. Additionally, we introduce and analyze a deterministic rounding heuristic.

preprint2015arXiv

Experimental Evaluation of Multi-Round Matrix Multiplication on MapReduce

A common approach in the design of MapReduce algorithms is to minimize the number of rounds. Indeed, there are many examples in the literature of monolithic MapReduce algorithms, which are algorithms requiring just one or two rounds. However, we claim that the design of monolithic algorithms may not be the best approach in cloud systems. Indeed, multi-round algorithms may exploit some features of cloud platforms by suitably setting the round number according to the execution context. In this paper we carry out an experimental study of multi-round MapReduce algorithms aiming at investigating the performance of the multi-round approach. We use matrix multiplication as a case study. We first propose a scalable Hadoop library, named M$_3$, for matrix multiplication in the dense and sparse cases which allows to tradeoff round number with the amount of data shuffled in each round and the amount of memory required by reduce functions. Then, we present an extensive study of this library on an in-house cluster and on Amazon Web Services aiming at showing its performance and at comparing monolithic and multi-round approaches. The experiments show that, even without a low level optimization, it is possible to design multi-round algorithms with a small running time overhead.

preprint2015arXiv

Exploiting non-constant safe memory in resilient algorithms and data structures

We extend the Faulty RAM model by Finocchi and Italiano (2008) by adding a safe memory of arbitrary size $S$, and we then derive tradeoffs between the performance of resilient algorithmic techniques and the size of the safe memory. Let $δ$ and $α$ denote, respectively, the maximum amount of faults which can happen during the execution of an algorithm and the actual number of occurred faults, with $α\leq δ$. We propose a resilient algorithm for sorting $n$ entries which requires $O\left(n\log n+α(δ/S + \log S)\right)$ time and uses $Θ(S)$ safe memory words. Our algorithm outperforms previous resilient sorting algorithms which do not exploit the available safe memory and require $O\left(n\log n+ αδ\right)$ time. Finally, we exploit our sorting algorithm for deriving a resilient priority queue. Our implementation uses $Θ(S)$ safe memory words and $Θ(n)$ faulty memory words for storing $n$ keys, and requires $O\left(\log n + δ/S\right)$ amortized time for each insert and deletemin operation. Our resilient priority queue improves the $O\left(\log n + δ\right)$ amortized time required by the state of the art.

preprint2015arXiv

Subgraph Enumeration in Massive Graphs

We consider the problem of enumerating all instances of a given pattern graph in a large data graph. Our focus is on determining the input/output (I/O) complexity of this problem. Let $E$ be the number of edges in the data graph, $k=O(1)$ be the number of vertices in the pattern graph, $B$ be the block length, and $M$ be the main memory size. The main results of the paper are two algorithms that enumerate all instances of the pattern graph. The first one is a deterministic algorithm that exploits a suitable independent set of the pattern graph of size $1\leq s \leq k/2$ and requires $O\left(E^{k-s}/\left(BM^{k-s-1}\right)\right)$ I/Os. The second algorithm is a randomized algorithm that enumerates all instances in $O\left(E^{k/2}/\left(BM^{k/2-1}\right)\right)$ expected I/Os; the same bound also applies with high probability under some assumptions. A lower bound shows that the deterministic algorithm is optimal for some pattern graphs with $s=k/2$ (e.g., paths and cycles of even length, meshes of even side), while the randomized algorithm is optimal for a wide class of pattern graphs, called Alon class (e.g., cliques, cycles and every graph with a perfect matching).

preprint2014arXiv

Network-Oblivious Algorithms

A framework is proposed for the design and analysis of \emph{network-oblivious algorithms}, namely, algorithms that can run unchanged, yet efficiently, on a variety of machines characterized by different degrees of parallelism and communication capabilities. The framework prescribes that a network-oblivious algorithm be specified on a parallel model of computation where the only parameter is the problem's input size, and then evaluated on a model with two parameters, capturing parallelism granularity and communication latency. It is shown that, for a wide class of network-oblivious algorithms, optimality in the latter model implies optimality in the Decomposable BSP model, which is known to effectively describe a wide and significant class of parallel platforms. The proposed framework can be regarded as an attempt to port the notion of obliviousness, well established in the context of cache hierarchies, to the realm of parallel computation. Its effectiveness is illustrated by providing optimal network-oblivious algorithms for a number of key problems. Some limitations of the oblivious approach are also discussed.

preprint2014arXiv

Space-Efficient Parallel Algorithms for Combinatorial Search Problems

We present space-efficient parallel strategies for two fundamental combinatorial search problems, namely, backtrack search and branch-and-bound, both involving the visit of an $n$-node tree of height $h$ under the assumption that a node can be accessed only through its father or its children. For both problems we propose efficient algorithms that run on a $p$-processor distributed-memory machine. For backtrack search, we give a deterministic algorithm running in $O(n/p+h\log p)$ time, and a Las Vegas algorithm requiring optimal $O(n/p+h)$ time, with high probability. Building on the backtrack search algorithm, we also derive a Las Vegas algorithm for branch-and-bound which runs in $O((n/p+h\log p \log n)h\log^2 n)$ time, with high probability. A remarkable feature of our algorithms is the use of only constant space per processor, which constitutes a significant improvement upon previous algorithms whose space requirements per processor depend on the (possibly huge) tree to be explored.

preprint2014arXiv

The Input/Output Complexity of Triangle Enumeration

We consider the well-known problem of enumerating all triangles of an undirected graph. Our focus is on determining the input/output (I/O) complexity of this problem. Let $E$ be the number of edges, $M<E$ the size of internal memory, and $B$ the block size. The best results obtained previously are sort$(E^{3/2})$ I/Os (Dementiev, PhD thesis 2006) and $O(E^2/(MB))$ I/Os (Hu et al., SIGMOD 2013), where sort$(n)$ denotes the number of I/Os for sorting $n$ items. We improve the I/O complexity to $O(E^{3/2}/(\sqrt{M} B))$ expected I/Os, which improves the previous bounds by a factor $\min(\sqrt{E/M},\sqrt{M})$. Our algorithm is cache-oblivious and also I/O optimal: We show that any algorithm enumerating $t$ distinct triangles must always use $Ω(t/(\sqrt{M} B))$ I/Os, and there are graphs for which $t=Ω(E^{3/2})$. Finally, we give a deterministic cache-aware algorithm using $O(E^{3/2}/(\sqrt{M} B))$ I/Os assuming $M\geq E^\varepsilon$ for a constant $\varepsilon > 0$. Our results are based on a new color coding technique, which may be of independent interest.

preprint2013arXiv

Communication Lower Bounds for Distributed-Memory Computations

We give lower bounds on the communication complexity required to solve several computational problems in a distributed-memory parallel machine, namely standard matrix multiplication, stencil computations, comparison sorting, and the Fast Fourier Transform. We revisit the assumptions under which preceding results were derived and provide new lower bounds which use much weaker and appropriate hypotheses. Our bounds rely on a mild assumption on work distribution, and strengthen previous results which require either the computation to be balanced among the processors, or specific initial distributions of the input data, or an upper bound on the size of processors' local memories.

preprint2011arXiv

Space-Round Tradeoffs for MapReduce Computations

This work explores fundamental modeling and algorithmic issues arising in the well-established MapReduce framework. First, we formally specify a computational model for MapReduce which captures the functional flavor of the paradigm by allowing for a flexible use of parallelism. Indeed, the model diverges from a traditional processor-centric view by featuring parameters which embody only global and local memory constraints, thus favoring a more data-centric view. Second, we apply the model to the fundamental computation task of matrix multiplication presenting upper and lower bounds for both dense and sparse matrix multiplication, which highlight interesting tradeoffs between space and round complexity. Finally, building on the matrix multiplication results, we derive further space-round tradeoffs on matrix inversion and matching.

preprint2010arXiv

An Optimized Data Structure for High Throughput 3D Proteomics Data: mzRTree

As an emerging field, MS-based proteomics still requires software tools for efficiently storing and accessing experimental data. In this work, we focus on the management of LC-MS data, which are typically made available in standard XML-based portable formats. The structures that are currently employed to manage these data can be highly inefficient, especially when dealing with high-throughput profile data. LC-MS datasets are usually accessed through 2D range queries. Optimizing this type of operation could dramatically reduce the complexity of data analysis. We propose a novel data structure for LC-MS datasets, called mzRTree, which embodies a scalable index based on the R-tree data structure. mzRTree can be efficiently created from the XML-based data formats and it is suitable for handling very large datasets. We experimentally show that, on all range queries, mzRTree outperforms other known structures used for LC-MS data, even on those queries these structures are optimized for. Besides, mzRTree is also more space efficient. As a result, mzRTree reduces data analysis computational costs for very large profile datasets.

Francesco Silvestri

What is connected

Connect this record

See the researcher in context

Building this map preview

20 published item(s)

How many users have been here for a long time? Efficient solutions for counting long aggregated visits

Blocking Techniques for Sparse Matrix Multiplication on Tensor Accelerators

Understanding the Role of Non-Fullerene Acceptors Crystallinity on the Charge Transport Properties and Performance of Organic Solar Cells

Sampling a Near Neighbor in High Dimensions -- Who is the Fairest of Them All?

A Computational Model for Tensor Core Units

Fair Near Neighbor Search: Independent Range Sampling in High Dimensions

Similarity Search with Tensor Core Units

Approximate Furthest Neighbor with Application to Annulus Query

Distance Sensitive Bloom Filters Without False Negatives

On the Complexity of Inner Product Similarity Join

Symmetry-free SDP Relaxations for Affine Subspace Clustering

Experimental Evaluation of Multi-Round Matrix Multiplication on MapReduce

Exploiting non-constant safe memory in resilient algorithms and data structures

Subgraph Enumeration in Massive Graphs

Network-Oblivious Algorithms

Space-Efficient Parallel Algorithms for Combinatorial Search Problems

The Input/Output Complexity of Triangle Enumeration

Communication Lower Bounds for Distributed-Memory Computations

Space-Round Tradeoffs for MapReduce Computations

An Optimized Data Structure for High Throughput 3D Proteomics Data: mzRTree