Researcher profile

Roberto Grossi

Roberto Grossi contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2020arXiv

On the Complexity of Exact Pattern Matching in Graphs: Binary Strings and Bounded Degree

Exact pattern matching in labeled graphs is the problem of searching paths of a graph $G=(V,E)$ that spell the same string as the pattern $P[1..m]$. This basic problem can be found at the heart of more complex operations on variation graphs in computational biology, of query operations in graph databases, and of analysis operations in heterogeneous networks, where the nodes of some paths must match a sequence of labels or types. We describe a simple conditional lower bound that, for any constant $ε>0$, an $O(|E|^{1 - ε} \, m)$-time or an $O(|E| \, m^{1 - ε})$-time algorithm for exact pattern matching on graphs, with node labels and patterns drawn from a binary alphabet, cannot be achieved unless the Strong Exponential Time Hypothesis (SETH) is false. The result holds even if restricted to undirected graphs of maximum degree three or directed acyclic graphs of maximum sum of indegree and outdegree three. Although a conditional lower bound of this kind can be somehow derived from previous results (Backurs and Indyk, FOCS'16), we give a direct reduction from SETH for dissemination purposes, as the result might interest researchers from several areas, such as computational biology, graph database, and graph mining, as mentioned before. Indeed, as approximate pattern matching on graphs can be solved in $O(|E|\,m)$ time, exact and approximate matching are thus equally hard (quadratic time) on graphs under the SETH assumption. In comparison, the same problems restricted to strings have linear time vs quadratic time solutions, respectively, where the latter ones have a matching SETH lower bound on computing the edit distance of two strings (Backurs and Indyk, STOC'15).

preprint2020arXiv

Zuckerli: A New Compressed Representation for Graphs

Zuckerli is a scalable compression system meant for large real-world graphs. Graphs are notoriously challenging structures to store efficiently due to their linked nature, which makes it hard to separate them into smaller, compact components. Therefore, effective compression is crucial when dealing with large graphs, which can have billions of nodes and edges. Furthermore, a good compression system should give the user fast and reasonably flexible access to parts of the compressed data without requiring full decompression, which may be unfeasible on their system. Zuckerli improves multiple aspects of WebGraph, the current state-of-the-art in compressing real-world graphs, by using advanced compression techniques and novel heuristic graph algorithms. It can produce both a compressed representation for storage and one which allows fast direct access to the adjacency lists of the compressed graph without decompressing the entire graph. We validate the effectiveness of Zuckerli on real-world graphs with up to a billion nodes and 90 billion edges, conducting an extensive experimental evaluation of both compression density and decompression performance. We show that Zuckerli-compressed graphs are 10% to 29% smaller, and more than 20% in most cases, with a resource usage for decompression comparable to that of WebGraph.

preprint2019arXiv

Combinatorial Algorithms for String Sanitization

String data are often disseminated to support applications such as location-based service provision or DNA sequence analysis. This dissemination, however, may expose sensitive patterns that model confidential knowledge. In this paper, we consider the problem of sanitizing a string by concealing the occurrences of sensitive patterns, while maintaining data utility, in two settings that are relevant to many common string processing tasks. In the first setting, we aim to generate the minimal-length string that preserves the order of appearance and frequency of all non-sensitive patterns. Such a string allows accurately performing tasks based on the sequential nature and pattern frequencies of the string. To construct such a string, we propose a time-optimal algorithm, TFS-ALGO. We also propose another time-optimal algorithm, PFS-ALGO, which preserves a partial order of appearance of non-sensitive patterns but produces a much shorter string that can be analyzed more efficiently. The strings produced by either of these algorithms are constructed by concatenating non-sensitive parts of the input string. However, it is possible to detect the sensitive patterns by ``reversing'' the concatenation operations. In response, we propose a heuristic, MCSR-ALGO, which replaces letters in the strings output by the algorithms with carefully selected letters, so that sensitive patterns are not reinstated, implausible patterns are not introduced, and occurrences of spurious patterns are prevented. In the second setting, we aim to generate a string that is at minimal edit distance from the original string, in addition to preserving the order of appearance and frequency of all non-sensitive patterns. To construct such a string, we propose an algorithm, ETFS-ALGO, based on solving specific instances of approximate regular expression matching.