Researcher profile

Massimiliano Rossi

Massimiliano Rossi contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2022arXiv

Computing Maximal Unique Matches with the r-index

In recent years, pangenomes received increasing attention from the scientific community for their ability to incorporate population variation information and alleviate reference genome bias. Maximal Exact Matches (MEMs) and Maximal Unique Matches (MUMs) have proven themselves to be useful in multiple bioinformatic contexts, for example short-read alignment and multiple-genome alignment. However, standard techniques using suffix trees and FM-indexes do not scale to a pangenomic level. Recently, Gagie et al. [JACM 20] introduced the $r$-index that is a Burrows-Wheeler Transform (BWT)-based index able to handle hundreds of human genomes. Later, Rossi et al. [JCB 22] enabled the computation of MEMs using the $r$-index, and Boucher et al. [DCC 21] showed how to compute them in a streaming fashion. In this paper, we show how to augment Boucher et al.'s approach to enable the computation of MUMs on the $r$-index, while preserving the space and time bounds. We add additional $O(r)$ samples of the longest common prefix (LCP) array, where $r$ is the number of equal-letter runs of the BWT, that permits the computation of the second longest match of the pattern suffix with respect to the input text, which in turn allows the computation of candidate MUMs. We implemented a proof-of-concept of our approach, that we call mum-phinder, and tested on real-world datasets. We compared our approach with competing methods that are able to compute MUMs. We observe that our method is up to 8 times smaller, while up to 19 times slower when the dataset is not highly repetitive, while on highly repetitive data, our method is up to 6.5 times slower and uses up to 25 times less memory.

preprint2022arXiv

Encoding information in the mutual coherence of spatially separated light beams

Coherence has been used as a resource for optical communications since its earliest days. It is widely used for multiplexing of data, but not for encoding of data. Here we introduce a coding scheme, which we call \textit{mutual coherence coding}, to encode information in the mutual coherence of spatially separated light beams. We describe its implementation and analyze its performance by deriving the relevant figures of merit (signal-to-noise ratio, maximum bit-rate, and spectral efficiency) with respect to the number of transmitted beams. Mutual coherence coding yields a quadratic scaling of the number of transmitted signals with the number of employed light beams, which might have benefits for cryptography and data security.

preprint2022arXiv

Ponderomotive squeezing of light by a levitated nanoparticle in free space

A mechanically compliant element can be set into motion by the interaction with light. In turn, this light-driven motion can give rise to ponderomotive correlations in the electromagnetic field. In optomechanical systems, cavities are often employed to enhance these correlations up to the point where they generate quantum squeezing of light. In free-space scenarios, where no cavity is used, observation of squeezing remains possible but challenging due to the weakness of the interaction, and has not been reported so far. Here, we measure the ponderomotively squeezed state of light scattered by a nanoparticle levitated in a free-space optical tweezer. We observe a reduction of the optical fluctuations by up to $25$~\% below the vacuum level, in a bandwidth of about $15$~kHz. Our results are well explained by a linearized dipole interaction between the nanoparticle and the electromagnetic continuum. These ponderomotive correlations open the door to quantum-enhanced sensing and metrology with levitated systems, such as force measurements below the standard quantum limit.

preprint2022arXiv

RLBWT Tricks

Until recently, most experts would probably have agreed we cannot backwards-step in constant time with a run-length compressed Burrows-Wheeler Transform (RLBWT), since doing so relies on rank queries on sparse bitvectors and those inherit lower bounds from predecessor queries. At ICALP '21, however, Nishimoto and Tabei described a new, simple and constant-time implementation. For a permutation $π$, it stores an $O (r)$-space table -- where $r$ is the number of positions $i$ where either $i = 0$ or $π(i + 1) \neq π(i) + 1$ -- that enables the computation of successive values of $π(i)$ by table look-ups and linear scans. Nishimoto and Tabei showed how to increase the number of rows in the table to bound the length of the linear scans such that the query time for computing $π(i)$ is constant while maintaining $O (r)$-space. In this paper we refine Nishimoto and Tabei's approach, including a time-space tradeoff, and experimentally evaluate different implementations demonstrating the practicality of part of their result. We show that even without adding rows to the table, in practice we almost always scan only a few entries during queries. We propose a decomposition scheme of the permutation $π$ corresponding to the LF-mapping that allows an improved compression of the data structure, while limiting the query time. We tested our implementation on real-world genomic datasets and found that without compression of the table, backward-stepping is drastically faster than with sparse bitvector implementations but, unfortunately, also uses drastically more space. After compression, backward-stepping is competitive both in time and space with the best existing implementations.

preprint2021arXiv

PHONI: Streamed Matching Statistics with Multi-Genome References

Computing the matching statistics of patterns with respect to a text is a fundamental task in bioinformatics, but a formidable one when the text is a highly compressed genomic database. Bannai et al. gave an efficient solution for this case, which Rossi et al. recently implemented, but it uses two passes over the patterns and buffers a pointer for each character during the first pass. In this paper, we simplify their solution and make it streaming, at the cost of slowing it down slightly. This means that, first, we can compute the matching statistics of several long patterns (such as whole human chromosomes) in parallel while still using a reasonable amount of RAM; second, we can compute matching statistics online with low latency and thus quickly recognize when a pattern becomes incompressible relative to the database.

preprint2020arXiv

General Defocusing Particle Tracking: fundamentals and uncertainty assessment

General Defocusing Particle Tracking (GDPT) is a single-camera, three-dimensional particle tracking method that determines the particle depth positions from the defocusing patterns of the corresponding particle images. GDPT relies on a reference set of experimental particle images which is used to predict the depth position of measured particle images of similar shape. While several implementations of the method are possible, its accuracy is ultimately limited by some intrinsic properties of the acquired data, such as the signal-to-noise ratio, the particle concentration, as well as the characteristics of the defocusing patterns. GDPT has been applied in different fields by different research groups, however, a deeper description and analysis of the method fundamentals has hitherto not been available. In this work, we first identity the fundamental elements that characterize a GDPT measurement. Afterwards, we present a standardized framework based on synthetic images to assess the performance of GDPT implementations in terms of measurement uncertainty and relative number of measured particles. Finally, we provide guidelines to assess the uncertainty of experimental GDPT measurements, where true values are not accessible and additional image aberrations can lead to bias errors. The data were processed using DefocusTracker, an open-source GDPT software. The datasets were created using the synthetic image generator MicroSIG and have been shared in a freely-accessible repository.

preprint2020arXiv

PFP Data Structures

Prefix-free parsing (PFP) was introduced by Boucher et al. (2019) as a preprocessing step to ease the computation of Burrows-Wheeler Transforms (BWTs) of genomic databases. Given a string $S$, it produces a dictionary $D$ and a parse $P$ of overlapping phrases such that $\mathrm{BWT} (S)$ can be computed from $D$ and $P$ in time and workspace bounded in terms of their combined size $|\mathrm{PFP} (S)|$. In practice $D$ and $P$ are significantly smaller than $S$ and computing $\mathrm{BWT} (S)$ from them is more efficient than computing it from $S$ directly, at least when $S$ consists of genomes from individuals of the same species. In this paper, we consider $\mathrm{PFP} (S)$ as a {\em data structure} and show how it can be augmented to support the following queries quickly, still in $O (|\mathrm{PFP} (S)|)$ space: longest common extension (LCE), suffix array (SA), longest common prefix (LCP) and BWT. Lastly, we provide experimental evidence that the PFP data structure can be efficiently constructed for very large repetitive datasets: it takes one hour and 54 GB peak memory for $1000$ variants of human chromosome 19, initially occupying roughly 56 GB.

preprint2020arXiv

Tailoring r-index for metagenomics

A basic problem in metagenomics is to assign a sequenced read to the correct species in the reference collection. In typical applications in genomic epidemiology and viral metagenomics the reference collection consists of set of species with each species represented by its highly similar strains. It has been recently shown that accurate read assignment can be achieved with $k$-mer hashing-based pseudoalignment: A read is assigned to species A if each of its $k$-mer hits to reference collection is located only on strains of A. We study the underlying primitives required in pseudoalignment and related tasks. We propose three space-efficient solutions building upon the document listing with frequencies problem. All the solutions use an $r$-index (Gagie et al., SODA 2018) as an underlying index structure for the text obtained as concatenation of the set of species, as well as for each species. Given $t$ species whose concatenation length is $n$, and whose Burrows-Wheeler transform contains $r$ runs, our first solution, based on a grammar-compressed document array with precomputed queries at non terminal symbols, reports the frequencies for the ${\tt ndoc}$ distinct documents in which the pattern of length $m$ occurs in ${\cal O}(m + \log(n){\tt ndoc}) $ time. Our second solution is also based on a grammar-compressed document array, but enhanced with bitvectors and reports the frequencies in ${\cal O}(m + ((t/w)\log n + \log(n/r)){\tt ndoc})$ time, over a machine with wordsize $w$. Our third solution, based on the interleaved LCP array, answers the same query in ${\cal O}(m + \log(n/r){\tt ndoc})$. We implemented our solutions and tested them on real-world and synthetic datasets. The results show that all the solutions are fast on highly-repetitive data, and the size overhead introduced by the indexes are comparable with the size of the $r$-index.

preprint2019arXiv

Entanglement of Propagating Optical Modes via a Mechanical Interface

Many applications of quantum information processing (QIP) require distribution of quantum states in networks, both within and between distant nodes. Optical quantum states are uniquely suited for this purpose, as they propagate with ultralow attenuation and are resilient to ubiquitous thermal noise. Mechanical systems are then envisioned as versatile interfaces between photons and a variety of solid-state QIP platforms. Here, we demonstrate a key step towards this vision, and generate entanglement between two propagating optical modes, by coupling them to the same, cryogenic mechanical system. The entanglement persists at room temperature, where we verify the inseparability of the bipartite state and fully characterize its logarithmic negativity by homodyne tomography. We detect, without any corrections, correlations corresponding to a logarithmic negativity of $E_\mathrm{N}=0.35$. Combined with quantum interfaces between mechanical systems and solid-state qubit processors already available or under development, this paves the way for mechanical systems enabling long-distance quantum information networking over optical fiber networks.