Researcher profile

Emanuele Rodolà

Emanuele Rodolà contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
14works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

14 published item(s)

preprint2026arXiv

Communicating Sound Through Natural Language

Natural language is widely used to describe, prompt, and control audio systems, but rarely serves as the representation carrying audio itself. We introduce lexical acoustic coding (LAC), a framework in which pre-trained LLM sender and receiver agents transmit sound through natural language. Under fixed system prompts, the agents write their own analysis and synthesis code, communicating only through a lexical sentence, shared vocabulary, and optional symbolic music structure. The sender analyzes an input waveform into interpretable, non-learned acoustic descriptors, quantizes each with a feature-specific interval vocabulary, and verbalizes the lexical code as English. The receiver parses the sentence back into lexical-acoustic constraints and renders a waveform through closed-loop refinement. The transmitted text serves as both a rich caption and as the transport representation itself. We frame LAC as a finite-rate lossy quantizer, exposing trade-offs between vocabulary size, rate, and fidelity. Experiments on short sounds and symbolic music transfer show that plain text preserves measurable acoustic structure while remaining interpretable, editable, and native to LLM-mediated communication.

preprint2026arXiv

PHALAR: Phasors for Learned Musical Audio Representations

Stem retrieval, the task of matching missing stems to a given audio submix, is a key challenge currently limited by models that discard temporal information. We introduce PHALAR, a contrastive framework achieving a relative accuracy increase of up to $\approx 70\%$ over the state-of-the-art while requiring $<50\%$ of the parameters and a 7$\times$ training speedup. By utilizing a Learned Spectral Pooling layer and a complex-valued head, PHALAR enforces pitch-equivariant and phase-equivariant biases. PHALAR establishes new retrieval state-of-the-art across MoisesDB, Slakh, and ChocoChorales, correlating significantly higher with human coherence judgment than semantic baselines. Finally, zero-shot beat tracking and linear chord probing confirm that PHALAR captures robust musical structures beyond the retrieval task.

preprint2022arXiv

3D Human Pose Estimation Using Möbius Graph Convolutional Networks

3D human pose estimation is fundamental to understanding human behavior. Recently, promising results have been achieved by graph convolutional networks (GCNs), which achieve state-of-the-art performance and provide rather light-weight architectures. However, a major limitation of GCNs is their inability to encode all the transformations between joints explicitly. To address this issue, we propose a novel spectral GCN using the Möbius transformation (MöbiusGCN). In particular, this allows us to directly and explicitly encode the transformation between joints, resulting in a significantly more compact representation. Compared to even the lightest architectures so far, our novel approach requires 90-98% fewer parameters, i.e. our lightest MöbiusGCN uses only 0.042M trainable parameters. Besides the drastic parameter reduction, explicitly encoding the transformation of joints also enables us to achieve state-of-the-art results. We evaluate our approach on the two challenging pose estimation benchmarks, Human3.6M and MPI-INF-3DHP, demonstrating both state-of-the-art results and the generalization capabilities of MöbiusGCN.

preprint2022arXiv

Certification of Gaussian Boson Sampling via graph theory

Gaussian Boson Sampling is a non-universal model for quantum computing inspired by the original formulation of the Boson Sampling problem. Nowadays, it represents a paradigmatic quantum platform to reach the quantum advantage regime in a specific computational model. Indeed, thanks to the implementation in photonics-based processors, the latest Gaussian Boson Sampling experiments have reached a level of complexity where the quantum apparatus has solved the task faster than currently up-to-date classical strategies. In addition, recent studies have identified possible applications beyond the inherent sampling task. In particular, a direct connection between photon counting of a genuine Gaussian Boson Sampling device and the number of perfect matchings in a graph has been established. In this work, we propose to exploit such a connection to benchmark Gaussian Boson Sampling experiments. We interpret the properties of the feature vectors of the graph encoded in the device as a signature of correct sampling from the true input state. Within this framework, two approaches that exploit the distributions of graph feature vectors and graph kernels are presented. Our results provide a novel approach to the actual need for tailored algorithms to benchmark large-scale Gaussian Boson Samplers.

preprint2022arXiv

Efficient Globally Optimal 2D-to-3D Deformable Shape Matching

We propose the first algorithm for non-rigid 2D-to-3D shape matching, where the input is a 2D shape represented as a planar curve and a 3D shape represented as a surface; the output is a continuous curve on the surface. We cast the problem as finding the shortest circular path on the product 3-manifold of the surface and the curve. We prove that the optimal matching can be computed in polynomial time with a (worst-case) complexity of $O(mn^2\log(n))$, where $m$ and $n$ denote the number of vertices on the template curve and the 3D shape respectively. We also demonstrate that in practice the runtime is essentially linear in $m\!\cdot\! n$ making it an efficient method for shape analysis and shape retrieval. Quantitative evaluation confirms that the method provides excellent results for sketch-based deformable 3D shape retrieval.

preprint2022arXiv

Explanatory Learning: Beyond Empiricism in Neural Networks

We introduce Explanatory Learning (EL), a framework to let machines use existing knowledge buried in symbolic sequences -- e.g. explanations written in hieroglyphic -- by autonomously learning to interpret them. In EL, the burden of interpreting symbols is not left to humans or rigid human-coded compilers, as done in Program Synthesis. Rather, EL calls for a learned interpreter, built upon a limited collection of symbolic sequences paired with observations of several phenomena. This interpreter can be used to make predictions on a novel phenomenon given its explanation, and even to find that explanation using only a handful of observations, like human scientists do. We formulate the EL problem as a simple binary classification task, so that common end-to-end approaches aligned with the dominant empiricist view of machine learning could, in principle, solve it. To these models, we oppose Critical Rationalist Networks (CRNs), which instead embrace a rationalist view on the acquisition of knowledge. CRNs express several desired properties by construction, they are truly explainable, can adjust their processing at test-time for harder inferences, and can offer strong confidence guarantees on their predictions. As a final contribution, we introduce Odeen, a basic EL environment that simulates a small flatland-style universe full of phenomena to explain. Using Odeen as a testbed, we show how CRNs outperform empiricist end-to-end approaches of similar size and architecture (Transformers) in discovering explanations for novel phenomena.

preprint2022arXiv

Fish sounds: towards the evaluation of marine acoustic biodiversity through data-driven audio source separation

The marine ecosystem is changing at an alarming rate, exhibiting biodiversity loss and the migration of tropical species to temperate basins. Monitoring the underwater environments and their inhabitants is of fundamental importance to understand the evolution of these systems and implement safeguard policies. However, assessing and tracking biodiversity is often a complex task, especially in large and uncontrolled environments, such as the oceans. One of the most popular and effective methods for monitoring marine biodiversity is passive acoustics monitoring (PAM), which employs hydrophones to capture underwater sound. Many aquatic animals produce sounds characteristic of their own species; these signals travel efficiently underwater and can be detected even at great distances. Furthermore, modern technologies are becoming more and more convenient and precise, allowing for very accurate and careful data acquisition. To date, audio captured with PAM devices is frequently manually processed by marine biologists and interpreted with traditional signal processing techniques for the detection of animal vocalizations. This is a challenging task, as PAM recordings are often over long periods of time. Moreover, one of the causes of biodiversity loss is sound pollution; in data obtained from regions with loud anthropic noise, it is hard to separate the artificial from the fish sound manually. Nowadays, machine learning and, in particular, deep learning represents the state of the art for processing audio signals. Specifically, sound separation networks are able to identify and separate human voices and musical instruments. In this work, we show that the same techniques can be successfully used to automatically extract fish vocalizations in PAM recordings, opening up the possibility for biodiversity monitoring at a large scale.

preprint2022arXiv

Localized Shape Modelling with Global Coherence: An Inverse Spectral Approach

Many natural shapes have most of their characterizing features concentrated over a few regions in space. For example, humans and animals have distinctive head shapes, while inorganic objects like chairs and airplanes are made of well-localized functional parts with specific geometric features. Often, these features are strongly correlated -- a modification of facial traits in a quadruped should induce changes to the body structure. However, in shape modelling applications, these types of edits are among the hardest ones; they require high precision, but also a global awareness of the entire shape. Even in the deep learning era, obtaining manipulable representations that satisfy such requirements is an open problem posing significant constraints. In this work, we address this problem by defining a data-driven model upon a family of linear operators (variants of the mesh Laplacian), whose spectra capture global and local geometric properties of the shape at hand. Modifications to these spectra are translated to semantically valid deformations of the corresponding surface. By explicitly decoupling the global from the local surface features, our pipeline allows to perform local edits while simultaneously maintaining a global stylistic coherence. We empirically demonstrate how our learning-based model generalizes to shape representations not seen at training time, and we systematically analyze different choices of local operators over diverse shape categories.

preprint2022arXiv

Smoothness and effective regularizations in learned embeddings for shape matching

Many innovative applications require establishing correspondences among 3D geometric objects. However, the countless possible deformations of smooth surfaces make shape matching a challenging task. Finding an embedding to represent the different shapes in high-dimensional space where the matching is easier to solve is a well-trodden path that has given many outstanding solutions. Recently, a new trend has shown advantages in learning such representations. This novel idea motivated us to investigate which properties differentiate these data-driven embeddings and which ones promote state-of-the-art results. In this study, we analyze, for the first time, properties that arise in data-driven learned embedding and their relation to the shape-matching task. Our discoveries highlight the close link between matching and smoothness, which naturally emerge from training. Also, we demonstrate the relation between the orthogonality of the embedding and the bijectivity of the correspondence. Our experiments show exciting results, overcoming well-established alternatives and shedding a different light on relevant contexts and properties for learned embeddings.

preprint2022arXiv

Unsupervised Source Separation via Bayesian Inference in the Latent Domain

State of the art audio source separation models rely on supervised data-driven approaches, which can be expensive in terms of labeling resources. On the other hand, approaches for training these models without any direct supervision are typically high-demanding in terms of memory and time requirements, and remain impractical to be used at inference time. We aim to tackle these limitations by proposing a simple yet effective unsupervised separation algorithm, which operates directly on a latent representation of time-domain signals. Our algorithm relies on deep Bayesian priors in the form of pre-trained autoregressive networks to model the probability distributions of each source. We leverage the low cardinality of the discrete latent space, trained with a novel loss term imposing a precise arithmetic structure on it, to perform exact Bayesian inference without relying on an approximation strategy. We validate our approach on the Slakh dataset arXiv:1909.08494, demonstrating results in line with state of the art supervised approaches while requiring fewer resources with respect to other unsupervised methods.

preprint2020arXiv

High-Resolution Augmentation for Automatic Template-Based Matching of Human Models

We propose a new approach for 3D shape matching of deformable human shapes. Our approach is based on the joint adoption of three different tools: an intrinsic spectral matching pipeline, a morphable model, and an extrinsic details refinement. By operating in conjunction, these tools allow us to greatly improve the quality of the matching while at the same time resolving the key issues exhibited by each tool individually. In this paper we present an innovative High-Resolution Augmentation (HRA) strategy that enables highly accurate correspondence even in the presence of significant mesh resolution mismatch between the input shapes. This augmentation provides an effective workaround for the resolution limitations imposed by the adopted morphable model. The HRA in its global and localized versions represents a novel refinement strategy for surface subdivision methods. We demonstrate the accuracy of the proposed pipeline on multiple challenging benchmarks, and showcase its effectiveness in surface registration and texture transfer.

preprint2020arXiv

Nonlinear Spectral Geometry Processing via the TV Transform

We introduce a novel computational framework for digital geometry processing, based upon the derivation of a nonlinear operator associated to the total variation functional. Such operator admits a generalized notion of spectral decomposition, yielding a sparse multiscale representation akin to Laplacian-based methods, while at the same time avoiding undesirable over-smoothing effects typical of such techniques. Our approach entails accurate, detail-preserving decomposition and manipulation of 3D shape geometry while taking an especially intuitive form: non-local semantic details are well separated into different bands, which can then be filtered and re-synthesized with a straightforward linear step. Our computational framework is flexible, can be applied to a variety of signals, and is easily adapted to different geometry representations, including triangle meshes and point clouds. We showcase our method throughout multiple applications in graphics, ranging from surface and signal denoising to detail transfer and cubic stylization.

preprint2020arXiv

The Whole Is Greater Than the Sum of Its Nonrigid Parts

According to Aristotle, a philosopher in Ancient Greece, &#34;the whole is greater than the sum of its parts&#34;. This observation was adopted to explain human perception by the Gestalt psychology school of thought in the twentieth century. Here, we claim that observing part of an object which was previously acquired as a whole, one could deal with both partial matching and shape completion in a holistic manner. More specifically, given the geometry of a full, articulated object in a given pose, as well as a partial scan of the same object in a different pose, we address the problem of matching the part to the whole while simultaneously reconstructing the new pose from its partial observation. Our approach is data-driven, and takes the form of a Siamese autoencoder without the requirement of a consistent vertex labeling at inference time; as such, it can be used on unorganized point clouds as well as on triangle meshes. We demonstrate the practical effectiveness of our model in the applications of single-view deformable shape completion and dense shape correspondence, both on synthetic and real-world geometric data, where we outperform prior work on these tasks by a large margin.

preprint2019arXiv

Isospectralization, or how to hear shape, style, and correspondence

The question whether one can recover the shape of a geometric object from its Laplacian spectrum (&#39;hear the shape of the drum&#39;) is a classical problem in spectral geometry with a broad range of implications and applications. While theoretically the answer to this question is negative (there exist examples of iso-spectral but non-isometric manifolds), little is known about the practical possibility of using the spectrum for shape reconstruction and optimization. In this paper, we introduce a numerical procedure called isospectralization, consisting of deforming one shape to make its Laplacian spectrum match that of another. We implement the isospectralization procedure using modern differentiable programming techniques and exemplify its applications in some of the classical and notoriously hard problems in geometry processing, computer vision, and graphics such as shape reconstruction, pose and style transfer, and dense deformable correspondence.