Researcher profile

Alessandro Laio

Alessandro Laio contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2026arXiv

Unsupervised Domain Shift Detection with Interpretable Subspace Attribution

We developed a tool for detecting domain shifts, namely subtle differences in the probability distributions of datasets. We identify these shifts using an algorithm designed to detect localised density anomalies in high-dimensional feature spaces. If an anomaly is present, we then identify the feature subspace in which the anomaly is most pronounced. This allows us to trace the domain shift to a small set of features, making the shift interpretable. Moreover, we provide a protocol for compensating domain shifts by extracting, from two unlabelled datasets, subsets of samples with no detectable residual distributional difference. We validate the framework on controlled 20-dimensional benchmarks with known ground truth, recovering both broad and localized shifts together with their supporting feature subspaces. We then apply it to healthy electrocardiogram (ECG) recordings represented by 782 features. In age- and sex-matched cohort comparisons differing in measurement-device composition, the method detects device-induced shifts, extracts representative subsets enriched in the imbalanced device components, and identifies ECG features associated with the acquisition contrast. These results suggest that density-shift detection and subspace attribution provide a practical framework for uncovering hidden cohort biases before downstream modelling.

preprint2022arXiv

A Monte Carlo approach to the conformal bootstrap

We introduce an approach to find approximate numerical solutions of truncated bootstrap equations for Conformal Field Theories (CFTs) in arbitrary dimensions. The method is based on a stochastic search via a Metropolis algorithm guided by an action $S$ which is the logarithm of the truncated bootstrap equations for a single scalar field correlator. While numerical conformal bootstrap methods based on semi-definite programming put rigorous exclusion bounds on CFTs, this method looks for approximate solutions, which correspond to local minima of $S$, when present, and can be even far from the extremality region. By this protocol we find that if no constraint on the operator scaling dimensions is imposed, $S$ has a single minimum, corresponding to the Free Theory. If we fix the external operator dimension, however, we encounter minima that can be studied with our approach. Imposing a conserved stress-tensor, a $\mathbf{Z}_2$ symmetry and one relevant scalar, we identify two regions where local minima of $S$ are present. When projected in the $(Δ_σ, Δ_ε)$-plane, $σ$ and $ε$ being the external and the lightest exchanged operators, one of these regions essentially coincides with the extremality line found in previous bootstrap studies. The other region is along the generalized free theories in $d = 2$ and below that in both $d = 3$ and $d = 4$. We empirically prove that some of the minima found are associated to known theories, including the $2d$ and $3d$ Ising theories and the $2d$ Yang-Lee model.

preprint2022arXiv

Ranking the information content of distance measures

Real-world data typically contain a large number of features that are often heterogeneous in nature, relevance, and also units of measure. When assessing the similarity between data points, one can build various distance measures using subsets of these features. Using the fewest features but still retaining sufficient information about the system is crucial in many statistical learning approaches, particularly when data are sparse. We introduce a statistical test that can assess the relative information retained when using two different distance measures, and determine if they are equivalent, independent, or if one is more informative than the other. This in turn allows finding the most informative distance measure out of a pool of candidates. The approach is applied to find the most relevant policy variables for controlling the Covid-19 epidemic and to find compact yet informative representations of atomic structures, but its potential applications are wide ranging in many branches of science.

preprint2021arXiv

Automatic topography of high-dimensional data sets by non-parametric Density Peak clustering

Data analysis in high-dimensional spaces aims at obtaining a synthetic description of a data set, revealing its main structure and its salient features. We here introduce an approach providing this description in the form of a topography of the data, namely a human-readable chart of the probability density from which the data are harvested. The approach is based on an unsupervised extension of Density Peak clustering and a non-parametric density estimator that measures the probability density in the manifold containing the data. This allows finding automatically the number and the height of the peaks of the probability density, and the depth of the "valleys" separating them. Importantly, the density estimator provides a measure of the error, which allows distinguishing genuine density peaks from density fluctuations due to finite sampling. The approach thus provides robust and visual information about the density peaks' height, their statistical reliability, and their hierarchical organization, offering a conceptually powerful extension of the standard clustering partitions. We show that this framework is particularly useful in the analysis of complex data sets.

preprint2020arXiv

Data segmentation based on the local intrinsic dimension

One of the founding paradigms of machine learning is that a small number of variables is often sufficient to describe high-dimensional data. The minimum number of variables required is called the intrinsic dimension (ID) of the data. Contrary to common intuition, there are cases where the ID varies within the same data set. This fact has been highlighted in technical discussions, but seldom exploited to analyze large data sets and obtain insight into their structure. Here we develop a robust approach to discriminate regions with different local IDs and segment the points accordingly. Our approach is computationally efficient and can be proficiently used even on large data sets. We find that many real-world data sets contain regions with widely heterogeneous dimensions. These regions host points differing in core properties: folded vs unfolded configurations in a protein molecular dynamics trajectory, active vs non-active regions in brain imaging data, and firms with different financial risk in company balance sheets. A simple topological feature, the local ID, is thus sufficient to achieve an unsupervised segmentation of high-dimensional data, complementary to the one given by clustering algorithms.

preprint2020arXiv

Hierarchical nucleation in deep neural networks

Deep convolutional networks (DCNs) learn meaningful representations where data that share the same abstract characteristics are positioned closer and closer. Understanding these representations and how they are generated is of unquestioned practical and theoretical interest. In this work we study the evolution of the probability density of the ImageNet dataset across the hidden layers in some state-of-the-art DCNs. We find that the initial layers generate a unimodal probability density getting rid of any structure irrelevant for classification. In subsequent layers density peaks arise in a hierarchical fashion that mirrors the semantic hierarchy of the concepts. Density peaks corresponding to single categories appear only close to the output and via a very sharp transition which resembles the nucleation process of a heterogeneous liquid. This process leaves a footprint in the probability density of the output layer where the topography of the peaks allows reconstructing the semantic relationships of the categories.

preprint2020arXiv

PARCE: Protocol for Amino acid Refinement through Computational Evolution

The in silico design of peptides and proteins as binders is useful for diagnosis and therapeutics due to their low adverse effects and major specificity. To select the most promising candidates, a key matter is to understand their interactions with protein targets. In this work, we present PARCE, an open source Protocol for Amino acid Refinement through Computational Evolution that implements an advanced and promising method for the design of peptides and proteins. The protocol performs a random mutation in the binder sequence, then samples the bound conformations using molecular dynamics simulations, and evaluates the protein-protein interactions from multiple scoring. Finally, it accepts or rejects the mutation by applying a consensus criterion based on binding scores. The procedure is iterated with the aim to explore efficiently novel sequences with potential better affinities toward their targets. We also provide a tutorial for running and reproducing the methodology.

preprint2020arXiv

Proton strings and rings in atypical nucleation of ferroelectricity in ice

Ordinary ice has a proton-disordered phase which is kinetically metastable, unable to reach, spontaneously, the ferroelectric (FE) ground state at low temperature where a residual Pauling entropy persists. Upon light doping with KOH at low temperature, the transition to FE ice takes place, but its microscopic mechanism still needs clarification. We introduce a lattice model based on dipolar interactions plus a competing, frustrating term that enforces the ice rule (IR). In the absence of IR-breaking defects, standard Monte Carlo (MC) simulation leaves this ice model stuck in a state of disordered proton ring configurations with the correct Pauling entropy. A replica exchange accelerated MC sampling strategy succeeds, without open path moves, interfaces, or off-lattice configurations, in equilibrating this defect-free ice, reaching its low-temperature FE order through a well-defined first-order phase transition. When proton vacancies mimicking the KOH impurities are planted into the IR-conserving lattice, they enable standard MC simulation to work, revealing the kinetics of evolution of ice from proton disorder to partial FE order below the transition temperature. Replacing ordinary nucleation, each impurity opens up a proton ring generating a linear string, an actual FE hydrogen bond wire that expands with time. Reminiscent of those described for spin ice, these impurity-induced strings are proposed to exist in doped water ice too, where IRs are even stronger. The emerging mechanism yields a dependence of the long-time FE order fraction upon dopant concentration, and upon quenching temperature, that compares favorably with that known in real-life KOH doped ice.