Source author record

Alessandro Laio

Alessandro Laio appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning cond-mat.stat-mech cond-mat.mes-hall Biological Physics Biomolecules Computer Vision cond-mat.mtrl-sci hep-lat hep-th Information Theory math.IT physics.comp-ph q-fin.GN q-fin.MF Quantitative Methods

Catalog footprint

What is connected

14works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Unsupervised Domain Shift Detection with Interpretable Subspace Attribution

We developed a tool for detecting domain shifts, namely subtle differences in the probability distributions of datasets. We identify these shifts using an algorithm designed to detect localised density anomalies in high-dimensional feature spaces. If an anomaly is present, we then identify the feature subspace in which the anomaly is most pronounced. This allows us to trace the domain shift to a small set of features, making the shift interpretable. Moreover, we provide a protocol for compensating domain shifts by extracting, from two unlabelled datasets, subsets of samples with no detectable residual distributional difference. We validate the framework on controlled 20-dimensional benchmarks with known ground truth, recovering both broad and localized shifts together with their supporting feature subspaces. We then apply it to healthy electrocardiogram (ECG) recordings represented by 782 features. In age- and sex-matched cohort comparisons differing in measurement-device composition, the method detects device-induced shifts, extracts representative subsets enriched in the imbalanced device components, and identifies ECG features associated with the acquisition contrast. These results suggest that density-shift detection and subspace attribution provide a practical framework for uncovering hidden cohort biases before downstream modelling.

preprint2022arXiv

A Monte Carlo approach to the conformal bootstrap

We introduce an approach to find approximate numerical solutions of truncated bootstrap equations for Conformal Field Theories (CFTs) in arbitrary dimensions. The method is based on a stochastic search via a Metropolis algorithm guided by an action $S$ which is the logarithm of the truncated bootstrap equations for a single scalar field correlator. While numerical conformal bootstrap methods based on semi-definite programming put rigorous exclusion bounds on CFTs, this method looks for approximate solutions, which correspond to local minima of $S$, when present, and can be even far from the extremality region. By this protocol we find that if no constraint on the operator scaling dimensions is imposed, $S$ has a single minimum, corresponding to the Free Theory. If we fix the external operator dimension, however, we encounter minima that can be studied with our approach. Imposing a conserved stress-tensor, a $\mathbf{Z}_2$ symmetry and one relevant scalar, we identify two regions where local minima of $S$ are present. When projected in the $(Δ_σ, Δ_ε)$-plane, $σ$ and $ε$ being the external and the lightest exchanged operators, one of these regions essentially coincides with the extremality line found in previous bootstrap studies. The other region is along the generalized free theories in $d = 2$ and below that in both $d = 3$ and $d = 4$. We empirically prove that some of the minima found are associated to known theories, including the $2d$ and $3d$ Ising theories and the $2d$ Yang-Lee model.

preprint2022arXiv

Ranking the information content of distance measures

Real-world data typically contain a large number of features that are often heterogeneous in nature, relevance, and also units of measure. When assessing the similarity between data points, one can build various distance measures using subsets of these features. Using the fewest features but still retaining sufficient information about the system is crucial in many statistical learning approaches, particularly when data are sparse. We introduce a statistical test that can assess the relative information retained when using two different distance measures, and determine if they are equivalent, independent, or if one is more informative than the other. This in turn allows finding the most informative distance measure out of a pool of candidates. The approach is applied to find the most relevant policy variables for controlling the Covid-19 epidemic and to find compact yet informative representations of atomic structures, but its potential applications are wide ranging in many branches of science.

preprint2021arXiv

Automatic topography of high-dimensional data sets by non-parametric Density Peak clustering

Data analysis in high-dimensional spaces aims at obtaining a synthetic description of a data set, revealing its main structure and its salient features. We here introduce an approach providing this description in the form of a topography of the data, namely a human-readable chart of the probability density from which the data are harvested. The approach is based on an unsupervised extension of Density Peak clustering and a non-parametric density estimator that measures the probability density in the manifold containing the data. This allows finding automatically the number and the height of the peaks of the probability density, and the depth of the "valleys" separating them. Importantly, the density estimator provides a measure of the error, which allows distinguishing genuine density peaks from density fluctuations due to finite sampling. The approach thus provides robust and visual information about the density peaks' height, their statistical reliability, and their hierarchical organization, offering a conceptually powerful extension of the standard clustering partitions. We show that this framework is particularly useful in the analysis of complex data sets.

preprint2020arXiv

Data segmentation based on the local intrinsic dimension

One of the founding paradigms of machine learning is that a small number of variables is often sufficient to describe high-dimensional data. The minimum number of variables required is called the intrinsic dimension (ID) of the data. Contrary to common intuition, there are cases where the ID varies within the same data set. This fact has been highlighted in technical discussions, but seldom exploited to analyze large data sets and obtain insight into their structure. Here we develop a robust approach to discriminate regions with different local IDs and segment the points accordingly. Our approach is computationally efficient and can be proficiently used even on large data sets. We find that many real-world data sets contain regions with widely heterogeneous dimensions. These regions host points differing in core properties: folded vs unfolded configurations in a protein molecular dynamics trajectory, active vs non-active regions in brain imaging data, and firms with different financial risk in company balance sheets. A simple topological feature, the local ID, is thus sufficient to achieve an unsupervised segmentation of high-dimensional data, complementary to the one given by clustering algorithms.

preprint2020arXiv

Hierarchical nucleation in deep neural networks

Deep convolutional networks (DCNs) learn meaningful representations where data that share the same abstract characteristics are positioned closer and closer. Understanding these representations and how they are generated is of unquestioned practical and theoretical interest. In this work we study the evolution of the probability density of the ImageNet dataset across the hidden layers in some state-of-the-art DCNs. We find that the initial layers generate a unimodal probability density getting rid of any structure irrelevant for classification. In subsequent layers density peaks arise in a hierarchical fashion that mirrors the semantic hierarchy of the concepts. Density peaks corresponding to single categories appear only close to the output and via a very sharp transition which resembles the nucleation process of a heterogeneous liquid. This process leaves a footprint in the probability density of the output layer where the topography of the peaks allows reconstructing the semantic relationships of the categories.

preprint2020arXiv

PARCE: Protocol for Amino acid Refinement through Computational Evolution

The in silico design of peptides and proteins as binders is useful for diagnosis and therapeutics due to their low adverse effects and major specificity. To select the most promising candidates, a key matter is to understand their interactions with protein targets. In this work, we present PARCE, an open source Protocol for Amino acid Refinement through Computational Evolution that implements an advanced and promising method for the design of peptides and proteins. The protocol performs a random mutation in the binder sequence, then samples the bound conformations using molecular dynamics simulations, and evaluates the protein-protein interactions from multiple scoring. Finally, it accepts or rejects the mutation by applying a consensus criterion based on binding scores. The procedure is iterated with the aim to explore efficiently novel sequences with potential better affinities toward their targets. We also provide a tutorial for running and reproducing the methodology.

preprint2020arXiv

Proton strings and rings in atypical nucleation of ferroelectricity in ice

Ordinary ice has a proton-disordered phase which is kinetically metastable, unable to reach, spontaneously, the ferroelectric (FE) ground state at low temperature where a residual Pauling entropy persists. Upon light doping with KOH at low temperature, the transition to FE ice takes place, but its microscopic mechanism still needs clarification. We introduce a lattice model based on dipolar interactions plus a competing, frustrating term that enforces the ice rule (IR). In the absence of IR-breaking defects, standard Monte Carlo (MC) simulation leaves this ice model stuck in a state of disordered proton ring configurations with the correct Pauling entropy. A replica exchange accelerated MC sampling strategy succeeds, without open path moves, interfaces, or off-lattice configurations, in equilibrating this defect-free ice, reaching its low-temperature FE order through a well-defined first-order phase transition. When proton vacancies mimicking the KOH impurities are planted into the IR-conserving lattice, they enable standard MC simulation to work, revealing the kinetics of evolution of ice from proton disorder to partial FE order below the transition temperature. Replacing ordinary nucleation, each impurity opens up a proton ring generating a linear string, an actual FE hydrogen bond wire that expands with time. Reminiscent of those described for spin ice, these impurity-induced strings are proposed to exist in doped water ice too, where IRs are even stronger. The emerging mechanism yields a dependence of the long-time FE order fraction upon dopant concentration, and upon quenching temperature, that compares favorably with that known in real-life KOH doped ice.

preprint2016arXiv

Metadynamics Surfing on Topology Barriers: the $CP^{N-1}$ Case

As one approaches the continuum limit, $QCD$ systems, investigated via numerical simulations, remain trapped in sectors of field space with fixed topological charge. As a consequence the numerical studies of physical quantities may give biased results. The same is true in the case of two dimensional $CP^{N-1}$ models. In this paper we show that metadynamics, when used to simulate $CP^{N-1}$, allows to address efficiently this problem. By studying $CP^{20}$ we show that we are able to reconstruct the free energy of the topological charge $F(Q)$ and compute the topological susceptibility as a function of the coupling and of the volume. This is a very important physical quantity in studies of the dynamics of the $θ$ vacuum and of the axion. This method can in principle be extended to $QCD$ applications.

preprint2016arXiv

Modelling the impact of financialization on agricultural commodity markets

We propose a stylized model of production and exchange in which long-term investors set their production decision over a horizon τ , the "time to produce", and are liquidity constrained, while financial investors trade over a much shorter horizon δ (<< τ ) and are therefore more duly informed on the exogenous shocks affecting the production output. The equilibrium solution proves that: (i) long-term producers modify their production decisions to anticipate the impact of short-term investors allocations on prices; (ii) short-term investments return a positive expected profit commensurate to the informational advantage. While the presence of financial investors improves the efficiency of risk allocation in the short-term and reduces price volatility, the model shows that the aggregate effect of commodity market financialization results in rising the volatility of both farms' default risk and production output.

preprint2014arXiv

Shape and area fluctuation effects on nucleation theory

In standard nucleation theory, the nucleation process is characterized by computing $ΔΩ(V)$, the reversible work required to form a cluster of volume $V$ of the stable phase inside the metastable mother phase. However, other quantities besides the volume could play a role in the free energy of cluster formation, and this will in turn affect the nucleation barrier and the shape of the nucleus. Here we exploit our recently introduced mesoscopic theory of nucleation to compute the free energy cost of a nearly-spherical cluster of volume $V$ and a fluctuating surface area $A$, whereby the maximum of $ΔΩ(V)$ is replaced by a saddle point in $ΔΩ(V,A)$. Compared to the simpler theory based on volume only, the barrier height of $ΔΩ(V,A)$ at the transition state is systematically larger by a few $k_BT$. More importantly, we show that, depending on the physical situation, the most probable shape of the nucleus may be highly non spherical, even when the surface tension and stiffness of the model are isotropic. Interestingly, these shape fluctuations do not influence or modify the standard Classical Nucleation Theory manner of extracting the interface tension from the logarithm of the nucleation rate near coexistence.

preprint2013arXiv

A fingerprint of surface-tension anisotropy in the free-energy cost of nucleation

We focus on the Gibbs free energy $ΔG$ for nucleating a droplet of the stable phase (e.g. solid) inside the metastable parent phase (e.g. liquid), close to the first-order transition temperature. This quantity is central to the theory of homogeneous nucleation, since it superintends the nucleation rate. We recently introduced a field theory describing the dependence of $ΔG$ on the droplet volume $V$, taking into account besides the microscopic fuzziness of the droplet-parent interface, also small fluctuations around the spherical shape whose effect, assuming isotropy, was found to be a characteristic logarithmic term. Here we extend this theory, introducing the effect of anisotropy in the surface tension, and show that in the limit of strong anisotropy $ΔG(V)$ once more develops a term logarithmic on $V$, now with a prefactor of opposite sign with respect to the isotropic case. Based on this result, we argue that the geometrical shape that large solid nuclei mostly prefer could be inferred from the prefactor of the logarithmic term in the droplet free energy, as determined from the optimization of its near-coexistence profile.

preprint2012arXiv

Systematic Improvement of Classical Nucleation Theory

We reconsider the applicability of classical nucleation theory (CNT) to the calculation of the free energy of solid cluster formation in a liquid and its use to the evaluation of interface free energies from nucleation barriers. Using two different freezing transitions (hard spheres and NaCl) as test cases, we first observe that the interface-free-energy estimates based on CNT are generally in error. As successive refinements of nucleation-barrier theory, we consider corrections due to a non-sharp solid-liquid interface and to a non-spherical cluster shape. Extensive calculations for the Ising model show that corrections due to a non-sharp and thermally fluctuating interface account for the barrier shape with excellent accuracy. The experimental solid nucleation rates that are measured in colloids are better accounted for by these non-CNT terms, whose effect appears to be crucial in the interpretation of data and in the extraction of the interface tension from them.

preprint2011arXiv

Finite temperature properties of clusters by replica exchange metadynamics: the water nonamer

We introduce an approach for the accurate calculation of thermal properties of classical nanoclusters. Based on a recently developed enhanced sampling technique, replica exchange metadynamics, the method yields the true free energy of each relevant cluster structure, directly sampling its basin and measuring its occupancy in full equilibrium. All entropy sources, whether vibrational, rotational anharmonic and especially configurational -- the latter often forgotten in many cluster studies -- are automatically included. For the present demonstration we choose the water nonamer (H2O)9, an extremely simple cluster which nonetheless displays a sufficient complexity and interesting physics in its relevant structure spectrum. Within a standard TIP4P potential description of water, we find that the nonamer second relevant structure possesses a higher configurational entropy than the first, so that the two free energies surprisingly cross for increasing temperature.

Alessandro Laio

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

Unsupervised Domain Shift Detection with Interpretable Subspace Attribution

A Monte Carlo approach to the conformal bootstrap

Ranking the information content of distance measures

Automatic topography of high-dimensional data sets by non-parametric Density Peak clustering

Data segmentation based on the local intrinsic dimension

Hierarchical nucleation in deep neural networks

PARCE: Protocol for Amino acid Refinement through Computational Evolution

Proton strings and rings in atypical nucleation of ferroelectricity in ice

Metadynamics Surfing on Topology Barriers: the $CP^{N-1}$ Case

Modelling the impact of financialization on agricultural commodity markets

Shape and area fluctuation effects on nucleation theory

A fingerprint of surface-tension anisotropy in the free-energy cost of nucleation

Systematic Improvement of Classical Nucleation Theory

Finite temperature properties of clusters by replica exchange metadynamics: the water nonamer