Source author record

Marco Gherardi

Marco Gherardi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

cond-mat.stat-mech cond-mat.dis-nn Machine Learning Molecular Networks physics.soc-ph Populations and Evolution cond-mat.mtrl-sci cond-mat.soft cond-mat.str-el hep-lat math-ph math.MP nlin.AO physics.data-an physics.ed-ph

Catalog footprint

What is connected

17works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2021arXiv

Remote teaching data-driven physical modeling through a COVID-19 open ended data challenge

Physics can be seen as a conceptual approach to scientific problems, a method for discovery, but teaching this aspect of our discipline can be a challenge. We report on a first-time remote teaching experience for a computational physics third-year physics laboratory class taught in the first part of the 2020 COVID-19 pandemic (March-May 2020). To convey a ``physics of data" approach to data analysis and data-driven physical modeling we used interdisciplinary data sources, with an openended ``COVID-19 data challenge" project as the core of the course. COVID-19 epidemiological data provided an ideal setting for motivating the students to deal with complex problems, where there is no unique or preconceived solution. Our results indicate that such problems yield qualitatively different improvements compared to close-ended projects, as well as point to critical aspects in using these problems as a teaching strategy. By breaking the students' expectations of unidirectionality, remote teaching provided unexpected opportunities to promote active work and active learning.

preprint2020arXiv

Intrinsic dimension estimation for locally undersampled data

High-dimensional data are ubiquitous in contemporary science and finding methods to compress them is one of the primary goals of machine learning. Given a dataset lying in a high-dimensional space (in principle hundreds to several thousands of dimensions), it is often useful to project it onto a lower-dimensional manifold, without loss of information. Identifying the minimal dimension of such manifold is a challenging problem known in the literature as intrinsic dimension estimation (IDE). Traditionally, most IDE algorithms are either based on multiscale principal component analysis (PCA) or on the notion of correlation dimension (and more in general on k-nearest-neighbors distances). These methods are affected, in different ways, by a severe curse of dimensionality. In particular, none of the existing algorithms can provide accurate ID estimates in the extreme locally undersampled regime, i.e. in the limit where the number of samples in any local patch of the manifold is less than (or of the same order of) the ID of the dataset. Here we introduce a new ID estimator that leverages on simple properties of the tangent space of a manifold to overcome these shortcomings. The method is based on the full correlation integral, going beyond the limit of small radius used for the estimation of the correlation dimension. Our estimator alleviates the extreme undersampling problem, intractable with other methods. Based on this insight, we explore a multiscale generalization of the algorithm. We show that it is capable of (i) identifying multiple dimensionalities in a dataset, and (ii) providing accurate estimates of the ID of extremely curved manifolds. In particular, we test the method on manifolds generated from global transformations of high-contrast images, relevant for invariant object recognition and considered a challenge for state-of-the-art ID estimators.

preprint2020arXiv

Random geometric graphs in high dimension

Many machine learning algorithms used for dimensional reduction and manifold learning leverage on the computation of the nearest neighbours to each point of a dataset to perform their tasks. These proximity relations define a so-called geometric graph, where two nodes are linked if they are sufficiently close to each other. Random geometric graphs, where the positions of nodes are randomly generated in a subset of $\mathbb{R}^{d}$, offer a null model to study typical properties of datasets and of machine learning algorithms. Up to now, most of the literature focused on the characterization of low-dimensional random geometric graphs whereas typical datasets of interest in machine learning live in high-dimensional spaces ($d \gg 10^{2}$). In this work, we consider the infinite dimensions limit of hard and soft random geometric graphs and we show how to compute the average number of subgraphs of given finite size $k$, e.g. the average number of $k$-cliques. This analysis highlights that local observables display different behaviors depending on the chosen ensemble: soft random geometric graphs with continuous activation functions converge to the naive infinite dimensional limit provided by Erdös-Rényi graphs, whereas hard random geometric graphs can show systematic deviations from it. We present numerical evidence that our analytical insights, exact in infinite dimensions, provide a good approximation also for dimension $d\gtrsim10$.

preprint2019arXiv

Counting the learnable functions of structured data

Cover's function counting theorem is a milestone in the theory of artificial neural networks. It provides an answer to the fundamental question of determining how many binary assignments (dichotomies) of $p$ points in $n$ dimensions can be linearly realized. Regrettably, it has proved hard to extend the same approach to more advanced problems than the classification of points. In particular, an emerging necessity is to find methods to deal with structured data, and specifically with non-pointlike patterns. A prominent case is that of invariant recognition, whereby identification of a stimulus is insensitive to irrelevant transformations on the inputs (such as rotations or changes in perspective in an image). An object is therefore represented by an extended perceptual manifold, consisting of inputs that are classified similarly. Here, we develop a function counting theory for structured data of this kind, by extending Cover's combinatorial technique, and we derive analytical expressions for the average number of dichotomies of generically correlated sets of patterns. As an application, we obtain a closed formula for the capacity of a binary classifier trained to distinguish general polytopes of any dimension. These results may help extend our theoretical understanding of generalization, feature extraction, and invariant object recognition by neural networks.

preprint2019arXiv

Generalization from correlated sets of patterns in the perceptron

Generalization is a central aspect of learning theory. Here, we propose a framework that explores an auxiliary task-dependent notion of generalization, and attempts to quantitatively answer the following question: given two sets of patterns with a given degree of dissimilarity, how easily will a network be able to "unify" their interpretation? This is quantified by the volume of the configurations of synaptic weights that classify the two sets in a similar manner. To show the applicability of our idea in a concrete setting, we compute this quantity for the perceptron, a simple binary classifier, using the classical statistical physics approach in the replica-symmetric ansatz. In this case, we show how an analytical expression measures the "distance-based capacity", the maximum load of patterns sustainable by the network, at fixed dissimilarity between patterns and fixed allowed number of errors. This curve indicates that generalization is possible at any distance, but with decreasing capacity. We propose that a distance-based definition of generalization may be useful in numerical experiments with real-world neural networks, and to explore computationally sub-dominant sets of synaptic solutions.

preprint2016arXiv

Devil's staircase phase diagram of the fractional quantum Hall effect in the thin-torus limit

After more than three decades the fractional quantum Hall effect still poses challenges to contemporary physics. Recent experiments point toward a fractal scenario for the Hall resistivity as a function of the magnetic field. Here, we consider the so-called thin-torus limit of the Hamiltonian describing interacting electrons in a strong magnetic field, restricted to the lowest Landau level, and we show that it can be mapped onto a one-dimensional lattice gas with repulsive interactions, with the magnetic field playing the role of a chemical potential. The statistical mechanics of such models leads to interpret the sequence of Hall plateaux as a fractal phase diagram, whose landscape shows a qualitative agreement with experiments.

preprint2016arXiv

Measuring logic complexity can guide pattern discovery in empirical systems

We explore a definition of complexity based on logic functions, which are widely used as compact descriptions of rules in diverse fields of contemporary science. Detailed numerical analysis shows that (i) logic complexity is effective in discriminating between classes of functions commonly employed in modelling contexts; (ii) it extends the notion of canalisation, used in the study of genetic regulation, to a more general and detailed measure; (iii) it is tightly linked to the resilience of a function's output to noise affecting its inputs. We demonstrate its utility by measuring it in empirical data on gene regulation, digital circuitry, and propositional calculus. Logic complexity is exceptionally low in these systems. The asymmetry between "on" and "off" states in the data correlates with the complexity in a non-null way; a model of random Boolean networks clarifies this trend and indicates a common hierarchical architecture in the three systems.

preprint2014arXiv

A Parafermionic Generalization of the Jaynes Cummings Model

We introduce a parafermionic version of the Jaynes Cummings Hamiltonian, by coupling $k$ Fock parafermions (nilpotent of order $F$) to a 1D harmonic oscillator, representing the interaction with a single mode of the electromagnetic field. We argue that for $k=1$ and $F\leq 3$ there is no difference between Fock parafermions and quantum spins $s=\frac{F-1}{2}$. We also derive a semiclassical approximation of the canonical partition function of the model by assuming $\hbar$ to be small in the regime of large enough total number of excitations $n$, where the dimension of the Hilbert space of the problem becomes constant as a function of $n$. We observe in this case an interesting behaviour of the average of the bosonic number operator showing a single crossover between regimes with different integer values of this observable. These features persist when we generalize the parafermionic Hamiltonian by deforming the bosonic oscillator with a generic function $Φ(x)$; the $q-$deformed bosonic oscillator corresponds to a specific choice of the deformation function $Φ$. In this particular case, we observe at most $k(F-1)$ crossovers in the behavior of the mean bosonic number operator, suggesting a phenomenology of superradiance similar to the $k-$atoms Jaynes Cummings model.

preprint2014arXiv

Soft bounds on diffusion produce skewed distributions and Gompertz growth

Constraints can affect dramatically the behavior of diffusion processes. Recently, we analyzed a natural and a technological system and reported that they perform diffusion-like discrete steps displaying a peculiar constraint, whereby the increments of the diffusing variable are subject to configuration-dependent bounds. This work explores theoretically some of the revealing landmarks of such phenomenology, termed "soft bound". At long times, the system reaches a steady state irreversibly (i.e., violating detailed balance), characterized by a skewed "shoulder" in the density distribution, and by a net local probability flux, which has entropic origin. The largest point in the support of the distribution follows a saturating dynamics, expressed by the Gompertz law, in line with empirical observations. Finally, we propose a generic allometric scaling for the origin of soft bounds. These findings shed light on the impact on a system of such "scaling" constraint and on its possible generating mechanisms.

preprint2013arXiv

Hard and soft bounds in the evolution of Ubuntu packages. A lesson for species body masses?

Open-source software is a complex system; its development depends on the self-coordinated action of a large number of agents. This study follows the size of the building blocks, called "packages", of the Ubuntu Linux operating system over its entire history. The analysis reveals a multiplicative diffusion process, constrained by size-dependent bounds, driving the dynamics of the package-size distribution. A formalization of this into a quantitative model is able to match the data without relying on any adjustable parameters, and generates definite predictions. Finally, we formulate the hypothesis that a similar non-stationary mechanism could be shaping the distribution of mammal body sizes.

preprint2013arXiv

Hybrid deterministic/stochastic algorithm for large sets of rate equations

We propose a hybrid algorithm for the time integration of large sets of rate equations coupled by a relatively small number of degrees of freedom. A subset containing fast degrees of freedom evolves deterministically, while the rest of the variables evolves stochastically. The emphasis is put on the coupling between the two subsets, in order to achieve both accuracy and efficiency. The algorithm is tested on the problem of nucleation, growth and coarsening of clusters of defects in iron, treated by the formalism of cluster dynamics. We show that it is possible to obtain results indistinguishable from fully deterministic and fully stochastic calculations, while speeding up significantly the computations with respect to these two cases.

preprint2013arXiv

q-deformed Loewner evolution

The Loewner equation, in its stochastic incarnation introduced by Schramm, is an insightful method for the description of critical random curves and interfaces in two-dimensional statistical mechanics. Two features are crucial, namely conformal invariance and a conformal version of the Markov property. Extensions of the equation have been explored in various directions, in order to expand the reach of such a powerful method. We propose a new generalization based on q-calculus, a concept rooted in quantum geometry and non-extensive thermodynamics; the main motivation is the explicit breaking of the Markov property, while retaining scale invariance in the stochastic version. We focus on the deterministic equation and give some exact solutions; the formalism naturally gives rise to multiple mutually-intersecting curves. A general method of simulation is constructed - which can be easily extended to other q-deformed equations - and is applied to both the deterministic and the stochastic realms. The way the $q\neq 1$ picture converges to the classical one is explored as well.

preprint2013arXiv

Theta-point polymers in the plane and Schramm-Loewner evolution

We study the connection between polymers at the theta temperature on the lattice and Schramm-Loewner chains with constant step length in the continuum. The latter realize a useful algorithm for the exact sampling of tricritical polymers, where finite-chain effects are excluded. The driving function computed from the lattice model via a radial implementation of the zipper method is shown to converge to Brownian motion of diffusivity kappa=6 for large times. The distribution function of an internal portion of walk is well approximated by that obtained from Schramm-Loewner chains. The exponent of the correlation length nu and the leading correction-to scaling exponent Delta_1 measured in the continuum are compatible with nu=4/7 (predicted for the theta point) and Delta_1=72/91 (predicted for percolation). Finally, we compute the shape factor and the asphericity of the chains, finding surprising accord with the theta-point end-to-end values.

preprint2012arXiv

Influence of homology and node-age on the growth of protein-protein interaction networks

Proteins participating in a protein-protein interaction network can be grouped into homology classes following their common ancestry. Proteins added to the network correspond to genes added to the classes, so that the dynamics of the two objects are intrinsically linked. Here, we first introduce a statistical model describing the joint growth of the network and the partitioning of nodes into classes, which is studied through a combined mean-field and simulation approach. We then employ this unified framework to address the specific issue of the age dependence of protein interactions, through the definition of three different node wiring/divergence schemes. Comparison with empirical data indicates that an age-dependent divergence move is necessary in order to reproduce the basic topological observables together with the age correlation between interacting nodes visible in empirical data. We also discuss the possibility of nontrivial joint partition/topology observables.

preprint2010arXiv

Exact sampling of self-avoiding paths via discrete Schramm-Loewner evolution

We present an algorithm, based on the iteration of conformal maps, that produces independent samples of self-avoiding paths in the plane. It is a discrete process approximating radial Schramm-Loewner evolution growing to infinity. We focus on the problem of reproducing the parametrization corresponding to that of lattice models, namely self-avoiding walks on the lattice, and we propose a strategy that gives rise to discrete paths where consecutive points lie an approximately constant distance apart from each other. This new method allows us to tackle two non-trivial features of self-avoiding walks that critically depend on the parametrization: the asphericity of a portion of chain and the correction-to-scaling exponent.

preprint2010arXiv

Geometrical Properties of Two-Dimensional Interacting Self-Avoiding Walks at the Theta-Point

We perform a Monte Carlo simulation of two-dimensional N-step interacting self-avoiding walks at the theta point, with lengths up to N=3200. We compute the critical exponents, verifying the Coulomb-gas predictions, the theta-point temperature T_theta = 1.4986(11), and several invariant size ratios. Then, we focus on the geometrical features of the walks, computing the instantaneous shape ratios, the average asphericity, and the end-to-end distribution function. For the latter quantity, we verify in detail the theoretical predictions for its small- and large-distance behavior.

preprint2009arXiv

Whole-plane self-avoiding walks and radial Schramm-Loewner evolution: a numerical study

We numerically test the correspondence between the scaling limit of self-avoiding walks (SAW) in the plane and Schramm-Loewner evolution (SLE) with k=8/3. We introduce a discrete-time process approximating SLE in the exterior of the unit disc and compare the distribution functions for an internal point in the SAW and a point at a fixed fractal variation on the SLE, finding good agreement. This provides numerical evidence in favor of a conjecture by Lawler, Schramm and Werner. The algorithm turns out to be an efficient way of computing the position of an internal point in the SAW.

Marco Gherardi

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

Remote teaching data-driven physical modeling through a COVID-19 open ended data challenge

Intrinsic dimension estimation for locally undersampled data

Random geometric graphs in high dimension

Counting the learnable functions of structured data

Generalization from correlated sets of patterns in the perceptron

Devil's staircase phase diagram of the fractional quantum Hall effect in the thin-torus limit

Measuring logic complexity can guide pattern discovery in empirical systems

A Parafermionic Generalization of the Jaynes Cummings Model

Soft bounds on diffusion produce skewed distributions and Gompertz growth

Hard and soft bounds in the evolution of Ubuntu packages. A lesson for species body masses?

Hybrid deterministic/stochastic algorithm for large sets of rate equations

q-deformed Loewner evolution

Theta-point polymers in the plane and Schramm-Loewner evolution

Influence of homology and node-age on the growth of protein-protein interaction networks

Exact sampling of self-avoiding paths via discrete Schramm-Loewner evolution

Geometrical Properties of Two-Dimensional Interacting Self-Avoiding Walks at the Theta-Point

Whole-plane self-avoiding walks and radial Schramm-Loewner evolution: a numerical study