Researcher profile

Salvatore Miccichè

Salvatore Miccichè contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - Emerging
11works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2019arXiv

Nested partitions from hierarchical clustering statistical validation

We develop a greedy algorithm that is fast and scalable in the detection of a nested partition extracted from a dendrogram obtained from hierarchical clustering of a multivariate series. Our algorithm provides a $p$-value for each clade observed in the hierarchical tree. The $p$-value is obtained by computing a number of bootstrap replicas of the dissimilarity matrix and by performing a statistical test on each difference between the dissimilarity associated with a given clade and the dissimilarity of the clade of its parent node. We prove the efficacy of our algorithm with a set of benchmarks generated by using a hierarchical factor model. We compare the results obtained by our algorithm with those of Pvclust. Pvclust is a widely used algorithm developed with a global approach originally motivated by phylogenetic studies. In our numerical experiments we focus on the role of multiple hypothesis test correction and on the robustness of the algorithms to inaccuracy and errors of datasets. We also apply our algorithm to a reference empirical dataset. We verify that our algorithm is much faster than Pvclust algorithm and has a better scalability both in the number of elements and in the number of records of the investigated multivariate set. Our algorithm provides a hierarchically nested partition in much shorter time than currently widely used algorithms allowing to perform a statistically validated cluster analysis detection in very large systems.

preprint2014arXiv

A comparative analysis of the statistical properties of large mobile phone calling networks

Mobile phone calling is one of the most widely used communication methods in modern society. The records of calls among mobile phone users provide us a valuable proxy for the understanding of human communication patterns embedded in social networks. Mobile phone users call each other forming a directed calling network. If only reciprocal calls are considered, we obtain an undirected mutual calling network. The preferential communication behavior between two connected users can be statistically tested and it results in two Bonferroni networks with statistically validated edges. We perform a comparative analysis of the statistical properties of these four networks, which are constructed from the calling records of more than nine million individuals in Shanghai over a period of 110 days. We find that these networks share many common structural properties and also exhibit idiosyncratic features when compared with previously studied large mobile calling networks. The empirical findings provide us an intriguing picture of a representative large social network that might shed new lights on the modelling of large social networks.

preprint2013arXiv

Multi-scale analysis of the European airspace using network community detection

We show that the European airspace can be represented as a multi-scale traffic network whose nodes are airports, sectors, or navigation points and links are defined and weighted according to the traffic of flights between the nodes. By using a unique database of the air traffic in the European airspace, we investigate the architecture of these networks with a special emphasis on their community structure. We propose that unsupervised network community detection algorithms can be used to monitor the current use of the airspaces and improve it by guiding the design of new ones. Specifically, we compare the performance of three community detection algorithms, also by using a null model which takes into account the spatial distance between nodes, and we discuss their ability to find communities that could be used to define new control units of the airspace.

preprint2013arXiv

Scale-free relaxation of a wave packet in a quantum well with power-law tails

We propose a setup for which a power-law decay is predicted to be observable for generic and realistic conditions. The system we study is very simple: A quantum wave packet initially prepared in a potential well with (i) tails asymptotically decaying like ~ x^{-2} and (ii) an eigenvalues spectrum that shows a continuous part attached to the ground or equilibrium state. We analytically derive the asymptotic decay law from the spectral properties for generic, confined initial states. Our findings are supported by realistic numerical simulations for state-of-the-art expansion experiments with cold atoms.

preprint2011arXiv

Do firms share the same functional form of their growth rate distribution? A new statistical test

We introduce a new statistical test of the hypothesis that a balanced panel of firms have the same growth rate distribution or, more generally, that they share the same functional form of growth rate distribution. We applied the test to European Union and US publicly quoted manufacturing firms data, considering functional forms belonging to the Subbotin family of distributions. While our hypotheses are rejected for the vast majority of sets at the sector level, we cannot rejected them at the subsector level, indicating that homogenous panels of firms could be described by a common functional form of growth rate distribution.

preprint2010arXiv

Community characterization of heterogeneous complex systems

We introduce an analytical statistical method to characterize the communities detected in heterogeneous complex systems. By posing a suitable null hypothesis, our method makes use of the hypergeometric distribution to assess the probability that a given property is over-expressed in the elements of a community with respect to all the elements of the investigated set. We apply our method to two specific complex networks, namely a network of world movies and a network of physics preprints. The characterization of the elements and of the communities is done in terms of languages and countries for the movie network and of journals and subject categories for papers. We find that our method is able to characterize clearly the identified communities. Moreover our method works well both for large and for small communities.

preprint2010arXiv

Statistically validated networks in bipartite complex systems

Many complex systems present an intrinsic bipartite nature and are often described and modeled in terms of networks [1-5]. Examples include movies and actors [1, 2, 4], authors and scientific papers [6-9], email accounts and emails [10], plants and animals that pollinate them [11, 12]. Bipartite networks are often very heterogeneous in the number of relationships that the elements of one set establish with the elements of the other set. When one constructs a projected network with nodes from only one set, the system heterogeneity makes it very difficult to identify preferential links between the elements. Here we introduce an unsupervised method to statistically validate each link of the projected network against a null hypothesis taking into account the heterogeneity of the system. We apply our method to three different systems, namely the set of clusters of orthologous genes (COG) in completely sequenced genomes [13, 14], a set of daily returns of 500 US financial stocks, and the set of world movies of the IMDb database [15]. In all these systems, both different in size and level of heterogeneity, we find that our method is able to detect network structures which are informative about the system and are not simply expression of its heterogeneity. Specifically, our method (i) identifies the preferential relationships between the elements, (ii) naturally highlights the clustered structure of investigated systems, and (iii) allows to classify links according to the type of statistically validated relationships between the connected nodes.

preprint2010arXiv

The role of conditional probability in multi-scale stationary Markovian processes

The aim of the paper is to understand how the inclusion of more and more time-scales into a stochastic stationary Markovian process affects its conditional probability. To this end, we consider two Gaussian processes: (i) a short-range correlated process with an infinite set of time-scales bounded from below, and (ii) a power-law correlated process with an infinite and unbounded set of time-scales. For these processes we investigate the equal position conditional probability P(x,t|x,0) and the mean First Passage Time T(L). The function P(x,t|x,0) can be considered as a proxy of the persistence, i.e. the fact that when a process reaches a position x then it spends some time around that position value. The mean First Passage Time can be considered as a proxy of how fast is the process in reaching a position at distance L starting from position x. In the first investigation we show that the more time-scales the process includes, the larger the persistence. Specifically, we show that the power-law correlated process shows a slow power-law decay of P(x,t|x,0) to the stationary pdf. By contrast, the short range correlated process shows a decay dominated by an exponential cut-off. Moreover, we also show that the existence of an infinite and unbouded set of time-scales is a necessary and not sufficient condition for observing a slow power-law decay of P(x,t|x,0). In the second investigation, we show that for large values of L the more time-scales the process includes, the larger the mean First Passage Time, i.e. the slowest the process. On the other hand, for small values of L, the more time-scales the process includes, the smaller the mean First Passage Time, i.e. when a process statistically spends more time in a given position the likelihood that it reached nearby positions by chance is also enhanced.

preprint2008arXiv

Modeling long-range memory with stationary Markovian processes

In this paper we give explicit examples of power-law correlated stationary Markovian processes y(t) where the stationary pdf shows tails which are gaussian or exponential. These processes are obtained by simply performing a coordinate transformation of a specific power-law correlated additive process x(t), already known in the literature, whose pdf shows power-law tails 1/x^a. We give analytical and numerical evidence that although the new processes (i) are Markovian and (ii) have gaussian or exponential tails their autocorrelation function still shows a power-law decay <y(t) y(t+T)>=1/T^b where b grows with a with a law which is compatible with b=a/2-c, where c is a numerical constant. When a<2(1+c) the process y(t), although Markovian, is long-range correlated. Our results help in clarifying that even in the context of Markovian processes long-range dependencies are not necessarily associated to the occurrence of extreme events. Moreover, our results can be relevant in the modeling of complex systems with long memory. In fact, we provide simple processes associated to Langevin equations thus showing that long-memory effects can be modeled in the context of continuous time stationary Markovian processes.

preprint2008arXiv

Statistical properties of thermodynamically predicted RNA secondary structures in viral genomes

By performing a comprehensive study on 1832 segments of 1212 complete genomes of viruses, we show that in viral genomes the hairpin structures of thermodynamically predicted RNA secondary structures are more abundant than expected under a simple random null hypothesis. The detected hairpin structures of RNA secondary structures are present both in coding and in noncoding regions for the four groups of viruses categorized as dsDNA, dsRNA, ssDNA and ssRNA. For all groups hairpin structures of RNA secondary structures are detected more frequently than expected for a random null hypothesis in noncoding rather than in coding regions. However, potential RNA secondary structures are also present in coding regions of dsDNA group. In fact we detect evolutionary conserved RNA secondary structures in conserved coding and noncoding regions of a large set of complete genomes of dsDNA herpesviruses.

preprint2007arXiv

Emergence of time-horizon invariant correlation structure in financial returns by subtraction of the market mode

We investigate the emergence of a structure in the correlation matrix of assets&#39; returns as the time-horizon over which returns are computed increases from the minutes to the daily scale. We analyze data from different stock markets (New York, Paris, London, Milano) and with different methods. Result crucially depends on whether the data is restricted to the ``internal&#39;&#39; dynamics of the market, where the ``center of mass&#39;&#39; motion (the market mode) is removed or not. If the market mode is not removed, we find that the structure emerges, as the time-horizon increases, from splitting a single large cluster. In NYSE we find that when the market mode is removed, the structure of correlation at the daily scale is already well defined at the 5 minutes time-horizon, and this structure accounts for 80 % of the classification of stocks in economic sectors. Similar results, though less sharp, are found for the other markets. We also find that the structure of correlations in the overnight returns is markedly different from that of intraday activity.