Source author record

Fabrizio Lillo

Fabrizio Lillo appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

46works

24topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

GravityGraphSAGE: Link Prediction in Directed Attributed Graphs

Link prediction (inferring missing or future connections between nodes in a graph) is a fundamental problem in network science with widespread applications in, e.g., biological systems, recommender systems, finance and cybersecurity. The ability to accurately predict links has significant real-world applications, such as detecting fraudulent financial transactions or identifying drug-target interactions in biomedicine. Despite a rich literature, link prediction is still challenging, especially for graphs enriched with information on edges (direction) and nodes (attributes). In fact, research on link prediction, especially the one based on Graph Deep Learning (GDL), has mostly focused on undirected graphs, without fully leveraging node attributes. Here, we fill this gap by proposing Gravity-GraphSAGE (GG-SAGE), a modified version of GraphSAGE, a GDL model for node embeddings, composed of a gravity-inspired decoder. This implementation is the first example in the literature of a GraphSAGE backbone adopted for directed link prediction. Using the benchmark datasets Cora, Citeseer, PubMed and 16 real-world graphs from the online Netzschleuder repository, we show that our proposed model outperforms state-of-the-art GDL link prediction techniques. Using further experimental evidence, we relate the quality of the output of our model with various characteristics of the graph, suggesting that our framework scales well when applied to data of increasing complexity.

preprint2022arXiv

How Covid mobility restrictions modified the population of investors in Italian stock markets

This paper investigates how Covid mobility restrictions impacted the population of investors of the Italian stock market. The analysis tracks the trading activity of individual investors in Italian stocks in the period January 2019-September 2021, investigating how their composition and the trading activity changed around the Covid-19 lockdown period (March 9 - May 19, 2020) and more generally in the period of the pandemic. The results pinpoint that the lockdown restriction was accompanied by a surge in interest toward stock market, as testified by the trading volume by households. Given the generically falling prices during the lockdown, the households, which are typically contrarian, were net buyers, even if less than expected from their trading activity in 2019. This can be explained by the arrival, during the lockdown, of a group of about 185k new investors (i.e. which had never traded since January 2019) which were on average ten year younger and with a larger fraction of males than the pre-lockdown investors. By looking at the gross P&L, there is clear evidence that these new investors were more skilled in trading. There are thus indications that the lockdown, and more generally the Covid pandemic, created a sort of regime change in the population of financial investors.

preprint2022arXiv

Score Driven Generalized Fitness Model for Sparse and Weighted Temporal Networks

While the vast majority of the literature on models for temporal networks focuses on binary graphs, often one can associate a weight to each link. In such cases the data are better described by a weighted, or valued, network. An important well known fact is that real world weighted networks are typically sparse. We propose a novel time varying parameter model for sparse and weighted temporal networks as a combination of the fitness model, appropriately extended, and the score driven framework. We consider a zero augmented generalized linear model to handle the weights and an observation driven approach to describe time varying parameters. The result is a flexible approach where the probability of a link to exist is independent from its expected weight. This represents a crucial difference with alternative specifications proposed in the recent literature, with relevant implications for the flexibility of the model. Our approach also accommodates for the dependence of the network dynamics on external variables. We present a link forecasting analysis to data describing the overnight exposures in the Euro interbank market and investigate whether the influence of EONIA rates on the interbank network dynamics has changed over time.

preprint2021arXiv

Estimating the Total Volume of Queries to a Search Engine

We study the problem of estimating the total number of searches (volume) of queries in a specific domain, which were submitted to a search engine in a given time period. Our statistical model assumes that the distribution of searches follows a Zipf's law, and that the observed sample volumes are biased accordingly to three possible scenarios. These assumptions are consistent with empirical data, with keyword research practices, and with approximate algorithms used to take counts of query frequencies. A few estimators of the parameters of the distribution are devised and experimented, based on the nature of the empirical/simulated data. For continuous data, we recommend using nonlinear least square regression (NLS) on the top-volume queries, where the bound on the volume is obtained from the well-known Clauset, Shalizi and Newman (CSN) estimation of power-law parameters. For binned data, we propose using a Chi-square minimization approach restricted to the top-volume queries, where the bound is obtained by the binned version of the CSN method. Estimations are then derived for the total number of queries and for the total volume of the population, including statistical error bounds. We apply the methods on the domain of recipes and cooking queries searched in Italian in 2017. The observed volumes of sample queries are collected from Google Trends (continuous data) and SearchVolume (binned data). The estimated total number of queries and total volume are computed for the two cases, and the results are compared and discussed.

preprint2021arXiv

Information dynamics of price and liquidity around the 2017 Bitcoin markets crash

We study the information dynamics between the largest Bitcoin exchange markets during the bubble in 2017-2018. By analysing high-frequency market-microstructure observables with different information theoretic measures for dynamical systems, we find temporal changes in information sharing across markets. In particular, we study the time-varying components of predictability, memory, and synchronous coupling, measured by transfer entropy, active information storage, and multi-information. By comparing these empirical findings with several models we argue that some results could relate to intra-market and inter-market regime shifts, and changes in direction of information flow between different market observables.

preprint2020arXiv

A tale of two sentiment scales: Disentangling short-run and long-run components in multivariate sentiment dynamics

We propose a novel approach to sentiment data filtering for a portfolio of assets. In our framework, a dynamic factor model drives the evolution of the observed sentiment and allows to identify two distinct components: a long-term component, modeled as a random walk, and a short-term component driven by a stationary VAR(1) process. Our model encompasses alternative approaches available in literature and can be readily estimated by means of Kalman filtering and expectation maximization. This feature makes it convenient when the cross-sectional dimension of the portfolio increases. By applying the model to a portfolio of Dow Jones stocks, we find that the long term component co-integrates with the market principal factor, while the short term one captures transient swings of the market associated with the idiosyncratic components and captures the correlation structure of returns. Using quantile regressions, we assess the significance of the contemporaneous and lagged explanatory power of sentiment on returns finding strong statistical evidence when extreme returns, especially negative ones, are considered. Finally, the lagged relation is exploited in a portfolio allocation exercise.

preprint2020arXiv

Betweenness centrality for temporal multiplexes

Betweenness centrality quantifies the importance of a vertex for the information flow in a network. We propose a flexible definition of betweenness for temporal multiplexes, where geodesics are determined accounting for the topological and temporal structure and the duration of paths. We propose an algorithm to compute the new metric via a mapping to a static graph. We show the importance of considering the temporal multiplex structure and an appropriate distance metric comparing the results with those obtained with static or single-layer metrics on a dataset of $\sim 20$k European flights.

preprint2020arXiv

Unveiling the relation between herding and liquidity with trader lead-lag networks

We propose a method to infer lead-lag networks of traders from the observation of their trade record as well as to reconstruct their state of supply and demand when they do not trade. The method relies on the Kinetic Ising model to describe how information propagates among traders, assigning a positive or negative "opinion" to all agents about whether the traded asset price will go up or down. This opinion is reflected by their trading behavior, but whenever the trader is not active in a given time window, a missing value will arise. Using a recently developed inference algorithm, we are able to reconstruct a lead-lag network and to estimate the unobserved opinions, giving a clearer picture about the state of supply and demand in the market at all times. We apply our method to a dataset of clients of a major dealer in the Foreign Exchange market at the 5 minutes time scale. We identify leading players in the market and define a herding measure based on the observed and inferred opinions. We show the causal link between herding and liquidity in the inter-dealer market used by dealers to rebalance their inventories.

preprint2018arXiv

Cashtag piggybacking: uncovering spam and bot activity in stock microblogs on Twitter

Microblogs are increasingly exploited for predicting prices and traded volumes of stocks in financial markets. However, it has been demonstrated that much of the content shared in microblogging platforms is created and publicized by bots and spammers. Yet, the presence (or lack thereof) and the impact of fake stock microblogs has never systematically been investigated before. Here, we study 9M tweets related to stocks of the 5 main financial markets in the US. By comparing tweets with financial data from Google Finance, we highlight important characteristics of Twitter stock microblogs. More importantly, we uncover a malicious practice - referred to as cashtag piggybacking - perpetrated by coordinated groups of bots and likely aimed at promoting low-value stocks by exploiting the popularity of high-value ones. Among the findings of our study is that as much as 71% of the authors of suspicious financial tweets are classified as bots by a state-of-the-art spambot detection algorithm. Furthermore, 37% of them were suspended by Twitter a few months after our investigation. Our results call for the adoption of spam and bot detection techniques in all studies and applications that exploit user-generated content for predicting the stock market.

preprint2016arXiv

A continuous and efficient fundamental price on the discrete order book grid

This paper develops a model of liquidity provision in financial markets by adapting the Madhavan, Richardson, and Roomans (1997) price formation model to realistic order books with quote discretization and liquidity rebates. We postulate that liquidity providers observe a fundamental price which is continuous, efficient, and can assume values outside the interval spanned by the best quotes. We confirm the predictions of our price formation model with extensive empirical tests on large high-frequency datasets of 100 liquid Nasdaq stocks. Finally we use the model to propose an estimator of the fundamental price based on the rebate adjusted volume imbalance at the best quotes and we empirically show that it outperforms other simpler estimators.

preprint2016arXiv

Linear models for the impact of order flow on prices I. Propagators: Transient vs. History Dependent Impact

Market impact is a key concept in the study of financial markets and several models have been proposed in the literature so far. The Transient Impact Model (TIM) posits that the price at high frequency time scales is a linear combination of the signs of the past executed market orders, weighted by a so-called propagator function. An alternative description -- the History Dependent Impact Model (HDIM) -- assumes that the deviation between the realised order sign and its expected level impacts the price linearly and permanently. The two models, however, should be extended since prices are a priori influenced not only by the past order flow, but also by the past realisation of returns themselves. In this paper, we propose a two-event framework, where price-changing and non price-changing events are considered separately. Two-event propagator models provide a remarkable improvement of the description of the market impact, especially for large tick stocks, where the events of price changes are very rare and very informative. Specifically the extended approach captures the excess anti-correlation between past returns and subsequent order flow which is missing in one-event models. Our results document the superior performances of the HDIMs even though only in minor relative terms compared to TIMs. This is somewhat surprising, because HDIMs are well grounded theoretically, while TIMs are, strictly speaking, inconsistent.

preprint2016arXiv

Linear models for the impact of order flow on prices II. The Mixture Transition Distribution model

Modeling the impact of the order flow on asset prices is of primary importance to understand the behavior of financial markets. Part I of this paper reported the remarkable improvements in the description of the price dynamics which can be obtained when one incorporates the impact of past returns on the future order flow. However, impact models presented in Part I consider the order flow as an exogenous process, only characterized by its two-point correlations. This assumption seriously limits the forecasting ability of the model. Here we attempt to model directly the stream of discrete events with a so-called Mixture Transition Distribution (MTD) framework, introduced originally by Raftery (1985). We distinguish between price-changing and non price-changing events and combine them with the order sign in order to reduce the order flow dynamics to the dynamics of a four-state discrete random variable. The MTD represents a parsimonious approximation of a full high-order Markov chain. The new approach captures with adequate realism the conditional correlation functions between signed events for both small and large tick stocks and signature plots. From a methodological viewpoint, we discuss a novel and flexible way to calibrate a large class of MTD models with a very large number of parameters. In spite of this large number of parameters, an out-of-sample analysis confirms that the model does not overfit the data.

preprint2016arXiv

Optimal information diffusion in stochastic block models

We use the linear threshold model to study the diffusion of information on a network generated by the stochastic block model. We focus our analysis on a two community structure where the initial set of informed nodes lies only in one of the two communities and we look for optimal network structures, i.e. those maximizing the asymptotic extent of the diffusion. We find that, constraining the mean degree and the fraction of initially informed nodes, the optimal structure can be assortative (modular), core-periphery, or even disassortative. We then look for minimal cost structures, i.e. those such that a minimal fraction of initially informed nodes is needed to trigger a global cascade. We find that the optimal networks are assortative but with a structure very close to a core-periphery graph, i.e. a very dense community linked to a much more sparsely connected periphery.

preprint2016arXiv

Strategic allocation of flight plans: an evolutionary point of view

We consider the simplified model of strategic allocation of trajectories in the airspace presented in a previous publication. Two types of companies, characterized by different cost functions, compete for allocation of trajectories in the airspace. We study how the equilibrium state of the model depends on the traffic demand and number of airports. We show that in a mixed population environment the equilibrium solution is not the optimal at the global level, but rather than it tends to have a larger fraction of companies who prefer to delay the departure time rather taking a longer routes. Finally we study the evolutionary dynamics investigating the fluctuations of airline types around the equilibrium and the speed of convergence toward it in finite populations. We find that the equilibrium point is shifted by the presence of noise and is reached more slowly.

preprint2015arXiv

Centrality metrics and localization in core-periphery networks

Two concepts of centrality have been defined in complex networks. The first considers the centrality of a node and many different metrics for it has been defined (e.g. eigenvector centrality, PageRank, non-backtracking centrality, etc). The second is related to a large scale organization of the network, the core-periphery structure, composed by a dense core plus an outlying and loosely-connected periphery. In this paper we investigate the relation between these two concepts. We consider networks generated via the Stochastic Block Model, or its degree corrected version, with a strong core-periphery structure and we investigate the centrality properties of the core nodes and the ability of several centrality metrics to identify them. We find that the three measures with the best performance are marginals obtained with belief propagation, PageRank, and degree centrality, while non-backtracking and eigenvector centrality (or MINRES}, showed to be equivalent to the latter in the large network limit) perform worse in the investigated networks.

preprint2015arXiv

Collective synchronization and high frequency systemic instabilities in financial markets

Recent years have seen an unprecedented rise of the role that technology plays in all aspects of human activities. Unavoidably, technology has heavily entered the Capital Markets trading space, to the extent that all major exchanges are now trading exclusively using electronic platforms. The ultra fast speed of information processing, order placement, and cancelling generates new dynamics which is still not completely deciphered. Analyzing a large dataset of stocks traded on the US markets, our study evidences that since 2001 the level of synchronization of large price movements across assets has significantly increased. Even though the total number of over-threshold events has diminished in recent years, when an event occurs, the average number of assets swinging together has increased. Quite unexpectedly, only a minor fraction of these events -- regularly less than 40% along all years -- can be connected with the release of pre-announced macroeconomic news. We also document that the larger is the level of sistemicity of an event, the larger is the probability -- and degree of sistemicity -- that a new event will occur in the near future. This opens the way to the intriguing idea that systemic events emerge as an effect of a purely endogenous mechanism. Consistently, we present a high-dimensional, yet parsimonious, model based on a class of self- and cross-exciting processes, termed Hawkes processes, which reconciles the modeling effort with the empirical evidence.

preprint2015arXiv

Coupling news sentiment with web browsing data improves prediction of intra-day price dynamics

The new digital revolution of big data is deeply changing our capability of understanding society and forecasting the outcome of many social and economic systems. Unfortunately, information can be very heterogeneous in the importance, relevance, and surprise it conveys, affecting severely the predictive power of semantic and statistical methods. Here we show that the aggregation of web users' behavior can be elicited to overcome this problem in a hard to predict complex system, namely the financial market. Specifically, our in-sample analysis shows that the combined use of sentiment analysis of news and browsing activity of users of Yahoo! Finance greatly helps forecasting intra-day and daily price changes of a set of 100 highly capitalized US stocks traded in the period 2012-2013. Sentiment analysis or browsing activity when taken alone have very small or no predictive power. Conversely, when considering a "news signal" where in a given time interval we compute the average sentiment of the clicked news, weighted by the number of clicks, we show that for nearly 50% of the companies such signal Granger-causes hourly price returns. Our result indicates a "wisdom-of-the-crowd" effect that allows to exploit users' activity to identify and weigh properly the relevant and surprising news, enhancing considerably the forecasting power of the news sentiment.

preprint2015arXiv

Disentangling bipartite and core-periphery structure in financial networks

A growing number of systems are represented as networks whose architecture conveys significant information and determines many of their properties. Examples of network architecture include modular, bipartite, and core-periphery structures. However inferring the network structure is a non trivial task and can depend sometimes on the chosen null model. Here we propose a method for classifying network structures and ranking its nodes in a statistically well-grounded fashion. The method is based on the use of Belief Propagation for learning through Entropy Maximization on both the Stochastic Block Model (SBM) and the degree-corrected Stochastic Block Model (dcSBM). As a specific application we show how the combined use of the two ensembles -SBM and dcSBM- allows to disentangle the bipartite and the core-periphery structure in the case of the e-MID interbank network. Specifically we find that, taking into account the degree, this interbank network is better described by a bipartite structure, while using the SBM the core-periphery structure emerges only when data are aggregated for more than a week.

preprint2015arXiv

Interbank markets and multiplex networks: centrality measures and statistical null models

The interbank market is considered one of the most important channels of contagion. Its network representation, where banks and claims/obligations are represented by nodes and links (respectively), has received a lot of attention in the recent theoretical and empirical literature, for assessing systemic risk and identifying systematically important financial institutions. Different types of links, for example in terms of maturity and collateralization of the claim/obligation, can be established between financial institutions. Therefore a natural representation of the interbank structure which takes into account more features of the market, is a multiplex, where each layer is associated with a type of link. In this paper we review the empirical structure of the multiplex and the theoretical consequences of this representation. We also investigate the betweenness and eigenvector centrality of a bank in the network, comparing its centrality properties across different layers and with Maximum Entropy null models.

preprint2014arXiv

Beyond the square root: Evidence for logarithmic dependence of market impact on size and participation rate

We make an extensive empirical study of the market impact of large orders (metaorders) executed in the U.S. equity market between 2007 and 2009. We show that the square root market impact formula, which is widely used in the industry and supported by previous published research, provides a good fit only across about two orders of magnitude in order size. A logarithmic functional form fits the data better, providing a good fit across almost five orders of magnitude. We introduce the concept of an "impact surface" to model the impact as a function of both the duration and the participation rate of the metaorder, finding again a logarithmic dependence. We show that during the execution the price trajectory deviates from the market impact, a clear indication of non-VWAP executions. Surprisingly, we find that sometimes the price starts reverting well before the end of the execution. Finally we show that, although on average the impact relaxes to approximately 2/3 of the peak impact, the precise asymptotic value of the price depends on the participation rate and on the duration of the metaorder. We present evidence that this might be due to a herding phenomenon among metaorders.

preprint2014arXiv

Competitive allocation of resources on a network: an agent-based model of air companies competing for the best routes

We present a stylized model of the allocation of resources on a network. By considering as a concrete example the network of sectors of the airspace, where each node is a sector characterized by a maximal number of simultaneously present aircraft, we consider the problem of air companies competing for the allocation of the airspace. Each company is characterized by a cost function, weighting differently punctuality and length of the flight. We consider the model in the presence of pure and mixed populations of types of airline companies and we study how the equilibria depends on the characteristics of the network.

preprint2014arXiv

Modeling FX market activity around macroeconomic news: a Hawkes process approach

We present a Hawkes model approach to foreign exchange market in which the high frequency price dynamics is affected by a self exciting mechanism and an exogenous component, generated by the pre-announced arrival of macroeconomic news. By focusing on time windows around the news announcement, we find that the model is able to capture the increase of trading activity after the news, both when the news has a sizeable effect on volatility and when this effect is negligible, either because the news in not important or because the announcement is in line with the forecast by analysts. We extend the model by considering non-causal effects, due to the fact that the existence of the news (but not its content) is known by the market before the announcement.

preprint2014arXiv

Optimal execution with nonlinear transient market impact

We study the problem of the optimal execution of a large trade in the presence of nonlinear transient impact. We propose an approach based on homotopy analysis, whereby a well behaved initial strategy is continuously deformed to lower the expected execution cost. We find that the optimal solution is front loaded for concave impact and that its expected cost is significantly lower than that of conventional strategies. We then consider brute force numerical optimization of the cost functional; we find that the optimal solution for a buy program typically features a few short intense buying periods separated by long periods of weak selling. Indeed, in some cases we find negative expected cost. We show that this undesirable characteristic of the nonlinear transient impact model may be mitigated either by introducing a bid-ask spread cost or by imposing convexity of the instantaneous market impact function for large trading rates.

preprint2014arXiv

The adaptive nature of liquidity taking in limit order books

In financial markets, the order flow, defined as the process assuming value one for buy market orders and minus one for sell market orders, displays a very slowly decaying autocorrelation function. Since orders impact prices, reconciling the persistence of the order flow with market efficiency is a subtle issue. A possible solution is provided by asymmetric liquidity, which states that the impact of a buy or sell order is inversely related to the probability of its occurrence. We empirically find that when the order flow predictability increases in one direction, the liquidity in the opposite side decreases, but the probability that a trade moves the price decreases significantly. While the last mechanism is able to counterbalance the persistence of order flow and restore efficiency and diffusivity, the first acts in opposite direction. We introduce a statistical order book model where the persistence of the order flow is mitigated by adjusting the market order volume to the predictability of the order flow. The model reproduces the diffusive behaviour of prices at all time scales without fine-tuning the values of parameters, as well as the behaviour of most order book quantities as a function of the local predictability of order flow.

preprint2014arXiv

Why is order flow so persistent?

Order flow in equity markets is remarkably persistent in the sense that order signs (to buy or sell) are positively autocorrelated out to time lags of tens of thousands of orders, corresponding to many days. Two possible explanations are herding, corresponding to positive correlation in the behavior of different investors, or order splitting, corresponding to positive autocorrelation in the behavior of single investors. We investigate this using order flow data from the London Stock Exchange for which we have membership identifiers. By formulating models for herding and order splitting, as well as models for brokerage choice, we are able to overcome the distortion introduced by brokerage. On timescales of less than a few hours the persistence of order flow is overwhelmingly due to splitting rather than herding. We also study the properties of brokerage order flow and show that it is remarkably consistent both cross-sectionally and longitudinally.

preprint2013arXiv

How efficiency shapes market impact

We develop a theory for the market impact of large trading orders, which we call metaorders because they are typically split into small pieces and executed incrementally. Market impact is empirically observed to be a concave function of metaorder size, i.e., the impact per share of large metaorders is smaller than that of small metaorders. We formulate a stylized model of an algorithmic execution service and derive a fair pricing condition, which says that the average transaction price of the metaorder is equal to the price after trading is completed. We show that at equilibrium the distribution of trading volume adjusts to reflect information, and dictates the shape of the impact function. The resulting theory makes empirically testable predictions for the functional form of both the temporary and permanent components of market impact. Based on the commonly observed asymptotic distribution for the volume of large trades, it says that market impact should increase asymptotically roughly as the square root of metaorder size, with average permanent impact relaxing to about two thirds of peak impact.

preprint2013arXiv

Modeling the coupled return-spread high frequency dynamics of large tick assets

Large tick assets, i.e. assets where one tick movement is a significant fraction of the price and bid-ask spread is almost always equal to one tick, display a dynamics in which price changes and spread are strongly coupled. We introduce a Markov-switching modeling approach for price change, where the latent Markov process is the transition between spreads. We then use a finite Markov mixture of logit regressions on past squared returns to describe the dependence of the probability of price changes. The model can thus be seen as a Double Chain Markov Model. We show that the model describes the shape of return distribution at different time aggregations, volatility clustering, and the anomalous decrease of kurtosis of returns. We calibrate our models on Nasdaq stocks and we show that this model reproduces remarkably well the statistical properties of real data.

preprint2013arXiv

Modelling systemic price cojumps with Hawkes factor models

Instabilities in the price dynamics of a large number of financial assets are a clear sign of systemic events. By investigating a set of 20 high cap stocks traded at the Italian Stock Exchange, we find that there is a large number of high frequency cojumps. We show that the dynamics of these jumps is described neither by a multivariate Poisson nor by a multivariate Hawkes model. We introduce a Hawkes one factor model which is able to capture simultaneously the time clustering of jumps and the high synchronization of jumps across assets.

preprint2013arXiv

Modelling the Air Transport with Complex Networks: a short review

Air transport is a key infrastructure of modern societies. In this paper we review some recent approaches to air transport, which make extensive use of theory of complex networks. We discuss possible networks that can be defined for the air transport and we focus our attention to networks of airports connected by flights. We review several papers investigating the topology of these networks and their dynamics for time scales ranging from years to intraday intervals, and consider also the resilience properties of air networks to extreme events. Finally we discuss the results of some recent papers investigating the dynamics on air transport network, with emphasis on passengers traveling in the network and epidemic spreading mediated by air transport.

preprint2013arXiv

Multi-scale analysis of the European airspace using network community detection

We show that the European airspace can be represented as a multi-scale traffic network whose nodes are airports, sectors, or navigation points and links are defined and weighted according to the traffic of flights between the nodes. By using a unique database of the air traffic in the European airspace, we investigate the architecture of these networks with a special emphasis on their community structure. We propose that unsupervised network community detection algorithms can be used to monitor the current use of the airspaces and improve it by guiding the design of new ones. Specifically, we compare the performance of three community detection algorithms, also by using a null model which takes into account the spatial distance between nodes, and we discuss their ability to find communities that could be used to define new control units of the airspace.

preprint2013arXiv

Scale-free relaxation of a wave packet in a quantum well with power-law tails

We propose a setup for which a power-law decay is predicted to be observable for generic and realistic conditions. The system we study is very simple: A quantum wave packet initially prepared in a potential well with (i) tails asymptotically decaying like ~ x^{-2} and (ii) an eigenvalues spectrum that shows a continuous part attached to the ground or equilibrium state. We analytically derive the asymptotic decay law from the spectral properties for generic, confined initial states. Our findings are supported by realistic numerical simulations for state-of-the-art expansion experiments with cold atoms.

preprint2013arXiv

The effect of round-off error on long memory processes

We study how the round-off (or discretization) error changes the statistical properties of a Gaussian long memory process. We show that the autocovariance and the spectral density of the discretized process are asymptotically rescaled by a factor smaller than one, and we compute exactly this scaling factor. Consequently, we find that the discretized process is also long memory with the same Hurst exponent as the original process. We consider the properties of two estimators of the Hurst exponent, namely the local Whittle (LW) estimator and the Detrended Fluctuation Analysis (DFA). By using analytical considerations and numerical simulations we show that, in presence of round-off error, both estimators are severely negatively biased in finite samples. Under regularity conditions we prove that the LW estimator applied to discretized processes is consistent and asymptotically normal. Moreover, we compute the asymptotic properties of the DFA for a generic (i.e. non Gaussian) long memory process and we apply the result to discretized processes.

preprint2013arXiv

The multiplex structure of interbank networks

The interbank market has a natural multiplex network representation. We employ a unique database of supervisory reports of Italian banks to the Banca d'Italia that includes all bilateral exposures broken down by maturity and by the secured and unsecured nature of the contract. We find that layers have different topological properties and persistence over time. The presence of a link in a layer is not a good predictor of the presence of the same link in other layers. Maximum entropy models reveal different unexpected substructures, such as network motifs, in different layers. Using the total interbank network or focusing on a specific layer as representative of the other layers provides a poor representation of interlinkages in the interbank market and could lead to biased estimation of systemic risk.

preprint2012arXiv

Calibration of optimal execution of financial transactions in the presence of transient market impact

Trading large volumes of a financial asset in order driven markets requires the use of algorithmic execution dividing the volume in many transactions in order to minimize costs due to market impact. A proper design of an optimal execution strategy strongly depends on a careful modeling of market impact, i.e. how the price reacts to trades. In this paper we consider a recently introduced market impact model (Bouchaud et al., 2004), which has the property of describing both the volume and the temporal dependence of price change due to trading. We show how this model can be used to describe price impact also in aggregated trade time or in real time. We then solve analytically and calibrate with real data the optimal execution problem both for risk neutral and for risk averse investors and we derive an efficient frontier of optimal execution. When we include spread costs the problem must be solved numerically and we show that the introduction of such costs regularizes the solution.

preprint2012arXiv

How does the market react to your order flow?

We present an empirical study of the intertwined behaviour of members in a financial market. Exploiting a database where the broker that initiates an order book event can be identified, we decompose the correlation and response functions into contributions coming from different market participants and study how their behaviour is interconnected. We find evidence that (1) brokers are very heterogeneous in liquidity provision -- some are consistently liquidity providers while others are consistently liquidity takers. (2) The behaviour of brokers is strongly conditioned on the actions of {\it other} brokers. In contrast brokers are only weakly influenced by the impact of their own previous orders. (3) The total impact of market orders is the result of a subtle compensation between the same broker pushing the price in one direction and the liquidity provision of other brokers pushing it in the opposite direction. These results enforce the picture of market dynamics being the result of the competition between heterogeneous participants interacting to form a complicated market ecology.

preprint2011arXiv

Do firms share the same functional form of their growth rate distribution? A new statistical test

We introduce a new statistical test of the hypothesis that a balanced panel of firms have the same growth rate distribution or, more generally, that they share the same functional form of growth rate distribution. We applied the test to European Union and US publicly quoted manufacturing firms data, considering functional forms belonging to the Subbotin family of distributions. While our hypotheses are rejected for the vast majority of sets at the sector level, we cannot rejected them at the subsector level, indicating that homogenous panels of firms could be described by a common functional form of growth rate distribution.

preprint2011arXiv

Identification of clusters of investors from their real trading activity in a financial market

We use statistically validated networks, a recently introduced method to validate links in a bipartite system, to identify clusters of investors trading in a financial market. Specifically, we investigate a special database allowing to track the trading activity of individual investors of the stock Nokia. We find that many statistically detected clusters of investors show a very high degree of synchronization in the time when they decide to trade and in the trading action taken. We investigate the composition of these clusters and we find that several of them show an over-expression of specific categories of investors.

preprint2011arXiv

Segmentation algorithm for non-stationary compound Poisson processes

We introduce an algorithm for the segmentation of a class of regime switching processes. The segmentation algorithm is a non parametric statistical method able to identify the regimes (patches) of the time series. The process is composed of consecutive patches of variable length, each patch being described by a stationary compound Poisson process, i.e. a Poisson process where each count is associated to a fluctuating signal. The parameters of the process are different in each patch and therefore the time series is non stationary. Our method is a generalization of the algorithm introduced by Bernaola-Galvan, et al., Phys. Rev. Lett., 87, 168105 (2001). We show that the new algorithm outperforms the original one for regime switching compound Poisson processes. As an application we use the algorithm to segment the time series of the inventory of market members of the London Stock Exchange and we observe that our method finds almost three times more patches than the original one.

preprint2011arXiv

Trading activity and price impact in parallel markets: SETS vs. off-book market at the London Stock Exchange

We empirically study the trading activity in the electronic on-book segment and in the dealership off-book segment of the London Stock Exchange, investigating separately the trading of active market members and of other market participants which are non-members. We find that (i) the volume distribution of off-book transactions has a significantly fatter tail than the one of on-book transactions, (ii) groups of members and non-members can be classified in categories according to their trading profile (iii) there is a strong anticorrelation between the daily inventory variation of a market member due to the on-book market transactions and inventory variation due to the off-book market transactions with non-members, and (iv) the autocorrelation of the sign of the orders of non-members in the off-book market is slowly decaying. We also analyze the on-book price impact function over time, both for positive and negative lags, of the electronic trades and of the off-book trades. The unconditional impact curves are very different for the electronic trades and the off-book trades. Moreover there is a small dependence of impact on the volume for the on-book electronic trades, while the shape and magnitude of impact function of off-book transactions strongly depend on volume.

preprint2010arXiv

Community characterization of heterogeneous complex systems

We introduce an analytical statistical method to characterize the communities detected in heterogeneous complex systems. By posing a suitable null hypothesis, our method makes use of the hypergeometric distribution to assess the probability that a given property is over-expressed in the elements of a community with respect to all the elements of the investigated set. We apply our method to two specific complex networks, namely a network of world movies and a network of physics preprints. The characterization of the elements and of the communities is done in terms of languages and countries for the movie network and of journals and subject categories for papers. We find that our method is able to characterize clearly the identified communities. Moreover our method works well both for large and for small communities.

preprint2010arXiv

Statistical identification with hidden Markov models of large order splitting strategies in an equity market

Large trades in a financial market are usually split into smaller parts and traded incrementally over extended periods of time. We address these large trades as hidden orders. In order to identify and characterize hidden orders we fit hidden Markov models to the time series of the sign of the tick by tick inventory variation of market members of the Spanish Stock Exchange. Our methodology probabilistically detects trading sequences, which are characterized by a net majority of buy or sell transactions. We interpret these patches of sequential buying or selling transactions as proxies of the traded hidden orders. We find that the time, volume and number of transactions size distributions of these patches are fat tailed. Long patches are characterized by a high fraction of market orders and a low participation rate, while short patches have a large fraction of limit orders and a high participation rate. We observe the existence of a buy-sell asymmetry in the number, average length, average fraction of market orders and average participation rate of the detected patches. The detected asymmetry is clearly depending on the local market trend. We also compare the hidden Markov models patches with those obtained with the segmentation method used in Vaglica {\it et al.} (2008) and we conclude that the former ones can be interpreted as a partition of the latter ones.

preprint2010arXiv

Statistically validated networks in bipartite complex systems

Many complex systems present an intrinsic bipartite nature and are often described and modeled in terms of networks [1-5]. Examples include movies and actors [1, 2, 4], authors and scientific papers [6-9], email accounts and emails [10], plants and animals that pollinate them [11, 12]. Bipartite networks are often very heterogeneous in the number of relationships that the elements of one set establish with the elements of the other set. When one constructs a projected network with nodes from only one set, the system heterogeneity makes it very difficult to identify preferential links between the elements. Here we introduce an unsupervised method to statistically validate each link of the projected network against a null hypothesis taking into account the heterogeneity of the system. We apply our method to three different systems, namely the set of clusters of orthologous genes (COG) in completely sequenced genomes [13, 14], a set of daily returns of 500 US financial stocks, and the set of world movies of the IMDb database [15]. In all these systems, both different in size and level of heterogeneity, we find that our method is able to detect network structures which are informative about the system and are not simply expression of its heterogeneity. Specifically, our method (i) identifies the preferential relationships between the elements, (ii) naturally highlights the clustered structure of investigated systems, and (iii) allows to classify links according to the type of statistically validated relationships between the connected nodes.

preprint2010arXiv

Tick size and price diffusion

A tick size is the smallest increment of a security price. It is clear that at the shortest time scale on which individual orders are placed the tick size has a major role which affects where limit orders can be placed, the bid-ask spread, etc. This is the realm of market microstructure and there is a vast literature on the role of tick size on market microstructure. However, tick size can also affect price properties at longer time scales, and relatively less is known about the effect of tick size on the statistical properties of prices. The present paper is divided in two parts. In the first we review the effect of tick size change on the market microstructure and the diffusion properties of prices. The second part presents original results obtained by investigating the tick size changes occurring at the New York Stock Exchange (NYSE). We show that tick size change has three effects on price diffusion. First, as already shown in the literature, tick size affects price return distribution at an aggregate time scale. Second, reducing the tick size typically leads to an increase of volatility clustering. We give a possible mechanistic explanation for this effect, but clearly more investigation is needed to understand the origin of this relation. Third, we explicitly show that the ability of the subordination hypothesis in explaining fat tails of returns and volatility clustering is strongly dependent on tick size. While for large tick sizes the subordination hypothesis has significant explanatory power, for small tick sizes we show that subordination is not the main driver of these two important stylized facts of financial market.

preprint2010arXiv

When do improved covariance matrix estimators enhance portfolio optimization? An empirical comparative study of nine estimators

The use of improved covariance matrix estimators as an alternative to the sample estimator is considered an important approach for enhancing portfolio optimization. Here we empirically compare the performance of 9 improved covariance estimation procedures by using daily returns of 90 highly capitalized US stocks for the period 1997-2007. We find that the usefulness of covariance matrix estimators strongly depends on the ratio between estimation period T and number of stocks N, on the presence or absence of short selling, and on the performance metric considered. When short selling is allowed, several estimation methods achieve a realized risk that is significantly smaller than the one obtained with the sample covariance method. This is particularly true when T/N is close to one. Moreover many estimators reduce the fraction of negative portfolio weights, while little improvement is achieved in the degree of diversification. On the contrary when short selling is not allowed and T>N, the considered methods are unable to outperform the sample covariance in terms of realized risk but can give much more diversified portfolios than the one obtained with the sample covariance. When T<N the use of the sample covariance matrix and of the pseudoinverse gives portfolios with very poor performance.

preprint2009arXiv

Market impact and trading profile of large trading orders in stock markets

We empirically study the market impact of trading orders. We are specifically interested in large trading orders that are executed incrementally, which we call hidden orders. These are reconstructed based on information about market member codes using data from the Spanish Stock Market and the London Stock Exchange. We find that market impact is strongly concave, approximately increasing as the square root of order size. Furthermore, as a given order is executed, the impact grows in time according to a power-law; after the order is finished, it reverts to a level of about 0.5-0.7 of its value at its peak. We observe that hidden orders are executed at a rate that more or less matches trading in the overall market, except for small deviations at the beginning and end of the order.

preprint2008arXiv

Statistical properties of thermodynamically predicted RNA secondary structures in viral genomes

By performing a comprehensive study on 1832 segments of 1212 complete genomes of viruses, we show that in viral genomes the hairpin structures of thermodynamically predicted RNA secondary structures are more abundant than expected under a simple random null hypothesis. The detected hairpin structures of RNA secondary structures are present both in coding and in noncoding regions for the four groups of viruses categorized as dsDNA, dsRNA, ssDNA and ssRNA. For all groups hairpin structures of RNA secondary structures are detected more frequently than expected for a random null hypothesis in noncoding rather than in coding regions. However, potential RNA secondary structures are also present in coding regions of dsDNA group. In fact we detect evolutionary conserved RNA secondary structures in conserved coding and noncoding regions of a large set of complete genomes of dsDNA herpesviruses.

Fabrizio Lillo

What is connected

Connect this record

See the researcher in context

Building this map preview

46 published item(s)

GravityGraphSAGE: Link Prediction in Directed Attributed Graphs

How Covid mobility restrictions modified the population of investors in Italian stock markets

Score Driven Generalized Fitness Model for Sparse and Weighted Temporal Networks

Estimating the Total Volume of Queries to a Search Engine

Information dynamics of price and liquidity around the 2017 Bitcoin markets crash

A tale of two sentiment scales: Disentangling short-run and long-run components in multivariate sentiment dynamics

Betweenness centrality for temporal multiplexes

Unveiling the relation between herding and liquidity with trader lead-lag networks

Cashtag piggybacking: uncovering spam and bot activity in stock microblogs on Twitter

A continuous and efficient fundamental price on the discrete order book grid

Linear models for the impact of order flow on prices I. Propagators: Transient vs. History Dependent Impact

Linear models for the impact of order flow on prices II. The Mixture Transition Distribution model

Optimal information diffusion in stochastic block models

Strategic allocation of flight plans: an evolutionary point of view

Centrality metrics and localization in core-periphery networks

Collective synchronization and high frequency systemic instabilities in financial markets

Coupling news sentiment with web browsing data improves prediction of intra-day price dynamics

Disentangling bipartite and core-periphery structure in financial networks

Interbank markets and multiplex networks: centrality measures and statistical null models

Beyond the square root: Evidence for logarithmic dependence of market impact on size and participation rate

Competitive allocation of resources on a network: an agent-based model of air companies competing for the best routes

Modeling FX market activity around macroeconomic news: a Hawkes process approach

Optimal execution with nonlinear transient market impact

The adaptive nature of liquidity taking in limit order books

Why is order flow so persistent?

How efficiency shapes market impact

Modeling the coupled return-spread high frequency dynamics of large tick assets

Modelling systemic price cojumps with Hawkes factor models

Modelling the Air Transport with Complex Networks: a short review

Multi-scale analysis of the European airspace using network community detection

Scale-free relaxation of a wave packet in a quantum well with power-law tails

The effect of round-off error on long memory processes

The multiplex structure of interbank networks

Calibration of optimal execution of financial transactions in the presence of transient market impact

How does the market react to your order flow?

Do firms share the same functional form of their growth rate distribution? A new statistical test

Identification of clusters of investors from their real trading activity in a financial market

Segmentation algorithm for non-stationary compound Poisson processes

Trading activity and price impact in parallel markets: SETS vs. off-book market at the London Stock Exchange

Community characterization of heterogeneous complex systems

Statistical identification with hidden Markov models of large order splitting strategies in an equity market

Statistically validated networks in bipartite complex systems

Tick size and price diffusion

When do improved covariance matrix estimators enhance portfolio optimization? An empirical comparative study of nine estimators

Market impact and trading profile of large trading orders in stock markets

Statistical properties of thermodynamically predicted RNA secondary structures in viral genomes