Researcher profile

Paulo Rocha

Paulo Rocha contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - Baseline
4works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2020arXiv

Towards Automatic Clustering Analysis using Traces of Information Gain: The InfoGuide Method

Clustering analysis has become a ubiquitous information retrieval tool in a wide range of domains, but a more automatic framework is still lacking. Though internal metrics are the key players towards a successful retrieval of clusters, their effectiveness on real-world datasets remains not fully understood, mainly because of their unrealistic assumptions underlying datasets. We hypothesized that capturing {\it traces of information gain} between increasingly complex clustering retrievals---{\it InfoGuide}---enables an automatic clustering analysis with improved clustering retrievals. We validated the {\it InfoGuide} hypothesis by capturing the traces of information gain using the Kolmogorov-Smirnov statistic and comparing the clusters retrieved by {\it InfoGuide} against those retrieved by other commonly used internal metrics in artificially-generated, benchmarks, and real-world datasets. Our results suggested that {\it InfoGuide} can enable a more automatic clustering analysis and may be more suitable for retrieving clusters in real-world datasets displaying nontrivial statistical properties.

preprint2015arXiv

Uncovering the evolution of non-stationary stochastic variables: the example of asset volume-price fluctuations

We present a framework for describing the evolution of stochastic observables having a non-stationary distribution of values. The framework is applied to empirical volume-prices from assets traded at the New York stock exchange. Using Kullback-Leibler divergence we evaluate the best model out from four biparametric models standardly used in the context of financial data analysis. In our present data sets we conclude that the inverse $Γ$-distribution is a good model, particularly for the distribution tail of the largest volume-price fluctuations. Extracting the time-series of the corresponding parameter values we show that they evolve in time as stochastic variables themselves. For the particular case of the parameter controlling the volume-price distribution tail we are able to extract an Ornstein-Uhlenbeck equation which describes the fluctuations of the largest volume-prices observed in the data. Finally, we discuss how to bridge from the stochastic evolution of the distribution parameters to the stochastic evolution of the (non-stationary) observable and put our conclusions into perspective for other applications in geophysics and biology.

preprint2014arXiv

Optimal models of extreme volume-prices are time-dependent

We present evidence that the best model for empirical volume-price distributions is not always the same and it strongly depends in (i) the region of the volume-price spectrum that one wants to model and (ii) the period in time that is being modelled. To show these two features we analyze stocks of the New York stock market with four different models: Gamma, inverse-gamma, log-normal, and Weibull distributions. To evaluate the accuracy of each model we use standard relative deviations as well as the Kullback-Leibler distance and introduce an additional distance particularly suited to evaluate how accurate are the models for the distribution tails (large volume-price). Finally we put our findings in perspective and discuss how they can be extended to other situations in finance engineering.

preprint2014arXiv

Stochastic Evolution of Stock Market Volume-Price Distributions

Using available data from the New York stock market (NYSM) we test four different bi-parametric models to fit the correspondent volume-price distributions at each $10$-minute lag: the Gamma distribution, the inverse Gamma distribution, the Weibull distribution and the log-normal distribution. The volume-price data, which measures market capitalization, appears to follow a specific statistical pattern, other than the evolution of prices measured in similar studies. We find that the inverse Gamma model gives a superior fit to the volume-price evolution than the other models. We then focus on the inverse Gamma distribution as a model for the NYSM data and analyze the evolution of the pair of distribution parameters as a stochastic process. Assuming that the evolution of these parameters is governed by coupled Langevin equations, we derive the corresponding drift and diffusion coefficients, which then provide insight for understanding the mechanisms underlying the evolution of the stock market.