Researcher profile

Diego R. Amancio

Diego R. Amancio contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2023arXiv

Using citation networks to evaluate the impact of text length on the identification of relevant concepts

The identification of the most significant concepts in unstructured data is of critical importance in various practical applications. Despite the large number of methods that have been put forth to extract the main topics of texts, a limited number of studies have probed the impact of the text length on the performance of keyword extraction (KE) methods. In this study, we adopted a network-based approach to evaluate whether keywords extracted from paper abstracts are compatible with keywords extracted from full papers. We employed a community detection method to identify groups of related papers in citation networks. These paper clusters were then employed to extract keywords from abstracts. Our results indicate that while the various community detection methods employed in our KE approach yielded similar levels of accuracy, a correlation analysis revealed that these methods produced distinct keyword lists for each abstract. We also observed that all considered approaches, however, reach low values of accuracy. Surprisingly, text clustering approaches outperformed all citation-based methods. The findings suggest that using different sources of information to extract keywords can lead to significant differences in performance. This effect can play an important role in applications relying upon the identification of relevant concepts.

preprint2022arXiv

Recovering network topology and dynamics via sequence characterization

Sequences arise in many real-world scenarios; thus, identifying the mechanisms behind symbol generation is essential to understanding many complex systems. This paper analyzes sequences generated by agents walking on a networked topology. Given that in many real scenarios, the underlying processes generating the sequence is hidden, we investigate whether the reconstruction of the network via the co-occurrence method is useful to recover both the network topology and agent dynamics generating sequences. We found that the characterization of reconstructed networks provides valuable information regarding the process and topology used to create the sequences. In a machine learning approach considering 16 combinations of network topology and agent dynamics as classes, we obtained an accuracy of 87% with sequences generated with less than 40% of nodes visited. Larger sequences turned out to generate improved machine learning models. Our findings suggest that the proposed methodology could be extended to classify sequences and understand the mechanisms behind sequence generation.

preprint2022arXiv

Using virtual edges to extract keywords from texts modeled as complex networks

Detecting keywords in texts is important for many text mining applications. Graph-based methods have been commonly used to automatically find the key concepts in texts, however, relevant information provided by embeddings has not been widely used to enrich the graph structure. Here we modeled texts co-occurrence networks, where nodes are words and edges are established either by contextual or semantical similarity. We compared two embedding approaches -- Word2vec and BERT -- to check whether edges created via word embeddings can improve the quality of the keyword extraction method. We found that, in fact, the use of virtual edges can improve the discriminability of co-occurrence networks. The best performance was obtained when we considered low percentages of addition of virtual (embedding) edges. A comparative analysis of structural and dynamical network metrics revealed the degree, PageRank, and accessibility are the metrics displaying the best performance in the model enriched with virtual edges.

preprint2021arXiv

A pattern recognition approach for distinguishing between prose and poetry

Poetry and prose are written artistic expressions that help us to appreciate the reality we live. Each of these styles has its own set of subjective properties, such as rhyme and rhythm, which are easily caught by a human reader's eye and ear. With the recent advances in artificial intelligence, the gap between humans and machines may have decreased, and today we observe algorithms mastering tasks that were once exclusively performed by humans. In this paper, we propose an automated method to distinguish between poetry and prose based solely on aural and rhythmic properties. In other to compare prose and poetry rhythms, we represent the rhymes and phones as temporal sequences and thus we propose a procedure for extracting rhythmic features from these sequences. The classification of the considered texts using the set of features extracted resulted in a best accuracy of 0.78, obtained with a neural network. Interestingly, by using an approach based on complex networks to visualize the similarities between the different texts considered, we found that the patterns of poetry vary much more than prose. Consequently, a much richer and complex set of rhythmic possibilities tends to be found in that modality.

preprint2021arXiv

On predicting research grants productivity

Understanding the reasons associated with successful proposals is of paramount importance to improve evaluation processes. In this context, we analyzed whether bibliometric features are able to predict the success of research grants. We extracted features aiming at characterizing the academic history of Brazilian researchers, including research topics, affiliations, number of publications and visibility. The extracted features were then used to predict grants productivity via machine learning in three major research areas, namely Medicine, Dentistry and Veterinary Medicine. We found that research subject and publication history play a role in predicting productivity. In addition, institution-based features turned out to be relevant when combined with other features. While the best results outperformed text-based attributes, the evaluated features were not highly discriminative. Our findings indicate that predicting grants success, at least with the considered set of bibliometric features, is not a trivial task.

preprint2020arXiv

Modeling Supply-Chain Networks with Firm-to-Firm Wire Transfers

We study a novel economic network (supply chain) comprised of wire transfers (electronic payment transactions) among the universe of firms in Brazil (6.2 million firms). We construct a directed and weighted network in which vertices represent cities and edges connote pairwise economic dependence between cities. Cities (vertices) represent the collection of all firms in that location and links denote intercity wire transfers. We find a high degree of economic integration among cities in the trade network, which is consistent with the high degree of specialization found across Brazilian cities. We are able to identify which cities have a dominant role in the entire supply chain process using centrality network measures. We find that the trade network has a disassortative mixing pattern, which is consistent with the power-law shape of the firm size distribution in Brazil. After the Brazilian recession in 2014, we find that the disassortativity becomes even stronger as a result of the death of many small firms and the consequent concentration of economic flows on large firms. Our results suggest that recessions have a large impact on the trade network with meaningful and heterogeneous economic consequences across municipalities. We run econometric exercises and find that courts efficiency plays a dual role. From the customer perspective, it plays an important role in reducing contractual frictions as it increases economic transactions between different cities. From the supplier perspective, cities that are central suppliers to the supply chain seem to use courts inefficiency as a lawsuit barrier from their customers.

preprint2019arXiv

A complex network approach to political analysis: application to the Brazilian Chamber of Deputies

In this paper, we introduce a network-based methodology to study how clusters represented by political entities evolve over time. We constructed networks of voting data from the Brazilian Chamber of Deputies, where deputies are nodes and edges are represented by voting similarity among deputies. The Brazilian Chamber of deputies is characterized by a multi-party political system. Thus, we would expect a broad spectrum of ideas to be represented. Our results, however, revealed that plurality of ideas is not present at all: the effective number of communities representing ideas based on agreement/disagreement in propositions is about 3 over the entire studied time span. The obtained results also revealed different patterns of coalitions between distinct parties. Finally, we also found signs of early party isolation before presidential impeachment proceedings effectively started. We believe that the proposed framework could be used to complement the study of political dynamics and even applied in similar social networks where individuals are organized in a complex manner.

preprint2019arXiv

Semantic flow in language networks

In this study we propose a framework to characterize documents based on their semantic flow. The proposed framework encompasses a network-based model that connected sentences based on their semantic similarity. Semantic fields are detected using standard community detection methods. as the story unfolds, transitions between semantic fields are represent in Markov networks, which in turned are characterized via network motifs (subgraphs). Here we show that the proposed framework can be used to classify books according to their style and publication dates. Remarkably, even without a systematic optimization of parameters, philosophy and investigative books were discriminated with an accuracy rate of 92.5%. Because this model captures semantic features of texts, it could be used as an additional feature in traditional network-based models of texts that capture only syntactical/stylistic information, as it is the case of word adjacency (co-occurrence) networks.

preprint2016arXiv

Complex systems: features, similarity and connectivity

The increasing interest in complex networks research has been a consequence of several intrinsic features of this area, such as the generality of the approach to represent and model virtually any discrete system, and the incorporation of concepts and methods deriving from many areas, from statistical physics to sociology, which are often used in an independent way. Yet, for this same reason, it would be desirable to integrate these various aspects into a more coherent and organic framework, which would imply in several benefits normally allowed by the systematization in science, including the identification of new types of problems and the cross-fertilization between fields. More specifically, the identification of the main areas to which the concepts frequently used in complex networks can be applied paves the way to adopting and applying a larger set of concepts and methods deriving from those respective areas. Among the several areas that have been used in complex networks research, pattern recognition, optimization, linear algebra, and time series analysis seem to play a more basic and recurrent role. In the present manuscript, we propose a systematic way to integrate the concepts from these diverse areas regarding complex networks research. In order to do so, we start by grouping the multidisciplinary concepts into three main groups, namely features, similarity, and network connectivity. Then we show that several of the analysis and modeling approaches to complex networks can be thought as a composition of maps between these three groups, with emphasis on nine main types of mappings, which are presented and illustrated. Such a systematization of principles and approaches also provides an opportunity to review some of the most closely related works in the literature, which is also developed in this article.