Source author record

Eric K. Tokuda

Eric K. Tokuda appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computer Vision physics.soc-ph Social and Information Networks

Catalog footprint

What is connected

3works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Unraveling the graph structure of tabular data through Bayesian and spectral analysis

In the big-data age, tabular data are being generated and analyzed everywhere. As a consequence, finding and understanding the relationships between the features in these data are of great relevance. Here, to encompass these relationships, we propose a graph-based method that allows individual, group and multi-scale analyses. The method starts by mapping the tabular data into a weighted directed graph using the Shapley additive explanations technique. With this graph of relationships, we show that the inference of the hierarchical modular structure obtained by the Nested Stochastic Block Model (nSBM) as well as the study of the spectral space of the magnetic Laplacian can help us identify the classes of features and unravel non-trivial relationships. As a case study, we analyzed a socioeconomic survey conducted with students in Brazil: the PeNSE survey. The spectral embedding of the columns suggested that questions related to physical activities form a separate group. The application of the nSBM approach not only corroborated with that but allowed complementary findings about the modular structure: some groups of questions showed a high adherence with the divisions qualitatively defined by the designers of the survey. As opposed to the structure obtained by the spectrum, questions from the class Safety were partly grouped by our method in the class Drugs. Surprisingly, by inspecting these questions, we observed that they were related to both these topics, suggesting an alternative interpretation of these questions. These results show how our method can provide guidance for tabular data analysis as well as the design of future surveys.

preprint2022arXiv

City Motifs as Revealed by Similarity Between Hierarchical Features

Several natural and theoretical networks can be broken down into smaller portions, or subgraphs corresponding to neighborhoods. The more frequent of these neighborhoods can then be understood as motifs of the network, being therefore important for better characterizing and understanding of the overall structure. Several developments in network science have relied on this interesting concept, with ample applications in areas including systems biology, computational neuroscience, economy and ecology. The present work aims at reporting an unsupervised methodology capable of identifying motifs respective to streets networks, the latter corresponding to graphs obtained from city plans by considering street junctions and terminations as nodes while the links are defined by the streets. Remarkable results are described, including the identification of nine stable and informative motifs, which have been allowed by three critically important factors: (i) adoption of five hierarchical measurements to locally characterize the neighborhoods of nodes in the streets networks; (ii) adoption of an effective coincidence methodology for translating datasets into networks; and (iii) definition of the motifs in statistical terms by using community finding methodology. The nine identified motifs are characterized and discussed from several perspective, including their mutual similarity, visualization, histograms of measurements, and geographical adjacency in the original cities. Also presented is the analysis of the effect of the adopted features on the obtained networks as well as a simple supervised learning method capable of assigning reference motifs to cities.

preprint2020arXiv

Revisiting Agglomerative Clustering

An important issue in clustering concerns the avoidance of false positives while searching for clusters. This work addressed this problem considering agglomerative methods, namely single, average, median, complete, centroid and Ward's approaches applied to unimodal and bimodal datasets obeying uniform, gaussian, exponential and power-law distributions. A model of clusters was also adopted, involving a higher density nucleus surrounded by a transition, followed by outliers. This paved the way to defining an objective means for identifying the clusters from dendrograms. The adopted model also allowed the relevance of the clusters to be quantified in terms of the height of their subtrees. The obtained results include the verification that many methods detect two clusters in unimodal data. The single-linkage method was found to be more resilient to false positives. Also, several methods detected clusters not corresponding directly to the nucleus. The possibility of identifying the type of distribution was also investigated.