Source author record

Akira Imakura

Akira Imakura appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Cryptography and Security cs.CY Distributed, Parallel, and Cluster Computing hep-lat math.NA Neural and Evolutionary Computing Numerical Analysis

Catalog footprint

What is connected

9works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A new type of federated clustering: A non-model-sharing approach

In recent years, the growing need to leverage sensitive data across institutions has led to increased attention on federated learning (FL), a decentralized machine learning paradigm that enables model training without sharing raw data. However, existing FL-based clustering methods, known as federated clustering, typically assume simple data partitioning scenarios such as horizontal or vertical splits, and cannot handle more complex distributed structures. This study proposes data collaboration clustering (DC-Clustering), a novel federated clustering method that supports clustering over complex data partitioning scenarios where horizontal and vertical splits coexist. In DC-Clustering, each institution shares only intermediate representations instead of raw data, ensuring privacy preservation while enabling collaborative clustering. The method allows flexible selection between k-means and spectral clustering, and achieves final results with a single round of communication with the central server. We conducted extensive experiments using synthetic and open benchmark datasets. The results show that our method achieves clustering performance comparable to centralized clustering where all data are pooled. DC-Clustering addresses an important gap in current FL research by enabling effective knowledge discovery from distributed heterogeneous data. Its practical properties -- privacy preservation, communication efficiency, and flexibility -- make it a promising tool for privacy-sensitive domains such as healthcare and finance.

preprint2022arXiv

Another Use of SMOTE for Interpretable Data Collaboration Analysis

Recently, data collaboration (DC) analysis has been developed for privacy-preserving integrated analysis across multiple institutions. DC analysis centralizes individually constructed dimensionality-reduced intermediate representations and realizes integrated analysis via collaboration representations without sharing the original data. To construct the collaboration representations, each institution generates and shares a shareable anchor dataset and centralizes its intermediate representation. Although, random anchor dataset functions well for DC analysis in general, using an anchor dataset whose distribution is close to that of the raw dataset is expected to improve the recognition performance, particularly for the interpretable DC analysis. Based on an extension of the synthetic minority over-sampling technique (SMOTE), this study proposes an anchor data construction technique to improve the recognition performance without increasing the risk of data leakage. Numerical results demonstrate the efficiency of the proposed SMOTE-based method over the existing anchor data constructions for artificial and real-world datasets. Specifically, the proposed method achieves 9 percentage point and 38 percentage point performance improvements regarding accuracy and essential feature selection, respectively, over existing methods for an income dataset. The proposed method provides another use of SMOTE not for imbalanced data classifications but for a key technology of privacy-preserving integrated analysis.

preprint2022arXiv

Divide-and-conquer based Large-Scale Spectral Clustering

Spectral clustering is one of the most popular clustering methods. However, how to balance the efficiency and effectiveness of the large-scale spectral clustering with limited computing resources has not been properly solved for a long time. In this paper, we propose a divide-and-conquer based large-scale spectral clustering method to strike a good balance between efficiency and effectiveness. In the proposed method, a divide-and-conquer based landmark selection algorithm and a novel approximate similarity matrix approach are designed to construct a sparse similarity matrix within low computational complexities. Then clustering results can be computed quickly through a bipartite graph partition process. The proposed method achieves a lower computational complexity than most existing large-scale spectral clustering methods. Experimental results on ten large-scale datasets have demonstrated the efficiency and effectiveness of the proposed method. The MATLAB code of the proposed method and experimental datasets are available at https://github.com/Li-Hongmin/MyPaperWithCode.

preprint2022arXiv

mdx: A Cloud Platform for Supporting Data Science and Cross-Disciplinary Research Collaborations

The growing amount of data and advances in data science have created a need for a new kind of cloud platform that provides users with flexibility, strong security, and the ability to couple with supercomputers and edge devices through high-performance networks. We have built such a nation-wide cloud platform, called "mdx" to meet this need. The mdx platform's virtualization service, jointly operated by 9 national universities and 2 national research institutes in Japan, launched in 2021, and more features are in development. Currently mdx is used by researchers in a wide variety of domains, including materials informatics, geo-spatial information science, life science, astronomical science, economics, social science, and computer science. This paper provides an the overview of the mdx platform, details the motivation for its development, reports its current status, and outlines its future plans.

preprint2022arXiv

Non-readily identifiable data collaboration analysis for multiple datasets including personal information

Multi-source data fusion, in which multiple data sources are jointly analyzed to obtain improved information, has considerable research attention. For the datasets of multiple medical institutions, data confidentiality and cross-institutional communication are critical. In such cases, data collaboration (DC) analysis by sharing dimensionality-reduced intermediate representations without iterative cross-institutional communications may be appropriate. Identifiability of the shared data is essential when analyzing data including personal information. In this study, the identifiability of the DC analysis is investigated. The results reveals that the shared intermediate representations are readily identifiable to the original data for supervised learning. This study then proposes a non-readily identifiable DC analysis only sharing non-readily identifiable data for multiple medical datasets including personal information. The proposed method solves identifiability concerns based on a random sample permutation, the concept of interpretable DC analysis, and usage of functions that cannot be reconstructed. In numerical experiments on medical datasets, the proposed method exhibits a non-readily identifiability while maintaining a high recognition performance of the conventional DC analysis. For a hospital dataset, the proposed method exhibits a nine percentage point improvement regarding the recognition performance over the local analysis that uses only local dataset.

preprint2021arXiv

Accuracy and Privacy Evaluations of Collaborative Data Analysis

Distributed data analysis without revealing the individual data has recently attracted significant attention in several applications. A collaborative data analysis through sharing dimensionality reduced representations of data has been proposed as a non-model sharing-type federated learning. This paper analyzes the accuracy and privacy evaluations of this novel framework. In the accuracy analysis, we provided sufficient conditions for the equivalence of the collaborative data analysis and the centralized analysis with dimensionality reduction. In the privacy analysis, we proved that collaborative users' private datasets are protected with a double privacy layer against insider and external attacking scenarios.

preprint2019arXiv

Verified partial eigenvalue computations using contour integrals for Hermitian generalized eigenproblems

We propose a verified computation method for partial eigenvalues of a Hermitian generalized eigenproblem. The block Sakurai-Sugiura Hankel method, a contour integral-type eigensolver, can reduce a given eigenproblem into a generalized eigenproblem of block Hankel matrices whose entries consist of complex moments. In this study, we evaluate all errors in computing the complex moments. We derive a truncation error bound of the quadrature. Then, we take numerical errors of the quadrature into account and rigorously enclose the entries of the block Hankel matrices. Each quadrature point gives rise to a linear system, and its structure enables us to develop an efficient technique to verify the approximate solution. Numerical experiments show that the proposed method outperforms a standard method and infer that the proposed method is potentially efficient in parallel.

preprint2016arXiv

Alternating optimization method based on nonnegative matrix factorizations for deep neural networks

The backpropagation algorithm for calculating gradients has been widely used in computation of weights for deep neural networks (DNNs). This method requires derivatives of objective functions and has some difficulties finding appropriate parameters such as learning rate. In this paper, we propose a novel approach for computing weight matrices of fully-connected DNNs by using two types of semi-nonnegative matrix factorizations (semi-NMFs). In this method, optimization processes are performed by calculating weight matrices alternately, and backpropagation (BP) is not used. We also present a method to calculate stacked autoencoder using a NMF. The output results of the autoencoder are used as pre-training data for DNNs. The experimental results show that our method using three types of NMFs attains similar error rates to the conventional DNNs with BP.

preprint2014arXiv

A filtering technique for the temporally reduced matrix of the Wilson fermion determinant

The Wilson fermion determinant can be written in the form of a series expansion in fugacity $ξ=\exp(μ/T)$, provided that the eigenmodes of the temporally reduced operator are obtained. Since the calculation of all eigenmodes rapidly becomes prohibitive for larger volumes, we develop a method to calculate only the low-energy eigenmodes of the reduced matrix using a matrix filetering technique. This provides a basis for an approximation to neglect uninteresting ultraviolet contributions.

Akira Imakura

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

A new type of federated clustering: A non-model-sharing approach

Another Use of SMOTE for Interpretable Data Collaboration Analysis

Divide-and-conquer based Large-Scale Spectral Clustering

mdx: A Cloud Platform for Supporting Data Science and Cross-Disciplinary Research Collaborations

Non-readily identifiable data collaboration analysis for multiple datasets including personal information

Accuracy and Privacy Evaluations of Collaborative Data Analysis

Verified partial eigenvalue computations using contour integrals for Hermitian generalized eigenproblems

Alternating optimization method based on nonnegative matrix factorizations for deep neural networks

A filtering technique for the temporally reduced matrix of the Wilson fermion determinant