Source author record

Tetsuya Sakurai

Tetsuya Sakurai appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Cryptography and Security hep-lat Neural and Evolutionary Computing nucl-th Artificial Intelligence cond-mat.mes-hall cond-mat.mtrl-sci cond-mat.supr-con math.NA

Catalog footprint

What is connected

14works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A new type of federated clustering: A non-model-sharing approach

In recent years, the growing need to leverage sensitive data across institutions has led to increased attention on federated learning (FL), a decentralized machine learning paradigm that enables model training without sharing raw data. However, existing FL-based clustering methods, known as federated clustering, typically assume simple data partitioning scenarios such as horizontal or vertical splits, and cannot handle more complex distributed structures. This study proposes data collaboration clustering (DC-Clustering), a novel federated clustering method that supports clustering over complex data partitioning scenarios where horizontal and vertical splits coexist. In DC-Clustering, each institution shares only intermediate representations instead of raw data, ensuring privacy preservation while enabling collaborative clustering. The method allows flexible selection between k-means and spectral clustering, and achieves final results with a single round of communication with the central server. We conducted extensive experiments using synthetic and open benchmark datasets. The results show that our method achieves clustering performance comparable to centralized clustering where all data are pooled. DC-Clustering addresses an important gap in current FL research by enabling effective knowledge discovery from distributed heterogeneous data. Its practical properties -- privacy preservation, communication efficiency, and flexibility -- make it a promising tool for privacy-sensitive domains such as healthcare and finance.

preprint2022arXiv

Another Use of SMOTE for Interpretable Data Collaboration Analysis

Recently, data collaboration (DC) analysis has been developed for privacy-preserving integrated analysis across multiple institutions. DC analysis centralizes individually constructed dimensionality-reduced intermediate representations and realizes integrated analysis via collaboration representations without sharing the original data. To construct the collaboration representations, each institution generates and shares a shareable anchor dataset and centralizes its intermediate representation. Although, random anchor dataset functions well for DC analysis in general, using an anchor dataset whose distribution is close to that of the raw dataset is expected to improve the recognition performance, particularly for the interpretable DC analysis. Based on an extension of the synthetic minority over-sampling technique (SMOTE), this study proposes an anchor data construction technique to improve the recognition performance without increasing the risk of data leakage. Numerical results demonstrate the efficiency of the proposed SMOTE-based method over the existing anchor data constructions for artificial and real-world datasets. Specifically, the proposed method achieves 9 percentage point and 38 percentage point performance improvements regarding accuracy and essential feature selection, respectively, over existing methods for an income dataset. The proposed method provides another use of SMOTE not for imbalanced data classifications but for a key technology of privacy-preserving integrated analysis.

preprint2022arXiv

Divide-and-conquer based Large-Scale Spectral Clustering

Spectral clustering is one of the most popular clustering methods. However, how to balance the efficiency and effectiveness of the large-scale spectral clustering with limited computing resources has not been properly solved for a long time. In this paper, we propose a divide-and-conquer based large-scale spectral clustering method to strike a good balance between efficiency and effectiveness. In the proposed method, a divide-and-conquer based landmark selection algorithm and a novel approximate similarity matrix approach are designed to construct a sparse similarity matrix within low computational complexities. Then clustering results can be computed quickly through a bipartite graph partition process. The proposed method achieves a lower computational complexity than most existing large-scale spectral clustering methods. Experimental results on ten large-scale datasets have demonstrated the efficiency and effectiveness of the proposed method. The MATLAB code of the proposed method and experimental datasets are available at https://github.com/Li-Hongmin/MyPaperWithCode.

preprint2022arXiv

Knowledge-Driven Program Synthesis via Adaptive Replacement Mutation and Auto-constructed Subprogram Archives

We introduce Knowledge-Driven Program Synthesis (KDPS) as a variant of the program synthesis task that requires the agent to solve a sequence of program synthesis problems. In KDPS, the agent should use knowledge from the earlier problems to solve the later ones. We propose a novel method based on PushGP to solve the KDPS problem, which takes subprograms as knowledge. The proposed method extracts subprograms from the solution of previously solved problems by the Even Partitioning (EP) method and uses these subprograms to solve the upcoming programming task using Adaptive Replacement Mutation (ARM). We call this method PushGP+EP+ARM. With PushGP+EP+ARM, no human effort is required in the knowledge extraction and utilization processes. We compare the proposed method with PushGP, as well as a method using subprograms manually extracted by a human. Our PushGP+EP+ARM achieves better train error, success count, and faster convergence than PushGP. Additionally, we demonstrate the superiority of PushGP+EP+ARM when consecutively solving a sequence of six program synthesis problems.

preprint2022arXiv

Non-readily identifiable data collaboration analysis for multiple datasets including personal information

Multi-source data fusion, in which multiple data sources are jointly analyzed to obtain improved information, has considerable research attention. For the datasets of multiple medical institutions, data confidentiality and cross-institutional communication are critical. In such cases, data collaboration (DC) analysis by sharing dimensionality-reduced intermediate representations without iterative cross-institutional communications may be appropriate. Identifiability of the shared data is essential when analyzing data including personal information. In this study, the identifiability of the DC analysis is investigated. The results reveals that the shared intermediate representations are readily identifiable to the original data for supervised learning. This study then proposes a non-readily identifiable DC analysis only sharing non-readily identifiable data for multiple medical datasets including personal information. The proposed method solves identifiability concerns based on a random sample permutation, the concept of interpretable DC analysis, and usage of functions that cannot be reconstructed. In numerical experiments on medical datasets, the proposed method exhibits a non-readily identifiability while maintaining a high recognition performance of the conventional DC analysis. For a hospital dataset, the proposed method exhibits a nine percentage point improvement regarding the recognition performance over the local analysis that uses only local dataset.

preprint2021arXiv

A Parallel Computing Method for the Higher Order Tensor Renormalization Group

In this paper, we propose a parallel computing method for the Higher Order Tensor Renormalization Group (HOTRG) applied to a $d$-dimensional $( d \geq 2 )$ simple lattice model. Sequential computation of the HOTRG requires $O ( χ^{4 d - 1} )$ computational cost, where $χ$ is bond dimension, in a step to contract indices of tensors. When we simply distribute elements of a local tensor to each process in parallel computing of the HOTRG, frequent communication between processes occurs. The simplest way to avoid such communication is to hold all the tensor elements in each process, however, it requires $O ( χ^{2d} )$ memory space. In the presented method, placement of a local tensor element to more than one process is accepted and sufficient local tensor elements are distributed to each process to avoid communication between processes during considering computation step. For the bottleneck part of computational cost, such distribution is achieved by distributing elements of two local tensors to $χ^2$ processes according to one of the indices of each local tensor which are not contracted during considering computation. In the case of $d \geq 3$, computational cost in each process is reduced to $O ( χ^{4 d - 3} )$ and memory space requirement in each process is kept to be $O ( χ^{2d - 1} )$.

preprint2021arXiv

Accuracy and Privacy Evaluations of Collaborative Data Analysis

Distributed data analysis without revealing the individual data has recently attracted significant attention in several applications. A collaborative data analysis through sharing dimensionality reduced representations of data has been proposed as a non-model sharing-type federated learning. This paper analyzes the accuracy and privacy evaluations of this novel framework. In the accuracy analysis, we provided sufficient conditions for the equivalence of the collaborative data analysis and the centralized analysis with dimensionality reduction. In the privacy analysis, we proved that collaborative users' private datasets are protected with a double privacy layer against insider and external attacking scenarios.

preprint2021arXiv

Unsupervised learning-based structural analysis: Search for a characteristic low-dimensional space by local structures in atomistic simulations

Owing to the advances in computational techniques and the increase in computational power, atomistic simulations of materials can simulate large systems with higher accuracy. Complex phenomena can be observed in such state-of-the-art atomistic simulations. However, it has become increasingly difficult to understand what is actually happening and mechanisms, for example, in molecular dynamics (MD) simulations. We propose an unsupervised machine learning method to analyze the local structure around a target atom. The proposed method, which uses the two-step locality preserving projections (TS-LPP), can find a low-dimensional space wherein the distributions of datapoints for each atom or groups of atoms can be properly captured. We demonstrate that the method is effective for analyzing the MD simulations of crystalline, liquid, and amorphous states and the melt-quench process from the perspective of local structures. The proposed method is demonstrated on a silicon single-component system, a silicon-germanium binary system, and a copper single-component system.

preprint2016arXiv

Alternating optimization method based on nonnegative matrix factorizations for deep neural networks

The backpropagation algorithm for calculating gradients has been widely used in computation of weights for deep neural networks (DNNs). This method requires derivatives of objective functions and has some difficulties finding appropriate parameters such as learning rate. In this paper, we propose a novel approach for computing weight matrices of fully-connected DNNs by using two types of semi-nonnegative matrix factorizations (semi-NMFs). In this method, optimization processes are performed by calculating weight matrices alternately, and backpropagation (BP) is not used. We also present a method to calculate stacked autoencoder using a NMF. The output results of the autoencoder are used as pre-training data for DNNs. The experimental results show that our method using three types of NMFs attains similar error rates to the conventional DNNs with BP.

preprint2016arXiv

Solving large-scale nonlinear eigenvalue problems by rational interpolation approach and resolvent sampling based Rayleigh-Ritz method

Numerical solution of nonlinear eigenvalue problems (NEPs) is frequently encountered in computational science and engineering. The applicability of most existing methods is limited by matrix structures, property of eigen-solutions, size of the problem, etc. This paper aims to break those limitations and to develop robust and universal NEP solvers for large-scale engineering applications. The novelty lies in two aspects. First, a rational interpolation approach (RIA) is proposed based on the Keldysh theorem for holomorphic matrix functions. Comparing with the existing contour integral approach (CIA), the RIA provides the possibility to select sampling points in more general regions and has advantages in improving accuracy and reducing computational cost. Second, a resolvent sampling scheme using the RIA is proposed for constructing reliable search spaces for the Rayleigh-Ritz procedure, based on which a robust eigen-solver, denoted by RSRR, is developed for solving general NEPs. RSRR can be easily implemented and parallelized. The advantages of the RIA and the performance of RSRR are demonstrated by a variety of benchmark and practical problems.

preprint2015arXiv

Stochastic Estimation of Nuclear Level Density in the Nuclear Shell Model: An Application to Parity-Dependent Level Density in $^{58}$Ni

We introduce a novel method to obtain level densities in large-scale shell-model calculations. Our method is a stochastic estimation of eigenvalue count based on a shifted Krylov-subspace method, which enables us to obtain level densities of huge Hamiltonian matrices. This framework leads to a successful description of both low-lying spectroscopy and the experimentally observed equilibration of $J^π=2^+$ and $2^-$ states in $^{58}$Ni in a unified manner.

preprint2014arXiv

A filtering technique for the temporally reduced matrix of the Wilson fermion determinant

The Wilson fermion determinant can be written in the form of a series expansion in fugacity $ξ=\exp(μ/T)$, provided that the eigenmodes of the temporally reduced operator are obtained. Since the calculation of all eigenmodes rapidly becomes prohibitive for larger volumes, we develop a method to calculate only the low-energy eigenmodes of the reduced matrix using a matrix filetering technique. This provides a basis for an approximation to neglect uninteresting ultraviolet contributions.

preprint2013arXiv

Numerical construction of a low-energy effective Hamiltonian in a self-consistent Bogoliubov-de Gennes approach of superconductivity

We propose a fast and efficient approach for solving the Bogoliubov-de Gennes (BdG) equations in superconductivity, with a numerical matrix-size reduction procedure proposed by Sakurai and Sugiura [J. Comput. Appl. Math. 159, 119 (2003)]. The resultant small-size Hamiltonian contains the information of the original BdG Hamiltonian in a given energy domain. In other words, the present approach leads to a numerical construction of a low-energy effective theory in superconductivity. The combination with the polynomial expansion method allows a self-consistent calculation of the BdG equations. Through numerical calculations of quasi-particle excitations in a vortex lattice, thermal conductivity, and nuclear magnetic relaxation rate, we show that our approach is suitable for evaluating physical quantities in a large-size superconductor and a nano-scale superconducting device, with the mean-field superconducting theory.

preprint2010arXiv

Filter diagonalization of shell-model calculations

We present a method of filter diagonalization for shell-model calculations. This method is based on the Sakurai and Sugiura (SS) method, but extended with help of the shifted complex orthogonal conjugate gradient (COCG) method. A salient feature of this method is that it can calculate eigenvalues and eigenstates in a given energy interval. We show that this method can be an alternative to the Lanczos method for calculating ground and excited states, as well as spectral strength functions. With an application to the $M$-scheme shell-model calculations we demonstrate that several inherent problems in the widely-used Lanczos method can be removed or reduced.

Tetsuya Sakurai

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

A new type of federated clustering: A non-model-sharing approach

Another Use of SMOTE for Interpretable Data Collaboration Analysis

Divide-and-conquer based Large-Scale Spectral Clustering

Knowledge-Driven Program Synthesis via Adaptive Replacement Mutation and Auto-constructed Subprogram Archives

Non-readily identifiable data collaboration analysis for multiple datasets including personal information

A Parallel Computing Method for the Higher Order Tensor Renormalization Group

Accuracy and Privacy Evaluations of Collaborative Data Analysis

Unsupervised learning-based structural analysis: Search for a characteristic low-dimensional space by local structures in atomistic simulations

Alternating optimization method based on nonnegative matrix factorizations for deep neural networks

Solving large-scale nonlinear eigenvalue problems by rational interpolation approach and resolvent sampling based Rayleigh-Ritz method

Stochastic Estimation of Nuclear Level Density in the Nuclear Shell Model: An Application to Parity-Dependent Level Density in $^{58}$Ni

A filtering technique for the temporally reduced matrix of the Wilson fermion determinant

Numerical construction of a low-energy effective Hamiltonian in a self-consistent Bogoliubov-de Gennes approach of superconductivity

Filter diagonalization of shell-model calculations