Researcher profile

Yuji Nakatsukasa

Yuji Nakatsukasa contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
19works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

19 published item(s)

preprint2026arXiv

Matrix perturbation analysis of methods for extracting singular values from approximate singular subspaces

Given (orthonormal) approximations $\tilde{U}$ and $\tilde{V}$ to the left and right subspaces spanned by the leading singular vectors of a matrix $A$, we discuss methods to approximate the leading singular values of $A$ and study their accuracy. In particular, we focus our analysis on the generalized Nyström approximation, as surprisingly, it is able to obtain significantly better accuracy than classical methods, namely Rayleigh-Ritz and (one-sided) projected SVD. A key idea of the analysis is to view the methods as finding the exact singular values of a perturbation of $A$. In this context, we derive a matrix perturbation result that exploits the structure of such $2\times2$ block matrix perturbation. Furthermore, we extend it to block tridiagonal matrices. We then obtain bounds on the accuracy of the extracted singular values. This leads to sharp bounds that predict well the approximation error trends and explain the difference in the behavior of these methods. Finally, we present an approach to derive an a-posteriori version of those bounds, which are more amenable to computation in practice.

preprint2024arXiv

Fast randomized numerical rank estimation for numerically low-rank matrices

Matrices with low-rank structure are ubiquitous in scientific computing. Choosing an appropriate rank is a key step in many computational algorithms that exploit low-rank structure. However, estimating the rank has been done largely in an ad-hoc fashion in large-scale settings. In this work we develop a randomized algorithm for estimating the numerical rank of a (numerically low-rank) matrix. The algorithm is based on sketching the matrix with random matrices from both left and right; the key fact is that with high probability, the sketches preserve the orders of magnitude of the leading singular values. We prove a result on the accuracy of the sketched singular values and show that gaps in the spectrum are detected. For an $m\times n$ $(m\geq n)$ matrix of numerical rank $r$, the algorithm runs with complexity $O(mn\log n+r^3)$, or less for structured matrices. The steps in the algorithm are required as a part of many low-rank algorithms, so the additional work required to estimate the rank can be even smaller in practice. Numerical experiments illustrate the speed and robustness of our rank estimator.

preprint2022arXiv

A Structure-Preserving Divide-and-Conquer Method for Pseudosymmetric Matrices

We devise a spectral divide-and-conquer scheme for matrices that are self-adjoint with respect to a given indefinite scalar product (i.e. pseudosymmetic matrices). The pseudosymmetric structure of the matrix is preserved in the spectral division, such that the method can be applied recursively to achieve full diagonalization. The method is well-suited for structured matrices that come up in computational quantum physics and chemistry. In this application context, additional definiteness properties guarantee a convergence of the matrix sign function iteration within two steps when Zolotarev functions are used. The steps are easily parallelizable. Furthermore, it is shown that the matrix decouples into symmetric definite eigenvalue problems after just one step of spectral division.

preprint2022arXiv

Fast & Accurate Randomized Algorithms for Linear Systems and Eigenvalue Problems

This paper develops a new class of algorithms for general linear systems and eigenvalue problems. These algorithms apply fast randomized sketching to accelerate subspace projection methods, such as GMRES and Rayleigh--Ritz. This approach offers great flexibility in designing the basis for the approximation subspace, which can improve scalability in many computational environments. The resulting algorithms outperform the classic methods with minimal loss of accuracy. For model problems, numerical experiments show large advantages over MATLAB's optimized routines, including a $100 \times$ speedup over gmres and a $10 \times$ speedup over eigs.

preprint2022arXiv

Randomized algorithms for Tikhonov regularization in linear least squares

We describe two algorithms to efficiently solve regularized linear least squares systems based on sketching. The algorithms compute preconditioners for $\min \|Ax-b\|^2_2 + λ\|x\|^2_2$, where $A\in\mathbb{R}^{m\times n}$ and $λ>0$ is a regularization parameter, such that LSQR converges in $\mathcal{O}(\log(1/ε))$ iterations for $ε$ accuracy. We focus on the context where the optimal regularization parameter is unknown, and the system must be solved for a number of parameters $λ$. Our algorithms are applicable in both the underdetermined $m\ll n$ and the overdetermined $m\gg n$ setting. Firstly, we propose a Cholesky-based sketch-to-precondition algorithm that uses a `partly exact' sketch, and only requires one sketch for a set of $N$ regularization parameters $λ$. The complexity of solving for $N$ parameters is $\mathcal{O}(mn\log(\max(m,n)) +N(\min(m,n)^3 + mn\log(1/ε)))$. Secondly, we introduce an algorithm that uses a sketch of size $\mathcal{O}(\text{sd}_λ(A))$ for the case where the statistical dimension $\text{sd}_λ(A)\ll\min(m,n)$. The scheme we propose does not require the computation of the Gram matrix, resulting in a more stable scheme than existing algorithms in this context. We can solve for $N$ values of $λ_i$ in $\mathcal{O}(mn\log(\max(m,n)) + \min(m,n)\,\text{sd}_{\minλ_i}(A)^2 + Nmn\log(1/ε))$ operations.

preprint2022arXiv

Stochastic diagonal estimation: probabilistic bounds and an improved algorithm

We study the problem of estimating the diagonal of an implicitly given matrix $A$. For such a matrix we have access to an oracle that allows us to evaluate the matrix vector product $Av$. For random variable $v$ drawn from an appropriate distribution, this may be used to return an estimate of the diagonal of the matrix $A$. Whilst results exist for probabilistic guarantees relating to the error of estimates of the trace of $A$, no such results have yet been derived for the diagonal. We analyse the number of queries $s$ required to guarantee that with probability at least $1-δ$ the estimates of the relative error of the diagonal entries is at most $\varepsilon$. We extend this analysis to the 2-norm of the difference between the estimate and the diagonal of $A$. We prove, discuss and experiment with bounds on the number of queries $s$ required to guarantee a probabilistic bound on the estimates of the diagonal by employing Rademacher and Gaussian random variables. Two sufficient upper bounds on the minimum number of query vectors are proved, extending the work of Avron and Toledo [JACM 58(2)8, 2011], and later work of Roosta-Khorasani and Ascher [FoCM 15, 1187-1212, 2015]. We find that, generally, there is little difference between the two, with convergence going as $O(\log(1/δ)/\varepsilon^2)$ for individual diagonal elements. However for small $s$, we find that the Rademacher estimator is superior. These results allow us to then extend the ideas of Meyer, Musco, Musco and Woodruff [SOSA, 142-155, 2021], suggesting algorithm Diag++, to speed up the convergence of diagonal estimation from $O(1/\varepsilon^2)$ to $O(1/\varepsilon)$ and make it robust to the spectrum of any positive semi-definite matrix $A$.

preprint2020arXiv

Error localization of best L1 polynomial approximants

An important observation in compressed sensing is that the $\ell_0$ minimizer of an underdetermined linear system is equal to the $\ell_1$ minimizer when there exists a sparse solution vector and a certain restricted isometry property holds. Here, we develop a continuous analogue of this observation and show that the best $L_0$ and $L_1$ polynomial approximants of a polynomial that is corrupted on a set of small measure are nearly equal. We go on to demonstrate an error localization property of best $L_1$ polynomial approximants and use our observations to develop an improved algorithm for computing best $L_1$ polynomial approximants to continuous functions.

preprint2020arXiv

Exponential node clustering at singularities for rational approximation, quadrature, and PDEs

Rational approximations of functions with singularities can converge at a root-exponential rate if the poles are exponentially clustered. We begin by reviewing this effect in minimax, least-squares, and AAA approximations on intervals and complex domains, conformal mapping, and the numerical solution of Laplace, Helmholtz, and biharmonic equations by the "lightning" method. Extensive and wide-ranging numerical experiments are involved. We then present further experiments showing that in all of these applications, it is advantageous to use exponential clustering whose density on a logarithmic scale is not uniform but tapers off linearly to zero near the singularity. We give a theoretical explanation of the tapering effect based on the Hermite contour integral and potential theory, showing that tapering doubles the rate of convergence. Finally we show that related mathematics applies to the relationship between exponential (not tapered) and doubly exponential (tapered) quadrature formulas. Here it is the Gauss--Takahasi--Mori contour integral that comes into play.

preprint2020arXiv

Fast Graph Sampling Set Selection Using Gershgorin Disc Alignment

Graph sampling set selection, where a subset of nodes are chosen to collect samples to reconstruct a smooth graph signal, is a fundamental problem in graph signal processing (GSP). Previous works employ an unbiased least-squares (LS) signal reconstruction scheme and select samples via expensive extreme eigenvector computation. Instead, we assume a biased graph Laplacian regularization (GLR) based scheme that solves a system of linear equations for reconstruction. We then choose samples to minimize the condition number of the coefficient matrix---specifically, maximize the smallest eigenvalue $λ_{\min}$. Circumventing explicit eigenvalue computation, we maximize instead the lower bound of $λ_{\min}$, designated by the smallest left-end of all Gershgorin discs of the matrix. To achieve this efficiently, we first convert the optimization to a dual problem, where we minimize the number of samples needed to align all Gershgorin disc left-ends at a chosen lower-bound target $T$. Algebraically, the dual problem amounts to optimizing two disc operations: i) shifting of disc centers due to sampling, and ii) scaling of disc radii due to a similarity transformation of the matrix. We further reinterpret the dual as an intuitive disc coverage problem bearing strong resemblance to the famous NP-hard set cover (SC) problem. The reinterpretation enables us to derive a fast approximation scheme from a known SC error-bounded approximation algorithm. We find an appropriate target $T$ efficiently via binary search. Extensive simulation experiments show that our disc-based sampling algorithm runs substantially faster than existing sampling schemes and outperforms other eigen-decomposition-free sampling schemes in reconstruction error.

preprint2019arXiv

Classification of chaotic time series with deep learning

We use standard deep neural networks to classify univariate time series generated by discrete and continuous dynamical systems based on their chaotic or non-chaotic behaviour. Our approach to circumvent the lack of precise models for some of the most challenging real-life applications is to train different neural networks on a data set from a dynamical system with a basic or low-dimensional phase space and then use these networks to classify univariate time series of a dynamical system with more intricate or high-dimensional phase space. We illustrate this generalisation approach using the logistic map, the sine-circle map, the Lorenz system, and the Kuramoto--Sivashinsky equation. We observe that a convolutional neural network without batch normalization layers outperforms state-of-the-art neural networks for time series classification and is able to generalise and classify time series as chaotic or not with high accuracy.

preprint2019arXiv

Sharp error bounds for Ritz vectors and approximate singular vectors

We derive sharp bounds for the accuracy of approximate eigenvectors (Ritz vectors) obtained by the Rayleigh-Ritz process for symmetric eigenvalue problems. Using information that is available or easy to estimate, our bounds improve the classical Davis-Kahan $\sinθ$ theorem by a factor that can be arbitrarily large, and can give nontrivial information even when the $\sinθ$ theorem suggests that a Ritz vector might have no accuracy at all. We also present extensions in three directions, deriving error bounds for invariant subspaces, singular vectors and subspaces computed by a (Petrov-Galerkin) projection SVD method, and eigenvectors of self-adjoint operators on a Hilbert space.

preprint2016arXiv

Finding a low-rank basis in a matrix subspace

For a given matrix subspace, how can we find a basis that consists of low-rank matrices? This is a generalization of the sparse vector problem. It turns out that when the subspace is spanned by rank-1 matrices, the matrices can be obtained by the tensor CP decomposition. For the higher rank case, the situation is not as straightforward. In this work we present an algorithm based on a greedy process applicable to higher rank problems. Our algorithm first estimates the minimum rank by applying soft singular value thresholding to a nuclear norm relaxation, and then computes a matrix with that rank using the method of alternating projections. We provide local convergence results, and compare our algorithm with several alternative approaches. Applications include data compression beyond the classical truncated SVD, computing accurate eigenvectors of a near-multiple eigenvalue, image separation and graph Laplacian eigenproblems.

preprint2014arXiv

A logarithmic minimization property of the unitary polar factor in the spectral norm and the Frobenius matrix norm

The unitary polar factor $Q=U$ in the polar decomposition of the matrix $Z=UH$ is the minimizer for both $\| \mathrm{Log}(Q^* Z)\|^2$ and its Hermitian part $\| \mathrm{sym Log}(Q^* Z)\|^2$ over both $\mathbb{R}$ and $\mathbb{C}$, for any given invertible matrix $Z$ in $\mathbb{C}^{n\times n}$ and any matrix logarithm $\mathrm{Log}$, not necessarily the principal logarithm $\mathrm{log}$. We prove this for the spectral matrix norm in any dimension and for the Frobenius matrix norm in two and three dimensions. The result shows that the unitary polar factor is the nearest orthogonal matrix to $Z$ not only in the normwise sense, but also in a geodesic distance. The derivation is based on Bhatia's generalization of Bernstein's trace inequality for the matrix exponential and a new sum of squared logarithms inequality.

preprint2013arXiv

The minimization of matrix logarithms - on a fundamental property of the unitary polar factor

We show that the unitary factor U in the polar decomposition of a nonsingular matrix Z = U H is the minimizer for both ||Log(Q^* Z)|| and ||sym (Log(Q^*Z))|| over unitary Q, for any given invertible complex n-times-n matrix Z, for any unitarily invariant norm and any n. We prove that U is the unique matrix with this property. As important tools we use a generalized Bernstein trace inequality and the theory of majorization.

preprint2012arXiv

Mysteries around the graph Laplacian eigenvalue 4

We describe our current understanding on the phase transition phenomenon of the graph Laplacian eigenvectors constructed on a certain type of unweighted trees, which we previously observed through our numerical experiments. The eigenvalue distribution for such a tree is a smooth bell-shaped curve starting from the eigenvalue 0 up to 4. Then, at the eigenvalue 4, there is a sudden jump. Interestingly, the eigenvectors corresponding to the eigenvalues below 4 are semi-global oscillations (like Fourier modes) over the entire tree or one of the branches; on the other hand, those corresponding to the eigenvalues above 4 are much more localized and concentrated (like wavelets) around junctions/branching vertices. For a special class of trees called starlike trees, we obtain a complete understanding of such phase transition phenomenon. For a general graph, we prove the number of the eigenvalues larger than 4 is bounded from above by the number of vertices whose degrees is strictly higher than 2. Moreover, we also prove that if a graph contains a branching path, then the magnitudes of the components of any eigenvector corresponding to the eigenvalue greater than 4 decay exponentially from the branching vertex toward the leaf of that branch.

preprint2011arXiv

On the condition numbers of a multiple generalized eigenvalue

For standard eigenvalue problems, a closed-form expression for the condition numbers of a multiple eigenvalue is known. In particular, they are uniformly 1 in the Hermitian case, and generally take different values in the non-Hermitian case. We consider the generalized eigenvalue problem and identify the condition numbers of a multiple eigenvalue. Our main result is that a multiple eigenvalue generally has multiple condition numbers, even in the Hermitian definite case. The condition numbers are characterized in terms of the singular values of the outer product of the corresponding left and right eigenvectors.

preprint2010arXiv

Gerschgorin's theorem for generalized eigenvalue problems in the Euclidean metric

We present Gerschgorin-type eigenvalue inclusion sets applicable to generalized eigenvalue problems.Our sets are defined by circles in the complex plane in the standard Euclidean metric, and are easier to compute than known similar results.As one application we use our results to provide a forward error analysis for a computed eigenvalue of a diagonalizable pencil.

preprint2010arXiv

Perturbation bounds of eigenvalues of Hermitian matrices with block structures

We derive new perturbation bounds for eigenvalues of Hermitian matrices with block structures. The structures we consider range from a standard 2-by-2 block form to block tridiagonal and tridigaonal forms. The main idea is the observation that an eigenvalue is insensitive to componentwise perturbations if the corresponding eigenvector components are small. We show that the same idea can be used to explain two well-known phenomena, one concerning extremal eigenvalues of Wilkinson's matrices and another concerning the efficiency of aggressive early deflation applied to the symmetric tridiagonal QR algorithm.

preprint2010arXiv

Quadratic perturbation bounds for generalized eigenvalue problems

We prove quadratic eigenvalue perturbation bounds for generalized Hermitian eigenvalue problems. The bounds are proportional to the square of the norm of the perturbation matrices divided by the gap between the spectrums. Using the results we provide a simple derivation of the first-order perturbation expansion of a multiple eigenvalue, whose trailing term is tighter than known results. We also present quadratic bounds for the non-Hermitian case.