Source author record

Makoto Aoshima

Makoto Aoshima appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.ST Statistics Theory Machine Learning Methodology

Catalog footprint

What is connected

5works

4topics

2close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2016arXiv

Two-sample tests for high-dimension, strongly spiked eigenvalue models

We consider two-sample tests for high-dimensional data under two disjoint models: the strongly spiked eigenvalue (SSE) model and the non-SSE (NSSE) model. We provide a general test statistic as a function of a positive-semidefinite matrix. We give sufficient conditions for the test statistic to satisfy a consistency property and to be asymptotically normal. We discuss an optimality of the test statistic under the NSSE model. We also investigate the test statistic under the SSE model by considering strongly spiked eigenstructures and create a new effective test procedure for the SSE model. Finally, we discuss the performance of the classifiers numerically.

preprint2015arXiv

Asymptotic properties of the first principal component and equality tests of covariance matrices in high-dimension, low-sample-size context

A common feature of high-dimensional data is that the data dimension is high, however, the sample size is relatively low. We call such data HDLSS data. In this paper, we study asymptotic properties of the first principal component in the HDLSS context and apply them to equality tests of covariance matrices for high dimensional data sets. We consider HDLSS asymptotic theories as the dimension grows for both the cases when the sample size is fixed and the sample size goes to infinity. We introduce an eigenvalue estimator by the noise-reduction methodology and provide asymptotic distributions of the largest eigenvalue in the HDLSS context. We construct a confidence interval of the first contribution ratio. We give asymptotic properties both for the first PC direction and PC score as well. We apply the findings to equality tests of two covariance matrices in the HDLSS context. We provide numerical results and discussions about the performances both on the estimates of the first PC and the equality tests of two covariance matrices.

preprint2015arXiv

High-dimensional inference on covariance structures via the extended cross-data-matrix methodology

In this paper, we consider testing the correlation coefficient matrix between two subsets of high-dimensional variables. We produce a test statistic by using the extended cross-data-matrix (ECDM) methodology and show the unbiasedness of ECDM estimator. We also show that the ECDM estimator has the consistency property and the asymptotic normality in high-dimensional settings. We propose a test procedure by the ECDM estimator and evaluate its asymptotic size and power theoretically and numerically. We give several applications of the ECDM estimator. Finally, we demonstrate how the test procedure performs in actual data analyses by using a microarray data set.

preprint2015arXiv

High-dimensional quadratic classifiers in non-sparse settings

We consider high-dimensional quadratic classifiers in non-sparse settings. The target of classification rules is not Bayes error rates in the context. The classifier based on the Mahalanobis distance does not always give a preferable performance even if the populations are normal distributions having known covariance matrices. The quadratic classifiers proposed in this paper draw information about heterogeneity effectively through both the differences of expanding mean vectors and covariance matrices. We show that they hold a consistency property in which misclassification rates tend to zero as the dimension goes to infinity under non-sparse settings. We verify that they are asymptotically distributed as a normal distribution under certain conditions. We also propose a quadratic classifier after feature selection by using both the differences of mean vectors and covariance matrices. Finally, we discuss performances of the classifiers in actual data analyses. The proposed classifiers achieve highly accurate classification with very low computational costs.

preprint2015arXiv

Principal component analysis based clustering for high-dimension, low-sample-size data

In this paper, we consider clustering based on principal component analysis (PCA) for high-dimension, low-sample-size (HDLSS) data. We give theoretical reasons why PCA is effective for clustering HDLSS data. First, we derive a geometric representation of HDLSS data taken from a two-class mixture model. With the help of the geometric representation, we give geometric consistency properties of sample principal component scores in the HDLSS context. We develop ideas of the geometric representation and geometric consistency properties to multiclass mixture models. We show that PCA can classify HDLSS data under certain conditions in a surprisingly explicit way. Finally, we demonstrate the performance of the clustering by using microarray data sets.

Makoto Aoshima

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Two-sample tests for high-dimension, strongly spiked eigenvalue models

Asymptotic properties of the first principal component and equality tests of covariance matrices in high-dimension, low-sample-size context

High-dimensional inference on covariance structures via the extended cross-data-matrix methodology

High-dimensional quadratic classifiers in non-sparse settings

Principal component analysis based clustering for high-dimension, low-sample-size data