Source author record

Katsuyuki Hagiwara

Katsuyuki Hagiwara appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Methodology

Catalog footprint

What is connected

3works

2topics

0close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Bridging between soft and hard thresholding by scaling

In this article, we developed and analyzed a thresholding method in which soft thresholding estimators are independently expanded by empirical scaling values. The scaling values have a common hyper-parameter that is an order of expansion of an ideal scaling value that achieves hard thresholding. We simply call this estimator a scaled soft thresholding estimator. The scaled soft thresholding is a general method that includes the soft thresholding and non-negative garrote as special cases and gives an another derivation of adaptive LASSO. We then derived the degree of freedom of the scaled soft thresholding by means of the Stein's unbiased risk estimate and found that it is decomposed into the degree of freedom of soft thresholding and the reminder connecting to hard thresholding. In this meaning, the scaled soft thresholding gives a natural bridge between soft and hard thresholding methods. Since the degree of freedom represents the degree of over-fitting, this result implies that there are two sources of over-fitting in the scaled soft thresholding. The first source originated from soft thresholding is determined by the number of un-removed coefficients and is a natural measure of the degree of over-fitting. We analyzed the second source in a particular case of the scaled soft thresholding by referring a known result for hard thresholding. We then found that, in a sparse, large sample and non-parametric setting, the second source is largely determined by coefficient estimates whose true values are zeros and has an influence on over-fitting when threshold levels are around noise levels in those coefficient estimates. In a simple numerical example, these theoretical implications has well explained the behavior of the degree of freedom. Moreover, based on the results here and some known facts, we explained the behaviors of risks of soft, hard and scaled soft thresholding methods.

preprint2022arXiv

On gradient descent training under data augmentation with on-line noisy copies

In machine learning, data augmentation (DA) is a technique for improving the generalization performance. In this paper, we mainly considered gradient descent of linear regression under DA using noisy copies of datasets, in which noise is injected into inputs. We analyzed the situation where random noisy copies are newly generated and used at each epoch; i.e., the case of using on-line noisy copies. Therefore, it is viewed as an analysis on a method using noise injection into training process by DA manner; i.e., on-line version of DA. We derived the averaged behavior of training process under three situations which are the full-batch training under the sum of squared errors, the full-batch and mini-batch training under the mean squared error. We showed that, in all cases, training for DA with on-line copies is approximately equivalent to a ridge regularization whose regularization parameter corresponds to the variance of injected noise. On the other hand, we showed that the learning rate is multiplied by the number of noisy copies plus one in full-batch under the sum of squared errors and the mini-batch under the mean squared error; i.e., DA with on-line copies yields apparent acceleration of training. The apparent acceleration and regularization effect come from the original part and noise in a copy data respectively. These results are confirmed in a numerical experiment. In the numerical experiment, we found that our result can be approximately applied to usual off-line DA in under-parameterization scenario and can not in over-parametrization scenario. Moreover, we experimentally investigated the training process of neural networks under DA with off-line noisy copies and found that our analysis on linear regression is possible to be applied to neural networks.

preprint2016arXiv

Adaptive scaling for soft-thresholding estimator

Soft-thresholding is a sparse modeling method that is typically applied to wavelet denoising in statistical signal processing and analysis. It has a single parameter that controls a threshold level on wavelet coefficients and, simultaneously, amount of shrinkage for coefficients of un-removed components. This parametrization is possible to cause excess shrinkage, thus, estimation bias at a sparse representation; i.e. there is a dilemma between sparsity and prediction accuracy. To relax this problem, we considered to introduce positive scaling on soft-thresholding estimator, by which threshold level and amount of shrinkage are independently controlled. Especially, in this paper, we proposed component-wise and data-dependent scaling in a setting of non-parametric orthogonal regression problem including discrete wavelet transform. We call our scaling method adaptive scaling. We here employed soft-thresholding method based on LARS(least angle regression), by which the model selection problem reduces to the determination of the number of un-removed components. We derived a risk under LARS-based soft-thresholding with the proposed adaptive scaling and established a model selection criterion as an unbiased estimate of the risk. We also analyzed some properties of the risk curve and found that the model selection criterion is possible to select a model with low risk and high sparsity compared to a naive soft-thresholding method. This theoretical speculation was verified by a simple numerical experiment and an application to wavelet denoising.