Source author record

Kenji Yamanishi

Kenji Yamanishi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Applications physics.soc-ph Computation and Language Computer Vision eess.IV Information Theory math.IT Methodology Social and Information Networks

Catalog footprint

What is connected

9works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Bandit and Delayed Feedback in Online Structured Prediction

Online structured prediction is a task of sequentially predicting outputs with complex structures based on inputs and past observations, encompassing online classification. Recent studies showed that in the full-information setting, we can achieve finite bounds on the \textit{surrogate regret}, \textit{i.e.,}~the extra target loss relative to the best possible surrogate loss. In practice, however, full-information feedback is often unrealistic as it requires immediate access to the whole structure of complex outputs. Motivated by this, we propose algorithms that work with less demanding feedback, \textit{bandit} and \textit{delayed} feedback. For bandit feedback, by using a standard inverse-weighted gradient estimator, we achieve a surrogate regret bound of $O(\sqrt{KT})$ for the time horizon $T$ and the size of the output set $K$. However, $K$ can be extremely large when outputs are highly complex, resulting in an undesirable bound. To address this issue, we propose another algorithm that achieves a surrogate regret bound of $O(T^{2/3})$, which is independent of $K$. This is achieved with a carefully designed pseudo-inverse matrix estimator. Furthermore, we numerically compare the performance of these algorithms, as well as existing ones. Regarding delayed feedback, we provide algorithms and regret analyses that cover various scenarios, including full-information and bandit feedback, as well as fixed and variable delays.

preprint2021arXiv

Detecting Change Signs with Differential MDL Change Statistics for COVID-19 Pandemic Analysis

We are concerned with the issue of detecting changes and their signs from a data stream. For example, when given time series of COVID-19 cases in a region, we may raise early warning signals of outbreaks by detecting signs of changes in the cases. We propose a novel methodology to address this issue. The key idea is to employ a new information-theoretic notion, which we call the differential minimum description length change statistics (D-MDL), for measuring the scores of change sign. We first give a fundamental theory for D-MDL. We then demonstrate its effectiveness using synthetic datasets. We apply it to detecting early warning signals of the COVID-19 epidemic. We empirically demonstrate that D-MDL is able to raise early warning signals of events such as significant increase/decrease of cases. Remarkably, for about $64\%$ of the events of significant increase of cases in 37 studied countries, our method can detect warning signals as early as nearly six days on average before the events, buying considerably long time for making responses. We further relate the warning signals to the basic reproduction number $R0$ and the timing of social distancing. The results showed that our method can effectively monitor the dynamics of $R0$, and confirmed the effectiveness of social distancing at containing the epidemic in a region. We conclude that our method is a promising approach to the pandemic analysis from a data science viewpoint. The software for the experiments is available at https://github.com/IbarakikenYukishi/differential-mdl-change-statistics. An online detection system is available at https://ibarakikenyukishi.github.io/d-mdl-html/index.html

preprint2020arXiv

A Novel Global Spatial Attention Mechanism in Convolutional Neural Network for Medical Image Classification

Spatial attention has been introduced to convolutional neural networks (CNNs) for improving both their performance and interpretability in visual tasks including image classification. The essence of the spatial attention is to learn a weight map which represents the relative importance of activations within the same layer or channel. All existing attention mechanisms are local attentions in the sense that weight maps are image-specific. However, in the medical field, there are cases that all the images should share the same weight map because the set of images record the same kind of symptom related to the same object and thereby share the same structural content. In this paper, we thus propose a novel global spatial attention mechanism in CNNs mainly for medical image classification. The global weight map is instantiated by a decision boundary between important pixels and unimportant pixels. And we propose to realize the decision boundary by a binary classifier in which the intensities of all images at a pixel are the features of the pixel. The binary classification is integrated into an image classification CNN and is to be optimized together with the CNN. Experiments on two medical image datasets and one facial expression dataset showed that with the proposed attention, not only the performance of four powerful CNNs which are GoogleNet, VGG, ResNet, and DenseNet can be improved, but also meaningful attended regions can be obtained, which is beneficial for understanding the content of images of a domain.

preprint2020arXiv

Long-tailed distributions of inter-event times as mixtures of exponential distributions

Inter-event times of various human behavior are apparently non-Poissonian and obey long-tailed distributions as opposed to exponential distributions, which correspond to Poisson processes. It has been suggested that human individuals may switch between different states in each of which they are regarded to generate events obeying a Poisson process. If this is the case, inter-event times should approximately obey a mixture of exponential distributions with different parameter values. In the present study, we introduce the minimum description length principle to compare mixtures of exponential distributions with different numbers of components (i.e., constituent exponential distributions). Because these distributions violate the identifiability property, one is mathematically not allowed to apply the Akaike or Bayes information criteria to their maximum likelihood estimator to carry out model selection. We overcome this theoretical barrier by applying a minimum description principle to joint likelihoods of the data and latent variables. We show that mixtures of exponential distributions with a few components are selected as opposed to more complex mixtures in various data sets and that the fitting accuracy is comparable to that of state-of-the-art algorithms to fit power-law distributions to data. Our results lend support to Poissonian explanations of apparently non-Poissonian human behavior.

preprint2020arXiv

Mixture Complexity and Its Application to Gradual Clustering Change Detection

In model-based clustering using finite mixture models, it is a significant challenge to determine the number of clusters (cluster size). It used to be equal to the number of mixture components (mixture size); however, this may not be valid in the presence of overlaps or weight biases. In this study, we propose to continuously measure the cluster size in a mixture model by a new concept called mixture complexity (MC). It is formally defined from the viewpoint of information theory and can be seen as a natural extension of the cluster size considering overlap and weight bias. Subsequently, we apply MC to the issue of gradual clustering change detection. Conventionally, clustering changes has been considered to be abrupt, induced by the changes in the mixture size or cluster size. Meanwhile, we consider the clustering changes to be gradual in terms of MC; it has the benefits of finding the changes earlier and discerning the significant and insignificant changes. We further demonstrate that the MC can be decomposed according to the hierarchical structures of the mixture models; it helps us to analyze the detail of substructures.

preprint2020arXiv

Word2vec Skip-gram Dimensionality Selection via Sequential Normalized Maximum Likelihood

In this paper, we propose a novel information criteria-based approach to select the dimensionality of the word2vec Skip-gram (SG). From the perspective of the probability theory, SG is considered as an implicit probability distribution estimation under the assumption that there exists a true contextual distribution among words. Therefore, we apply information criteria with the aim of selecting the best dimensionality so that the corresponding model can be as close as possible to the true distribution. We examine the following information criteria for the dimensionality selection problem: the Akaike Information Criterion, Bayesian Information Criterion, and Sequential Normalized Maximum Likelihood (SNML) criterion. SNML is the total codelength required for the sequential encoding of a data sequence on the basis of the minimum description length. The proposed approach is applied to both the original SG model and the SG Negative Sampling model to clarify the idea of using information criteria. Additionally, as the original SNML suffers from computational disadvantages, we introduce novel heuristics for its efficient computation. Moreover, we empirically demonstrate that SNML outperforms both BIC and AIC. In comparison with other evaluation methods for word embedding, the dimensionality selected by SNML is significantly closer to the optimal dimensionality obtained by word analogy or word similarity tasks.

preprint2016arXiv

Predicting Glaucoma Visual Field Loss by Hierarchically Aggregating Clustering-based Predictors

This study addresses the issue of predicting the glaucomatous visual field loss from patient disease datasets. Our goal is to accurately predict the progress of the disease in individual patients. As very few measurements are available for each patient, it is difficult to produce good predictors for individuals. A recently proposed clustering-based method enhances the power of prediction using patient data with similar spatiotemporal patterns. Each patient is categorized into a cluster of patients, and a predictive model is constructed using all of the data in the class. Predictions are highly dependent on the quality of clustering, but it is difficult to identify the best clustering method. Thus, we propose a method for aggregating cluster-based predictors to obtain better prediction accuracy than from a single cluster-based prediction. Further, the method shows very high performances by hierarchically aggregating experts generated from several cluster-based methods. We use real datasets to demonstrate that our method performs significantly better than conventional clustering-based and patient-wise regression methods, because the hierarchical aggregating strategy has a mechanism whereby good predictors in a small community can thrive.

preprint2012arXiv

Normalized Maximum Likelihood Coding for Exponential Family with Its Applications to Optimal Clustering

We are concerned with the issue of how to calculate the normalized maximum likelihood (NML) code-length. There is a problem that the normalization term of the NML code-length may diverge when it is continuous and unbounded and a straightforward computation of it is highly expensive when the data domain is finite . In previous works it has been investigated how to calculate the NML code-length for specific types of distributions. We first propose a general method for computing the NML code-length for the exponential family. Then we specifically focus on Gaussian mixture model (GMM), and propose a new efficient method for computing the NML to them. We develop it by generalizing Rissanen's re-normalizing technique. Then we apply this method to the clustering issue, in which a clustering structure is modeled using a GMM, and the main task is to estimate the optimal number of clusters on the basis of the NML code-length. We demonstrate using artificial data sets the superiority of the NML-based clustering over other criteria such as AIC, BIC in terms of the data size required for high accuracy rate to be achieved.

preprint2011arXiv

Discovering Emerging Topics in Social Streams via Link Anomaly Detection

Detection of emerging topics are now receiving renewed interest motivated by the rapid growth of social networks. Conventional term-frequency-based approaches may not be appropriate in this context, because the information exchanged are not only texts but also images, URLs, and videos. We focus on the social aspects of theses networks. That is, the links between users that are generated dynamically intentionally or unintentionally through replies, mentions, and retweets. We propose a probability model of the mentioning behaviour of a social network user, and propose to detect the emergence of a new topic from the anomaly measured through the model. We combine the proposed mention anomaly score with a recently proposed change-point detection technique based on the Sequentially Discounting Normalized Maximum Likelihood (SDNML), or with Kleinberg's burst model. Aggregating anomaly scores from hundreds of users, we show that we can detect emerging topics only based on the reply/mention relationships in social network posts. We demonstrate our technique in a number of real data sets we gathered from Twitter. The experiments show that the proposed mention-anomaly-based approaches can detect new topics at least as early as the conventional term-frequency-based approach, and sometimes much earlier when the keyword is ill-defined.

Kenji Yamanishi

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Bandit and Delayed Feedback in Online Structured Prediction

Detecting Change Signs with Differential MDL Change Statistics for COVID-19 Pandemic Analysis

A Novel Global Spatial Attention Mechanism in Convolutional Neural Network for Medical Image Classification

Long-tailed distributions of inter-event times as mixtures of exponential distributions

Mixture Complexity and Its Application to Gradual Clustering Change Detection

Word2vec Skip-gram Dimensionality Selection via Sequential Normalized Maximum Likelihood

Predicting Glaucoma Visual Field Loss by Hierarchically Aggregating Clustering-based Predictors

Normalized Maximum Likelihood Coding for Exponential Family with Its Applications to Optimal Clustering

Discovering Emerging Topics in Social Streams via Link Anomaly Detection