Source author record

Joe Suzuki

Joe Suzuki appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Information Theory math.IT Artificial Intelligence math.AC math.OC math.PR math.ST Statistics Theory

Catalog footprint

What is connected

11works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Converting ADMM to a Proximal Gradient for Efficient Sparse Estimation

In sparse estimation, such as fused lasso and convex clustering, we apply either the proximal gradient method or the alternating direction method of multipliers (ADMM) to solve the problem. It takes time to include matrix division in the former case, while an efficient method such as FISTA (fast iterative shrinkage-thresholding algorithm) has been developed in the latter case. This paper proposes a general method for converting the ADMM solution to the proximal gradient method, assuming that assumption that the derivative of the objective function is Lipschitz continuous. Then, we apply it to sparse estimation problems, such as sparse convex clustering and trend filtering, and we show by numerical experiments that we can obtain a significant improvement in terms of efficiency.

preprint2016arXiv

A Theoretical Analysis of the BDeu Scores in Bayesian Network Structure Learning

In Bayesian network structure learning (BNSL), we need the prior probability over structures and parameters. If the former is the uniform distribution, the latter determines the correctness of BNSL. In this paper, we compare BDeu (Bayesian Dirichlet equivalent uniform) and Jeffreys' prior w.r.t. their consistency. When we seek a parent set $U$ of a variable $X$, we require regularity that if $H(X|U)\leq H(X|U')$ and $U\subsetneq U'$, then $U$ should be chosen rather than $U'$. We prove that the BDeu scores violate the property and cause fatal situations in BNSL. This is because for the BDeu scores, for any sample size $n$,there exists a probability in the form $P(X,Y,Z)={P(XZ)P(YZ)}/{P(Z)}$ such that the probability of deciding that $X$ and $Y$ are not conditionally independent given $Z$ is more than a half. For Jeffreys' prior, the false-positive probability uniformly converges to zero without depending on any parameter values, and no such an inconvenience occurs.

preprint2015arXiv

Miura: Divisor Class Group Arithmetic

The Package Miura contains functions that compute divisor class group arithmetic for nonsingular curves. The package reduces computation in a divisor class group to that in the ideal class group via the isomorphism. The underlying quotient ring should be over the ideal given by a nonsingular curve in the form of Miura. Although computing the multiplication of two integral ideals is not hard, we need to obtain an ideal such that the shortest Groebner basis component is the minimum in order to obtain the representative of the ideal class. Although the basic procedure is due to Arita, the source code has become much shorter using MaCaulay2. The package is useful not just for computation itself but also for understanding the divisor class group arithmetic from the ideal point of view.

preprint2014arXiv

Causal Discovery in a Binary Exclusive-or Skew Acyclic Model: BExSAM

Discovering causal relations among observed variables in a given data set is a major objective in studies of statistics and artificial intelligence. Recently, some techniques to discover a unique causal model have been explored based on non-Gaussianity of the observed data distribution. However, most of these are limited to continuous data. In this paper, we present a novel causal model for binary data and propose an efficient new approach to deriving the unique causal model governing a given binary data set under skew distributions of external binary noises. Experimental evaluation shows excellent performance for both artificial and real world data sets.

preprint2014arXiv

Identifiability of an Integer Modular Acyclic Additive Noise Model and its Causal Structure Discovery

The notion of causality is used in many situations dealing with uncertainty. We consider the problem whether causality can be identified given data set generated by discrete random variables rather than continuous ones. In particular, for non-binary data, thus far it was only known that causality can be identified except rare cases. In this paper, we present necessary and sufficient condition for an integer modular acyclic additive noise (IMAN) of two variables. In addition, we relate bivariate and multivariate causal identifiability in a more explicit manner, and develop a practical algorithm to find the order of variables and their parent sets. We demonstrate its performance in applications to artificial data and real world body motion data with comparisons to conventional methods.

preprint2014arXiv

Universal Bayesian Measures and Universal Histogram Sequences

Consider universal data compression: the length $l(x^n)$ of sequence $x^n\in A^n$ with finite alphabet $A$ and length $n$ satisfies Kraft's inequality over $A^n$, and $-\frac{1}{n}\log \frac{P^n(x^n)}{Q^n(x^n)}$ almost surely converges to zero as $n$ grows for the $Q^n(x^n)=2^{-l(x^n)}$ and any stationary ergodic source $P$. In this paper, we say such a $Q$ is a universal Bayesian measure. We generalize the notion to the sources in which the random variables may be either discrete, continuous, or none of them. The basic idea is due to Boris Ryabko who utilized model weighting over histograms that approximate $P$, assuming that a density function of $P$ exists. However, the range of $P$ depends on the choice of the histogram sequence. The universal Bayesian measure constructed in this paper overcomes the drawbacks and has many applications to infer relation among random variables, and extends the application area of the minimum description length principle.

preprint2013arXiv

A Construction of Bayesian Networks from Databases Based on an MDL Principle

This paper addresses learning stochastic rules especially on an inter-attribute relation based on a Minimum Description Length (MDL) principle with a finite number of examples, assuming an application to the design of intelligent relational database systems. The stochastic rule in this paper consists of a model giving the structure like the dependencies of a Bayesian Belief Network (BBN) and some stochastic parameters each indicating a conditional probability of an attribute value given the state determined by the other attributes' values in the same record. Especially, we propose the extended version of the algorithm of Chow and Liu in that our learning algorithm selects the model in the range where the dependencies among the attributes are represented by some general plural number of trees.

preprint2012arXiv

Discovering causal structures in binary exclusive-or skew acyclic models

Discovering causal relations among observed variables in a given data set is a main topic in studies of statistics and artificial intelligence. Recently, some techniques to discover an identifiable causal structure have been explored based on non-Gaussianity of the observed data distribution. However, most of these are limited to continuous data. In this paper, we present a novel causal model for binary data and propose a new approach to derive an identifiable causal structure governing the data based on skew Bernoulli distributions of external noise. Experimental evaluation shows excellent performance for both artificial and real world data sets.

preprint2010arXiv

A Generalization of the Chow-Liu Algorithm and its Application to Statistical Learning

We extend the Chow-Liu algorithm for general random variables while the previous versions only considered finite cases. In particular, this paper applies the generalization to Suzuki's learning algorithm that generates from data forests rather than trees based on the minimum description length by balancing the fitness of the data to the forest and the simplicity of the forest. As a result, we successfully obtain an algorithm when both of the Gaussian and finite random variables are present.

preprint2010arXiv

Nonparametric Estimation and On-Line Prediction for General Stationary Ergodic Sources

We proposed a learning algorithm for nonparametric estimation and on-line prediction for general stationary ergodic sources. We prepare histograms each of which estimates the probability as a finite distribution, and mixture them with weights to construct an estimator. The whole analysis is based on measure theory. The estimator works whether the source is discrete or continuous. If it is stationary ergodic, then the measure theoretically given Kullback-Leibler information divided by the sequence length $n$ converges to zero as $n$ goes to infinity. In particular, for continuous sources, the method does not require existence of a probability density function.

preprint2010arXiv

The Hannan-Quinn Proposition for Linear Regression

We consider the variable selection problem in linear regression. Suppose that we have a set of random variables $X_1,...,X_m,Y,ε$ such that $Y=\sum_{k\in π}α_kX_k+ε$ with $π\subseteq \{1,...,m\}$ and $α_k\in {\mathbb R}$ unknown, and $ε$ is independent of any linear combination of $X_1,...,X_m$. Given actually emitted $n$ examples $\{(x_{i,1}...,x_{i,m},y_i)\}_{i=1}^n$ emitted from $(X_1,...,X_m, Y)$, we wish to estimate the true $π$ using information criteria in the form of $H+(k/2)d_n$, where $H$ is the likelihood with respect to $π$ multiplied by -1, and $\{d_n\}$ is a positive real sequence. If $d_n$ is too small, we cannot obtain consistency because of overestimation. For autoregression, Hannan-Quinn proved that, in their setting of $H$ and $k$, the rate $d_n=2\log\log n$ is the minimum satisfying strong consistency. This paper solves the statement affirmative for linear regression as well which has a completely different setting.

Joe Suzuki

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

Converting ADMM to a Proximal Gradient for Efficient Sparse Estimation

A Theoretical Analysis of the BDeu Scores in Bayesian Network Structure Learning

Miura: Divisor Class Group Arithmetic

Causal Discovery in a Binary Exclusive-or Skew Acyclic Model: BExSAM

Identifiability of an Integer Modular Acyclic Additive Noise Model and its Causal Structure Discovery

Universal Bayesian Measures and Universal Histogram Sequences

A Construction of Bayesian Networks from Databases Based on an MDL Principle

Discovering causal structures in binary exclusive-or skew acyclic models

A Generalization of the Chow-Liu Algorithm and its Application to Statistical Learning

Nonparametric Estimation and On-Line Prediction for General Stationary Ergodic Sources

The Hannan-Quinn Proposition for Linear Regression