Source author record

Junya Honda

Junya Honda appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.ST Statistics Theory Information Theory math.IT math.PR Data Structures and Algorithms Social and Information Networks

Catalog footprint

What is connected

16works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

The Survival Bandit Problem

We introduce and study a new variant of the multi-armed bandit problem (MAB), called the survival bandit problem (S-MAB). While in both problems, the objective is to maximize the so-called cumulative reward, in this new variant, the procedure is interrupted if the cumulative reward falls below a preset threshold. This simple yet unexplored extension of the MAB follows from many practical applications. For example, when testing two medicines against each other on voluntary patients, people's health are at stake, and it is necessary to be able to interrupt experiments if serious side effects occur or if the disease syndromes are not dissipated by the treatment. From a theoretical perspective, the S-MAB is the first variant of the MAB where the procedure may or may not be interrupted. We start by formalizing the S-MAB and we define its objective as the minimization of the so-called survival regret, which naturally generalizes the regret of the MAB. Then, we show that the objective of the S-MAB is considerably more difficult than the MAB, in the sense that contrary to the MAB, no policy can achieve a reasonably small (i.e., sublinear) survival regret. Instead, we minimize the survival regret in the sense of Pareto, i.e., we seek a policy whose cumulative reward cannot be improved for some problem instance without being sacrificed for another one. For that purpose, we identify two key components in the survival regret: the regret given no ruin (which corresponds to the regret in the MAB), and the probability that the procedure is interrupted, called the probability of ruin. We derive a lower bound on the probability of ruin, as well as policies whose probability of ruin matches the lower bound. Finally, based on a doubling trick on those policies, we derive a policy which minimizes the survival regret in the sense of Pareto, giving an answer to an open problem by Perotto et al. (COLT 2019).

preprint2022arXiv

Adversarially Robust Multi-Armed Bandit Algorithm with Variance-Dependent Regret Bounds

This paper considers the multi-armed bandit (MAB) problem and provides a new best-of-both-worlds (BOBW) algorithm that works nearly optimally in both stochastic and adversarial settings. In stochastic settings, some existing BOBW algorithms achieve tight gap-dependent regret bounds of $O(\sum_{i: Δ_i>0} \frac{\log T}{Δ_i})$ for suboptimality gap $Δ_i$ of arm $i$ and time horizon $T$. As Audibert et al. [2007] have shown, however, that the performance can be improved in stochastic environments with low-variance arms. In fact, they have provided a stochastic MAB algorithm with gap-variance-dependent regret bounds of $O(\sum_{i: Δ_i>0} (\frac{σ_i^2}{Δ_i} + 1) \log T )$ for loss variance $σ_i^2$ of arm $i$. In this paper, we propose the first BOBW algorithm with gap-variance-dependent bounds, showing that the variance information can be used even in the possibly adversarial environment. Further, the leading constant factor in our gap-variance dependent bound is only (almost) twice the value for the lower bound. Additionally, the proposed algorithm enjoys multiple data-dependent regret bounds in adversarial settings and works well in stochastic settings with adversarial corruptions. The proposed algorithm is based on the follow-the-regularized-leader method and employs adaptive learning rates that depend on the empirical prediction error of the loss, which leads to gap-variance-dependent regret bounds reflecting the variance of the arms.

preprint2022arXiv

Mediated Uncoupled Learning: Learning Functions without Direct Input-output Correspondences

Ordinary supervised learning is useful when we have paired training data of input $X$ and output $Y$. However, such paired data can be difficult to collect in practice. In this paper, we consider the task of predicting $Y$ from $X$ when we have no paired data of them, but we have two separate, independent datasets of $X$ and $Y$ each observed with some mediating variable $U$, that is, we have two datasets $S_X = \{(X_i, U_i)\}$ and $S_Y = \{(U'_j, Y'_j)\}$. A naive approach is to predict $U$ from $X$ using $S_X$ and then $Y$ from $U$ using $S_Y$, but we show that this is not statistically consistent. Moreover, predicting $U$ can be more difficult than predicting $Y$ in practice, e.g., when $U$ has higher dimensionality. To circumvent the difficulty, we propose a new method that avoids predicting $U$ but directly learns $Y = f(X)$ by training $f(X)$ with $S_{X}$ to predict $h(U)$ which is trained with $S_{Y}$ to approximate $Y$. We prove statistical consistency and error bounds of our method and experimentally confirm its practical usefulness.

preprint2020arXiv

Online Dense Subgraph Discovery via Blurred-Graph Feedback

Dense subgraph discovery aims to find a dense component in edge-weighted graphs. This is a fundamental graph-mining task with a variety of applications and thus has received much attention recently. Although most existing methods assume that each individual edge weight is easily obtained, such an assumption is not necessarily valid in practice. In this paper, we introduce a novel learning problem for dense subgraph discovery in which a learner queries edge subsets rather than only single edges and observes a noisy sum of edge weights in a queried subset. For this problem, we first propose a polynomial-time algorithm that obtains a nearly-optimal solution with high probability. Moreover, to deal with large-sized graphs, we design a more scalable algorithm with a theoretical guarantee. Computational experiments using real-world graphs demonstrate the effectiveness of our algorithms.

preprint2020arXiv

Time-varying Gaussian Process Bandit Optimization with Non-constant Evaluation Time

The Gaussian process bandit is a problem in which we want to find a maximizer of a black-box function with the minimum number of function evaluations. If the black-box function varies with time, then time-varying Bayesian optimization is a promising framework. However, a drawback with current methods is in the assumption that the evaluation time for every observation is constant, which can be unrealistic for many practical applications, e.g., recommender systems and environmental monitoring. As a result, the performance of current methods can be degraded when this assumption is violated. To cope with this problem, we propose a novel time-varying Bayesian optimization algorithm that can effectively handle the non-constant evaluation time. Furthermore, we theoretically establish a regret bound of our algorithm. Our bound elucidates that a pattern of the evaluation time sequence can hugely affect the difficulty of the problem. We also provide experimental results to validate the practical effectiveness of the proposed method.

preprint2016arXiv

Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm

We study the K-armed dueling bandit problem, a variation of the standard stochastic bandit problem where the feedback is limited to relative comparisons of a pair of arms. The hardness of recommending Copeland winners, the arms that beat the greatest number of other arms, is characterized by deriving an asymptotic regret bound. We propose Copeland Winners Relative Minimum Empirical Divergence (CW-RMED) and derive an asymptotically optimal regret bound for it. However, it is not known whether the algorithm can be efficiently computed or not. To address this issue, we devise an efficient version (ECW-RMED) and derive its asymptotic regret bound. Experimental comparisons of dueling bandit algorithms show that ECW-RMED significantly outperforms existing ones.

preprint2016arXiv

Variable-to-Fixed Length Homophonic Coding with a Modified Shannon-Fano-Elias Code

Homophonic coding is a framework to reversibly convert a message into a sequence with some target distribution. This is a promising tool to generate a codeword with a biased code-symbol distribution, which is required for capacity-achieving communication by asymmetric channels. It is known that asymptotically optimal homophonic coding can be realized by a Fixed-to-Variable (FV) length code using an interval algorithm similar to a random number generator. However, FV codes are not preferable as a component of channel codes since a decoding error propagates to all subsequent codewords. As a solution for this problem an asymptotically optimal Variable-to-Fixed (VF) length homophonic code, dual Shannon-Fano-Elias-Gray (dual SFEG) code, is proposed in this paper. This code can be interpreted as a dual of a modified Shannon-Fano-Elias (SFE) code based on Gray code. It is also shown as a by-product that the modified SFE code, named SFEG code, achieves a better coding rate than the original SFE code in lossless source coding.

preprint2015arXiv

Almost Instantaneous Fix-to-Variable Length Codes

We propose almost instantaneous fixed-to-variable-length (AIFV) codes such that two (resp. $K-1$) code trees are used if code symbols are binary (resp. $K$-ary for $K \geq 3$), and source symbols are assigned to incomplete internal nodes in addition to leaves. Although the AIFV codes are not instantaneous codes, they are devised such that the decoding delay is at most two bits (resp. one code symbol) in the case of binary (resp. $K$-ary) code alphabet. The AIFV code can attain better average compression rate than the Huffman code at the expenses of a little decoding delay and a little large memory size to store multiple code trees. We also show for the binary and ternary AIFV codes that the optimal AIFV code can be obtained by solving 0-1 integer programming problems.

preprint2015arXiv

Exact Asymptotics for the Random Coding Error Probability

Error probabilities of random codes for memoryless channels are considered in this paper. In the area of communication systems, admissible error probability is very small and it is sometimes more important to discuss the relative gap between the achievable error probability and its bound than to discuss the absolute gap. Scarlett et al. derived a good upper bound of a random coding union bound based on the technique of saddlepoint approximation but it is not proved that the relative gap of their bound converges to zero. This paper derives a new bound on the achievable error probability in this viewpoint for a class of memoryless channels. The derived bound is strictly smaller than that by Scarlett et al. and its relative gap with the random coding error probability (not a union bound) vanishes as the block length increases for a fixed coding rate.

preprint2015arXiv

Normal Bandits of Unknown Means and Variances: Asymptotic Optimality, Finite Horizon Regret Bounds, and a Solution to an Open Problem

Consider the problem of sampling sequentially from a finite number of $N \geq 2$ populations, specified by random variables $X^i_k$, $ i = 1,\ldots , N,$ and $k = 1, 2, \ldots$; where $X^i_k$ denotes the outcome from population $i$ the $k^{th}$ time it is sampled. It is assumed that for each fixed $i$, $\{ X^i_k \}_{k \geq 1}$ is a sequence of i.i.d. normal random variables, with unknown mean $μ_i$ and unknown variance $σ_i^2$. The objective is to have a policy $π$ for deciding from which of the $N$ populations to sample form at any time $n=1,2,\ldots$ so as to maximize the expected sum of outcomes of $n$ samples or equivalently to minimize the regret due to lack on information of the parameters $μ_i$ and $σ_i^2$. In this paper, we present a simple inflated sample mean (ISM) index policy that is asymptotically optimal in the sense of Theorem 4 below. This resolves a standing open problem from Burnetas and Katehakis (1996). Additionally, finite horizon regret bounds are given.

preprint2015arXiv

Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem

We study the $K$-armed dueling bandit problem, a variation of the standard stochastic bandit problem where the feedback is limited to relative comparisons of a pair of arms. We introduce a tight asymptotic regret lower bound that is based on the information divergence. An algorithm that is inspired by the Deterministic Minimum Empirical Divergence algorithm (Honda and Takemura, 2010) is proposed, and its regret is analyzed. The proposed algorithm is found to be the first one with a regret upper bound that matches the lower bound. Experimental comparisons of dueling bandit algorithms show that the proposed algorithm significantly outperforms existing ones.

preprint2015arXiv

Regret Lower Bound and Optimal Algorithm in Finite Stochastic Partial Monitoring

Partial monitoring is a general model for sequential learning with limited feedback formalized as a game between two players. In this game, the learner chooses an action and at the same time the opponent chooses an outcome, then the learner suffers a loss and receives a feedback signal. The goal of the learner is to minimize the total loss. In this paper, we study partial monitoring with finite actions and stochastic outcomes. We derive a logarithmic distribution-dependent regret lower bound that defines the hardness of the problem. Inspired by the DMED algorithm (Honda and Takemura, 2010) for the multi-armed bandit problem, we propose PM-DMED, an algorithm that minimizes the distribution-dependent regret. PM-DMED significantly outperforms state-of-the-art algorithms in numerical experiments. To show the optimality of PM-DMED with respect to the regret bound, we slightly modify the algorithm by introducing a hinge function (PM-DMED-Hinge). Then, we derive an asymptotically optimal regret upper bound of PM-DMED-Hinge that matches the lower bound.

preprint2013arXiv

Optimality of Thompson Sampling for Gaussian Bandits Depends on Priors

In stochastic bandit problems, a Bayesian policy called Thompson sampling (TS) has recently attracted much attention for its excellent empirical performance. However, the theoretical analysis of this policy is difficult and its asymptotic optimality is only proved for one-parameter models. In this paper we discuss the optimality of TS for the model of normal distributions with unknown means and variances as one of the most fundamental example of multiparameter models. First we prove that the expected regret of TS with the uniform prior achieves the theoretical bound, which is the first result to show that the asymptotic bound is achievable for the normal distribution model. Next we prove that TS with Jeffreys prior and reference prior cannot achieve the theoretical bound. Therefore the choice of priors is important for TS and non-informative priors are sometimes risky in cases of multiparameter models.

preprint2012arXiv

Finite-time Regret Bound of a Bandit Algorithm for the Semi-bounded Support Model

In this paper we consider stochastic multiarmed bandit problems. Recently a policy, DMED, is proposed and proved to achieve the asymptotic bound for the model that each reward distribution is supported in a known bounded interval, e.g. [0,1]. However, the derived regret bound is described in an asymptotic form and the performance in finite time has been unknown. We inspect this policy and derive a finite-time regret bound by refining large deviation probabilities to a simple finite form. Further, this observation reveals that the assumption on the lower-boundedness of the support is not essential and can be replaced with a weaker one, the existence of the moment generating function.

preprint2011arXiv

Stochastic Bandit Based on Empirical Moments

In the multiarmed bandit problem a gambler chooses an arm of a slot machine to pull considering a tradeoff between exploration and exploitation. We study the stochastic bandit problem where each arm has a reward distribution supported in a known bounded interval, e.g. [0,1]. For this model, policies which take into account the empirical variances (i.e. second moments) of the arms are known to perform effectively. In this paper, we generalize this idea and we propose a policy which exploits the first d empirical moments for arbitrary d fixed in advance. The asymptotic upper bound of the regret of the policy approaches the theoretical bound by Burnetas and Katehakis as d increases. By choosing appropriate d, the proposed policy realizes a tradeoff between the computational complexity and the expected regret.

preprint2010arXiv

An Asymptotically Optimal Policy for Finite Support Models in the Multiarmed Bandit Problem

We propose minimum empirical divergence (MED) policy for the multiarmed bandit problem. We prove asymptotic optimality of the proposed policy for the case of finite support models. In our setting, Burnetas and Katehakis has already proposed an asymptotically optimal policy. For choosing an arm our policy uses a criterion which is dual to the quantity used in Burnetas and Katehakis. Our criterion is easily computed by a convex optimization technique and has an advantage in practical implementation. We confirm by simulations that MED policy demonstrates good performance in finite time in comparison to other currently popular policies.

Junya Honda

What is connected

Connect this record

See the researcher in context

Building this map preview

16 published item(s)

The Survival Bandit Problem

Adversarially Robust Multi-Armed Bandit Algorithm with Variance-Dependent Regret Bounds

Mediated Uncoupled Learning: Learning Functions without Direct Input-output Correspondences

Online Dense Subgraph Discovery via Blurred-Graph Feedback

Time-varying Gaussian Process Bandit Optimization with Non-constant Evaluation Time

Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm

Variable-to-Fixed Length Homophonic Coding with a Modified Shannon-Fano-Elias Code

Almost Instantaneous Fix-to-Variable Length Codes

Exact Asymptotics for the Random Coding Error Probability

Normal Bandits of Unknown Means and Variances: Asymptotic Optimality, Finite Horizon Regret Bounds, and a Solution to an Open Problem

Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem

Regret Lower Bound and Optimal Algorithm in Finite Stochastic Partial Monitoring

Optimality of Thompson Sampling for Gaussian Bandits Depends on Priors

Finite-time Regret Bound of a Bandit Algorithm for the Semi-bounded Support Model

Stochastic Bandit Based on Empirical Moments

An Asymptotically Optimal Policy for Finite Support Models in the Multiarmed Bandit Problem