Source author record

Michael N. Katehakis

Michael N. Katehakis appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.OC math.PR

Catalog footprint

What is connected

12works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

Ameso Optimization: a Relaxation of Discrete Midpoint Convexity

In this paper we introduce the Ameso optimization problem, a special class of discrete optimization problems. We establish its basic properties and investigate the relation between Ameso optimization and the convex optimization. Further, we design an algorithm to solve a multi-dimensional Ameso problem by solving a sequence of one-dimensional Ameso problems. Finally, we demonstrate how the knapsack problem can be solved using the Ameso optimization framework.

preprint2015arXiv

A Comparative Analysis of the Successive Lumping and the Lattice Path Counting Algorithms

This article provides a comparison of the successive lumping (SL) methodology with the popular lattice path counting algorithm in obtaining rate matrices for queueing models, satisfying the quasi birth and death structure. The two methodologies are compared both in terms of applicability requirements and numerical complexity by analyzing their performance for the same classical queueing models. The main findings are: i) When both methods are applicable SL based algorithms outperform the lattice path counting algorithm (LPCA). ii) There are important classes of problems (e.g., models with (level) non-homogenous rates or with finite state spaces) for which the SL methodology is applicable and for which the LPCA cannot be used. iii) Another main advantage of successive lumping algorithms over LPCAs is that the former includes a method to compute the steady state distribution using this rate matrix.

preprint2015arXiv

An Asymptotically Optimal Policy for Uniform Bandits of Unknown Support

Consider the problem of a controller sampling sequentially from a finite number of $N \geq 2$ populations, specified by random variables $X^i_k$, $ i = 1,\ldots , N,$ and $k = 1, 2, \ldots$; where $X^i_k$ denotes the outcome from population $i$ the $k^{th}$ time it is sampled. It is assumed that for each fixed $i$, $\{ X^i_k \}_{k \geq 1}$ is a sequence of i.i.d. uniform random variables over some interval $[a_i, b_i]$, with the support (i.e., $a_i, b_i$) unknown to the controller. The objective is to have a policy $π$ for deciding, based on available data, from which of the $N$ populations to sample from at any time $n=1,2,\ldots$ so as to maximize the expected sum of outcomes of $n$ samples or equivalently to minimize the regret due to lack on information of the parameters $\{ a_i \}$ and $\{ b_i \}$. In this paper, we present a simple inflated sample mean (ISM) type policy that is asymptotically optimal in the sense of its regret achieving the asymptotic lower bound of Burnetas and Katehakis (1996). Additionally, finite horizon regret bounds are given.

preprint2015arXiv

Asymptotic Behavior of Minimal-Exploration Allocation Policies: Almost Sure, Arbitrarily Slow Growing Regret

The purpose of this paper is to provide further understanding into the structure of the sequential allocation ("stochastic multi-armed bandit", or MAB) problem by establishing probability one finite horizon bounds and convergence rates for the sample (or "pseudo") regret associated with two simple classes of allocation policies $π$. For any slowly increasing function $g$, subject to mild regularity constraints, we construct two policies (the $g$-Forcing, and the $g$-Inflated Sample Mean) that achieve a measure of regret of order $ O(g(n))$ almost surely as $n \to \infty$, bound from above and below. Additionally, almost sure upper and lower bounds on the remainder term are established. In the constructions herein, the function $g$ effectively controls the "exploration" of the classical "exploration/exploitation" tradeoff.

preprint2015arXiv

Asymptotically Optimal Multi-Armed Bandit Policies under a Cost Constraint

We develop asymptotically optimal policies for the multi armed bandit (MAB), problem, under a cost constraint. This model is applicable in situations where each sample (or activation) from a population (bandit) incurs a known bandit dependent cost. Successive samples from each population are iid random variables with unknown distribution. The objective is to design a feasible policy for deciding from which population to sample from, so as to maximize the expected sum of outcomes of $n$ total samples or equivalently to minimize the regret due to lack on information on sample distributions, For this problem we consider the class of feasible uniformly fast (f-UF) convergent policies, that satisfy the cost constraint sample-path wise. We first establish a necessary asymptotic lower bound for the rate of increase of the regret function of f-UF policies. Then we construct a class of f-UF policies and provide conditions under which they are asymptotically optimal within the class of f-UF policies, achieving this asymptotic lower bound. At the end we provide the explicit form of such policies for the case in which the unknown distributions are Normal with unknown means and known variances.

preprint2015arXiv

Asymptotically Optimal Sequential Experimentation Under Generalized Ranking

We consider the \mnk{classical} problem of a controller activating (or sampling) sequentially from a finite number of $N \geq 2$ populations, specified by unknown distributions. Over some time horizon, at each time $n = 1, 2, \ldots$, the controller wishes to select a population to sample, with the goal of sampling from a population that optimizes some "score" function of its distribution, e.g., maximizing the expected sum of outcomes or minimizing variability. We define a class of \textit{Uniformly Fast (UF)} sampling policies and show, under mild regularity conditions, that there is an asymptotic lower bound for the expected total number of sub-optimal population activations. Then, we provide sufficient conditions under which a UCB policy is UF and asymptotically optimal, since it attains this lower bound. Explicit solutions are provided for a number of examples of interest, including general score functionals on unconstrained Pareto distributions (of potentially infinite mean), and uniform distributions of unknown support. Additional results on bandits of Normal distributions are also provided.

preprint2015arXiv

Cash-Flow Based Dynamic Inventory Management

Small-to-medium size enterprises (SMEs), including many startup firms, need to manage interrelated flows of cash and inventories of goods. In this paper, we model a firm that can finance its inventory (ordered or manufactured) with loans in order to meet random demand which in general may not be time stationary. The firm earns interest on its cash on hand and pays interest on its debt. The objective is to maximize the expected value of the firm's %working capital at the end of a finite planning horizon. Our study shows that the optimal ordering policy is characterized by a pair of threshold variables for each period as function of the initial state of the period. Further, upper and lower bounds for the threshold values are developed using two simple-to-compute ordering policies. Based on these bounds, we provide an efficient algorithm to compute the two threshold values. Since the underlying state space is two-dimensional which leads to high computational complexity of the optimization algorithm, we also derive upper bounds for the optimal value function by reducing the optimization problem to one dimension. Subsequently, it is shown that policies of similar structure are optimal when the loan and deposit interest rates are piecewise linear functions, when there is a maximal loan limit and when unsatisfied demand is backordered. Finally, further managerial insights are provided with numerical studies.

preprint2015arXiv

Dynamic Pricing in a Dual Market Environment

This paper is concerned with the determination of pricing strategies for a firm that in each period of a finite horizon receives replenishment quantities of a single product which it sells in two markets, e.g., a long-distance market and an on-site market. The key difference between the two markets is that the long-distance market provides for a one period delay in demand fulfillment. In contrast, on-site orders must be filled immediately as the customer is at the physical on-site location. We model the demands in consecutive periods as independent random variables and their distributions depend on the item's price in accordance with two general stochastic demand functions: additive or multiplicative. The firm uses a single pool of inventory to fulfill demands from both markets. We investigate properties of the structure of the dynamic pricing strategy that maximizes the total expected discounted profit over the finite time horizon, under fixed or controlled replenishment conditions. Further, we provide conditions under which one market may be the preferred outlet to sale over the other.

preprint2015arXiv

Inventory Control Involving Unknown Demand of Discrete Nonperishable Items - Analysis of a Newsvendor-based Policy

Inventory control with unknown demand distribution is considered, with emphasis placed on the case involving discrete nonperishable items. We focus on an adaptive policy which in every period uses, as much as possible, the optimal newsvendor ordering quantity for the empirical distribution learned up to that period. The policy is assessed using the regret criterion, which measures the price paid for ambiguity on demand distribution over $T$ periods. When there are guarantees on the latter's separation from the critical newsvendor parameter $β=b/(h+b)$, a constant upper bound on regret can be found. Without any prior information on the demand distribution, we show that the regret does not grow faster than the rate $T^{1/2+ε}$ for any $ε>0$. In view of a known lower bound, this is almost the best one could hope for. Simulation studies involving this along with other policies are also conducted.

preprint2015arXiv

Level product form QSF processes and an analysis of queues with Coxian inter-arrival distribution

In this paper we study a class of Quasi-Skipfree (QSF) processes where the transition rate submatrices in the skipfree direction have a column times row structure. Under homogeneity and irreducibility assumptions we show that the stationary distributions of these processes have a product form as a function of the level. For an application, we will discuss the ${\it Cox(k)}/M^Y/1$-queue, that can be modelled as a QSF process on a two-dimensional state space. In addition we study the properties of the stationary distribution and derive monotonicity of the mean number of the customers in the queue, their mean sojourn time and the variance as a function of $k$ for fixed mean arrival rate.

preprint2015arXiv

Normal Bandits of Unknown Means and Variances: Asymptotic Optimality, Finite Horizon Regret Bounds, and a Solution to an Open Problem

Consider the problem of sampling sequentially from a finite number of $N \geq 2$ populations, specified by random variables $X^i_k$, $ i = 1,\ldots , N,$ and $k = 1, 2, \ldots$; where $X^i_k$ denotes the outcome from population $i$ the $k^{th}$ time it is sampled. It is assumed that for each fixed $i$, $\{ X^i_k \}_{k \geq 1}$ is a sequence of i.i.d. normal random variables, with unknown mean $μ_i$ and unknown variance $σ_i^2$. The objective is to have a policy $π$ for deciding from which of the $N$ populations to sample form at any time $n=1,2,\ldots$ so as to maximize the expected sum of outcomes of $n$ samples or equivalently to minimize the regret due to lack on information of the parameters $μ_i$ and $σ_i^2$. In this paper, we present a simple inflated sample mean (ISM) index policy that is asymptotically optimal in the sense of Theorem 4 below. This resolves a standing open problem from Burnetas and Katehakis (1996). Additionally, finite horizon regret bounds are given.

preprint2015arXiv

On the Solution to a Countable System of Equations Arising in Stochastic Processes

In this paper we develop a method to compute the solution to a countable (finite or infinite) set of equations that occurs in many different fields including Markov processes that model queueing systems, birth-and-death processes and inventory systems. The method provides a fast and exact computation of the inverse of the matrix of the coefficients of the system. In contrast, alternative inverse techniques perform much slower and work only for finite size matrices. Furthermore, we provide a procedure to construct the eigenvalues of the matrix under consideration.

Michael N. Katehakis

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

Ameso Optimization: a Relaxation of Discrete Midpoint Convexity

A Comparative Analysis of the Successive Lumping and the Lattice Path Counting Algorithms

An Asymptotically Optimal Policy for Uniform Bandits of Unknown Support

Asymptotic Behavior of Minimal-Exploration Allocation Policies: Almost Sure, Arbitrarily Slow Growing Regret

Asymptotically Optimal Multi-Armed Bandit Policies under a Cost Constraint

Asymptotically Optimal Sequential Experimentation Under Generalized Ranking

Cash-Flow Based Dynamic Inventory Management

Dynamic Pricing in a Dual Market Environment

Inventory Control Involving Unknown Demand of Discrete Nonperishable Items - Analysis of a Newsvendor-based Policy

Level product form QSF processes and an analysis of queues with Coxian inter-arrival distribution

Normal Bandits of Unknown Means and Variances: Asymptotic Optimality, Finite Horizon Regret Bounds, and a Solution to an Open Problem

On the Solution to a Countable System of Equations Arising in Stochastic Processes