Source author record

Mehdi Jafarnia-Jahromi

Mehdi Jafarnia-Jahromi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence Cryptography and Security math.OC

Catalog footprint

What is connected

3works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes

Model-free reinforcement learning is known to be memory and computation efficient and more amendable to large scale problems. In this paper, two model-free algorithms are introduced for learning infinite-horizon average-reward Markov Decision Processes (MDPs). The first algorithm reduces the problem to the discounted-reward version and achieves $\mathcal{O}(T^{2/3})$ regret after $T$ steps, under the minimal assumption of weakly communicating MDPs. To our knowledge, this is the first model-free algorithm for general MDPs in this setting. The second algorithm makes use of recent advances in adaptive algorithms for adversarial multi-armed bandits and improves the regret to $\mathcal{O}(\sqrt{T})$, albeit with a stronger ergodic assumption. This result significantly improves over the $\mathcal{O}(T^{3/4})$ regret achieved by the only existing model-free algorithm by Abbasi-Yadkori et al. (2019a) for ergodic MDPs in the infinite-horizon average-reward setting.

preprint2020arXiv

Non-indexability of the Stochastic Appointment Scheduling Problem

Consider a set of jobs with independent random service times to be scheduled on a single machine. The jobs can be surgeries in an operating room, patients' appointments in outpatient clinics, etc. The challenge is to determine the optimal sequence and appointment times of jobs to minimize some function of the server idle time and service start-time delay. We introduce a generalized objective function of delay and idle time, and consider $l_1$-type and $l_2$-type cost functions as special cases of interest. Determining an index-based policy for the optimal sequence in which to schedule jobs has been an open problem for many years. For example, it was conjectured that `least variance first' (LVF) policy is optimal for the $l_1$-type objective. This is known to be true for the case of two jobs with specific distributions. A key result in this paper is that the optimal sequencing problem is non-indexable, i.e., neither the variance, nor any other such index can be used to determine the optimal sequence in which to schedule jobs for $l_1$ and $l_2$-type objectives. We then show that given a sequence in which to schedule the jobs, sample average approximation yields a solution which is statistically consistent.

preprint2020arXiv

PPD: Permutation Phase Defense Against Adversarial Examples in Deep Learning

Deep neural networks have demonstrated cutting edge performance on various tasks including classification. However, it is well known that adversarially designed imperceptible perturbation of the input can mislead advanced classifiers. In this paper, Permutation Phase Defense (PPD), is proposed as a novel method to resist adversarial attacks. PPD combines random permutation of the image with phase component of its Fourier transform. The basic idea behind this approach is to turn adversarial defense problems analogously into symmetric cryptography, which relies solely on safekeeping of the keys for security. In PPD, safe keeping of the selected permutation ensures effectiveness against adversarial attacks. Testing PPD on MNIST and CIFAR-10 datasets yielded state-of-the-art robustness against the most powerful adversarial attacks currently available.

Mehdi Jafarnia-Jahromi

What is connected

Connect this record

See the researcher in context

Building this map preview

3 published item(s)

Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes

Non-indexability of the Stochastic Appointment Scheduling Problem

PPD: Permutation Phase Defense Against Adversarial Examples in Deep Learning