Researcher profile

Mehdi Jafarnia-Jahromi

Mehdi Jafarnia-Jahromi contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2020arXiv

Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes

Model-free reinforcement learning is known to be memory and computation efficient and more amendable to large scale problems. In this paper, two model-free algorithms are introduced for learning infinite-horizon average-reward Markov Decision Processes (MDPs). The first algorithm reduces the problem to the discounted-reward version and achieves $\mathcal{O}(T^{2/3})$ regret after $T$ steps, under the minimal assumption of weakly communicating MDPs. To our knowledge, this is the first model-free algorithm for general MDPs in this setting. The second algorithm makes use of recent advances in adaptive algorithms for adversarial multi-armed bandits and improves the regret to $\mathcal{O}(\sqrt{T})$, albeit with a stronger ergodic assumption. This result significantly improves over the $\mathcal{O}(T^{3/4})$ regret achieved by the only existing model-free algorithm by Abbasi-Yadkori et al. (2019a) for ergodic MDPs in the infinite-horizon average-reward setting.

preprint2020arXiv

Non-indexability of the Stochastic Appointment Scheduling Problem

Consider a set of jobs with independent random service times to be scheduled on a single machine. The jobs can be surgeries in an operating room, patients' appointments in outpatient clinics, etc. The challenge is to determine the optimal sequence and appointment times of jobs to minimize some function of the server idle time and service start-time delay. We introduce a generalized objective function of delay and idle time, and consider $l_1$-type and $l_2$-type cost functions as special cases of interest. Determining an index-based policy for the optimal sequence in which to schedule jobs has been an open problem for many years. For example, it was conjectured that `least variance first' (LVF) policy is optimal for the $l_1$-type objective. This is known to be true for the case of two jobs with specific distributions. A key result in this paper is that the optimal sequencing problem is non-indexable, i.e., neither the variance, nor any other such index can be used to determine the optimal sequence in which to schedule jobs for $l_1$ and $l_2$-type objectives. We then show that given a sequence in which to schedule the jobs, sample average approximation yields a solution which is statistically consistent.

preprint2020arXiv

PPD: Permutation Phase Defense Against Adversarial Examples in Deep Learning

Deep neural networks have demonstrated cutting edge performance on various tasks including classification. However, it is well known that adversarially designed imperceptible perturbation of the input can mislead advanced classifiers. In this paper, Permutation Phase Defense (PPD), is proposed as a novel method to resist adversarial attacks. PPD combines random permutation of the image with phase component of its Fourier transform. The basic idea behind this approach is to turn adversarial defense problems analogously into symmetric cryptography, which relies solely on safekeeping of the keys for security. In PPD, safe keeping of the selected permutation ensures effectiveness against adversarial attacks. Testing PPD on MNIST and CIFAR-10 datasets yielded state-of-the-art robustness against the most powerful adversarial attacks currently available.