Researcher profile

Mohammad Hajiesmaili

Mohammad Hajiesmaili contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2026arXiv

The Secretary Problem with Predictions and a Chosen Order

We study a learning-augmented variant of the secretary problem, recently introduced by Fujii and Yoshida (2023), in which the decision-maker has access to machine-learned predictions of candidate values. The central challenge is to balance consistency and robustness: when predictions are accurate, the algorithm should select a near-optimal secretary, while under inaccurate predictions it should still guarantee a bounded competitive ratio. We consider both the classical Random Order Secretary Problem (ROSP), where candidates arrive in a uniformly random order, and a more natural learning-augmented model in which the decision-maker may choose the arrival order based on predicted values. We call this model the Chosen Order Secretary Problem (COSP), capturing scenarios such as interview schedules set in advance. We propose a new randomized algorithm applicable to both ROSP and COSP. Our method switches from fully trusting predictions to a threshold-based rule once a large prediction deviation is detected. Let $ε\in [0,1]$ denote the maximum multiplicative prediction error. For ROSP, our algorithm achieves a competitive ratio of $\max\{0.221, (1-ε)/(1+ε)\}$, improving upon the prior bound of $\max\{0.215, (1-ε)/(1+ε)\}$. For COSP, we achieve $\max\{0.262, (1-ε)/(1+ε)\}$, surpassing the $0.25$ worst-case bound for prior approaches and moving closer to the classical secretary benchmark of $1/e \approx 0.368$. These results highlight the benefit of combining predictions with arrival-order control in online decision-making.

preprint2026arXiv

Unlearning Offline Stochastic Multi-Armed Bandits

Machine unlearning aims to unlearn data points from a learned model, offering a principled way to process data-deletion requests and mitigate privacy risks without full retraining. Prior work has mainly studied unsupervised / supervised machine unlearning, leaving unlearning for sequential decision-making systems far less understood. We initiate the first study of a foundational sequential decision-making problem: offline stochastic multi-armed bandits (MAB). We formalize the privacy constraint for offline MAB and measure utility by the post-unlearning decision quality. We conduct a systematic study of both single- and multi-source unlearning scenarios under two data-generation models, the fixed-sample model and the distribution model. For these settings, our algorithmic design is built on two canonical base algorithms: Gaussian mechanism and rollback, and we propose adaptive algorithms that switch between them according to the data regime and privacy constraint. We further introduce a mixing procedure that elucidates the rationale behind these baselines. We provide performance guarantees across the above settings and establish lower bounds under both dataset models. Experiments validate the predicted tradeoffs and demonstrate the effectiveness of the proposed methods.

preprint2023arXiv

Equitable Network-Aware Decarbonization of Residential Heating at City Scale

Residential heating, primarily powered by natural gas, accounts for a significant portion of residential sector energy use and carbon emissions in many parts of the world. Hence, there is a push towards decarbonizing residential heating by transitioning to energy-efficient heat pumps powered by an increasingly greener and less carbon-intensive electric grid. However, such a transition will add additional load to the electric grid triggering infrastructure upgrades, and subsequently erode the customer base using the gas distribution network. Utilities want to guide these transition efforts to ensure a phased decommissioning of the gas network and deferred electric grid infrastructure upgrades while achieving carbon reduction goals. To facilitate such a transition, we present a network-aware optimization framework for decarbonizing residential heating at city scale with an objective to maximize carbon reduction under budgetary constraints. Our approach operates on a graph representation of the gas network topology to compute the cost of transitioning and select neighborhoods for transition. We further extend our approach to explicitly incorporate equity and ensure an equitable distribution of benefits across different socioeconomic groups. We apply our framework to a city in the New England region of the U.S., using real-world gas usage, electric usage, and grid infrastructure data. We show that our network-aware strategy achieves 55% higher carbon reductions than prior network-oblivious work under the same budget. Our equity-aware strategy achieves an equitable outcome while preserving the carbon reduction benefits of the network-aware strategy.

preprint2022arXiv

Distributed Bandits with Heterogeneous Agents

This paper tackles a multi-agent bandit setting where $M$ agents cooperate together to solve the same instance of a $K$-armed stochastic bandit problem. The agents are \textit{heterogeneous}: each agent has limited access to a local subset of arms and the agents are asynchronous with different gaps between decision-making rounds. The goal for each agent is to find its optimal local arm, and agents can cooperate by sharing their observations with others. While cooperation between agents improves the performance of learning, it comes with an additional complexity of communication between agents. For this heterogeneous multi-agent setting, we propose two learning algorithms, \ucbo and \AAE. We prove that both algorithms achieve order-optimal regret, which is $O\left(\sum_{i:\tildeΔ_i>0} \log T/\tildeΔ_i\right)$, where $\tildeΔ_i$ is the minimum suboptimality gap between the reward mean of arm $i$ and any local optimal arm. In addition, a careful selection of the valuable information for cooperation, \AAE achieves a low communication complexity of $O(\log T)$. Last, numerical experiments verify the efficiency of both algorithms.

preprint2020arXiv

Hedge Your Bets: Optimizing Long-term Cloud Costs by Mixing VM Purchasing Options

Cloud platforms offer the same VMs under many purchasing options that specify different costs and time commitments, such as on-demand, reserved, sustained-use, scheduled reserve, transient, and spot block. In general, the stronger the commitment, i.e., longer and less flexible, the lower the price. However, longer and less flexible time commitments can increase cloud costs for users if future workloads cannot utilize the VMs they committed to buying. Large cloud customers often find it challenging to choose the right mix of purchasing options to reduce their long-term costs, while retaining the ability to adjust capacity up and down in response to workload variations. To address the problem, we design policies to optimize long-term cloud costs by selecting a mix of VM purchasing options based on short- and long-term expectations of workload utilization. We consider a batch trace spanning 4 years from a large shared cluster for a major state University system that includes 14k cores and 60 million job submissions, and evaluate how these jobs could be judiciously executed using cloud servers using our approach. Our results show that our policies incur a cost within 41% of an optimistic optimal offline approach, and 50% less than solely using on-demand VMs.