Source author record

Xian Yu

Xian Yu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Artificial Intelligence eess.SY Machine Learning Systems and Control

Catalog footprint

What is connected

4works

5topics

3close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Learning to Cut: Reinforcement Learning for Benders Decomposition

Benders decomposition (BD) is a widely used solution approach for solving two-stage stochastic programs arising in real-world decision-making under uncertainty. However, it often suffers from slow convergence as the master problem grows with an increasing number of cuts. In this paper, we propose Reinforcement Learning for BD (RLBD), a framework that adaptively selects cuts using a neural network-based stochastic policy. The policy is trained using a policy gradient method via the REINFORCE algorithm. We evaluate the proposed approach on a two-stage stochastic electric vehicle charging station location problem and compare it with vanilla BD and LearnBD, a supervised learning approach that classifies cuts using a support vector machine. Numerical results demonstrate that RLBD achieves substantial improvements in computational efficiency and exhibits strong generalization to problems with similar structures but varying data inputs and decision variable dimensions.

preprint2023arXiv

Risk-Averse Reinforcement Learning via Dynamic Time-Consistent Risk Measures

Traditional reinforcement learning (RL) aims to maximize the expected total reward, while the risk of uncertain outcomes needs to be controlled to ensure reliable performance in a risk-averse setting. In this paper, we consider the problem of maximizing dynamic risk of a sequence of rewards in infinite-horizon Markov Decision Processes (MDPs). We adapt the Expected Conditional Risk Measures (ECRMs) to the infinite-horizon risk-averse MDP and prove its time consistency. Using a convex combination of expectation and conditional value-at-risk (CVaR) as a special one-step conditional risk measure, we reformulate the risk-averse MDP as a risk-neutral counterpart with augmented action space and manipulation on the immediate rewards. We further prove that the related Bellman operator is a contraction mapping, which guarantees the convergence of any value-based RL algorithms. Accordingly, we develop a risk-averse deep Q-learning framework, and our numerical studies based on two simple MDPs show that the risk-averse setting can reduce the variance and enhance robustness of the results.

preprint2022arXiv

On the Value of Multistage Risk-Averse Stochastic Facility Location With or Without Prioritization

We consider a multiperiod stochastic capacitated facility location problem under uncertain demand and budget in each period. Using a scenario tree representation of the uncertainties, we formulate a multistage stochastic integer program to dynamically locate facilities in each period and compare it with a two-stage approach that determines the facility locations up front. In the multistage model, in each stage, a decision maker optimizes facility locations and recourse flows from open facilities to demand sites, to minimize certain risk measures of the cost associated with current facility location and shipment decisions. When the budget is also uncertain, a popular modeling framework is to prioritize the candidate sites. In the two-stage model, the priority list is decided in advance and fixed through all periods, while in the multistage model, the priority list can change adaptively. In each period, the decision maker follows the priority list to open facilities according to the realized budget, and optimizes recourse flows given the realized demand. Using expected conditional risk measures (ECRMs), we derive tight lower bounds for the gaps between the optimal objective values of risk-averse multistage models and their two-stage counterparts in both settings with and without prioritization. Moreover, we propose two approximation algorithms to efficiently solve risk-averse two-stage and multistage models without prioritization, which are asymptotically optimal under an expanding market assumption. We also design a set of super-valid inequalities for risk-averse two-stage and multistage stochastic programs with prioritization to reduce the computational time. We conduct numerical studies using both randomly generated and real-world instances with diverse sizes, to demonstrate the tightness of the analytical bounds and efficacy of the approximation algorithms and prioritization cuts.

preprint2022arXiv

Resource Distribution Under Spatiotemporal Uncertainty of Disease Spread: Stochastic versus Robust Approaches

We consider the problem of optimizing locations of distribution centers (DCs) and plans for distributing resources such as test kits and vaccines, under spatiotemporal uncertainties of disease spread and demand for the resources. We aim to balance the operational cost (including costs of deploying facilities, shipping, and storage) and quality of service (reflected by demand coverage), while ensuring equity and fairness of resource distribution across multiple populations. We compare a sample-based stochastic programming (SP) approach with a distributionally robust optimization (DRO) approach using a moment-based ambiguity set. Numerical studies are conducted on instances of distributing COVID-19 vaccines in the United States and test kits, to compare SP and DRO models with a deterministic formulation using estimated demand and with the current resource distribution plans implemented in the US. We demonstrate the results over distinct phases of the pandemic to estimate the cost and speed of resource distribution depending on scale and coverage, and show the ``demand-driven'' properties of the SP and DRO solutions. Our results further indicate that if the worst-case unmet demand is prioritized, then the DRO approach is preferred despite of its higher overall cost. Nevertheless, the SP approach can provide an intermediate plan under budgetary restrictions without significant compromises in demand coverage.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint