Source author record

Yingqian Zhang

Yingqian Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Machine Learning Computer Science and Game Theory math.OC Multiagent Systems Neural and Evolutionary Computing Symbolic Computation

Catalog footprint

What is connected

10works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Adversarial Instance Generation and Robust Training for Neural Combinatorial Optimization with Multiple Objectives

Deep reinforcement learning (DRL) has shown great promise in addressing multi-objective combinatorial optimization problems (MOCOPs). Nevertheless, the robustness of these learning-based solvers has remained insufficiently explored, especially across diverse and complex problem distributions. In this paper, we propose a unified robustness-oriented framework for preference-conditioned DRL solvers for MOCOPs. Within this framework, we develop a preference-based adversarial attack to generate hard instances that expose solver weaknesses, and quantify the attack impact by the resulting degradation on Pareto-front quality. We further introduce a defense strategy that integrates hardness-aware preference selection into adversarial training to reduce overfitting to restricted preference regions and improve out-of-distribution performance. The experimental results on multi-objective traveling salesman problem (MOTSP), multi-objective capacitated vehicle routing problem (MOCVRP), and multi-objective knapsack problem (MOKP) verify that our attack method successfully learns hard instances for different solvers. Furthermore, our defense method significantly strengthens the robustness and generalizability of neural solvers, delivering superior performance on hard or out-of-distribution instances.

preprint2026arXiv

DiPRL: Learning Discrete Programmatic Policies via Architecture Entropy Regularization

Programmatic reinforcement learning (PRL) offers an interpretable alternative to deep reinforcement learning by representing policies as human-readable and -editable programs. While gradient-based methods have been developed to optimize continuous relaxations of programs, they face a significant performance drop when converting the continuous relaxations back into discrete programs. Post-hoc discretization can discard optimized branches and parameters in a program, which results in a collapse of policy expressivity and lowered task performance, leading in turn to a need for additional fine-tuning. To overcome these limitations, we propose Differentiable Discrete Programmatic Reinforcement Learning (DiPRL), a method that learns programmatic policies that become nearly discrete during training, avoiding a separate post-hoc fine-tuning stage. We first analyze the inherent risks of performance drop introduced by post-hoc discretization of gradient-based methods. Then, we introduce programmatic architecture entropy regularization, which enables smooth, differentiable training that encourages convergence toward a discrete program. DiPRL maintains the efficiency of gradient-based optimization while mitigating the risks of post-hoc discretization. Our experiments across multiple discrete and continuous RL tasks demonstrate that DiPRL can achieve strong performance via interpretable programmatic policies.

preprint2026arXiv

Learning with Foresight: Enhancing Neural Routing Policy via Multi-Node Lookahead Prediction

Neural policies have shown promise in solving vehicle routing problems due to their reduced reliance on handcrafted heuristics. However, current training paradigms suffer from a fundamental limitation: they primarily focus on next-node prediction for solution construction, resulting in myopic decision-making that undermines long-horizon planning capacity. To this end, we introduce Multi-node Lookahead Prediction (MnLP), a novel training strategy that extends the supervised learning paradigm to predict multiple future nodes simultaneously. We incorporate causal and discardable MnLP modules that operate exclusively during training, facilitating models to anticipate multi-step decisions while preserving inference-time efficiency. By incorporating multi-depth auxiliary supervision into the loss function, MnLP equips neural policies with the ability of long-range contextual understanding. Experimentally, MnLP outperforms existing training methods, improving the generalization capability of neural policies across various problem sizes, distributions, and real-world benchmarks. Moreover, MnLP can be seamlessly integrated into diverse neural architectures without introducing additional inference overhead.

preprint2026arXiv

Scheduling That Speaks: An Interpretable Programmatic Reinforcement Learning Framework

Deep reinforcement learning (DRL) has recently emerged as a promising approach to solve combinatorial optimization problems such as job shop scheduling. However, the policies learned by DRL are typically represented by deep neural networks (DNNs), whose opaque neural architectures and non-interpretable policy decisions can lead to critical trust and usability concerns for human decision makers. In addition, the computational requirements of DNNs can further hinder practical deployment in resource constrained environments. In this work, we propose ProRL, a novel interpretable programmatic reinforcement learning framework that achieves high-performance scheduling with human-readable and editable programmatic policies (i.e., programs). We first introduce a domain-specific language for scheduling (DSL-S) to represent scheduling strategies as structured programs. ProRL then explores the program space defined by DSL-S using local search to identify incomplete programs, which are subsequently completed by learning their parameters via Bayesian optimization. ProRL learns which scheduling heuristic rules to select, and hence, it naturally incorporates existing heuristics already used in industrial scenarios. Experiments on widely used benchmark instances demonstrate the strong performance of ProRL against existing heuristics and DRL baselines. Furthermore, ProRL performs well under strongly constrained computational resources, such as training with only 100 episodes. Our code is available at https://github.com/HcPlu/ProRL.

preprint2022arXiv

Policies for the Dynamic Traveling Maintainer Problem with Alerts

Downtime of industrial assets such as wind turbines and medical imaging devices comes at a sharp cost. To avoid such downtime costs, companies seek to initiate maintenance just before failure. Unfortunately, this is challenging for the following two reasons: On the one hand, because asset failures are notoriously difficult to predict, even in the presence of real-time monitoring devices which signal early degradation. On the other hand, because the available resources to serve a network of geographically dispersed assets are typically limited. In this paper, we propose a novel dynamic traveling maintainer problem with alerts model that incorporates these two challenges and we provide three solution approaches on how to dispatch the limited resources. Namely, we propose: (i) Greedy heuristic approaches that rank assets on urgency, proximity and economic risk; (ii) A novel traveling maintainer heuristic approach that optimizes short-term costs; and (iii) A deep reinforcement learning (DRL) approach that optimizes long-term costs. Each approach has different requirements concerning the available alert information. Experiments with small asset networks show that all methods can approximate the optimal policy when given access to complete condition information. For larger networks, the proposed methods yield competitive policies, with DRL consistently achieving the lowest costs.

preprint2022arXiv

The First AI4TSP Competition: Learning to Solve Stochastic Routing Problems

This paper reports on the first international competition on AI for the traveling salesman problem (TSP) at the International Joint Conference on Artificial Intelligence 2021 (IJCAI-21). The TSP is one of the classical combinatorial optimization problems, with many variants inspired by real-world applications. This first competition asked the participants to develop algorithms to solve a time-dependent orienteering problem with stochastic weights and time windows (TD-OPSWTW). It focused on two types of learning approaches: surrogate-based optimization and deep reinforcement learning. In this paper, we describe the problem, the setup of the competition, the winning methods, and give an overview of the results. The winning methods described in this work have advanced the state-of-the-art in using AI for stochastic routing problems. Overall, by organizing this competition we have introduced routing problems as an interesting problem setting for AI researchers. The simulator of the problem has been made open-source and can be used by other researchers as a benchmark for new AI methods.

preprint2020arXiv

A State Aggregation Approach for Solving Knapsack Problem with Deep Reinforcement Learning

This paper proposes a Deep Reinforcement Learning (DRL) approach for solving knapsack problem. The proposed method consists of a state aggregation step based on tabular reinforcement learning to extract features and construct states. The state aggregation policy is applied to each problem instance of the knapsack problem, which is used with Advantage Actor Critic (A2C) algorithm to train a policy through which the items are sequentially selected at each time step. The method is a constructive solution approach and the process of selecting items is repeated until the final solution is obtained. The experiments show that our approach provides close to optimal solutions for all tested instances, outperforms the greedy algorithm, and is able to handle larger instances and more flexible than an existing DRL approach. In addition, the results demonstrate that the proposed model with the state aggregation strategy not only gives better solutions but also learns in less timesteps, than the one without state aggregation.

preprint2020arXiv

Algorithms for slate bandits with non-separable reward functions

In this paper, we study a slate bandit problem where the function that determines the slate-level reward is non-separable: the optimal value of the function cannot be determined by learning the optimal action for each slot. We are mainly concerned with cases where the number of slates is large relative to the time horizon, so that trying each slate as a separate arm in a traditional multi-armed bandit, would not be feasible. Our main contribution is the design of algorithms that still have sub-linear regret with respect to the time horizon, despite the large number of slates. Experimental results on simulated data and real-world data show that our proposed method outperforms popular benchmark bandit algorithms.

preprint2020arXiv

Learning 2-opt Heuristics for the Traveling Salesman Problem via Deep Reinforcement Learning

Recent works using deep learning to solve the Traveling Salesman Problem (TSP) have focused on learning construction heuristics. Such approaches find TSP solutions of good quality but require additional procedures such as beam search and sampling to improve solutions and achieve state-of-the-art performance. However, few studies have focused on improvement heuristics, where a given solution is improved until reaching a near-optimal one. In this work, we propose to learn a local search heuristic based on 2-opt operators via deep reinforcement learning. We propose a policy gradient algorithm to learn a stochastic policy that selects 2-opt operations given a current solution. Moreover, we introduce a policy neural network that leverages a pointing attention mechanism, which unlike previous works, can be easily extended to more general k-opt moves. Our results show that the learned policies can improve even over random initial solutions and approach near-optimal solutions at a faster rate than previous state-of-the-art deep learning methods.

preprint2013arXiv

Solving Weighted Voting Game Design Problems Optimally: Representations, Synthesis, and Enumeration

We study the inverse power index problem for weighted voting games: the problem of finding a weighted voting game in which the power of the players is as close as possible to a certain target distribution. Our goal is to find algorithms that solve this problem exactly. Thereto, we study various subclasses of simple games, and their associated representation methods. We survey algorithms and impossibility results for the synthesis problem, i.e., converting a representation of a simple game into another representation. We contribute to the synthesis problem by showing that it is impossible to compute in polynomial time the list of ceiling coalitions (also known as shift-maximal losing coalitions) of a game from its list of roof coalitions (also known as shift-minimal winning coalitions), and vice versa. Then, we proceed by studying the problem of enumerating the set of weighted voting games. We present first a naive algorithm for this, running in doubly exponential time. Using our knowledge of the synthesis problem, we then improve on this naive algorithm, and we obtain an enumeration algorithm that runs in quadratic exponential time (that is, O(2^(n^2) p(n)) for a polynomial p). Moreover, we show that this algorithm runs in output-polynomial time, making it the best possible enumeration algorithm up to a polynomial factor. Finally, we propose an exact anytime algorithm for the inverse power index problem that runs in exponential time. This algorithm is straightforward and general: it computes the error for each game enumerated, and outputs the game that minimizes this error. By the genericity of our approach, our algorithm can be used to find a weighted voting game that optimizes any exponential time computable function. We implement our algorithm for the case of the normalized Banzhaf index, and we perform experiments in order to study performance and error convergence.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint

Fields this researcher appears in

Artificial Intelligence Machine Learning Computer Science and Game Theory math.OC Multiagent Systems Neural and Evolutionary Computing Symbolic Computation

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2605.19975:author:4:yingqian-zhang

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.18508:author:2:yingqian-zhang

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.18454:author:2:yingqian-zhang

Imported May 20, 2026Synced May 21, 2026

3 works

Alp Akcay

Researcher

Alp Akcay contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Chengpeng Hu

Researcher

Chengpeng Hu contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Hendrik Baier

Researcher

Hendrik Baier contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Jason Rhuggenaath

Researcher

Jason Rhuggenaath contributes to research discovery and scholarly infrastructure.

Open to collaborate

Yingqian Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Adversarial Instance Generation and Robust Training for Neural Combinatorial Optimization with Multiple Objectives

DiPRL: Learning Discrete Programmatic Policies via Architecture Entropy Regularization

Learning with Foresight: Enhancing Neural Routing Policy via Multi-Node Lookahead Prediction

Scheduling That Speaks: An Interpretable Programmatic Reinforcement Learning Framework

Policies for the Dynamic Traveling Maintainer Problem with Alerts

The First AI4TSP Competition: Learning to Solve Stochastic Routing Problems

A State Aggregation Approach for Solving Knapsack Problem with Deep Reinforcement Learning

Algorithms for slate bandits with non-separable reward functions

Learning 2-opt Heuristics for the Traveling Salesman Problem via Deep Reinforcement Learning

Solving Weighted Voting Game Design Problems Optimally: Representations, Synthesis, and Enumeration