Source author record

Zhiwei Qin

Zhiwei Qin appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.OC Artificial Intelligence Methodology Multiagent Systems

Catalog footprint

What is connected

6works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Reinforcement Learning in the Wild: Scalable RL Dispatching Algorithm Deployed in Ridehailing Marketplace

In this study, a real-time dispatching algorithm based on reinforcement learning is proposed and for the first time, is deployed in large scale. Current dispatching methods in ridehailing platforms are dominantly based on myopic or rule-based non-myopic approaches. Reinforcement learning enables dispatching policies that are informed of historical data and able to employ the learned information to optimize returns of expected future trajectories. Previous studies in this field yielded promising results, yet have left room for further improvements in terms of performance gain, self-dependency, transferability, and scalable deployment mechanisms. The present study proposes a standalone RL-based dispatching solution that is equipped with multiple mechanisms to ensure robust and efficient on-policy learning and inference while being adaptable for full-scale deployment. A new form of value updating based on temporal difference is proposed that is more adapted to the inherent uncertainty of the problem. For the driver-order assignment, a customized utility function is proposed that when tuned based on the statistics of the market, results in remarkable performance improvement and interpretability. In addition, for reducing the risk of cancellation after drivers' assignment, an adaptive graph pruning strategy based on the multi-arm bandit problem is introduced. The method is evaluated using offline simulation with real data and yields notable performance improvement. In addition, the algorithm is deployed online in multiple cities under DiDi's operation for A/B testing and is launched in one of the major international markets as the primary mode of dispatch. The deployed algorithm shows over 1.3% improvement in total driver income from A/B testing. In addition, by causal inference analysis, as much as 5.3% improvement in major performance metrics is detected after full-scale deployment.

preprint2020arXiv

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Contextual multi-armed bandit (MAB) achieves cutting-edge performance on a variety of problems. When it comes to real-world scenarios such as recommendation system and online advertising, however, it is essential to consider the resource consumption of exploration. In practice, there is typically non-zero cost associated with executing a recommendation (arm) in the environment, and hence, the policy should be learned with a fixed exploration cost constraint. It is challenging to learn a global optimal policy directly, since it is a NP-hard problem and significantly complicates the exploration and exploitation trade-off of bandit algorithms. Existing approaches focus on solving the problems by adopting the greedy policy which estimates the expected rewards and costs and uses a greedy selection based on each arm's expected reward/cost ratio using historical observation until the exploration resource is exhausted. However, existing methods are hard to extend to infinite time horizon, since the learning process will be terminated when there is no more resource. In this paper, we propose a hierarchical adaptive contextual bandit method (HATCH) to conduct the policy learning of contextual bandits with a budget constraint. HATCH adopts an adaptive method to allocate the exploration resource based on the remaining resource/time and the estimation of reward distribution among different user contexts. In addition, we utilize full of contextual feature information to find the best personalized recommendation. Finally, in order to prove the theoretical guarantee, we present a regret bound analysis and prove that HATCH achieves a regret bound as low as $O(\sqrt{T})$. The experimental results demonstrate the effectiveness and efficiency of the proposed method on both synthetic data sets and the real-world applications.

preprint2014arXiv

An Alternating Direction Method for Total Variation Denoising

We consider the image denoising problem using total variation (TV) regularization. This problem can be computationally challenging to solve due to the non-differentiability and non-linearity of the regularization term. We propose an alternating direction augmented Lagrangian (ADAL) method, based on a new variable splitting approach that results in subproblems that can be solved efficiently and exactly. The global convergence of the new algorithm is established for the anisotropic TV model. For the isotropic TV model, by doing further variable splitting, we are able to derive an ADAL method that is globally convergent. We compare our methods with the split Bregman method \cite{goldstein2009split},which is closely related to it, and demonstrate their competitiveness in computational performance on a set of standard test images.

preprint2014arXiv

HIPAD - A Hybrid Interior-Point Alternating Direction algorithm for knowledge-based SVM and feature selection

We consider classification tasks in the regime of scarce labeled training data in high dimensional feature space, where specific expert knowledge is also available. We propose a new hybrid optimization algorithm that solves the elastic-net support vector machine (SVM) through an alternating direction method of multipliers in the first phase, followed by an interior-point method for the classical SVM in the second phase. Both SVM formulations are adapted to knowledge incorporation. Our proposed algorithm addresses the challenges of automatic feature selection, high optimization accuracy, and algorithmic flexibility for taking advantage of prior knowledge. We demonstrate the effectiveness and efficiency of our algorithm and compare it with existing methods on a collection of synthetic and real-world data.

preprint2014arXiv

Joint Variable Selection for Data Envelopement Analysis via Group Sparsity

This study develops a data-driven group variable selection method for data envelopment analysis (DEA), a non-parametric linear programming approach to the estimation of production frontiers. The proposed method extends the group Lasso (least absolute shrinkage and selection operator) designed for variable selection on (often predefined) groups of variables in linear regression models to DEA models. In particular, a special constrained version of the group Lasso with the loss function suited for variable selection in DEA models is derived and solved by a new tailored algorithm based on the alternating direction method of multipliers (ADMM). This study further conducts a thorough evaluation of the proposed method against two widely used variable selection methods -- the efficiency contribution measure (ECM) method and the regression-based (RB) test -- in DEA via Monte Carlo simulations. The simulation results show that our method provides more favorable performance compared with its benchmarks.

preprint2011arXiv

Structured Sparsity via Alternating Direction Methods

We consider a class of sparse learning problems in high dimensional feature space regularized by a structured sparsity-inducing norm which incorporates prior knowledge of the group structure of the features. Such problems often pose a considerable challenge to optimization algorithms due to the non-smoothness and non-separability of the regularization term. In this paper, we focus on two commonly adopted sparsity-inducing regularization terms, the overlapping Group Lasso penalty $l_1/l_2$-norm and the $l_1/l_\infty$-norm. We propose a unified framework based on the augmented Lagrangian method, under which problems with both types of regularization and their variants can be efficiently solved. As the core building-block of this framework, we develop new algorithms using an alternating partial-linearization/splitting technique, and we prove that the accelerated versions of these algorithms require $O(\frac{1}{\sqrtε})$ iterations to obtain an $ε$-optimal solution. To demonstrate the efficiency and relevance of our algorithms, we test them on a collection of data sets and apply them to two real-world problems to compare the relative merits of the two norms.

Zhiwei Qin

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Reinforcement Learning in the Wild: Scalable RL Dispatching Algorithm Deployed in Ridehailing Marketplace

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

An Alternating Direction Method for Total Variation Denoising

HIPAD - A Hybrid Interior-Point Alternating Direction algorithm for knowledge-based SVM and feature selection

Joint Variable Selection for Data Envelopement Analysis via Group Sparsity

Structured Sparsity via Alternating Direction Methods