Source author record

Xiaoyang Tan

Xiaoyang Tan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computer Vision Artificial Intelligence Multiagent Systems Computer Science and Game Theory Neural and Evolutionary Computing Robotics

Catalog footprint

What is connected

10works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Contextual Conservative Q-Learning for Offline Reinforcement Learning

Offline reinforcement learning learns an effective policy on offline datasets without online interaction, and it attracts persistent research attention due to its potential of practical application. However, extrapolation error generated by distribution shift will still lead to the overestimation for those actions that transit to out-of-distribution(OOD) states, which degrades the reliability and robustness of the offline policy. In this paper, we propose Contextual Conservative Q-Learning(C-CQL) to learn a robustly reliable policy through the contextual information captured via an inverse dynamics model. With the supervision of the inverse dynamics model, it tends to learn a policy that generates stable transition at perturbed states, for the fact that pertuebed states are a common kind of OOD states. In this manner, we enable the learnt policy more likely to generate transition that destines to the empirical next state distributions of the offline dataset, i.e., robustly reliable transition. Besides, we theoretically reveal that C-CQL is the generalization of the Conservative Q-Learning(CQL) and aggressive State Deviation Correction(SDC). Finally, experimental results demonstrate the proposed C-CQL achieves the state-of-the-art performance in most environments of offline Mujoco suite and a noisy Mujoco setting.

preprint2022arXiv

A Cooperative-Competitive Multi-Agent Framework for Auto-bidding in Online Advertising

In online advertising, auto-bidding has become an essential tool for advertisers to optimize their preferred ad performance metrics by simply expressing high-level campaign objectives and constraints. Previous works designed auto-bidding tools from the view of single-agent, without modeling the mutual influence between agents. In this paper, we instead consider this problem from a distributed multi-agent perspective, and propose a general $\underline{M}$ulti-$\underline{A}$gent reinforcement learning framework for $\underline{A}$uto-$\underline{B}$idding, namely MAAB, to learn the auto-bidding strategies. First, we investigate the competition and cooperation relation among auto-bidding agents, and propose a temperature-regularized credit assignment to establish a mixed cooperative-competitive paradigm. By carefully making a competition and cooperation trade-off among agents, we can reach an equilibrium state that guarantees not only individual advertiser's utility but also the system performance (i.e., social welfare). Second, to avoid the potential collusion behaviors of bidding low prices underlying the cooperation, we further propose bar agents to set a personalized bidding bar for each agent, and then alleviate the revenue degradation due to the cooperation. Third, to deploy MAAB in the large-scale advertising system with millions of advertisers, we propose a mean-field approach. By grouping advertisers with the same objective as a mean auto-bidding agent, the interactions among the large-scale advertisers are greatly simplified, making it practical to train MAAB efficiently. Extensive experiments on the offline industrial dataset and Alibaba advertising platform demonstrate that our approach outperforms several baseline methods in terms of social welfare and revenue.

preprint2022arXiv

Smoothing Advantage Learning

Advantage learning (AL) aims to improve the robustness of value-based reinforcement learning against estimation errors with action-gap-based regularization. Unfortunately, the method tends to be unstable in the case of function approximation. In this paper, we propose a simple variant of AL, named smoothing advantage learning (SAL), to alleviate this problem. The key to our method is to replace the original Bellman Optimal operator in AL with a smooth one so as to obtain more reliable estimation of the temporal difference target. We give a detailed account of the resulting action gap and the performance bound for approximate SAL. Further theoretical analysis reveals that the proposed value smoothing technique not only helps to stabilize the training procedure of AL by controlling the trade-off between convergence rate and the upper bound of the approximation errors, but is beneficial to increase the action gap between the optimal and sub-optimal action value as well.

preprint2020arXiv

SMIX($λ$): Enhancing Centralized Value Functions for Cooperative Multi-Agent Reinforcement Learning

Learning a stable and generalizable centralized value function (CVF) is a crucial but challenging task in multi-agent reinforcement learning (MARL), as it has to deal with the issue that the joint action space increases exponentially with the number of agents in such scenarios. This paper proposes an approach, named SMIX($λ$), to address the issue using an efficient off-policy centralized training method within a flexible learner search space. As importance sampling for such off-policy training is both computationally costly and numerically unstable, we proposed to use the $λ$-return as a proxy to compute the TD error. With this new loss function objective, we adopt a modified QMIX network structure as the base to train our model. By further connecting it with the ${Q(λ)}$ approach from an unified expectation correction viewpoint, we show that the proposed SMIX($λ$) is equivalent to ${Q(λ)}$ and hence shares its convergence properties, while without being suffered from the aforementioned curse of dimensionality problem inherent in MARL. Experiments on the StarCraft Multi-Agent Challenge (SMAC) benchmark demonstrate that our approach not only outperforms several state-of-the-art MARL methods by a large margin, but also can be used as a general tool to improve the overall performance of other CTDE-type algorithms by enhancing their CVFs.

preprint2020arXiv

Truly Proximal Policy Optimization

Proximal policy optimization (PPO) is one of the most successful deep reinforcement-learning methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, its optimization behavior is still far from being fully understood. In this paper, we show that PPO could neither strictly restrict the likelihood ratio as it attempts to do nor enforce a well-defined trust region constraint, which means that it may still suffer from the risk of performance instability. To address this issue, we present an enhanced PPO method, named Truly PPO. Two critical improvements are made in our method: 1) it adopts a new clipping function to support a rollback behavior to restrict the difference between the new policy and the old one; 2) the triggering condition for clipping is replaced with a trust region-based one, such that optimizing the resulted surrogate objective function provides guaranteed monotonic improvement of the ultimate policy performance. It seems, by adhering more truly to making the algorithm proximal - confining the policy within the trust region, the new algorithm improves the original PPO on both sample efficiency and performance.

preprint2016arXiv

A Unified Gender-Aware Age Estimation

Human age estimation has attracted increasing researches due to its wide applicability in such as security monitoring and advertisement recommendation. Although a variety of methods have been proposed, most of them focus only on the age-specific facial appearance. However, biological researches have shown that not only gender but also the aging difference between the male and the female inevitably affect the age estimation. To our knowledge, so far there have been two methods that have concerned the gender factor. The first is a sequential method which first classifies the gender and then performs age estimation respectively for classified male and female. Although it promotes age estimation performance because of its consideration on the gender semantic difference, an accumulation risk of estimation errors is unavoidable. To overcome drawbacks of the sequential strategy, the second is to regress the age appended with the gender by concatenating their labels as two dimensional output using Partial Least Squares (PLS). Although leading to promotion of age estimation performance, such a concatenation not only likely confuses the semantics between the gender and age, but also ignores the aging discrepancy between the male and the female. In order to overcome their shortcomings, in this paper we propose a unified framework to perform gender-aware age estimation. The proposed method considers and utilizes not only the semantic relationship between the gender and the age, but also the aging discrepancy between the male and the female. Finally, experimental results demonstrate not only the superiority of our method in performance, but also its good interpretability in revealing the aging discrepancy.

preprint2016arXiv

Bayesian Neighbourhood Component Analysis

Learning a good distance metric in feature space potentially improves the performance of the KNN classifier and is useful in many real-world applications. Many metric learning algorithms are however based on the point estimation of a quadratic optimization problem, which is time-consuming, susceptible to overfitting, and lack a natural mechanism to reason with parameter uncertainty, an important property useful especially when the training set is small and/or noisy. To deal with these issues, we present a novel Bayesian metric learning method, called Bayesian NCA, based on the well-known Neighbourhood Component Analysis method, in which the metric posterior is characterized by the local label consistency constraints of observations, encoded with a similarity graph instead of independent pairwise constraints. For efficient Bayesian optimization, we explore the variational lower bound over the log-likelihood of the original NCA objective. Experiments on several publicly available datasets demonstrate that the proposed method is able to learn robust metric measures from small size dataset and/or from challenging training set with labels contaminated by errors. The proposed method is also shown to outperform a previous pairwise constrained Bayesian metric learning method.

preprint2016arXiv

Face Alignment In-the-Wild: A Survey

Over the last two decades, face alignment or localizing fiducial facial points has received increasing attention owing to its comprehensive applications in automatic face analysis. However, such a task has proven extremely challenging in unconstrained environments due to many confounding factors, such as pose, occlusions, expression and illumination. While numerous techniques have been developed to address these challenges, this problem is still far away from being solved. In this survey, we present an up-to-date critical review of the existing literatures on face alignment, focusing on those methods addressing overall difficulties and challenges of this topic under uncontrolled conditions. Specifically, we categorize existing face alignment techniques, present detailed descriptions of the prominent algorithms within each category, and discuss their advantages and disadvantages. Furthermore, we organize special discussions on the practical aspects of face alignment in-the-wild, towards the development of a robust face alignment system. In addition, we show performance statistics of the state of the art, and conclude this paper with several promising directions for future research.

preprint2015arXiv

Tri-Subject Kinship Verification: Understanding the Core of A Family

One major challenge in computer vision is to go beyond the modeling of individual objects and to investigate the bi- (one-versus-one) or tri- (one-versus-two) relationship among multiple visual entities, answering such questions as whether a child in a photo belongs to given parents. The child-parents relationship plays a core role in a family and understanding such kin relationship would have fundamental impact on the behavior of an artificial intelligent agent working in the human world. In this work, we tackle the problem of one-versus-two (tri-subject) kinship verification and our contributions are three folds: 1) a novel relative symmetric bilinear model (RSBM) introduced to model the similarity between the child and the parents, by incorporating the prior knowledge that a child may resemble a particular parent more than the other; 2) a spatially voted method for feature selection, which jointly selects the most discriminative features for the child-parents pair, while taking local spatial information into account; 3) a large scale tri-subject kinship database characterized by over 1,000 child-parents families. Extensive experiments on KinFaceW, Family101 and our newly released kinship database show that the proposed method outperforms several previous state of the art methods, while could also be used to significantly boost the performance of one-versus-one kinship verification when the information about both parents are available.

preprint2015arXiv

Unsupervised Feature Learning with C-SVDDNet

In this paper, we investigate the problem of learning feature representation from unlabeled data using a single-layer K-means network. A K-means network maps the input data into a feature representation by finding the nearest centroid for each input point, which has attracted researchers' great attention recently due to its simplicity, effectiveness, and scalability. However, one drawback of this feature mapping is that it tends to be unreliable when the training data contains noise. To address this issue, we propose a SVDD based feature learning algorithm that describes the density and distribution of each cluster from K-means with an SVDD ball for more robust feature representation. For this purpose, we present a new SVDD algorithm called C-SVDD that centers the SVDD ball towards the mode of local density of each cluster, and we show that the objective of C-SVDD can be solved very efficiently as a linear programming problem. Additionally, traditional unsupervised feature learning methods usually take an average or sum of local representations to obtain global representation which ignore spatial relationship among them. To use spatial information we propose a global representation with a variant of SIFT descriptor. The architecture is also extended with multiple receptive field scales and multiple pooling sizes. Extensive experiments on several popular object recognition benchmarks, such as STL-10, MINST, Holiday and Copydays shows that the proposed C-SVDDNet method yields comparable or better performance than that of the previous state of the art methods.

Xiaoyang Tan

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Contextual Conservative Q-Learning for Offline Reinforcement Learning

A Cooperative-Competitive Multi-Agent Framework for Auto-bidding in Online Advertising

Smoothing Advantage Learning

SMIX($λ$): Enhancing Centralized Value Functions for Cooperative Multi-Agent Reinforcement Learning

Truly Proximal Policy Optimization

A Unified Gender-Aware Age Estimation

Bayesian Neighbourhood Component Analysis

Face Alignment In-the-Wild: A Survey

Tri-Subject Kinship Verification: Understanding the Core of A Family

Unsupervised Feature Learning with C-SVDDNet