Source author record

Jia Yuan Yu

Jia Yuan Yu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Artificial Intelligence Machine Learning Computation and Language Computer Vision eess.IV math.DS math.PR math.ST Methodology Multiagent Systems Statistics Theory Systems and Control

Catalog footprint

What is connected

11works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Reward Modeling for Mitigating Toxicity in Transformer-based Language Models

Transformer-based language models are able to generate fluent text and be efficiently adapted across various natural language generation tasks. However, language models that are pretrained on large unlabeled web text corpora have been shown to suffer from degenerating toxic content and social bias behaviors, consequently hindering their safe deployment. Various detoxification methods were proposed to mitigate the language model's toxicity; however, these methods struggled to detoxify language models when conditioned on prompts that contain specific social identities related to gender, race, or religion. In this study, we propose Reinforce-Detoxify; A reinforcement learning-based method for mitigating toxicity in language models. We address the challenge of safety in language models and propose a new reward model that is able to detect toxic content and mitigate unintended bias towards social identities in toxicity prediction. The experiments demonstrate that the Reinforce-Detoxify method for language model detoxification outperforms existing detoxification approaches in automatic evaluation metrics, indicating the ability of our approach in language model detoxification and less prone to unintended bias toward social identities in generated content.

preprint2021arXiv

Bias-Corrected Peaks-Over-Threshold Estimation of the CVaR

The conditional value-at-risk (CVaR) is a useful risk measure in fields such as machine learning, finance, insurance, energy, etc. When measuring very extreme risk, the commonly used CVaR estimation method of sample averaging does not work well due to limited data above the value-at-risk (VaR), the quantile corresponding to the CVaR level. To mitigate this problem, the CVaR can be estimated by extrapolating above a lower threshold than the VaR using a generalized Pareto distribution (GPD), which is often referred to as the peaks-over-threshold (POT) approach. This method often requires a very high threshold to fit well, leading to high variance in estimation, and can induce significant bias if the threshold is chosen too low. In this paper, we derive a new expression for the GPD approximation error of the CVaR, a bias term induced by the choice of threshold, as well as a bias correction method for the estimated GPD parameters. This leads to the derivation of a new estimator for the CVaR that we prove to be asymptotically unbiased. In a practical setting, we show through experiments that our estimator provides a significant performance improvement compared with competing CVaR estimators in finite samples. As a consequence of our bias correction method, it is also shown that a much lower threshold can be selected without introducing significant bias. This allows a larger portion of data to be be used in CVaR estimation compared with the typical POT approach, leading to more stable estimates. As secondary results, a new estimator for a second-order parameter of heavy-tailed distributions is derived, as well as a confidence interval for the CVaR which enables quantifying the level of variability in our estimator.

preprint2020arXiv

Generating Embroidery Patterns Using Image-to-Image Translation

In many scenarios in computer vision, machine learning, and computer graphics, there is a requirement to learn the mapping from an image of one domain to an image of another domain, called Image-to-image translation. For example, style transfer, object transfiguration, visually altering the appearance of weather conditions in an image, changing the appearance of a day image into a night image or vice versa, photo enhancement, to name a few. In this paper, we propose two machine learning techniques to solve the embroidery image-to-image translation. Our goal is to generate a preview image which looks similar to an embroidered image, from a user-uploaded image. Our techniques are modifications of two existing techniques, neural style transfer, and cycle-consistent generative-adversarial network. Neural style transfer renders the semantic content of an image from one domain in the style of a different image in another domain, whereas a cycle-consistent generative adversarial network learns the mapping from an input image to output image without any paired training data, and also learn a loss function to train this mapping. Furthermore, the techniques we propose are independent of any embroidery attributes, such as elevation of the image, light-source, start, and endpoints of a stitch, type of stitch used, fabric type, etc. Given the user image, our techniques can generate a preview image which looks similar to an embroidered image. We train and test our propose techniques on an embroidery dataset which consist of simple 2D images. To do so, we prepare an unpaired embroidery dataset with more than 8000 user-uploaded images along with embroidered images. Empirical results show that these techniques successfully generate an approximate preview of an embroidered version of a user image, which can help users in decision making.

preprint2016arXiv

Nonhomogeneous Place-Dependent Markov Chains, Unsynchronised AIMD, and Network Utility Maximization

We present a solution of a class of network utility maximization (NUM) problems using minimal communication. The constraints of the problem are inspired less by TCP-like congestion control but by problems in the area of internet of things and related areas in which the need arises to bring the behavior of a large group of agents to a social optimum. The approach uses only intermittent feedback, no inter-agent communication, and no common clock. The proposed algorithm is a combination of the classical AIMD algorithm in conjunction with a simple probabilistic rule for the agents to respond to a capacity signal. This leads to a nonhomogeneous Markov chain and we show almost sure convergence of this chain to the social optimum.

preprint2016arXiv

Signalling and obfuscation for congestion control

We aim to reduce the social cost of congestion in many smart city applications. In our model of congestion, agents interact over limited resources after receiving signals from a central agent that observes the state of congestion in real time. Under natural models of agent populations, we develop new signalling schemes and show that by introducing a non-trivial amount of uncertainty in the signals, we reduce the social cost of congestion, i.e., improve social welfare. The signalling schemes are efficient in terms of both communication and computation, and are consistent with past observations of the congestion. Moreover, the resulting population dynamics converge under reasonable assumptions.

preprint2015arXiv

A Fair Assignment of Drivers to Parking Lots

Searching for a parking spot can waste time and gasoline. This waste can be reduced by assigning drivers to parking lots based on their destination and arrival time. In such a system, drivers could request a parking spot in advance and be alerted (e.g., via their phone or vehicle) of their assignment to a specific parking lot or available spot. In this paper, a parking assignment system is described to allocate parking spaces in a fair and equitable manner. Heuristics are developed to solve the underlying large scale optimization problem. The efficacy of the system is demonstrated by applying our algorithms to real data sets.

preprint2015arXiv

Central-limit approach to risk-aware Markov decision processes

Whereas classical Markov decision processes maximize the expected reward, we consider minimizing the risk. We propose to evaluate the risk associated to a given policy over a long-enough time horizon with the help of a central limit theorem. The proposed approach works whether the transition probabilities are known or not. We also provide a gradient-based policy improvement algorithm that converges to a local optimum of the risk objective.

preprint2015arXiv

On the Design of Campus Parking Systems with QoS guarantees

Parking spaces are resources that can be pooled together and shared, especially when there are complementary day-time and night-time users. We answer two design questions. First, given a quality of service requirement, how many spaces should be set aside as contingency during day-time for night-time users? Next, how can we replace the first-come-first-served access method by one that aims at optimal efficiency while keeping user preferences private?

preprint2015arXiv

Two Phase $Q-$learning for Bidding-based Vehicle Sharing

We consider one-way vehicle sharing systems where customers can rent a car at one station and drop it off at another. The problem we address is to optimize the distribution of cars, and quality of service, by pricing rentals appropriately. We propose a bidding approach that is inspired from auctions and takes into account the significant uncertainty inherent in the problem data (e.g., pick-up and drop-off locations, time of requests, and duration of trips). Specifically, in contrast to current vehicle sharing systems, the operator does not set prices. Instead, customers submit bids and the operator decides whether to rent or not. The operator can even accept negative bids to motivate drivers to rebalance available cars to unpopular destinations within a city. We model the operator's sequential decision-making problem as a \emph{constrained Markov decision problem} (CMDP) and propose and rigorously analyze a novel two phase $Q$-learning algorithm for its solution. Numerical experiments are presented and discussed.

preprint2014arXiv

Functional Bandits

We introduce the functional bandit problem, where the objective is to find an arm that optimises a known functional of the unknown arm-reward distributions. These problems arise in many settings such as maximum entropy methods in natural language processing, and risk-averse decision-making, but current best-arm identification techniques fail in these domains. We propose a new approach, that combines functional estimation and arm elimination, to tackle this problem. This method achieves provably efficient performance guarantees. In addition, we illustrate this method on a number of important functionals in risk management and information theory, and refine our generic theoretical results in those cases.

preprint2011arXiv

Lipschitz Bandits without the Lipschitz Constant

We consider the setting of stochastic bandit problems with a continuum of arms. We first point out that the strategies considered so far in the literature only provided theoretical guarantees of the form: given some tuning parameters, the regret is small with respect to a class of environments that depends on these parameters. This is however not the right perspective, as it is the strategy that should adapt to the specific bandit environment at hand, and not the other way round. Put differently, an adaptation issue is raised. We solve it for the special case of environments whose mean-payoff functions are globally Lipschitz. More precisely, we show that the minimax optimal orders of magnitude $L^{d/(d+2)} \, T^{(d+1)/(d+2)}$ of the regret bound against an environment $f$ with Lipschitz constant $L$ over $T$ time instances can be achieved without knowing $L$ or $T$ in advance. This is in contrast to all previously known strategies, which require to some extent the knowledge of $L$ to achieve this performance guarantee.

Jia Yuan Yu

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

Reward Modeling for Mitigating Toxicity in Transformer-based Language Models

Bias-Corrected Peaks-Over-Threshold Estimation of the CVaR

Generating Embroidery Patterns Using Image-to-Image Translation

Nonhomogeneous Place-Dependent Markov Chains, Unsynchronised AIMD, and Network Utility Maximization

Signalling and obfuscation for congestion control

A Fair Assignment of Drivers to Parking Lots

Central-limit approach to risk-aware Markov decision processes

On the Design of Campus Parking Systems with QoS guarantees

Two Phase $Q-$learning for Bidding-based Vehicle Sharing

Functional Bandits

Lipschitz Bandits without the Lipschitz Constant