Source author record

Xinyun Chen

Xinyun Chen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.PR Machine Learning Computation and Language Artificial Intelligence Computer Science and Game Theory Computer Vision math.OC Programming Languages q-fin.ST q-fin.TR

Catalog footprint

What is connected

17works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Symbol tuning improves in-context learning in language models

We present symbol tuning - finetuning language models on in-context input-label pairs where natural language labels (e.g., "positive/negative sentiment") are replaced with arbitrary symbols (e.g., "foo/bar"). Symbol tuning leverages the intuition that when a model cannot use instructions or natural language labels to figure out a task, it must instead do so by learning the input-label mappings. We experiment with symbol tuning across Flan-PaLM models up to 540B parameters and observe benefits across various settings. First, symbol tuning boosts performance on unseen in-context learning tasks and is much more robust to underspecified prompts, such as those without instructions or without natural language labels. Second, symbol-tuned models are much stronger at algorithmic reasoning tasks, with up to 18.2% better performance on the List Functions benchmark and up to 15.3% better performance on the Simple Turing Concepts benchmark. Finally, symbol-tuned models show large improvements in following flipped-labels presented in-context, meaning that they are more capable of using in-context information to override prior semantic knowledge.

preprint2022arXiv

An online learning approach to dynamic pricing and capacity sizing in service systems

We study a dynamic pricing and capacity sizing problem in a $GI/GI/1$ queue, where the service provider's objective is to obtain the optimal service fee $p$ and service capacity $μ$ so as to maximize the cumulative expected profit (the service revenue minus the staffing cost and delay penalty). Due to the complex nature of the queueing dynamics, such a problem has no analytic solution so that previous research often resorts to heavy-traffic analysis where both the arrival rate and service rate are sent to infinity. In this work we propose an online learning framework designed for solving this problem which does not require the system's scale to increase. Our framework is dubbed Gradient-based Online Learning in Queue (GOLiQ). GOLiQ organizes the time horizon into successive operational cycles and prescribes an efficient procedure to obtain improved pricing and staffing policies in each cycle using data collected in previous cycles. Data here include the number of customer arrivals, waiting times, and the server's busy times. The ingenuity of this approach lies in its online nature, which allows the service provider do better by interacting with the environment. Effectiveness of GOLiQ is substantiated by (i) theoretical results including the algorithm convergence and regret analysis (with a logarithmic regret bound), and (ii) engineering confirmation via simulation experiments of a variety of representative $GI/GI/1$ queues.

preprint2022arXiv

Competition-Level Code Generation with AlphaCode

Programming is a powerful and ubiquitous problem-solving tool. Developing systems that can assist programmers or even generate programs independently could make programming more productive and accessible, yet so far incorporating innovations in AI has proven challenging. Recent large-scale language models have demonstrated an impressive ability to generate code, and are now able to complete simple programming tasks. However, these models still perform poorly when evaluated on more complex, unseen problems that require problem-solving skills beyond simply translating instructions into code. For example, competitive programming problems which require an understanding of algorithms and complex natural language remain extremely challenging. To address this gap, we introduce AlphaCode, a system for code generation that can create novel solutions to these problems that require deeper reasoning. In simulated evaluations on recent programming competitions on the Codeforces platform, AlphaCode achieved on average a ranking of top 54.3% in competitions with more than 5,000 participants. We found that three key components were critical to achieve good and reliable performance: (1) an extensive and clean competitive programming dataset for training and evaluation, (2) large and efficient-to-sample transformer-based architectures, and (3) large-scale model sampling to explore the search space, followed by filtering based on program behavior to a small set of submissions.

preprint2022arXiv

Measuring and Improving Compositional Generalization in Text-to-SQL via Component Alignment

In text-to-SQL tasks -- as in much of NLP -- compositional generalization is a major challenge: neural networks struggle with compositional generalization where training and test distributions differ. However, most recent attempts to improve this are based on word-level synthetic data or specific dataset splits to generate compositional biases. In this work, we propose a clause-level compositional example generation method. We first split the sentences in the Spider text-to-SQL dataset into sub-sentences, annotating each sub-sentence with its corresponding SQL clause, resulting in a new dataset Spider-SS. We then construct a further dataset, Spider-CG, by composing Spider-SS sub-sentences in different combinations, to test the ability of models to generalize compositionally. Experiments show that existing models suffer significant performance degradation when evaluated on Spider-CG, even though every sub-sentence is seen during training. To deal with this problem, we modify a number of state-of-the-art models to train on the segmented data of Spider-SS, and we show that this method improves the generalization performance.

preprint2022arXiv

RobustART: Benchmarking Robustness on Architecture Design and Training Techniques

Deep neural networks (DNNs) are vulnerable to adversarial noises, which motivates the benchmark of model robustness. Existing benchmarks mainly focus on evaluating defenses, but there are no comprehensive studies of how architecture design and training techniques affect robustness. Comprehensively benchmarking their relationships is beneficial for better understanding and developing robust DNNs. Thus, we propose RobustART, the first comprehensive Robustness investigation benchmark on ImageNet regarding ARchitecture design (49 human-designed off-the-shelf architectures and 1200+ networks from neural architecture search) and Training techniques (10+ techniques, e.g., data augmentation) towards diverse noises (adversarial, natural, and system noises). Extensive experiments substantiated several insights for the first time, e.g., (1) adversarial training is effective for the robustness against all noises types for Transformers and MLP-Mixers; (2) given comparable model sizes and aligned training settings, CNNs > Transformers > MLP-Mixers on robustness against natural and system noises; Transformers > MLP-Mixers > CNNs on adversarial robustness; (3) for some light-weight architectures, increasing model sizes or using extra data cannot improve robustness. Our benchmark presents: (1) an open-source platform for comprehensive robustness evaluation; (2) a variety of pre-trained models to facilitate robustness evaluation; and (3) a new view to better understand the mechanism towards designing robust DNNs. We will continuously develop to this ecosystem for the community.

preprint2022arXiv

Tail Quantile Estimation for Non-preemptive Priority Queues

Motivated by applications in computing and telecommunication systems, we investigate the problem of estimating p-quantile of steady-state sojourn times in a single-server multi-class queueing system with non-preemptive priorities for p close to 1. The main challenge in this problem lies in efficient sampling from the tail event. To address this issue, we develop a regenerative simulation algorithm with importance sampling. In addition, we establish a central limit theorem for the estimator to construct the confidence interval. Numerical experiments show that our algorithm outperforms benchmark simulation methods. Our result contributes to the literature on rare event simulation for queueing systems.

preprint2021arXiv

Understanding Robustness in Teacher-Student Setting: A New Perspective

Adversarial examples have appeared as a ubiquitous property of machine learning models where bounded adversarial perturbation could mislead the models to make arbitrarily incorrect predictions. Such examples provide a way to assess the robustness of machine learning models as well as a proxy for understanding the model training process. Extensive studies try to explain the existence of adversarial examples and provide ways to improve model robustness (e.g. adversarial training). While they mostly focus on models trained on datasets with predefined labels, we leverage the teacher-student framework and assume a teacher model, or oracle, to provide the labels for given instances. We extend Tian (2019) in the case of low-rank input data and show that student specialization (trained student neuron is highly correlated with certain teacher neuron at the same layer) still happens within the input subspace, but the teacher and student nodes could differ wildly out of the data subspace, which we conjecture leads to adversarial examples. Extensive experiments show that student specialization correlates strongly with model robustness in different scenarios, including student trained via standard training, adversarial training, confidence-calibrated adversarial training, and training with robust feature dataset. Our studies could shed light on the future exploration about adversarial examples, and enhancing model robustness via principled data augmentation.

preprint2020arXiv

Efficient Steady-state Simulation of High-dimensional Stochastic Networks

We propose and study an asymptotically optimal Monte Carlo estimator for steady-state expectations of a d-dimensional reflected Brownian motion. Our estimator is asymptotically optimal in the sense that it requires $\tilde{O}(d)$ (up to logarithmic factors in $d$) i.i.d. Gaussian random variables in order to output an estimate with a controlled error. Our construction is based on the analysis of a suitable multi-level Monte Carlo strategy which, we believe, can be applied widely. This is the first algorithm with linear complexity (under suitable regularity conditions) for steady-state estimation of RBM as the dimension increases.

preprint2020arXiv

Perfect Sampling of Hawkes Processes and Queues with Hawkes Arrivals

In this paper we develop the first perfect sampling algorithm for queues with Hawkes input, i.e. single-server queues with Hawkes arrivals and i.i.d. service times of general distribution. In addition to the stability condition, we also assume the excitation function of the Hawkes process has a light tail and the service time has finite moment generating function in the neighborhood of the origin. In this procedure, we also propose a new perfect sampling algorithm for Hawkes process with improved computational efficiency compared to the existing algorithm. Theoretical analysis and numerical tests on the algorithms' correctness and efficiency are also included.

preprint2016arXiv

$ε$-Strong Simulation for Multidimensional Stochastic Differential Equations via Rough Path Analysis

Consider a multidimensional diffusion process $X=\{X\left(t\right) :t\in\lbrack0,1]\}$. Let $\varepsilon>0$ be a \textit{deterministic}, user defined, tolerance error parameter. Under standard regularity conditions on the drift and diffusion coefficients of $X$, we construct a probability space, supporting both $X$ and an explicit, piecewise constant, fully simulatable process $X_{\varepsilon}$ such that \[ \sup_{0\leq t\leq1}\left\Vert X_{\varepsilon}\left(t\right) -X\left(t\right) \right\Vert_{\infty}<\varepsilon \] with probability one. Moreover, the user can adaptively choose $\varepsilon^{\prime}\in\left(0,\varepsilon\right) $ so that $X_{\varepsilon^{\prime}}$ (also piecewise constant and fully simulatable) can be constructed conditional on $X_{\varepsilon}$ to ensure an error smaller than $\varepsilon^{\prime}>0$ with probability one. Our construction requires a detailed study of continuity estimates of the Ito map using Lyon's theory of rough paths. We approximate the underlying Brownian motion, jointly with the Lévy areas with a deterministic $\varepsilon$ error in the underlying rough path metric.

preprint2016arXiv

A General Retraining Framework for Scalable Adversarial Classification

Traditional classification algorithms assume that training and test data come from similar distributions. This assumption is violated in adversarial settings, where malicious actors modify instances to evade detection. A number of custom methods have been developed for both adversarial evasion attacks and robust learning. We propose the first systematic and general-purpose retraining framework which can: a) boost robustness of an \emph{arbitrary} learning algorithm, in the face of b) a broader class of adversarial models than any prior methods. We show that, under natural conditions, the retraining framework minimizes an upper bound on optimal adversarial risk, and show how to extend this result to account for approximations of evasion attacks. Extensive experimental evaluation demonstrates that our retraining methods are nearly indistinguishable from state-of-the-art algorithms for optimizing adversarial risk, but are more general and far more scalable. The experiments also confirm that without retraining, our adversarial framework dramatically reduces the effectiveness of learning. In contrast, retraining significantly boosts robustness to evasion attacks without significantly compromising overall accuracy.

preprint2016arXiv

Latent Attention For If-Then Program Synthesis

Automatic translation from natural language descriptions into programs is a longstanding challenging problem. In this work, we consider a simple yet important sub-problem: translation from textual descriptions to If-Then programs. We devise a novel neural network architecture for this task which we train end-to-end. Specifically, we introduce Latent Attention, which computes multiplicative weights for the words in the description in a two-stage process with the goal of better leveraging the natural language structures that indicate the relevant parts for predicting program elements. Our architecture reduces the error rate by 28.57% compared to prior art. We also propose a one-shot learning scenario of If-Then program synthesis and simulate it with our existing dataset. We demonstrate a variation on the training procedure for this scenario that outperforms the original procedure, significantly closing the gap to the model trained with all data.

preprint2016arXiv

Perfect Sampling and Gradient Simulation for Fork-Join Networks

Fork-join network is a class of queueing networks with applications in manufactory, healthcare and computation systems. In this paper, we develop a simulation algorithm that (1) generates i.i.d. samples of the job sojourn time, jointly with the number of waiting tasks, exactly following the steady-state distribution, and (2) unbiased estimators of the derivatives of the job sojourn time with respect to the service rates of the servers in the network. The algorithm is designed based on the Coupling from the Past (CFTP) and Infinitesimal Perturbation Analysis (IPA) techniques. Two numerical examples are reported, including the special 2-station case where analytic results on the steady-state distribution is known and a 10-station network with a bottleneck.

preprint2016arXiv

Perfect Sampling of Generalized Jackson Network

We provide the first perfect sampling algorithm for a Generalized Jackson Network of FIFO queues under arbitrary topology and non-Markovian assumptions on the input of the network. We assume (in addition to stability) that the interarrival and service times of customers have finite moment generating function in a neighborhood of the origin, and the interarrival times have unbounded support.

preprint2016arXiv

Rates of Convergence to Stationarity for Multidimensional RBM

We provide the first rate of convergence analysis for RBM as the dimension grows under natural uniformity conditions. In particular, if the underlying routing matrix is uniformly contractive, uniform stability of the drift vector holds, and the variances of the underlying Brownian Motion (BM) are bounded, then we show that the RBM converges exponentially fast to stationarity with a relaxation time of order $O(d^4\log(d)^2)$ as $d\to\infty$.

preprint2015arXiv

Steady-state simulation of reflected Brownian motion and related stochastic networks

This paper develops the first class of algorithms that enable unbiased estimation of steady-state expectations for multidimensional reflected Brownian motion. In order to explain our ideas, we first consider the case of compound Poisson (possibly Markov modulated) input. In this case, we analyze the complexity of our procedure as the dimension of the network increases and show that, under certain assumptions, the algorithm has polynomial-expected termination time. Our methodology includes procedures that are of interest beyond steady-state simulation and reflected processes. For instance, we use wavelets to construct a piecewise linear function that can be guaranteed to be within $\varepsilon$ distance (deterministic) in the uniform norm to Brownian motion in any compact time interval.

preprint2013arXiv

Continuous-time Modeling of Bid-Ask Spread and Price Dynamics in Limit Order Books

We derive a continuous time model for the joint evolution of the mid price and the bid-ask spread from a multiscale analysis of the whole limit order book (LOB) dynamics. We model the LOB as a multiclass queueing system and perform our asymptotic analysis using stylized features observed empirically. We argue that in the asymptotic regime supported by empirical observations the mid price and bid-ask-spread can be described using only certain parameters of the book (not the whole book itself). Our limit process is characterized by reflecting behavior and state-dependent jumps. Our analysis allows to explain certain characteristics observed in practice such as: the connection between power-law decaying tails in the volumes of the order book and the returns, as well as statistical properties of the long-run spread distribution.

Xinyun Chen

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

Symbol tuning improves in-context learning in language models

An online learning approach to dynamic pricing and capacity sizing in service systems

Competition-Level Code Generation with AlphaCode

Measuring and Improving Compositional Generalization in Text-to-SQL via Component Alignment

RobustART: Benchmarking Robustness on Architecture Design and Training Techniques

Tail Quantile Estimation for Non-preemptive Priority Queues

Understanding Robustness in Teacher-Student Setting: A New Perspective

Efficient Steady-state Simulation of High-dimensional Stochastic Networks

Perfect Sampling of Hawkes Processes and Queues with Hawkes Arrivals

$ε$-Strong Simulation for Multidimensional Stochastic Differential Equations via Rough Path Analysis

A General Retraining Framework for Scalable Adversarial Classification

Latent Attention For If-Then Program Synthesis

Perfect Sampling and Gradient Simulation for Fork-Join Networks

Perfect Sampling of Generalized Jackson Network

Rates of Convergence to Stationarity for Multidimensional RBM

Steady-state simulation of reflected Brownian motion and related stochastic networks

Continuous-time Modeling of Bid-Ask Spread and Price Dynamics in Limit Order Books