Source author record

Wenyu Du

Wenyu Du appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation Computation and Language Methodology math.ST Statistics Theory Databases Information Theory math.IT

Catalog footprint

What is connected

8works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Graphix-T5: Mixing Pre-Trained Transformers with Graph-Aware Layers for Text-to-SQL Parsing

The task of text-to-SQL parsing, which aims at converting natural language questions into executable SQL queries, has garnered increasing attention in recent years, as it can assist end users in efficiently extracting vital information from databases without the need for technical background. One of the major challenges in text-to-SQL parsing is domain generalization, i.e., how to generalize well to unseen databases. Recently, the pre-trained text-to-text transformer model, namely T5, though not specialized for text-to-SQL parsing, has achieved state-of-the-art performance on standard benchmarks targeting domain generalization. In this work, we explore ways to further augment the pre-trained T5 model with specialized components for text-to-SQL parsing. Such components are expected to introduce structural inductive bias into text-to-SQL parsers thus improving model's capacity on (potentially multi-hop) reasoning, which is critical for generating structure-rich SQLs. To this end, we propose a new architecture GRAPHIX-T5, a mixed model with the standard pre-trained transformer model augmented by some specially-designed graph-aware layers. Extensive experiments and analysis demonstrate the effectiveness of GRAPHIX-T5 across four text-to-SQL benchmarks: SPIDER, SYN, REALISTIC and DK. GRAPHIX-T5 surpass all other T5-based parsers with a significant margin, achieving new state-of-the-art performance. Notably, GRAPHIX-T5-large reach performance superior to the original T5-large by 5.7% on exact match (EM) accuracy and 6.6% on execution accuracy (EX). This even outperforms the T5-3B by 1.2% on EM and 1.5% on EX.

preprint2022arXiv

Linguistic Dependencies and Statistical Dependence

Are pairs of words that tend to occur together also likely to stand in a linguistic dependency? This empirical question is motivated by a long history of literature in cognitive science, psycholinguistics, and NLP. In this work we contribute an extensive analysis of the relationship between linguistic dependencies and statistical dependence between words. Improving on previous work, we introduce the use of large pretrained language models to compute contextualized estimates of the pointwise mutual information between words (CPMI). For multiple models and languages, we extract dependency trees which maximize CPMI, and compare to gold standard linguistic dependencies. Overall, we find that CPMI dependencies achieve an unlabelled undirected attachment score of at most $\approx 0.5$. While far above chance, and consistently above a non-contextualized PMI baseline, this score is generally comparable to a simple baseline formed by connecting adjacent words. We analyze which kinds of linguistic dependencies are best captured in CPMI dependencies, and also find marked differences between the estimates of the large pretrained language models, illustrating how their different training schemes affect the type of dependencies they capture.

preprint2020arXiv

Exploiting Syntactic Structure for Better Language Modeling: A Syntactic Distance Approach

It is commonly believed that knowledge of syntactic structure should improve language modeling. However, effectively and computationally efficiently incorporating syntactic structure into neural language models has been a challenging topic. In this paper, we make use of a multi-task objective, i.e., the models simultaneously predict words as well as ground truth parse trees in a form called "syntactic distances", where information between these two separate objectives shares the same intermediate representation. Experimental results on the Penn Treebank and Chinese Treebank datasets show that when ground truth parse trees are provided as additional training signals, the model is able to achieve lower perplexity and induce trees with better quality.

preprint2015arXiv

On Robustness of the Shiryaev-Roberts Procedure for Quickest Change-Point Detection under Parameter Misspecification in the Post-Change Distribution

The gist of the quickest change-point detection problem is to detect the presence of a change in the statistical behavior of a series of sequentially made observations, and do so in an optimal detection-speed-vs.-"false-positive"-risk manner. When optimality is understood either in the generalized Bayesian sense or as defined in Shiryaev's multi-cyclic setup, the so-called Shiryaev-Roberts (SR) detection procedure is known to be the "best one can do", provided, however, that the observations' pre- and post-change distributions are both fully specified. We consider a more realistic setup, viz. one where the post-change distribution is assumed known only up to a parameter, so that the latter may be "misspecified". The question of interest is the sensitivity (or robustness) of the otherwise "best" SR procedure with respect to a possible misspecification of the post-change distribution parameter. To answer this question, we provide a case study where, in a specific Gaussian scenario, we allow the SR procedure to be "out of tune" in the way of the post-change distribution parameter, and numerically assess the effect of the "mistuning" on Shiryaev's (multi-cyclic) Stationary Average Detection Delay delivered by the SR procedure. The comprehensive quantitative robustness characterization of the SR procedure obtained in the study can be used to develop the respective theory as well as to provide a rational for practical design of the SR procedure. The overall qualitative conclusion of the study is an expected one: the SR procedure is less (more) robust for less (more) contrast changes and for lower (higher) levels of the false alarm risk.

preprint2014arXiv

An Exact Formula for the Average Run Length to False Alarm of the Generalized Shiryaev-Roberts Procedure for Change-Point Detection under Exponential Observations

We derive analytically an exact closed-form formula for the standard minimax Average Run Length (ARL) to false alarm delivered by the Generalized Shiryaev-Roberts (GSR) change-point detection procedure devised to detect a shift in the baseline mean of a sequence of independent exponentially distributed observations. Specifically, the formula is found through direct solution of the respective integral (renewal) equation, and is a general result in that the GSR procedure's headstart is not restricted to a bounded range, nor is there a "ceiling" value for the detection threshold. Apart from the theoretical significance (in change-point detection, exact closed-form performance formulae are typically either difficult or impossible to get, especially for the GSR procedure), the obtained formula is also useful to a practitioner: in cases of practical interest, the formula is a function linear in both the detection threshold and the headstart, and, therefore, the ARL to false alarm of the GSR procedure can be easily computed.

preprint2013arXiv

An Accurate Method for Determining the Pre-Change Run-Length Distribution of the Generalized Shiryaev--Roberts Detection Procedure

Change-of-measure is a powerful technique used across statistics, probability and analysis. Particularly known as Wald's likelihood ratio identity, the technique enabled the proof of a number of exact and asymptotic optimality results pertaining to the problem of quickest change-point detection. Within the latter problem's context we apply the technique to develop a numerical method to compute the Generalized Shiryaev--Roberts (GSR) detection procedure's pre-change Run-Length distribution. Specifically, the method is based on the integral-equations approach and uses the collocation framework with the basis functions chosen so as to exploit a certain change-of-measure identity and a specific martingale property of the GSR procedure's detection statistic. As a result, the method's accuracy and robustness improve substantially, even though the method's theoretical rate of convergence is shown to be merely quadratic. A tight upper bound on the method's error is supplied as well. The method is not restricted to a particular data distribution or to a specific value of the GSR detection statistic's "headstart". To conclude, we offer a case study to demonstrate the proposed method at work, drawing particular attention to the method's accuracy and its robustness with respect to three factors: (a) partition size, (b) change magnitude, and (c) Average Run Length (ARL) to false alarm level. Specifically, assuming independent standard Gaussian observations undergoing a surge in the mean, we employ the method to study the GSR procedure's Run-Length's pre-change distribution, its average (i.e., the usual ARL to false alarm) and standard deviation. As expected from the theoretical analysis, the method's high accuracy and robustness with respect to the foregoing three factors are confirmed experimentally. We also comment on extending the method to handle other performance measures and other procedures.

preprint2013arXiv

Efficient Performance Evaluation of the Generalized Shiryaev--Roberts Detection Procedure in a Multi-Cyclic Setup

We propose a numerical method to evaluate the performance of the emerging Generalized Shiryaev--Roberts (GSR) change-point detection procedure in a "minimax-ish" multi-cyclic setup where the procedure of choice is applied repetitively (cyclically) and the change is assumed to take place at an unknown time moment in a distant-future stationary regime. Specifically, the proposed method is based on the integral-equations approach and uses the collocation technique with the basis functions chosen so as to exploit a certain change-of-measure identity and the GSR detection statistic's unique martingale property. As a result, the method's accuracy and robustness improve, as does its efficiency since using the change-of-measure ploy the Average Run Length (ARL) to false alarm and the Stationary Average Detection Delay (STADD) are computed simultaneously. We show that the method's rate of convergence is quadratic and supply a tight upperbound on its error. We conclude with a case study and confirm experimentally that the proposed method's accuracy and rate of convergence are robust with respect to three factors: (a) partition fineness (coarse vs. fine), (b) change magnitude (faint vs. contrast), and (c) the level of the ARL to false alarm (low vs. high). Since the method is designed not restricted to a particular data distribution or to a specific value of the GSR detection statistic's headstart, this work may help gain greater insight into the characteristics of the GSR procedure and aid a practitioner to design the GSR procedure as needed while fully utilizing its potential.

preprint2013arXiv

Quickest Change-Point Detection: A Bird's Eye View

We provide a bird's eye view onto the area of sequential change-point detection. We focus on the discrete-time case with known pre- and post-change data distributions and offer a summary of the forefront asymptotic results established in each of the four major formulations of the underlying optimization problem: Bayesian, generalized Bayesian, minimax, and multi-cyclic.

Wenyu Du

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Graphix-T5: Mixing Pre-Trained Transformers with Graph-Aware Layers for Text-to-SQL Parsing

Linguistic Dependencies and Statistical Dependence

Exploiting Syntactic Structure for Better Language Modeling: A Syntactic Distance Approach

On Robustness of the Shiryaev-Roberts Procedure for Quickest Change-Point Detection under Parameter Misspecification in the Post-Change Distribution

An Exact Formula for the Average Run Length to False Alarm of the Generalized Shiryaev-Roberts Procedure for Change-Point Detection under Exponential Observations

An Accurate Method for Determining the Pre-Change Run-Length Distribution of the Generalized Shiryaev--Roberts Detection Procedure

Efficient Performance Evaluation of the Generalized Shiryaev--Roberts Detection Procedure in a Multi-Cyclic Setup

Quickest Change-Point Detection: A Bird's Eye View