Source author record

Yu Du

Yu Du appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.NA math.OC Methodology Applications Artificial Intelligence Computation and Language Computer Vision Machine Learning Computation cond-mat.mtrl-sci Discrete Mathematics Information Retrieval math.QA math.RA Neural and Evolutionary Computing Numerical Analysis quant-ph

Catalog footprint

What is connected

18works

17topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A matching design for augmenting a randomized clinical trial with external control

The use of information from real world to assess the effectiveness of medical products is becoming increasingly popular and more acceptable by regulatory agencies. According to a strategic real-world evidence framework published by U.S. Food and Drug Administration, a hybrid randomized controlled trial that augments internal control arm with real-world data is a pragmatic approach worth more attention. In this paper, we aim to improve on existing matching designs for such a hybrid randomized controlled trial. In particular, we propose to match the entire concurrent randomized clinical trial (RCT) such that (1) the matched external control subjects used to augment the internal control arm are as comparable as possible to the RCT population, (2) every active treatment arm in an RCT with multiple treatments is compared with the same control group, and (3) matching can be conducted and the matched set locked before treatment unblinding to better maintain the data integrity. Besides a weighted estimator, we also introduce a bootstrap method to obtain its variance estimation. The finite sample performance of the proposed method is evaluated by simulations based on data from a real clinical trial.

preprint2022arXiv

A Unified and Biologically-Plausible Relational Graph Representation of Vision Transformers

Vision transformer (ViT) and its variants have achieved remarkable successes in various visual tasks. The key characteristic of these ViT models is to adopt different aggregation strategies of spatial patch information within the artificial neural networks (ANNs). However, there is still a key lack of unified representation of different ViT architectures for systematic understanding and assessment of model representation performance. Moreover, how those well-performing ViT ANNs are similar to real biological neural networks (BNNs) is largely unexplored. To answer these fundamental questions, we, for the first time, propose a unified and biologically-plausible relational graph representation of ViT models. Specifically, the proposed relational graph representation consists of two key sub-graphs: aggregation graph and affine graph. The former one considers ViT tokens as nodes and describes their spatial interaction, while the latter one regards network channels as nodes and reflects the information communication between channels. Using this unified relational graph representation, we found that: a) a sweet spot of the aggregation graph leads to ViTs with significantly improved predictive performance; b) the graph measures of clustering coefficient and average path length are two effective indicators of model prediction performance, especially when applying on the datasets with small samples; c) our findings are consistent across various ViT architectures and multiple datasets; d) the proposed relational graph representation of ViT has high similarity with real BNNs derived from brain science data. Overall, our work provides a novel unified and biologically-plausible paradigm for more interpretable and effective representation of ViT ANNs.

preprint2022arXiv

Charge Carrier Mediation and Ferromagnetism induced in MnBi6Te10 Magnetic Topological Insulators by antimony doping

A new kind of intrinsic magnetic topological insulators (MTI) MnBi2Te4 family have shed light on the observation of novel topological quantum effect such as quantum anomalous Hall effect (QAHE). However, the strong anti-ferromagnetic (AFM) coupling and high carrier concentration in the bulk hinder the practical applications. In closely related materials MnBi4Te7 and MnBi6Te10, the interlayer magnetic coupling is greatly suppressed by Bi2Te3 layer intercalation. However, AFM is still the ground state in these compounds. Here by magnetic and transport measurements, we demonstrate that Sb substitutional dopant plays a dual role in MnBi6Te10, which can not only adjust the charge carrier type and the concentration, but also induce the solid into a ferromagnetic (FM) ground state. AFM ground state region which is also close to the charge neutral point can be found in the phase diagram of Mn(SbxBi1-x)6Te10 when x ~ 0.25. An intrinsic FM-MTI candidate is thus demonstrated, and it may take a step further for the realization of high-quality and high-temperature QAHE and the related topological quantum effects in the future.

preprint2022arXiv

HyperPrompt: Prompt-based Task-Conditioning of Transformers

Prompt-Tuning is a new paradigm for finetuning pre-trained language models in a parameter-efficient way. Here, we explore the use of HyperNetworks to generate hyper-prompts: we propose HyperPrompt, a novel architecture for prompt-based task-conditioning of self-attention in Transformers. The hyper-prompts are end-to-end learnable via generation by a HyperNetwork. HyperPrompt allows the network to learn task-specific feature maps where the hyper-prompts serve as task global memories for the queries to attend to, at the same time enabling flexible information sharing among tasks. We show that HyperPrompt is competitive against strong multi-task learning baselines with as few as $0.14\%$ of additional task-conditioning parameters, achieving great parameter and computational efficiency. Through extensive empirical experiments, we demonstrate that HyperPrompt can achieve superior performances over strong T5 multi-task learning baselines and parameter-efficient adapter variants including Prompt-Tuning and HyperFormer++ on Natural Language Understanding benchmarks of GLUE and SuperGLUE across many model sizes.

preprint2022arXiv

LaMDA: Language Models for Dialog Applications

We present LaMDA: Language Models for Dialog Applications. LaMDA is a family of Transformer-based neural language models specialized for dialog, which have up to 137B parameters and are pre-trained on 1.56T words of public dialog data and web text. While model scaling alone can improve quality, it shows less improvements on safety and factual grounding. We demonstrate that fine-tuning with annotated data and enabling the model to consult external knowledge sources can lead to significant improvements towards the two key challenges of safety and factual grounding. The first challenge, safety, involves ensuring that the model's responses are consistent with a set of human values, such as preventing harmful suggestions and unfair bias. We quantify safety using a metric based on an illustrative set of human values, and we find that filtering candidate responses using a LaMDA classifier fine-tuned with a small amount of crowdworker-annotated data offers a promising approach to improving model safety. The second challenge, factual grounding, involves enabling the model to consult external knowledge sources, such as an information retrieval system, a language translator, and a calculator. We quantify factuality using a groundedness metric, and we find that our approach enables the model to generate responses grounded in known sources, rather than responses that merely sound plausible. Finally, we explore the use of LaMDA in the domains of education and content recommendations, and analyze their helpfulness and role consistency.

preprint2022arXiv

Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model

Recently, vision-language pre-training shows great potential in open-vocabulary object detection, where detectors trained on base classes are devised for detecting new classes. The class text embedding is firstly generated by feeding prompts to the text encoder of a pre-trained vision-language model. It is then used as the region classifier to supervise the training of a detector. The key element that leads to the success of this model is the proper prompt, which requires careful words tuning and ingenious design. To avoid laborious prompt engineering, there are some prompt representation learning methods being proposed for the image classification task, which however can only be sub-optimal solutions when applied to the detection task. In this paper, we introduce a novel method, detection prompt (DetPro), to learn continuous prompt representations for open-vocabulary object detection based on the pre-trained vision-language model. Different from the previous classification-oriented methods, DetPro has two highlights: 1) a background interpretation scheme to include the proposals in image background into the prompt training; 2) a context grading scheme to separate proposals in image foreground for tailored prompt training. We assemble DetPro with ViLD, a recent state-of-the-art open-world object detector, and conduct experiments on the LVIS as well as transfer learning on the Pascal VOC, COCO, Objects365 datasets. Experimental results show that our DetPro outperforms the baseline ViLD in all settings, e.g., +3.4 APbox and +3.0 APmask improvements on the novel classes of LVIS. Code and models are available at https://github.com/dyabel/detpro.

preprint2022arXiv

Testing for Treatment Effect Twice Using Internal and External Controls in Clinical Trials

Leveraging external controls -- relevant individual patient data under control from external trials or real-world data -- has the potential to reduce the cost of randomized controlled trials (RCTs) while increasing the proportion of trial patients given access to novel treatments. However, due to lack of randomization, RCT patients and external controls may differ with respect to covariates that may or may not have been measured. Hence, after controlling for measured covariates, for instance by matching, testing for treatment effect using external controls may still be subject to unmeasured biases. In this paper, we propose a sensitivity analysis approach to quantify the magnitude of unmeasured bias that would be needed to alter the study conclusion that presumed no unmeasured biases are introduced by employing external controls. Whether leveraging external controls increases power or not depends on the interplay between sample sizes and the magnitude of treatment effect and unmeasured biases, which may be difficult to anticipate. This motivates a combined testing procedure that performs two highly correlated analyses, one with and one without external controls, with a small correction for multiple testing using the joint distribution of the two test statistics. The combined test provides a new method of sensitivity analysis designed for data fusion problems, which anchors at the unbiased analysis based on RCT only and spends a small proportion of the type I error to also test using the external controls. In this way, if leveraging external controls increases power, the power gain compared to the analysis based on RCT only can be substantial; if not, the power loss is small. The proposed method is evaluated in theory and power calculations, and applied to a real trial.

preprint2022arXiv

Using Targeted Maximum Likelihood Estimation to Estimate Treatment Effect with Longitudinal Continuous or Binary Data: A Systematic Evaluation of 28 Diabetes Clinical Trials

The primary analysis of clinical trials in diabetes therapeutic area often involves a mixed-model repeated measure (MMRM) approach to estimate the average treatment effect for longitudinal continuous outcome, and a generalized linear mixed model (GLMM) approach for longitudinal binary outcome. In this paper, we considered another estimator of the average treatment effect, called targeted maximum likelihood estimator (TMLE). This estimator can be a one-step alternative to model either continuous or binary outcome. We compared those estimators by simulation studies and by analyzing real data from 28 diabetes clinical trials. The simulations involved different missing data scenarios, and the real data sets covered a wide range of possible distributions of the outcome and covariates in real-life clinical trials for diabetes drugs with different mechanisms of action. For all the settings, adjusted estimators tended to be more efficient than the unadjusted one. In the setting of longitudinal continuous outcome, the MMRM approach with visits and baseline variables interaction appeared to dominate the performance of the MMRM considering the main effects only for the baseline variables while showing better or comparable efficiency to the TMLE estimator in both simulations and data applications. For modeling longitudinal binary outcome, TMLE generally outperformed GLMM in terms of relative efficiency, and its avoidance of the cumbersome covariance fitting procedure from GLMM makes TMLE a more advantageous estimator.

preprint2021arXiv

Perfectly Matched Layers for nonlocal Helmholtz equations II: multi-dimensional cases

Perfectly matched layers (PMLs) are formulated and applied to numerically solve nonlocal Helmholtz equations in one and two dimensions. In one dimension, we present the PML modifications for the nonlocal Helmholtz equation with general kernels and theoretically show its effectiveness in some sense. In two dimensions, we give the PML modifications in both Cartesian coordinates and polar coordinates. Based on the PML modifications, nonlocal Helmholtz equations are truncated in one and two dimensional spaces, and asymptotic compatibility schemes are introduced to discretize the resulting truncated problems. Finally, numerical examples are provided to study the "numerical reflections" by PMLs and demonstrate the effectiveness and validation of our nonlocal PML strategy.

preprint2020arXiv

Quantum Bridge Analytics II: Network Optimization and Combinatorial Chaining for Asset Exchange

Quantum Bridge Analytics relates to methods and systems for hybrid classical-quantum computing, and is devoted to developing tools for bridging classical and quantum computing to gain the benefits of their alliance in the present and enable enhanced practical application of quantum computing in the future. This is the second of a two-part tutorial that surveys key elements of Quantum Bridge Analytics and its applications. Part I focused on the Quadratic Unconstrained Binary Optimization (QUBO) model which is presently the most widely applied optimization model in the quantum computing area, and which unifies a rich variety of combinatorial optimization problems. Part II (the present paper) examines an application that augments the use of QUBO models, by disclosing a context for coordinating QUBO solutions through a model we call the Asset Exchange Problem (AEP). Solutions to the AEP enable individuals or institutions to take fuller advantage of solutions to their QUBO models by exchanges of assets that benefit all participants. Such exchanges are generated by a combination of two optimization technologies, one grounded in network optimization and one based on a new metaheuristic optimization approach called combinatorial chaining. This combination provides a flexibility to solve AEP variants that open the door to additional links to quantum computing applications and additional applications via the Quantum Bridge Analytics perspective. We show how this modeling and solution capability gives rise to an Asset Exchange Technology that embraces a broad range of financial, industrial, scientific and social settings. Examples are presented that show the nature of these processes from a tutorial perspective.

preprint2020arXiv

Understanding and adjusting the selection bias from a proof-of-concept study to a more confirmatory study

It has long been noticed that the efficacy observed in small early phase studies is generally better than that observed in later larger studies. Historically, the inflation of the efficacy results from early proof-of-concept studies is either ignored, or adjusted empirically using a frequentist or Bayesian approach. In this article, we systematically explained the underlying reason for the inflation of efficacy results in small early phase studies from the perspectives of measurement error models and selection bias. A systematic method was built to adjust the early phase study results from both frequentist and Bayesian perspectives. A hierarchical model was proposed to estimate the distribution of the efficacy for a portfolio of compounds, which can serve as the prior distribution for the Bayesian approach. We showed through theory that the systematic adjustment provides an unbiased estimator for the true mean efficacy for a portfolio of compounds. The adjustment was applied to paired data for the efficacy in early small and later larger studies for a set of compounds in diabetes and immunology. After the adjustment, the bias in the early phase small studies seems to be diminished.

preprint2020arXiv

Zero-Shot Heterogeneous Transfer Learning from Recommender Systems to Cold-Start Search Retrieval

Many recent advances in neural information retrieval models, which predict top-K items given a query, learn directly from a large training set of (query, item) pairs. However, they are often insufficient when there are many previously unseen (query, item) combinations, often referred to as the cold start problem. Furthermore, the search system can be biased towards items that are frequently shown to a query previously, also known as the 'rich get richer' (a.k.a. feedback loop) problem. In light of these problems, we observed that most online content platforms have both a search and a recommender system that, while having heterogeneous input spaces, can be connected through their common output item space and a shared semantic representation. In this paper, we propose a new Zero-Shot Heterogeneous Transfer Learning framework that transfers learned knowledge from the recommender system component to improve the search component of a content platform. First, it learns representations of items and their natural-language features by predicting (item, item) correlation graphs derived from the recommender system as an auxiliary task. Then, the learned representations are transferred to solve the target search retrieval task, performing query-to-item prediction without having seen any (query, item) pairs in training. We conduct online and offline experiments on one of the world's largest search and recommender systems from Google, and present the results and lessons learned. We demonstrate that the proposed approach can achieve high performance on offline search retrieval tasks, and more importantly, achieved significant improvements on relevance and user interactions over the highly-optimized production system in online experiments.

preprint2018arXiv

SQUAREM: An R Package for Off-the-Shelf Acceleration of EM, MM and Other EM-like Monotone Algorithms

We discuss R package SQUAREM for accelerating iterative algorithms which exhibit slow, monotone convergence. These include the well-known expectation-maximization algorithm, majorize-minimize (MM), and other EM-like algorithms such as expectation conditional maximization, and generalized EM algorithms. We demonstrate the simplicity, generality, and power of SQUAREM through a wide array of applications of EM/MM problems, including binary Poisson mixture, factor analysis, interval censoring, genetics admixture, and logistic regression maximum likelihood estimation (an MM problem). We show that SQUAREM is easy to apply, and can accelerate any fixed-point, smooth, contraction mapping with linear convergence rate. Squared iterative scheme (Squarem) algorithm provides significant speed-up of EM-like algorithms. The margin of the advantage for Squarem is especially huge for high-dimensional problems or when EM step is relatively time-consuming to evaluate. Squarem can be used off-the-shelf since there is no need for the user to tweak any control parameters to optimize performance. Given its remarkable ease of use, Squarem may be considered as a default accelerator for slowly converging EM-like algorithms. All the comparisons of CPU computing time in the paper are made on a quad-core 2.3 GHz Intel Core i7 Mac computer. R Package SQUAREM can be downloaded at https://cran.r-project.org/web/packages/SQUAREM/index.html.

preprint2016arXiv

Rate of Convergence of the Bundle Method

We prove that the bundle method for nonsmooth optimization achieves solution accuracy $\varepsilon$ in at most $\mathcal{O}\big(\ln(1/\varepsilon)/\varepsilon\big)$ iterations, if the function is strongly convex. The result is true for the versions of the method with multiple cuts and with cut aggregation.

preprint2016arXiv

Selective Linearization For Multi-Block Convex Optimization

We consider the problem of minimizing a sum of several convex non-smooth functions. We introduce a new algorithm called the selective linearization method, which iteratively linearizes all but one of the functions and employs simple proximal steps. The algorithm is a form of multiple operator splitting in which the order of processing partial functions is not fixed, but rather determined in the course of calculations. Global convergence is proved and estimates of the convergence rate are derived. Specifically, the number of iterations needed to achieve solution accuracy $\varepsilon$ is of order $\mathcal{O}\big(\ln(1/\varepsilon)/\varepsilon\big)$. We also illustrate the operation of the algorithm on structured regularization problems.

preprint2016arXiv

Superconvergence analysis of DG-FEM based on the polynomial preserving recovery for Helmholtz equation with high wave number

We study superconvergence property of the linear discontinuous Galerkin finite element method with the polynomial preserving recovery (PPR) and Richardson extrapolation for the two dimensional Helmholtz equation. The error estimate with explicit dependence on the wave number $k$, the penalty parameter $μ$ and the mesh condition parameter $α$ is derived. First, we prove that under the assumption $k(kh)^2\leq C_0$ ($h$ is the mesh size) and certain mesh condition, the estimate between the finite element solution and the linear interpolation of the exact solution is superconvergent under the $\norme{\cdot}$-seminorm. Second, we prove a superconvergence result for the recovered gradient by PPR. Furthermore, we estimate the error between the finite element gradient and recovered gradient, which motivate us to define the a posteriori error estimator. Finally, Some numerical examples are provided to confirm the theoretical results of superconvergence analysis. All theoretical findings are verified by numerical tests.

preprint2014arXiv

Preasymptotic error analysis of higher order FEM and CIP-FEM for Helmholtz equation with high wave number

A preasymptotic error analysis of the finite element method (FEM) and some continuous interior penalty finite element method (CIP-FEM) for Helmholtz equation in two and three dimensions is proposed. $H^1$- and $L^2$- error estimates with explicit dependence on the wave number $k$ are derived. In particular, it is shown that if $k^{2p+1}h^{2p}$ is sufficiently small, then the pollution errors of both methods in $H^1$-norm are bounded by $O(k^{2p+1}h^{2p})$, which coincides with the phase error of the FEM obtained by existent dispersion analyses on Cartesian grids, where $h$ is the mesh size, $p$ is the order of the approximation space and is fixed. The CIP-FEM extends the classical one by adding more penalty terms on jumps of higher (up to $p$-th order) normal derivatives in order to reduce efficiently the pollution errors of higher order methods. Numerical tests are provided to verify the theoretical findings and to illustrate great capability of the CIP-FEM in reducing the pollution effect.

preprint2005arXiv

On Graded Bialgebra Deformations

We introduce the graded bialgebra deformations, which explain Andruskiewitsch-Schneider's liftings method. We also relate this graded bialgebra deformation with the corresponding graded bialgebra cohomology groups, which is the graded version of the one due to Gerstenhaber-Schack.

Yu Du

What is connected

Connect this record

See the researcher in context

Building this map preview

18 published item(s)

A matching design for augmenting a randomized clinical trial with external control

A Unified and Biologically-Plausible Relational Graph Representation of Vision Transformers

Charge Carrier Mediation and Ferromagnetism induced in MnBi6Te10 Magnetic Topological Insulators by antimony doping

HyperPrompt: Prompt-based Task-Conditioning of Transformers

LaMDA: Language Models for Dialog Applications

Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model

Testing for Treatment Effect Twice Using Internal and External Controls in Clinical Trials

Using Targeted Maximum Likelihood Estimation to Estimate Treatment Effect with Longitudinal Continuous or Binary Data: A Systematic Evaluation of 28 Diabetes Clinical Trials

Perfectly Matched Layers for nonlocal Helmholtz equations II: multi-dimensional cases

Quantum Bridge Analytics II: Network Optimization and Combinatorial Chaining for Asset Exchange

Understanding and adjusting the selection bias from a proof-of-concept study to a more confirmatory study

Zero-Shot Heterogeneous Transfer Learning from Recommender Systems to Cold-Start Search Retrieval

SQUAREM: An R Package for Off-the-Shelf Acceleration of EM, MM and Other EM-like Monotone Algorithms

Rate of Convergence of the Bundle Method

Selective Linearization For Multi-Block Convex Optimization

Superconvergence analysis of DG-FEM based on the polynomial preserving recovery for Helmholtz equation with high wave number

Preasymptotic error analysis of higher order FEM and CIP-FEM for Helmholtz equation with high wave number

On Graded Bialgebra Deformations