Source author record

Jing Gu

Jing Gu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Artificial Intelligence Machine Learning Computer Vision Distributed, Parallel, and Cluster Computing physics.chem-ph

Catalog footprint

What is connected

8works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Memformer: A Memory-Augmented Transformer for Sequence Modeling

Transformers have reached remarkable success in sequence modeling. However, these models have efficiency issues as they need to store all the history token-level representations as memory. We present Memformer, an efficient neural network for sequence modeling, that utilizes an external dynamic memory to encode and retrieve past information. Our model achieves linear time complexity and constant memory space complexity when processing long sequences. We also propose a new optimization scheme, memory replay back-propagation (MRBP), which promotes long-range back-propagation through time with a significantly reduced memory requirement. Experimental results show that Memformer has achieved comparable performance compared to the baselines by using 8.1x less memory space and 3.2x faster on inference. Analysis of the attention pattern shows that our external memory slots can encode and retain important information through timesteps.

preprint2022arXiv

RobustScaler: QoS-Aware Autoscaling for Complex Workloads

Autoscaling is a critical component for efficient resource utilization with satisfactory quality of service (QoS) in cloud computing. This paper investigates proactive autoscaling for widely-used scaling-per-query applications where scaling is required for each query, such as container registry and function-as-a-service (FaaS). In these scenarios, the workload often exhibits high uncertainty with complex temporal patterns like periodicity, noises and outliers. Conservative strategies that scale out unnecessarily many instances lead to high resource costs whereas aggressive strategies may result in poor QoS. We present RobustScaler to achieve superior trade-off between cost and QoS. Specifically, we design a novel autoscaling framework based on non-homogeneous Poisson processes (NHPP) modeling and stochastically constrained optimization. Furthermore, we develop a specialized alternating direction method of multipliers (ADMM) to efficiently train the NHPP model, and rigorously prove the QoS guarantees delivered by our optimization-based proactive strategies. Extensive experiments show that RobustScaler outperforms common baseline autoscaling strategies in various real-world traces, with large margins for complex workload patterns.

preprint2022arXiv

Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

A long-term goal of AI research is to build intelligent agents that can communicate with humans in natural language, perceive the environment, and perform real-world tasks. Vision-and-Language Navigation (VLN) is a fundamental and interdisciplinary research topic towards this goal, and receives increasing attention from natural language processing, computer vision, robotics, and machine learning communities. In this paper, we review contemporary studies in the emerging field of VLN, covering tasks, evaluation metrics, methods, etc. Through structured analysis of current progress and challenges, we highlight the limitations of current VLN and opportunities for future work. This paper serves as a thorough reference for the VLN research community.

preprint2021arXiv

ChainCQG: Flow-Aware Conversational Question Generation

Conversational systems enable numerous valuable applications, and question-answering is an important component underlying many of these. However, conversational question-answering remains challenging due to the lack of realistic, domain-specific training data. Inspired by this bottleneck, we focus on conversational question generation as a means to generate synthetic conversations for training and evaluation purposes. We present a number of novel strategies to improve conversational flow and accommodate varying question types and overall fluidity. Specifically, we design ChainCQG as a two-stage architecture that learns question-answer representations across multiple dialogue turns using a flow propagation training strategy.ChainCQG significantly outperforms both answer-aware and answer-unaware SOTA baselines (e.g., up to 48% BLEU-1 improvement). Additionally, our model is able to generate different types of questions, with improved fluidity and coreference alignment.

preprint2020arXiv

A Tailored Pre-Training Model for Task-Oriented Dialog Generation

The recent success of large pre-trained language models such as BERT and GPT-2 has suggested the effectiveness of incorporating language priors in downstream dialog generation tasks. However, the performance of pre-trained models on the dialog task is not as optimal as expected. In this paper, we propose a Pre-trained Role Alternating Language model (PRAL), designed specifically for task-oriented conversational systems. We adopted (Wu et al., 2019) that models two speakers separately. We also design several techniques, such as start position randomization, knowledge distillation, and history discount to improve pre-training performance. We introduce a task-oriented dialog pretraining dataset by cleaning 13 existing data sets. We test PRAL on three different downstream tasks. The results show that PRAL performs better or on par with state-of-the-art methods.

preprint2020arXiv

Data Annealing for Informal Language Understanding Tasks

There is a huge performance gap between formal and informal language understanding tasks. The recent pre-trained models that improved the performance of formal language understanding tasks did not achieve a comparable result on informal language. We pro-pose a data annealing transfer learning procedure to bridge the performance gap on informal natural language understanding tasks. It successfully utilizes a pre-trained model such as BERT in informal language. In our data annealing procedure, the training set contains mainly formal text data at first; then, the proportion of the informal text data is gradually increased during the training process. Our data annealing procedure is model-independent and can be applied to various tasks. We validate its effectiveness in exhaustive experiments. When BERT is implemented with our learning procedure, it outperforms all the state-of-the-art models on the three common informal language tasks.

preprint2020arXiv

Perception Score, A Learned Metric for Open-ended Text Generation Evaluation

Automatic evaluation for open-ended natural language generation tasks remains a challenge. Existing metrics such as BLEU show a low correlation with human judgment. We propose a novel and powerful learning-based evaluation metric: Perception Score. The method measures the overall quality of the generation and scores holistically instead of only focusing on one evaluation criteria, such as word overlapping. Moreover, it also shows the amount of uncertainty about its evaluation result. By connecting the uncertainty, Perception Score gives a more accurate evaluation for the generation system. Perception Score provides state-of-the-art results on two conditional generation tasks and two unconditional generation tasks.

preprint2014arXiv

Theoretic Insight into CO2 Reduction at Active Sites of Molybdenum and Tungsten Enzymes: a π Interaction between CO2 and Tungsten Bis-Dithiolene Complexes

Active sites of molybdenum and tungsten enzymes, particularly mononuclear tungsten formate dehydrogenase (FDH) have been theoretically investigated towards their interaction with CO2. Obvious π interaction has been found between the 2e reduced metallodithiole moiety and the molecular CO2. This weak π bonding is predicated both at gas phase, noted as -6.0 kcal/mol and aqueous solvation level, -3.6 kcal/mol. Such interaction is not only limited to CO2, but also to the CO2 reduced product, i.e. formate, in the form of anion- π interaction, noted as -6.8 kcal/mol and -4.1 kcal/mol respectively in gas and aqueous solvation model. The Bailar twisted angles from 60o to 0o, governing structure preference of tungsten dithiolene from octahedron to triangle prism in their restricted structures, has been explored to evaluate such π in-terrelations with CO2 and formate. An octahedral structure with 3 kcal/mol energy lower is preferred over the triangle prismatic when such interactions are concerned.

Jing Gu

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Memformer: A Memory-Augmented Transformer for Sequence Modeling

RobustScaler: QoS-Aware Autoscaling for Complex Workloads

Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

ChainCQG: Flow-Aware Conversational Question Generation

A Tailored Pre-Training Model for Task-Oriented Dialog Generation

Data Annealing for Informal Language Understanding Tasks

Perception Score, A Learned Metric for Open-ended Text Generation Evaluation

Theoretic Insight into CO2 Reduction at Active Sites of Molybdenum and Tungsten Enzymes: a π Interaction between CO2 and Tungsten Bis-Dithiolene Complexes