Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
52works
0followers
31topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

52 published item(s)

preprint2026arXiv

DataArc-SynData-Toolkit: A Unified Closed-Loop Framework for Multi-Path, Multimodal, and Multilingual Data Synthesis

Synthetic data has emerged as a crucial solution to the data scarcity bottleneck in large language models (LLMs), particularly for specialized domains and low-resource languages. However, the broader adoption of existing synthetic data tools is severely hindered by convoluted workflows, fragmented data standards, and limited scalability across modalities. To address these limitations, we develop DataArc-SynData-Toolkit, an open-source framework featuring: (1) a configuration-driven, end-to-end pipeline equipped with an intuitive visual interface and simplified CLI for exceptional usability; (2) a unified, quality-controllable synthesis paradigm that standardizes multi-source data generation to ensure high reusability; and (3) a highly modular architecture designed for seamless multimodal, multilingual, and multi-task adaptation. We apply the toolkit in multiple application scenarios. Experimental results demonstrate that our toolkit achieves an optimal balance between generation efficiency and data quality. By offering an end-to-end and visually interactive pipeline, DataArc-SynData-Toolkit significantly lowers the technical barrier to synthetic data generation and subsequent model training, accelerating its practical deployment in real-world applications.

preprint2025arXiv

Ultrahigh-Energy Gamma-ray Emission Associated with Black Hole-Jet Systems

Black holes (BH), one of the most intriguing objects in the universe, can manifest themselves through electromagnetic radiation initiated by the accretion flow. Some stellar-mass BHs drive relativistic jets when accreting matter from their companion stars, forming microquasars. Non-thermal emission from the radio to tera-electronvolt (TeV) gamma-ray band has been observed from microquasars, indicating the acceleration of relativistic particles. Here we report detection of four microquasars (SS 433, V4641 Sgr, GRS 1915+105, MAXI J1820+070) of spectrum extending to the ultrahigh-energy (UHE; photon energy $E>100$ TeV) band and one microquasar (Cygnus X-1) of spectrum approaching 100 TeV, using the Large High Altitude Air Shower Observatory (LHAASO). Notably, the total emission associated with SS 433 cannot be interpreted with a single leptonic component. In the UHE band, its emission is in spatial coincidence with a giant atomic cloud, which is consistent with a hadronic origin. An elongated source is discovered from V4641 Sgr with the spectrum continuing up to 800 TeV. The detection of UHE gamma rays demonstrates that accreting BHs and their environments can operate as extremely efficient accelerators of particles out of 1 peta-electronvolt (PeV), suggesting microquasars to be important contributors to Galactic cosmic rays especially around the `knee' region.

preprint2024arXiv

Fast KV-Switching and Dual-Layer Flat-Panel Detector Enabled Cone-Beam CT Joint Spectral Imaging

Purpose: Fast kV-switching (FKS) and dual-layer flat-panel detector (DL-FPD) technologies have been actively studied as promising dual-energy solutions for FPD-based cone-beam computed tomography (CBCT). However, CBCT spectral imaging is known to face challenges in obtaining accurate and robust material discrimination performance due to the limited energy separation. To further improve CBCT spectral imaging capability, this work aims to promote a source-detector joint spectral imaging solution which takes advantages of both FKS and DL-FPD, and to conduct a feasibility study on the first tabletop CBCT system with the joint spectral imaging capability developed. Methods: In this work, the first FKS and DL-FPD jointly enabled multi-energy tabletop CBCT system has been developed in our laboratory. To evaluate its spectral imaging performance, a set of physics experiments are conducted, where the multi-energy and head phantoms are scanned using the 80/105/130kVp switching pairs and projection data are collected using a prototype DL-FPD. To compensate for the slightly angular mismatch between the low- and high-energy projections in FKS, a dual-domain projection completion scheme is implemented. Afterwards material decomposition is carried out by using the maximum-likelihood method, followed by reconstruction of basis material and virtual monochromatic images. Results: The physics experiments confirmed the feasibility and superiority of the joint spectral imaging, whose CNR of the multi-energy phantom were boosted by an average improvement of 21.9%, 20.4% for water and 32.8%, 62.8% for iodine when compared with that of the FKS and DL-FPD in fan-beam and cone-beam experiments, respectively. Conclusions: A feasibility study of the joint spectral imaging for CBCT by utilizing both the FKS and DL-FPD was conducted, with the first tabletop CBCT system having such a capability being developed.

preprint2023arXiv

Hierarchical Reinforcement Learning Based Traffic Steering in Multi-RAT 5G Deployments

In 5G non-standalone mode, an intelligent traffic steering mechanism can vastly aid in ensuring smooth user experience by selecting the best radio access technology (RAT) from a multi-RAT environment for a specific traffic flow. In this paper, we propose a novel load-aware traffic steering algorithm based on hierarchical reinforcement learning (HRL) while satisfying diverse QoS requirements of different traffic types. HRL can significantly increase system performance using a bi-level architecture having a meta-controller and a controller. In our proposed method, the meta-controller provides an appropriate threshold for load balancing, while the controller performs traffic admission to an appropriate RAT in the lower level. Simulation results show that HRL outperforms a Deep Q-Learning (DQN) and a threshold-based heuristic baseline with 8.49%, 12.52% higher average system throughput and 27.74%, 39.13% lower network delay, respectively.

preprint2023arXiv

Hierarchical Reinforcement Learning for RIS-Assisted Energy-Efficient RAN

Reconfigurable intelligent surface (RIS) is emerging as a promising technology to boost the energy efficiency (EE) of 5G beyond and 6G networks. Inspired by this potential, in this paper, we investigate the RIS-assisted energy-efficient radio access networks (RAN). In particular, we combine RIS with sleep control techniques, and develop a hierarchical reinforcement learning (HRL) algorithm for network management. In HRL, the meta-controller decides the on/off status of the small base stations (SBSs) in heterogeneous networks, while the sub-controller can change the transmission power levels of SBSs to save energy. The simulations show that the RIS-assisted sleep control can achieve significantly lower energy consumption, higher throughput, and more than doubled energy efficiency than no-RIS conditions.

preprint2023arXiv

Seismic Wave Scattering and Dissipation in Fractured Shales

Seismic attenuation in granular porous media is of paramount importance in rock physics and seismology. Unlike sandstones, shales are mixtures of sand grains and clays with extremely low porosity and permeability. Swelling of clays upon wetting induce micro-cracks at grain-clay interfaces and results in the strong elastic wave scattering. Such scattering prevents adequate measurements of the absorption from ballistic wave attenuations. Here we infer this intrinsic attenuation from multiply scattered waves as in seismology and ultrasonics. We find that increasing confining pressure reduces the scattering attenuation by micro-crack closure but increases surprisingly the absorption, likely due to the viscous dissipation involved with more liquids adsorbed in clays and at grain surfaces. Also, we observe that cyclic heating and cooling causes the shrinkage of clays and the growth of microcracks as well as the nucleation of macro-fractures. This leads to a predominant chaotic reverberation in this fractured shale. Numerical simulations based on X-ray tomography of the fractured sample confirm the multiple scattering behavior and reveal the increase of a characteristic length from an initial intact to a finally fractured shale. This study helps to improve acoustic techniques for multiscale exploration of gas and oil in shales and other fractured rocks.

preprint2023arXiv

Traffic Steering for 5G Multi-RAT Deployments using Deep Reinforcement Learning

In 5G non-standalone mode, traffic steering is a critical technique to take full advantage of 5G new radio while optimizing dual connectivity of 5G and LTE networks in multiple radio access technology (RAT). An intelligent traffic steering mechanism can play an important role to maintain seamless user experience by choosing appropriate RAT (5G or LTE) dynamically for a specific user traffic flow with certain QoS requirements. In this paper, we propose a novel traffic steering mechanism based on Deep Q-learning that can automate traffic steering decisions in a dynamic environment having multiple RATs, and maintain diverse QoS requirements for different traffic classes. The proposed method is compared with two baseline algorithms: a heuristic-based algorithm and Q-learningbased traffic steering. Compared to the Q-learning and heuristic baselines, our results show that the proposed algorithm achieves better performance in terms of 6% and 10% higher average system throughput, and 23% and 33% lower network delay, respectively.

preprint2022arXiv

$\textit{latent}$-GLAT: Glancing at Latent Variables for Parallel Text Generation

Recently, parallel text generation has received widespread attention due to its success in generation efficiency. Although many advanced techniques are proposed to improve its generation quality, they still need the help of an autoregressive model for training to overcome the one-to-many multi-modal phenomenon in the dataset, limiting their applications. In this paper, we propose $\textit{latent}$-GLAT, which employs the discrete latent variables to capture word categorical information and invoke an advanced curriculum learning technique, alleviating the multi-modality problem. Experiment results show that our method outperforms strong baselines without the help of an autoregressive model, which further broadens the application scenarios of the parallel decoding paradigm.

preprint2022arXiv

Backward Euler method for stochastic differential equations with non-Lipschitz coefficients

We study the traditional backward Euler method for $m$-dimensional stochastic differential equations driven by fractional Brownian motion with Hurst parameter $H > 1/2$ whose drift coefficient satisfies the one-sided Lipschitz condition. The backward Euler scheme is proved to be of order $1$ and this rate is optimal by showing the asymptotic error distribution result. Two numerical experiments are performed to validate our claims about the optimality of the rate of convergence.

preprint2022arXiv

Congested Urban Networks Tend to Be Insensitive to Signal Settings: Implications for Learning-Based Control

This paper highlights several properties of large urban networks that can have an impact on machine learning methods applied to traffic signal control. In particular, we show that the average network flow tends to be independent of the signal control policy as density increases. This property, which so far has remained under the radar, implies that deep reinforcement learning (DRL) methods becomes ineffective when trained under congested conditions, and might explain DRL's limited success for traffic signal control. Our results apply to all possible grid networks thanks to a parametrization based on two network parameters: the ratio of the expected distance between consecutive traffic lights to the expected green time, and the turning probability at intersections. Networks with different parameters exhibit very different responses to traffic signal control. Notably, we found that no control (i.e. random policy) can be an effective control strategy for a surprisingly large family of networks. The impact of the turning probability turned out to be very significant both for baseline and for DRL policies. It also explains the loss of symmetry observed for these policies, which is not captured by existing theories that rely on corridor approximations without turns. Our findings also suggest that supervised learning methods have enormous potential as they require very little examples to produce excellent policies.

preprint2022arXiv

Consecutive Decoding for Speech-to-text Translation

Speech-to-text translation (ST), which directly translates the source language speech to the target language text, has attracted intensive attention recently. However, the combination of speech recognition and machine translation in a single model poses a heavy burden on the direct cross-modal cross-lingual mapping. To reduce the learning difficulty, we propose COnSecutive Transcription and Translation (COSTT), an integral approach for speech-to-text translation. The key idea is to generate source transcript and target translation text with a single decoder. It benefits the model training so that additional large parallel text corpus can be fully exploited to enhance the speech translation training. Our method is verified on three mainstream datasets, including Augmented LibriSpeech English-French dataset, IWSLT2018 English-German dataset, and TED English-Chinese dataset. Experiments show that our proposed COSTT outperforms or on par with the previous state-of-the-art methods on the three datasets. We have released our code at \url{https://github.com/dqqcasia/st}.

preprint2022arXiv

Contextual Representation Learning beyond Masked Language Modeling

How do masked language models (MLMs) such as BERT learn contextual representations? In this work, we analyze the learning dynamics of MLMs. We find that MLMs adopt sampled embeddings as anchors to estimate and inject contextual semantics to representations, which limits the efficiency and effectiveness of MLMs. To address these issues, we propose TACO, a simple yet effective representation learning approach to directly model global semantics. TACO extracts and aligns contextual semantics hidden in contextualized representations to encourage models to attend global semantics when generating contextualized representations. Experiments on the GLUE benchmark show that TACO achieves up to 5x speedup and up to 1.2 points average improvement over existing MLMs. The code is available at https://github.com/FUZHIYI/TACO.

preprint2022arXiv

Deep Reinforcement Learning-based Radio Resource Allocation and Beam Management under Location Uncertainty in 5G mmWave Networks

Millimeter Wave (mmWave) is an important part of 5G new radio (NR), in which highly directional beams are adapted to compensate for the substantial propagation loss based on UE locations. However, the location information may have some errors such as GPS errors. In any case, some uncertainty, and localization error is unavoidable in most settings. Applying these distorted locations for clustering will increase the error of beam management. Meanwhile, the traffic demand may change dynamically in the wireless environment. Therefore, a scheme that can handle both the uncertainty of localization and dynamic radio resource allocation is needed. In this paper, we propose a UK-means-based clustering and deep reinforcement learning-based resource allocation algorithm (UK-DRL) for radio resource allocation and beam management in 5G mmWave networks. We first apply UK-means as the clustering algorithm to mitigate the localization uncertainty, then deep reinforcement learning (DRL) is adopted to dynamically allocate radio resources. Finally, we compare the UK-DRL with K-means-based clustering and DRL-based resource allocation algorithm (K-DRL), the simulations show that our proposed UK-DRL-based method achieves 150% higher throughput and 61.5% lower delay compared with K-DRL when traffic load is 4Mbps.

preprint2022arXiv

Directed Acyclic Transformer for Non-Autoregressive Machine Translation

Non-autoregressive Transformers (NATs) significantly reduce the decoding latency by generating all tokens in parallel. However, such independent predictions prevent NATs from capturing the dependencies between the tokens for generating multiple possible translations. In this paper, we propose Directed Acyclic Transfomer (DA-Transformer), which represents the hidden states in a Directed Acyclic Graph (DAG), where each path of the DAG corresponds to a specific translation. The whole DAG simultaneously captures multiple translations and facilitates fast predictions in a non-autoregressive fashion. Experiments on the raw training data of WMT benchmark show that DA-Transformer substantially outperforms previous NATs by about 3 BLEU on average, which is the first NAT model that achieves competitive results with autoregressive Transformers without relying on knowledge distillation.

preprint2022arXiv

Duplex Sequence-to-Sequence Learning for Reversible Machine Translation

Sequence-to-sequence learning naturally has two directions. How to effectively utilize supervision signals from both directions? Existing approaches either require two separate models, or a multitask-learned model but with inferior performance. In this paper, we propose REDER (Reversible Duplex Transformer), a parameter-efficient model and apply it to machine translation. Either end of REDER can simultaneously input and output a distinct language. Thus REDER enables reversible machine translation by simply flipping the input and output ends. Experiments verify that REDER achieves the first success of reversible machine translation, which helps outperform its multitask-trained baselines by up to 1.3 BLEU.

preprint2022arXiv

Enhancing Cross-lingual Transfer by Manifold Mixup

Based on large-scale pre-trained multilingual representations, recent cross-lingual transfer methods have achieved impressive transfer performances. However, the performance of target languages still lags far behind the source language. In this paper, our analyses indicate such a performance gap is strongly associated with the cross-lingual representation discrepancy. To achieve better cross-lingual transfer performance, we propose the cross-lingual manifold mixup (X-Mixup) method, which adaptively calibrates the representation discrepancy and gives a compromised representation for target languages. Experiments on the XTREME benchmark show X-Mixup achieves 1.8% performance gains on multiple text understanding tasks, compared with strong baselines, and significantly reduces the cross-lingual representation discrepancy.

preprint2022arXiv

Federated Deep Reinforcement Learning for Resource Allocation in O-RAN Slicing

Recently, open radio access network (O-RAN) has become a promising technology to provide an open environment for network vendors and operators. Coordinating the x-applications (xAPPs) is critical to increase flexibility and guarantee high overall network performance in O-RAN. Meanwhile, federated reinforcement learning has been proposed as a promising technique to enhance the collaboration among distributed reinforcement learning agents and improve learning efficiency. In this paper, we propose a federated deep reinforcement learning algorithm to coordinate multiple independent xAPPs in O-RAN for network slicing. We design two xAPPs, namely a power control xAPP and a slice-based resource allocation xAPP, and we use a federated learning model to coordinate two xAPP agents to enhance learning efficiency and improve network performance. Compared with conventional deep reinforcement learning, our proposed algorithm can achieve 11% higher throughput for enhanced mobile broadband (eMBB) slices and 33% lower delay for ultra-reliable low-latency communication (URLLC) slices.

preprint2022arXiv

Joint Sensing and Communications for Deep Reinforcement Learning-based Beam Management in 6G

User location is a piece of critical information for network management and control. However, location uncertainty is unavoidable in certain settings leading to localization errors. In this paper, we consider the user location uncertainty in the mmWave networks, and investigate joint vision-aided sensing and communications using deep reinforcement learning-based beam management for future 6G networks. In particular, we first extract pixel characteristic-based features from satellite images to improve localization accuracy. Then we propose a UK-medoids based method for user clustering with location uncertainty, and the clustering results are consequently used for the beam management. Finally, we apply the DRL algorithm for intra-beam radio resource allocation. The simulations first show that our proposed vision-aided method can substantially reduce the localization error. The proposed UK-medoids and DRL based scheme (UKM-DRL) is compared with two other schemes: K-means based clustering and DRL based resource allocation (K-DRL) and UK-means based clustering and DRL based resource allocation (UK-DRL). The proposed method has 17.2% higher throughput and 7.7% lower delay than UK-DRL, and more than doubled throughput and 55.8% lower delay than K-DRL.

preprint2022arXiv

Learning from Peers: Deep Transfer Reinforcement Learning for Joint Radio and Cache Resource Allocation in 5G RAN Slicing

Network slicing is a critical technique for 5G communications that covers radio access network (RAN), edge, transport and core slicing.The evolving network architecture requires the orchestration of multiple network resources such as radio and cache resources. In recent years, machine learning (ML) techniques have been widely applied for network management. However, most existing works do not take advantage of the knowledge transfer capability in ML. In this paper, we propose a deep transfer reinforcement learning (DTRL) scheme for joint radio and cache resource allocation to serve 5G RAN slicing. We first define a hierarchical architecture for joint resource allocation. Then we propose two DTRL algorithms: Q-value-based deep transfer reinforcement learning (QDTRL) and action selection-based deep transfer reinforcement learning (ADTRL). In the proposed schemes, learner agents utilize expert agents' knowledge to improve their performance on current tasks. The proposed algorithms are compared with both the model-free exploration bonus deep Q-learning (EB-DQN) and the model-based priority proportional fairness and time-to-live (PPF-TTL) algorithms. Compared with EB-DQN, our proposed DTRL-based method presents 21.4% lower delay for Ultra Reliable Low Latency Communications (URLLC) slice and 22.4% higher throughput for enhanced Mobile Broad Band (eMBB) slice, while achieving significantly faster convergence than EB-DQN. Moreover, 40.8% lower URLLC delay and 59.8% higher eMBB throughput are observed with respect to PPF-TTL.

preprint2022arXiv

Manual-Guided Dialogue for Flexible Conversational Agents

How to build and use dialogue data efficiently, and how to deploy models in different domains at scale can be two critical issues in building a task-oriented dialogue system. In this paper, we propose a novel manual-guided dialogue scheme to alleviate these problems, where the agent learns the tasks from both dialogue and manuals. The manual is an unstructured textual document that guides the agent in interacting with users and the database during the conversation. Our proposed scheme reduces the dependence of dialogue models on fine-grained domain ontology, and makes them more flexible to adapt to various domains. We then contribute a fully-annotated multi-domain dataset MagDial to support our scheme. It introduces three dialogue modeling subtasks: instruction matching, argument filling, and response generation. Modeling these subtasks is consistent with the human agent's behavior patterns. Experiments demonstrate that the manual-guided dialogue scheme improves data efficiency and domain scalability in building dialogue systems. The dataset and benchmark will be publicly available for promoting future research.

preprint2022arXiv

MTG: A Benchmark Suite for Multilingual Text Generation

We introduce MTG, a new benchmark suite for training and evaluating multilingual text generation. It is the first-proposed multilingual multiway text generation dataset with the largest human-annotated data (400k). It includes four generation tasks (story generation, question generation, title generation and text summarization) across five languages (English, German, French, Spanish and Chinese). The multiway setup enables testing knowledge transfer capabilities for a model across languages and tasks. Using MTG, we train and analyze several popular multilingual generation models from different aspects. Our benchmark suite fosters model performance enhancement with more human-annotated parallel data. It provides comprehensive evaluations with diverse generation scenarios. Code and data are available at \url{https://github.com/zide05/MTG}.

preprint2022arXiv

On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark

Dialogue safety problems severely limit the real-world deployment of neural conversational models and have attracted great research interests recently. However, dialogue safety problems remain under-defined and the corresponding dataset is scarce. We propose a taxonomy for dialogue safety specifically designed to capture unsafe behaviors in human-bot dialogue settings, with focuses on context-sensitive unsafety, which is under-explored in prior works. To spur research in this direction, we compile DiaSafety, a dataset with rich context-sensitive unsafe examples. Experiments show that existing safety guarding tools fail severely on our dataset. As a remedy, we train a dialogue safety classifier to provide a strong baseline for context-sensitive dialogue unsafety detection. With our classifier, we perform safety evaluations on popular conversational models and show that existing dialogue systems still exhibit concerning context-sensitive safety problems.

preprint2022arXiv

One-Class Model for Fabric Defect Detection

An automated and accurate fabric defect inspection system is in high demand as a replacement for slow, inconsistent, error-prone, and expensive human operators in the textile industry. Previous efforts focused on certain types of fabrics or defects, which is not an ideal solution. In this paper, we propose a novel one-class model that is capable of detecting various defects on different fabric types. Our model takes advantage of a well-designed Gabor filter bank to analyze fabric texture. We then leverage an advanced deep learning algorithm, autoencoder, to learn general feature representations from the outputs of the Gabor filter bank. Lastly, we develop a nearest neighbor density estimator to locate potential defects and draw them on the fabric images. We demonstrate the effectiveness and robustness of the proposed model by testing it on various types of fabrics such as plain, patterned, and rotated fabrics. Our model also achieves a true positive rate (a.k.a recall) value of 0.895 with no false alarms on our dataset based upon the Standard Fabric Defect Glossary.

preprint2022arXiv

Rethinking Document-level Neural Machine Translation

This paper does not aim at introducing a novel model for document-level neural machine translation. Instead, we head back to the original Transformer model and hope to answer the following question: Is the capacity of current models strong enough for document-level translation? Interestingly, we observe that the original Transformer with appropriate training techniques can achieve strong results for document translation, even with a length of 2000 words. We evaluate this model and several recent approaches on nine document-level datasets and two sentence-level datasets across six languages. Experiments show that document-level Transformer models outperforms sentence-level ones and many previous methods in a comprehensive set of metrics, including BLEU, four lexical indices, three newly proposed assistant linguistic indicators, and human evaluation.

preprint2022arXiv

Team Learning-Based Resource Allocation for Open Radio Access Network (O-RAN)

Recently, the concept of open radio access network (O-RAN) has been proposed, which aims to adopt intelligence and openness in the next generation radio access networks (RAN). It provides standardized interfaces and the ability to host network applications from third-party vendors by x-applications (xAPPs), which enables higher flexibility for network management. However, this may lead to conflicts in network function implementations, especially when these functions are implemented by different vendors. In this paper, we aim to mitigate the conflicts between xAPPs for near-real-time (near-RT) radio intelligent controller (RIC) of O-RAN. In particular, we propose a team learning algorithm to enhance the performance of the network by increasing cooperation between xAPPs. We compare the team learning approach with independent deep Q-learning where network functions individually optimize resources. Our simulations show that team learning has better network performance under various user mobility and traffic loads. With 6 Mbps traffic load and 20 m/s user movement speed, team learning achieves 8% higher throughput and 64.8% lower PDR.

preprint2022arXiv

Towards Making the Most of BERT in Neural Machine Translation

GPT-2 and BERT demonstrate the effectiveness of using pre-trained language models (LMs) on various natural language processing tasks. However, LM fine-tuning often suffers from catastrophic forgetting when applied to resource-rich tasks. In this work, we introduce a concerted training framework (CTNMT) that is the key to integrate the pre-trained LMs to neural machine translation (NMT). Our proposed CTNMT consists of three techniques: a) asymptotic distillation to ensure that the NMT model can retain the previous pre-trained knowledge; b) a dynamic switching gate to avoid catastrophic forgetting of pre-trained knowledge; and c) a strategy to adjust the learning paces according to a scheduled policy. Our experiments in machine translation show CTNMT gains of up to 3 BLEU score on the WMT14 English-German language pair which even surpasses the previous state-of-the-art pre-training aided NMT by 1.4 BLEU score. While for the large WMT14 English-French task with 40 millions of sentence-pairs, our base model still significantly improves upon the state-of-the-art Transformer big model by more than 1 BLEU score. The code and model can be downloaded from https://github.com/bytedance/neurst/ tree/master/examples/ctnmt.

preprint2022arXiv

Variational Autoencoder Generative Adversarial Network for Synthetic Data Generation in Smart Home

Data is the fuel of data science and machine learning techniques for smart grid applications, similar to many other fields. However, the availability of data can be an issue due to privacy concerns, data size, data quality, and so on. To this end, in this paper, we propose a Variational AutoEncoder Generative Adversarial Network (VAE-GAN) as a smart grid data generative model which is capable of learning various types of data distributions and generating plausible samples from the same distribution without performing any prior analysis on the data before the training phase.We compared the Kullback-Leibler (KL) divergence, maximum mean discrepancy (MMD), and Wasserstein distance between the synthetic data (electrical load and PV production) distribution generated by the proposed model, vanilla GAN network, and the real data distribution, to evaluate the performance of our model. Furthermore, we used five key statistical parameters to describe the smart grid data distribution and compared them between synthetic data generated by both models and real data. Experiments indicate that the proposed synthetic data generative model outperforms the vanilla GAN network. The distribution of VAE-GAN synthetic data is the most comparable to that of real data.

preprint2021arXiv

Correlated Deep Q-learning based Microgrid Energy Management

Microgrid (MG) energy management is an important part of MG operation. Various entities are generally involved in the energy management of an MG, e.g., energy storage system (ESS), renewable energy resources (RER) and the load of users, and it is crucial to coordinate these entities. Considering the significant potential of machine learning techniques, this paper proposes a correlated deep Q-learning (CDQN) based technique for the MG energy management. Each electrical entity is modeled as an agent which has a neural network to predict its own Q-values, after which the correlated Q-equilibrium is used to coordinate the operation among agents. In this paper, the Long Short Term Memory networks (LSTM) based deep Q-learning algorithm is introduced and the correlated equilibrium is proposed to coordinate agents. The simulation result shows 40.9% and 9.62% higher profit for ESS agent and photovoltaic (PV) agent, respectively.

preprint2021arXiv

Decentralized Microgrid Energy Management: A Multi-agent Correlated Q-learning Approach

Microgrids (MG) are anticipated to be important players in the future smart grid. For proper operation of MGs an Energy Management System (EMS) is essential. The EMS of an MG could be rather complicated when renewable energy resources (RER), energy storage system (ESS) and demand side management (DSM) need to be orchestrated. Furthermore, these systems may belong to different entities and competition may exist between them. Nash equilibrium is most commonly used for coordination of such entities however the convergence and existence of Nash equilibrium can not always be guaranteed. To this end, we use the correlated equilibrium to coordinate agents, whose convergence can be guaranteed. In this paper, we build an energy trading model based on mid-market rate, and propose a correlated Q-learning (CEQ) algorithm to maximize the revenue of each agent. Our results show that CEQ is able to balance the revenue of agents without harming total benefit. In addition, compared with Q-learning without correlation, CEQ could save 19.3% cost for the DSM agent and 44.2% more benefits for the ESS agent.

preprint2021arXiv

Pre-training Multilingual Neural Machine Translation by Leveraging Alignment Information

We investigate the following question for machine translation (MT): can we develop a single universal MT model to serve as the common seed and obtain derivative and improved models on arbitrary language pairs? We propose mRASP, an approach to pre-train a universal multilingual neural machine translation model. Our key idea in mRASP is its novel technique of random aligned substitution, which brings words and phrases with similar meanings across multiple languages closer in the representation space. We pre-train a mRASP model on 32 language pairs jointly with only public datasets. The model is then fine-tuned on downstream language pairs to obtain specialized MT models. We carry out extensive experiments on 42 translation directions across a diverse settings, including low, medium, rich resource, and as well as transferring to exotic language pairs. Experimental results demonstrate that mRASP achieves significant performance improvement compared to directly training on those target pairs. It is the first time to verify that multiple low-resource language pairs can be utilized to improve rich resource MT. Surprisingly, mRASP is even able to improve the translation quality on exotic languages that never occur in the pre-training corpus. Code, data, and pre-trained models are available at https://github.com/linzehui/mRASP.

preprint2021arXiv

Triangular Bidword Generation for Sponsored Search Auction

Sponsored search auction is a crucial component of modern search engines. It requires a set of candidate bidwords that advertisers can place bids on. Existing methods generate bidwords from search queries or advertisement content. However, they suffer from the data noise in <query, bidword> and <advertisement, bidword> pairs. In this paper, we propose a triangular bidword generation model (TRIDENT), which takes the high-quality data of paired <query, advertisement> as a supervision signal to indirectly guide the bidword generation process. Our proposed model is simple yet effective: by using bidword as the bridge between search query and advertisement, the generation of search query, advertisement and bidword can be jointly learned in the triangular training framework. This alleviates the problem that the training data of bidword may be noisy. Experimental results, including automatic and human evaluations, show that our proposed TRIDENT can generate relevant and diverse bidwords for both search queries and advertisements. Our evaluation on online real data validates the effectiveness of the TRIDENT&#39;s generated bidwords for product search.

preprint2020arXiv

ACMo: Angle-Calibrated Moment Methods for Stochastic Optimization

Due to its simplicity and outstanding ability to generalize, stochastic gradient descent (SGD) is still the most widely used optimization method despite its slow convergence. Meanwhile, adaptive methods have attracted rising attention of optimization and machine learning communities, both for the leverage of life-long information and for the profound and fundamental mathematical theory. Taking the best of both worlds is the most exciting and challenging question in the field of optimization for machine learning. Along this line, we revisited existing adaptive gradient methods from a novel perspective, refreshing understanding of second moments. Our new perspective empowers us to attach the properties of second moments to the first moment iteration, and to propose a novel first moment optimizer, \emph{Angle-Calibrated Moment method} (\method). Our theoretical results show that \method is able to achieve the same convergence rate as mainstream adaptive methods. Furthermore, extensive experiments on CV and NLP tasks demonstrate that \method has a comparable convergence to SOTA Adam-type optimizers, and gains a better generalization performance in most cases.

preprint2020arXiv

Adaptive Gradient Methods Can Be Provably Faster than SGD after Finite Epochs

Adaptive gradient methods have attracted much attention of machine learning communities due to the high efficiency. However their acceleration effect in practice, especially in neural network training, is hard to analyze, theoretically. The huge gap between theoretical convergence results and practical performances prevents further understanding of existing optimizers and the development of more advanced optimization methods. In this paper, we provide adaptive gradient methods a novel analysis with an additional mild assumption, and revise AdaGrad to \radagrad for matching a better provable convergence rate. To find an $ε$-approximate first-order stationary point in non-convex objectives, we prove random shuffling \radagrad achieves a $\tilde{O}(T^{-1/2})$ convergence rate, which is significantly improved by factors $\tilde{O}(T^{-1/4})$ and $\tilde{O}(T^{-1/6})$ compared with existing adaptive gradient methods and random shuffling SGD, respectively. To the best of our knowledge, it is the first time to demonstrate that adaptive gradient methods can deterministically be faster than SGD after finite epochs. Furthermore, we conduct comprehensive experiments to validate the additional mild assumption and the acceleration effect benefited from second moments and random shuffling.

preprint2020arXiv

An Analysis of Scatter Characteristics in X-ray CT Spectral Correction

X-ray scatter remains a major physics challenge in volumetric computed tomography (CT), whose physical and statistical behaviors have been commonly leveraged in order to eliminate its impact on CT image quality. In this work, we conduct an in-depth derivation of how the scatter distribution and scatter to primary ratio (SPR) will change during the spectral correction, leading to an interesting finding on the property of scatter: when applying the spectral correction before scatter is removed, the impact of SPR on a CT projection will be scaled by the first derivative of the mapping function; while the scatter distribution in the transmission domain will be scaled by the product of the first derivative of the mapping function and a natural exponential of the projection difference before and after the mapping. Such a characterization of scatter&#39;s behavior provides an analytic approach of compensating for the SPR as well as approximating the change of scatter distribution after spectral correction, even though both of them might be significantly distorted as the linearization mapping function in spectral correction could vary a lot from one detector pixel to another. We conduct an evaluation of SPR compensations on a Catphan phantom and an anthropomorphic chest phantom to validate the characteristics of scatter. In addition, this scatter property is also directly adopted into CT imaging using a spectral modulator with flying focal spot technology (SMFFS) as an example to demonstrate its potential in practical applications.

preprint2020arXiv

An Empirical Evaluation of GDPR Compliance Violations in Android mHealth Apps

The purpose of the General Data Protection Regulation (GDPR) is to provide improved privacy protection. If an app controls personal data from users, it needs to be compliant with GDPR. However, GDPR lists general rules rather than exact step-by-step guidelines about how to develop an app that fulfills the requirements. Therefore, there may exist GDPR compliance violations in existing apps, which would pose severe privacy threats to app users. In this paper, we take mobile health applications (mHealth apps) as a peephole to examine the status quo of GDPR compliance in Android apps. We first propose an automated system, named \mytool, to bridge the semantic gap between the general rules of GDPR and the app implementations by identifying the data practices declared in the app privacy policy and the data relevant behaviors in the app code. Then, based on \mytool, we detect three kinds of GDPR compliance violations, including the incompleteness of privacy policy, the inconsistency of data collections, and the insecurity of data transmission. We perform an empirical evaluation of 796 mHealth apps. The results reveal that 189 (23.7\%) of them do not provide complete privacy policies. Moreover, 59 apps collect sensitive data through different measures, but 46 (77.9\%) of them contain at least one inconsistent collection behavior. Even worse, among the 59 apps, only 8 apps try to ensure the transmission security of collected data. However, all of them contain at least one encryption or SSL misuse. Our work exposes severe privacy issues to raise awareness of privacy protection for app users and developers.

preprint2020arXiv

COVID-19 causes record decline in global CO2 emissions

The considerable cessation of human activities during the COVID-19 pandemic has affected global energy use and CO2 emissions. Here we show the unprecedented decrease in global fossil CO2 emissions from January to April 2020 was of 7.8% (938 Mt CO2 with a +6.8% of 2-σ uncertainty) when compared with the period last year. In addition other emerging estimates of COVID impacts based on monthly energy supply or estimated parameters, this study contributes to another step that constructed the near-real-time daily CO2 emission inventories based on activity from power generation (for 29 countries), industry (for 73 countries), road transportation (for 406 cities), aviation and maritime transportation and commercial and residential sectors emissions (for 206 countries). The estimates distinguished the decline of CO2 due to COVID-19 from the daily, weekly and seasonal variations as well as the holiday events. The COVID-related decreases in CO2 emissions in road transportation (340.4 Mt CO2, -15.5%), power (292.5 Mt CO2, -6.4% compared to 2019), industry (136.2 Mt CO2, -4.4%), aviation (92.8 Mt CO2, -28.9%), residential (43.4 Mt CO2, -2.7%), and international shipping (35.9Mt CO2, -15%). Regionally, decreases in China were the largest and earliest (234.5 Mt CO2,-6.9%), followed by Europe (EU-27 & UK) (138.3 Mt CO2, -12.0%) and the U.S. (162.4 Mt CO2, -9.5%). The declines of CO2 are consistent with regional nitrogen oxides concentrations observed by satellites and ground-based networks, but the calculated signal of emissions decreases (about 1Gt CO2) will have little impacts (less than 0.13ppm by April 30, 2020) on the overserved global CO2 concertation. However, with observed fast CO2 recovery in China and partial re-opening globally, our findings suggest the longer-term effects on CO2 emissions are unknown and should be carefully monitored using multiple measures.

preprint2020arXiv

Dispersed Exponential Family Mixture VAEs for Interpretable Text Generation

Deep generative models are commonly used for generating images and text. Interpretability of these models is one important pursuit, other than the generation quality. Variational auto-encoder (VAE) with Gaussian distribution as prior has been successfully applied in text generation, but it is hard to interpret the meaning of the latent variable. To enhance the controllability and interpretability, one can replace the Gaussian prior with a mixture of Gaussian distributions (GM-VAE), whose mixture components could be related to hidden semantic aspects of data. In this paper, we generalize the practice and introduce DEM-VAE, a class of models for text generation using VAEs with a mixture distribution of exponential family. Unfortunately, a standard variational training algorithm fails due to the mode-collapse problem. We theoretically identify the root cause of the problem and propose an effective algorithm to train DEM-VAE. Our method penalizes the training with an extra dispersion term to induce a well-structured latent space. Experimental results show that our approach does obtain a meaningful space, and it outperforms strong baselines in text generation benchmarks. The code is available at https://github.com/wenxianxian/demvae.

preprint2020arXiv

Do You Have the Right Scissors? Tailoring Pre-trained Language Models via Monte-Carlo Methods

It has been a common approach to pre-train a language model on a large corpus and fine-tune it on task-specific data. In practice, we observe that fine-tuning a pre-trained model on a small dataset may lead to over- and/or under-estimation problem. In this paper, we propose MC-Tailor, a novel method to alleviate the above issue in text generation tasks by truncating and transferring the probability mass from over-estimated regions to under-estimated ones. Experiments on a variety of text generation datasets show that MC-Tailor consistently and significantly outperforms the fine-tuning approach. Our code is available at this url.

preprint2020arXiv

Feature Location Benchmark for Decomposing and Reusing Android Apps

Software reuse enables developers to reuse architecture, programs and other software artifacts. Realizing a systematical reuse in software brings a large amount of benefits for stakeholders, including lower maintenance efforts, lower development costs, and time to market. Unfortunately, currently implementing a framework for large-scale software reuse in Android apps is still a huge problem, regarding the complexity of the task and lacking of practical technical support from either tools or domain experts. Therefore, proposing a feature location benchmark for apps will help developers either optimize their feature location techniques or reuse the assets created in the benchmark for reusing. In this paper, we release a feature location benchmark, which can be used for those developers, who intend to compose software product lines (SPL) and release reuse in apps. The benchmark not only contributes to the research community for reuse research, but also helps participants in industry for optimizing their architecture and enhancing modularity. In addition, we also develop an Android Studio plugin named caIDE for developers to view and operate on the benchmark.

preprint2020arXiv

Generating Fluent Adversarial Examples for Natural Languages

Efficiently building an adversarial attacker for natural language processing (NLP) tasks is a real challenge. Firstly, as the sentence space is discrete, it is difficult to make small perturbations along the direction of gradients. Secondly, the fluency of the generated examples cannot be guaranteed. In this paper, we propose MHA, which addresses both problems by performing Metropolis-Hastings sampling, whose proposal is designed with the guidance of gradients. Experiments on IMDB and SNLI show that our proposed MHA outperforms the baseline model on attacking capability. Adversarial training with MAH also leads to better robustness and performance.

preprint2020arXiv

Improving Maximum Likelihood Training for Text Generation with Density Ratio Estimation

Auto-regressive sequence generative models trained by Maximum Likelihood Estimation suffer the exposure bias problem in practical finite sample scenarios. The crux is that the number of training samples for Maximum Likelihood Estimation is usually limited and the input data distributions are different at training and inference stages. Many method shave been proposed to solve the above problem (Yu et al., 2017; Lu et al., 2018), which relies on sampling from the non-stationary model distribution and suffers from high variance or biased estimations. In this paper, we proposeψ-MLE, a new training scheme for auto-regressive sequence generative models, which is effective and stable when operating at large sample space encountered in text generation. We derive our algorithm from a new perspective of self-augmentation and introduce bias correction with density ratio estimation. Extensive experimental results on synthetic data and real-world text generation tasks demonstrate that our method stably outperforms Maximum Likelihood Estimation and other state-of-the-art sequence generative models in terms of both quality and diversity.

preprint2020arXiv

Infomax Neural Joint Source-Channel Coding via Adversarial Bit Flip

Although Shannon theory states that it is asymptotically optimal to separate the source and channel coding as two independent processes, in many practical communication scenarios this decomposition is limited by the finite bit-length and computational power for decoding. Recently, neural joint source-channel coding (NECST) is proposed to sidestep this problem. While it leverages the advancements of amortized inference and deep learning to improve the encoding and decoding process, it still cannot always achieve compelling results in terms of compression and error correction performance due to the limited robustness of its learned coding networks. In this paper, motivated by the inherent connections between neural joint source-channel coding and discrete representation learning, we propose a novel regularization method called Infomax Adversarial-Bit-Flip (IABF) to improve the stability and robustness of the neural joint source-channel coding scheme. More specifically, on the encoder side, we propose to explicitly maximize the mutual information between the codeword and data; while on the decoder side, the amortized reconstruction is regularized within an adversarial framework. Extensive experiments conducted on various real-world datasets evidence that our IABF can achieve state-of-the-art performances on both compression and error correction benchmarks and outperform the baselines by a significant margin.

preprint2020arXiv

KdConv: A Chinese Multi-domain Dialogue Dataset Towards Multi-turn Knowledge-driven Conversation

The research of knowledge-driven conversational systems is largely limited due to the lack of dialog data which consist of multi-turn conversations on multiple topics and with knowledge annotations. In this paper, we propose a Chinese multi-domain knowledge-driven conversation dataset, KdConv, which grounds the topics in multi-turn conversations to knowledge graphs. Our corpus contains 4.5K conversations from three domains (film, music, and travel), and 86K utterances with an average turn number of 19.0. These conversations contain in-depth discussions on related topics and natural transition between multiple topics. To facilitate the following research on this corpus, we provide several benchmark models. Comparative results show that the models can be enhanced by introducing background knowledge, yet there is still a large space for leveraging knowledge to model multi-turn conversations for further research. Results also show that there are obvious performance differences between different domains, indicating that it is worth to further explore transfer learning and domain adaptation. The corpus and benchmark models are publicly available.

preprint2020arXiv

LUXOR: An FPGA Logic Cell Architecture for Efficient Compressor Tree Implementations

We propose two tiers of modifications to FPGA logic cell architecture to deliver a variety of performance and utilization benefits with only minor area overheads. In the irst tier, we augment existing commercial logic cell datapaths with a 6-input XOR gate in order to improve the expressiveness of each element, while maintaining backward compatibility. This new architecture is vendor-agnostic, and we refer to it as LUXOR. We also consider a secondary tier of vendor-speciic modifications to both Xilinx and Intel FPGAs, which we refer to as X-LUXOR+ and I-LUXOR+ respectively. We demonstrate that compressor tree synthesis using generalized parallel counters (GPCs) is further improved with the proposed modifications. Using both the Intel adaptive logic module and the Xilinx slice at the 65nm technology node for a comparative study, it is shown that the silicon area overhead is less than 0.5% for LUXOR and 5-6% for LUXOR+, while the delay increments are 1-6% and 3-9% respectively. We demonstrate that LUXOR can deliver an average reduction of 13-19% in logic utilization on micro-benchmarks from a variety of domains.BNN benchmarks benefit the most with an average reduction of 37-47% in logic utilization, which is due to the highly-efficient mapping of the XnorPopcount operation on our proposed LUXOR+ logic cells.

preprint2020arXiv

MajorityNets: BNNs Utilising Approximate Popcount for Improved Efficiency

Binarized neural networks (BNNs) have shown exciting potential for utilising neural networks in embedded implementations where area, energy and latency constraints are paramount. With BNNs, multiply-accumulate (MAC) operations can be simplified to XnorPopcount operations, leading to massive reductions in both memory and computation resources. Furthermore, multiple efficient implementations of BNNs have been reported on field-programmable gate array (FPGA) implementations. This paper proposes a smaller, faster, more energy-efficient approximate replacement for the XnorPopcountoperation, called XNorMaj, inspired by state-of-the-art FPGAlook-up table schemes which benefit FPGA implementations. Weshow that XNorMaj is up to 2x more resource-efficient than the XnorPopcount operation. While the XNorMaj operation has a minor detrimental impact on accuracy, the resource savings enable us to use larger networks to recover the loss.

preprint2020arXiv

Network On Network for Tabular Data Classification in Real-world Applications

Tabular data is the most common data format adopted by our customers ranging from retail, finance to E-commerce, and tabular data classification plays an essential role to their businesses. In this paper, we present Network On Network (NON), a practical tabular data classification model based on deep neural network to provide accurate predictions. Various deep methods have been proposed and promising progress has been made. However, most of them use operations like neural network and factorization machines to fuse the embeddings of different features directly, and linearly combine the outputs of those operations to get the final prediction. As a result, the intra-field information and the non-linear interactions between those operations (e.g. neural network and factorization machines) are ignored. Intra-field information is the information that features inside each field belong to the same field. NON is proposed to take full advantage of intra-field information and non-linear interactions. It consists of three components: field-wise network at the bottom to capture the intra-field information, across field network in the middle to choose suitable operations data-drivenly, and operation fusion network on the top to fuse outputs of the chosen operations deeply. Extensive experiments on six real-world datasets demonstrate NON can outperform the state-of-the-art models significantly. Furthermore, both qualitative and quantitative study of the features in the embedding space show NON can capture intra-field information effectively.

preprint2020arXiv

Screening piezoelectricity in determination of flexoelectric coefficient at nanoscale

Piezoelectricity usually accompanies with flexoelectricity in polar materials which is the linear response of polarization to a strain gradient. Therefore, it is hard to eliminate piezoelectric effect in determination of pure flexoelectric response. In this work, we propose an analytical method to characterize the flexoelectric coefficient quantitatively at nanoscale in piezoelectric materials by screening piezoelectricity. Our results show that the flexoelectricity reduces the nanopillar stiffness while the piezoelectricity enhances it. With careful design of the shape of the nanopillars and measuring their stiffness difference, the flexoelectric coefficient can be obtained with the piezoelectric contribution eliminated completely. This approach avoids the measurement of electrical properties with dynamic load, which helps to reduce the challenge of flexoelectric measurement at nanoscale. Our work will be beneficial to quantitative characterization of flexoelectric properties and design of flexoelectric devices at nanoscale.

preprint2020arXiv

SharinGAN: Combining Synthetic and Real Data for Unsupervised Geometry Estimation

We propose a novel method for combining synthetic and real images when training networks to determine geometric information from a single image. We suggest a method for mapping both image types into a single, shared domain. This is connected to a primary network for end-to-end training. Ideally, this results in images from two domains that present shared information to the primary network. Our experiments demonstrate significant improvements over the state-of-the-art in two important domains, surface normal estimation of human faces and monocular depth estimation for outdoor scenes, both in an unsupervised setting.

preprint2020arXiv

Spatial-Temporal Multi-Cue Network for Continuous Sign Language Recognition

Despite the recent success of deep learning in continuous sign language recognition (CSLR), deep models typically focus on the most discriminative features, ignoring other potentially non-trivial and informative contents. Such characteristic heavily constrains their capability to learn implicit visual grammars behind the collaboration of different visual cues (i,e., hand shape, facial expression and body posture). By injecting multi-cue learning into neural network design, we propose a spatial-temporal multi-cue (STMC) network to solve the vision-based sequence learning problem. Our STMC network consists of a spatial multi-cue (SMC) module and a temporal multi-cue (TMC) module. The SMC module is dedicated to spatial representation and explicitly decomposes visual features of different cues with the aid of a self-contained pose estimation branch. The TMC module models temporal correlations along two parallel paths, i.e., intra-cue and inter-cue, which aims to preserve the uniqueness and explore the collaboration of multiple cues. Finally, we design a joint optimization strategy to achieve the end-to-end sequence learning of the STMC network. To validate the effectiveness, we perform experiments on three large-scale CSLR benchmarks: PHOENIX-2014, CSL and PHOENIX-2014-T. Experimental results demonstrate that the proposed method achieves new state-of-the-art performance on all three benchmarks.

preprint2020arXiv

The TianQin project: current progress on science and technology

TianQin is a planned space-based gravitational wave (GW) observatory consisting of three earth orbiting satellites with an orbital radius of about $10^5~{\rm km}$. The satellites will form a equilateral triangle constellation the plane of which is nearly perpendicular to the ecliptic plane. TianQin aims to detect GWs between $10^{-4}~{\rm Hz}$ and $1~{\rm Hz}$ that can be generated by a wide variety of important astrophysical and cosmological sources, including the inspiral of Galactic ultra-compact binaries, the inspiral of stellar-mass black hole binaries, extreme mass ratio inspirals, the merger of massive black hole binaries, and possibly the energetic processes in the very early universe or exotic sources such as cosmic strings. In order to start science operations around 2035, a roadmap called the 0123 plan is being used to bring the key technologies of TianQin to maturity, supported by the construction of a series of research facilities on the ground. Two major projects of the 0123 plan are being carried out. In this process, the team has created a new generation $17~{\rm cm}$ single-body hollow corner-cube retro-reflector which has been launched with the QueQiao satellite on 21 May 2018; a new laser ranging station equipped with a $1.2~{\rm m}$ telescope has been constructed and the station has successfully ranged to all the five retro-reflectors on the Moon; and the TianQin-1 experimental satellite has been launched on 20 December 2019 and the first round result shows that the satellite has exceeded all of its mission requirements.

preprint2020arXiv

Variational Template Machine for Data-to-Text Generation

How to generate descriptions from structured data organized in tables? Existing approaches using neural encoder-decoder models often suffer from lacking diversity. We claim that an open set of templates is crucial for enriching the phrase constructions and realizing varied generations. Learning such templates is prohibitive since it often requires a large paired <table, description> corpus, which is seldom available. This paper explores the problem of automatically learning reusable &#34;templates&#34; from paired and non-paired data. We propose the variational template machine (VTM), a novel method to generate text descriptions from data tables. Our contributions include: a) we carefully devise a specific model architecture and losses to explicitly disentangle text template and semantic content information, in the latent spaces, and b)we utilize both small parallel data and large raw text without aligned tables to enrich the template learning. Experiments on datasets from a variety of different domains show that VTM is able to generate more diversely while keeping a good fluency and quality.

preprint2020arXiv

Xiaomingbot: A Multilingual Robot News Reporter

This paper proposes the building of Xiaomingbot, an intelligent, multilingual and multimodal software robot equipped with four integral capabilities: news generation, news translation, news reading and avatar animation. Its system summarizes Chinese news that it automatically generates from data tables. Next, it translates the summary or the full article into multiple languages, and reads the multilingual rendition through synthesized speech. Notably, Xiaomingbot utilizes a voice cloning technology to synthesize the speech trained from a real person&#39;s voice data in one input language. The proposed system enjoys several merits: it has an animated avatar, and is able to generate and read multilingual news. Since it was put into practice, Xiaomingbot has written over 600,000 articles, and gained over 150,000 followers on social media platforms.