Researcher profile

Xun Gong

Xun Gong contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2022arXiv

An Online Data-Driven Method for Microgrid Secondary Voltage and Frequency Control with Ensemble Koopman Modeling

Low inertia, nonlinearity and a high level of uncertainty (varying topologies and operating conditions) pose challenges to microgrid (MG) systemwide operation. This paper proposes an online adaptive Koopman operator optimal control (AKOOC) method for MG secondary voltage and frequency control. Unlike typical data-driven methods that are data-hungry and lack guaranteed stability, the proposed AKOOC requires no warm-up training yet with guaranteed bounded-input-bounded-output (BIBO) stability and even asymptotical stability under some mild conditions. The proposed AKOOC is developed based on an ensemble Koopman state space modeling with full basis functions that combines both linear and nonlinear bases without the need of event detection or switching. An iterative learning method is also developed to exploit model parameters, ensuring the effectiveness and the adaptiveness of the designed control. Simulation studies in the 4-bus (with detailed inner-loop control) MG system and the 34-bus MG system showed improved modeling accuracy and control, verifying the effectiveness of the proposed method subject to various changes of operating conditions even with time delay, measurement noise, and missing measurements.

preprint2022arXiv

Knowledge Transfer and Distillation from Autoregressive to Non-Autoregressive Speech Recognition

Modern non-autoregressive~(NAR) speech recognition systems aim to accelerate the inference speed; however, they suffer from performance degradation compared with autoregressive~(AR) models as well as the huge model size issue. We propose a novel knowledge transfer and distillation architecture that leverages knowledge from AR models to improve the NAR performance while reducing the model's size. Frame- and sequence-level objectives are well-designed for transfer learning. To further boost the performance of NAR, a beam search method on Mask-CTC is developed to enlarge the search space during the inference stage. Experiments show that the proposed NAR beam search relatively reduces CER by over 5% on AISHELL-1 benchmark with a tolerable real-time-factor~(RTF) increment. By knowledge transfer, the NAR student who has the same size as the AR teacher obtains relative CER reductions of 8/16% on AISHELL-1 dev/test sets, and over 25% relative WER reductions on LibriSpeech test-clean/other sets. Moreover, the ~9x smaller NAR models achieve ~25% relative CER/WER reductions on both AISHELL-1 and LibriSpeech benchmarks with the proposed knowledge transfer and distillation.

preprint2022arXiv

Layer-wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition

Accent variability has posed a huge challenge to automatic speech recognition~(ASR) modeling. Although one-hot accent vector based adaptation systems are commonly used, they require prior knowledge about the target accent and cannot handle unseen accents. Furthermore, simply concatenating accent embeddings does not make good use of accent knowledge, which has limited improvements. In this work, we aim to tackle these problems with a novel layer-wise adaptation structure injected into the E2E ASR model encoder. The adapter layer encodes an arbitrary accent in the accent space and assists the ASR model in recognizing accented speech. Given an utterance, the adaptation structure extracts the corresponding accent information and transforms the input acoustic feature into an accent-related feature through the linear combination of all accent bases. We further explore the injection position of the adaptation layer, the number of accent bases, and different types of accent bases to achieve better accent adaptation. Experimental results show that the proposed adaptation structure brings 12\% and 10\% relative word error rate~(WER) reduction on the AESRC2020 accent dataset and the Librispeech dataset, respectively, compared to the baseline.

preprint2022arXiv

ReplaceBlock: An improved regularization method based on background information

Attention mechanism, being frequently used to train networks for better feature representations, can effectively disentangle the target object from irrelevant objects in the background. Given an arbitrary image, we find that the background's irrelevant objects are most likely to occlude/block the target object. We propose, based on this finding, a ReplaceBlock to simulate the situations when the target object is partially occluded by the objects that are deemed as background. Specifically, ReplaceBlock erases the target object in the image, and then generates a feature map with only irrelevant objects and background by the model. Finally, some regions in the background feature map are used to replace some regions of the target object in the original image feature map. In this way, ReplaceBlock can effectively simulate the feature map of the occluded image. The experimental results show that ReplaceBlock works better than DropBlock in regularizing convolutional networks.

preprint2022arXiv

Self-Supervised Implicit Attention: Guided Attention by The Model Itself

We propose Self-Supervised Implicit Attention (SSIA), a new approach that adaptively guides deep neural network models to gain attention by exploiting the properties of the models themselves. SSIA is a novel attention mechanism that does not require any extra parameters, computation, or memory access costs during inference, which is in contrast to existing attention mechanism. In short, by considering attention weights as higher-level semantic information, we reconsidered the implementation of existing attention mechanisms and further propose generating supervisory signals from higher network layers to guide lower network layers for parameter updates. We achieved this by building a self-supervised learning task using the hierarchical features of the network itself, which only works at the training stage. To verify the effectiveness of SSIA, we performed a particular implementation (called an SSIA block) in convolutional neural network models and validated it on several image classification datasets. The experimental results show that an SSIA block can significantly improve the model performance, even outperforms many popular attention methods that require additional parameters and computation costs, such as Squeeze-and-Excitation and Convolutional Block Attention Module. Our implementation will be available on GitHub.

preprint2022arXiv

The Fixed Sub-Center: A Better Way to Capture Data Complexity

Treating class with a single center may hardly capture data distribution complexities. Using multiple sub-centers is an alternative way to address this problem. However, highly correlated sub-classes, the classifier's parameters grow linearly with the number of classes, and lack of intra-class compactness are three typical issues that need to be addressed in existing multi-subclass methods. To this end, we propose to use Fixed Sub-Center (F-SC), which allows the model to create more discrepant sub-centers while saving memory and cutting computational costs considerably. The F-SC specifically, first samples a class center Ui for each class from a uniform distribution, and then generates a normal distribution for each class, where the mean is equal to Ui. Finally, the sub-centers are sampled based on the normal distribution corresponding to each class, and the sub-centers are fixed during the training process avoiding the overhead of gradient calculation. Moreover, F-SC penalizes the Euclidean distance between the samples and their corresponding sub-centers, it helps remain intra-compactness. The experimental results show that F-SC significantly improves the accuracy of both image classification and fine-grained recognition tasks.

preprint2021arXiv

Cyclic three-level-pulse-area theorem for enantioselective state transfer of chiral molecules

We derive a pulse-area theorem for a cyclic three-level system, an archetypal model for exploring enantioselective state transfer (ESST) in chiral molecules driven by three linearly polarized microwave pulses. By dividing the closed-loop excitation into two separate stages, we obtain both amplitude and phase conditions of three control fields to generate high fidelity of ESST. As a proof of principle, we apply this pulse-area theorem to the cyclohexylmethanol molecules ($\text{C}_{7}\text{H}_{14}\text{O}$), for which three rotational states are connected by the $a$-type, $b$-type, and $c$-type components of the transition dipole moments in both center-frequency resonant and detuned conditions. Our results show that two enantiomers with opposite handedness can be transferred to different target states by designing three microwave pulses that satisfy the amplitude and phase conditions at the transition frequencies. The corresponding control schemes are robust against the time delays between the two stages. We suggest that the two control fields used in the second stage should be applied simultaneously for practical applications. This work contributes an alternative pulse-area theorem to the field of quantum control, which has the potential to determine the chirality of enantiomers in a mixture.

preprint2020arXiv

Achievable Rates of Opportunistic Cognitive Radio Systems Using Reconfigurable Antennas with Imperfect Sensing and Channel Estimation

We consider an opportunistic cognitive radio (CR) system in which secondary transmitter (SUtx) is equipped with a reconfigurable antenna (RA). Utilizing the beam steering capability of the RA, we regard a design framework for integrated sector-based spectrum sensing and data communication. In this framework, SUtx senses the spectrum and detects the beam corresponding to active primary user's (PU) location. SUtx also sends training symbols (prior to data symbols), to enable channel estimation at secondary receiver (SUrx) and selection of the strongest beam between SUtx-SUrx for data transmission. We establish a lower bound on the achievable rates of SUtx-SUrx link, in the presence of spectrum sensing and channel estimation errors, and errors due to incorrect detection of the beam corresponding to PU's location and incorrect selection of the strongest beam for data transmission. We formulate a novel constrained optimization problem, aiming at maximizing the derived achievable rate lower bound subject to average transmit and interference power constraints. We optimize the durations of spatial spectrum sensing and channel training as well as data symbol transmission power. Our numerical results demonstrate that between optimizing spectrum sensing and channel training durations, the latter is more important for providing higher achievable rates.