Researcher profile

Xin Fang

Xin Fang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
15works
0followers
20topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

15 published item(s)

preprint2022arXiv

A Complementary Joint Training Approach Using Unpaired Speech and Text for Low-Resource Automatic Speech Recognition

Unpaired data has shown to be beneficial for low-resource automatic speech recognition~(ASR), which can be involved in the design of hybrid models with multi-task training or language model dependent pre-training. In this work, we leverage unpaired data to train a general sequence-to-sequence model. Unpaired speech and text are used in the form of data pairs by generating the corresponding missing parts in prior to model training. Inspired by the complementarity of speech-PseudoLabel pair and SynthesizedAudio-text pair in both acoustic features and linguistic features, we propose a complementary joint training~(CJT) method that trains a model alternatively with two data pairs. Furthermore, label masking for pseudo-labels and gradient restriction for synthesized audio are proposed to further cope with the deviations from real data, termed as CJT++. Experimental results show that compared to speech-only training, the proposed basic CJT achieves great performance improvements on clean/other test sets, and the CJT++ re-training yields further performance enhancements. It is also apparent that the proposed method outperforms the wav2vec2.0 model with the same model size and beam size, particularly in extreme low-resource cases.

preprint2022arXiv

A Noise-Robust Self-supervised Pre-training Model Based Speech Representation Learning for Automatic Speech Recognition

Wav2vec2.0 is a popular self-supervised pre-training framework for learning speech representations in the context of automatic speech recognition (ASR). It was shown that wav2vec2.0 has a good robustness against the domain shift, while the noise robustness is still unclear. In this work, we therefore first analyze the noise robustness of wav2vec2.0 via experiments. We observe that wav2vec2.0 pre-trained on noisy data can obtain good representations and thus improve the ASR performance on the noisy test set, which however brings a performance degradation on the clean test set. To avoid this issue, in this work we propose an enhanced wav2vec2.0 model. Specifically, the noisy speech and the corresponding clean version are fed into the same feature encoder, where the clean speech provides training targets for the model. Experimental results reveal that the proposed method can not only improve the ASR performance on the noisy test set which surpasses the original wav2vec2.0, but also ensure a tiny performance decrease on the clean test set. In addition, the effectiveness of the proposed method is demonstrated under different types of noise conditions.

preprint2022arXiv

Band degeneration and evolution in nonlinear triatomic chain superlattices

Nonlinear superlattices exhibit unique features allowing for wave manipulations. Despite the increasing attention received, the underlying physical mechanisms and the evolution process of the band structures and bandgaps in strongly nonlinear superlattices remain unclear. Here we establish and examine strongly nonlinear superlattice models (three triatomic models) to show the evolution process of typical nonlinear band structures based on analytical and numerical approaches. We find that the strongly nonlinear superlattices present particular band degeneration and bifurcation, accompanied with the vibration mode transfer in their unit cells. The evolution processes and the physical mechanisms of the band degeneration in different models are clarified with the consideration of the mode transfer. The observed degeneration may occur as the shifting, bifurcating, shortening, merging or disappearing of dispersion curves, all depending on the arrangement of the coupled nonlinear elements. Meanwhile, the dimension of the unit cell reduces, alongside changes in the frequency range and mechanisms (Bragg and local resonance) of the bandgaps. These findings answer some foundamental questions peritinent to the study of nonlinear periodic structures, nonlinear crystals and nonlinear metamaterials, which are of interest to the broad community of physics

preprint2022arXiv

Learning Contextually Fused Audio-visual Representations for Audio-visual Speech Recognition

With the advance in self-supervised learning for audio and visual modalities, it has become possible to learn a robust audio-visual speech representation. This would be beneficial for improving the audio-visual speech recognition (AVSR) performance, as the multi-modal inputs contain more fruitful information in principle. In this paper, based on existing self-supervised representation learning methods for audio modality, we therefore propose an audio-visual representation learning approach. The proposed approach explores both the complementarity of audio-visual modalities and long-term context dependency using a transformer-based fusion module and a flexible masking strategy. After pre-training, the model is able to extract fused representations required by AVSR. Without loss of generality, it can be applied to single-modal tasks, e.g. audio/visual speech recognition by simply masking out one modality in the fusion module. The proposed pre-trained model is evaluated on speech recognition and lipreading tasks using one or two modalities, where the superiority is revealed.

preprint2022arXiv

Seshadri stratification for Schubert varieties and Standard Monomial Theory

The theory of Seshadri stratifications has been developed by the authors with the intention to build up a new geometric approach towards a standard monomial theory for embedded projective varieties with certain nice properties. In this article, we investigate the Seshadri stratification on a Schubert variety arising from its Schubert subvarieties. We show that the standard monomial theory developed in [32] is compatible with this new strategy.

preprint2021arXiv

Transmission-and-Distribution Frequency Dynamic Co-Simulation Framework for Distributed Energy Resources Frequency Response

The rapid deployment of distributed energy resources (DERs) in distribution networks has brought challenges to balance the system and stabilize frequency. DERs have the ability to provide frequency regulation; however, existing dynamic frequency simulation tools-which were developed mainly for the transmission system-lack the capability to simulate distribution network dynamics with high penetrations of DERs. Although electromagnetic transient (EMT) simulation tools can simulate distribution network dynamics, the computation efficiency limits their use for large-scale transmission-and-distribution (T&D) simulations. This paper presents an efficient T&D dynamic frequency co-simulation framework for DER frequency response based on the HELICS platform and existing off-the-shelf simulators. The challenge of synchronizing frequency between the transmission network and DERs hosted in the distribution network is approached by detailed modeling of DERs in frequency dynamic models while DER phasor models are also preserved in the distribution networks. Thereby, local voltage constraints can be respected when dispatching the DER power for frequency response. The DER frequency responses (primary and secondary)-are simulated in case studies to validate the proposed framework. Lastly, fault-induced delayed voltage recovery (FIDVR) event of a large system is presented to demonstrate the efficiency and effectiveness of the overall framework.

preprint2021arXiv

Two-Stage Stochastic Optimization Frameworks to Aid in Decision-Making Under Uncertainty for Variable Resource Generators Participating in a Sequential Energy Market

Decisions for a variable renewable resource generators commitment in the energy market are typically made in advance when little information is obtainable about wind availability and market prices. Much research has been published recommending various frameworks for addressing this issue. However, these frameworks are limited as they do not consider all markets a producer can participate in. Moreover, current stochastic programming models do not allow for uncertainty data to be updated as more accurate information becomes available. This work proposes two decision-making frameworks for a wind energy generator participating in day-ahead, intraday, reserve, and balancing markets. The first framework is a two-stage stochastic convex optimization approach, where both scenario-independent and scenario-dependent decisions are made concurrently. The second framework is a series of four two-stage stochastic optimization models wherein the results from each model feed into each subsequent model allowing for scenarios to be updated as more information becomes available to the decision-maker. In the simulation experiments, the multi-phase framework performs better than the single-phase in every run, and results in an average profit increase of 7%. The proposed optimization frameworks aid in better decision-making while addressing uncertainty related to variable resource generators and maximize the return on investment.

preprint2020arXiv

Domain-Embeddings Based DGA Detection with Incremental Training Method

DGA-based botnet, which uses Domain Generation Algorithms (DGAs) to evade supervision, has become a part of the most destructive threats to network security. Over the past decades, a wealth of defense mechanisms focusing on domain features have emerged to address the problem. Nonetheless, DGA detection remains a daunting and challenging task due to the big data nature of Internet traffic and the potential fact that the linguistic features extracted only from the domain names are insufficient and the enemies could easily forge them to disturb detection. In this paper, we propose a novel DGA detection system which employs an incremental word-embeddings method to capture the interactions between end hosts and domains, characterize time-series patterns of DNS queries for each IP address and therefore explore temporal similarities between domains. We carefully modify the Word2Vec algorithm and leverage it to automatically learn dynamic and discriminative feature representations for over 1.9 million domains, and develop an simple classifier for distinguishing malicious domains from the benign. Given the ability to identify temporal patterns of domains and update models incrementally, the proposed scheme makes the progress towards adapting to the changing and evolving strategies of DGA domains. Our system is evaluated and compared with the state-of-art system FANCI and two deep-learning methods CNN and LSTM, with data from a large university's network named TUNET. The results suggest that our system outperforms the strong competitors by a large margin on multiple metrics and meanwhile achieves a remarkable speed-up on model updating.

preprint2020arXiv

Weighted PBW degenerations and tropical flag varieties

We study algebraic, combinatorial and geometric aspects of weighted PBW-type degenerations of (partial) flag varieties in type $A$. These degenerations are labeled by degree functions lying in an explicitly defined polyhedral cone, which can be identified with a maximal cone in the tropical flag variety. Varying the degree function in the cone, we recover, for example, the classical flag variety, its abelian PBW degeneration, some of its linear degenerations and a particular toric degeneration.

preprint2019arXiv

Evaluating entropy rate of laser chaos and shot noise

Evaluating entropy rate of high-dimensional chaos and shot noise from analog raw signals remains elusive and important in information security. We experimentally present an accurate assessment of entropy rate for physical process randomness. The entropy generation of optical-feedback laser chaos and physical randomness limit from shot noise are quantified and unambiguously discriminated using the growth rate of average permutation entropy value in memory time. The permutation entropy difference of filtered laser chaos with varying embedding delay time is investigated experimentally and theoretically. High resolution maps of the entropy difference is observed over the range of the injection-feedback parameter space. We also clarify an inverse relationship between the entropy rate and time delay signature of laser chaos over a wide range of parameters. Compared to the original chaos, the time delay signature is suppressed up to 95% with the minimum of 0.015 via frequency-band extractor, and the experiment agrees well with the theory. Our system provides a commendable entropy evaluation and source for physical random number generation.

preprint2019arXiv

Space-time Variant Self-growing Bandgap in Nonlinear Acoustic Metamaterial

Material band structure is key foundation for various modern technologies, but it was regarded as a space-time invariant feature. Acoustic metamaterials show extraordinary properties for processing elastic waves, but conventional realizations suffer from narrow bandgaps. Here we first report a nonlinear acoustic metamaterial whose band structure self-adapts to the propagation distance/time and the bandgap exhibits a self-growing behaviour stemming from giant nonlinear interaction. This space-time self-modulating characteristic highlights an unconventional understanding of the band structure, and the self-growth generates an ultralow and ultrabroad bandgap that breaks through the limitation of the mass law for linear locally resonant bandgaps. We also elucidate the self-adaptive mechanisms. This first demonstration sheds light on conceiving advanced devices and metamaterials with broadband, space-time variant bandgaps for wave self-manipulation.