Researcher profile

Yulan Liu

Yulan Liu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2022arXiv

Calmness of partial perturbation to composite rank constraint systems and its applications

This paper is concerned with the calmness of a partial perturbation to the composite rank constraint system, an intersection of the rank constraint set and a general closed set, which is shown to be equivalent to a local Lipschitz-type error bound and also a global Lipschitz-type error bound under a certain compactness. Based on its lifted formulation, we derive two criteria for identifying those closed sets such that the associated partial perturbation possesses the calmness, and provide a collection of examples to demonstrate that the criteria are satisfied by common nonnegative and positive semidefinite rank constraint sets. Then, we use the calmness of this perturbation to obtain several global exact penalties for rank constrained optimization problems, and a family of equivalent DC surrogates for rank regularized problems.

preprint2022arXiv

Long-term scintillation studies of EPTA pulsars. I. Observations and basic results

Interstellar scintillation analysis of pulsars allows us to probe the small-scale distribution and inhomogeneities of the ionized interstellar medium. Our priority is to present the data set and the basic measurements of scintillation parameters of pulsars employing long-term scintillation observations carried out from 2011 January to 2020 August by the European Pulsar Timing Array radio telescopes in the 21-cm and 11-cm bands. Additionally, we aim to identify future possible lines of study using this long-term scintillation dataset. We present the long-term time series of $ν_{\rm d}$ and $τ_{\rm d}$ for 13 pulsars. Sanity-checks and comparisons indicate that the scintillation parameters of our work and previously published works are mostly consistent. For two pulsars, PSRs~J1857+0943 and J1939+2134, we were able to obtain measurements of the $ν_{\rm d}$ at both bands, which allows us to derive the time series of frequency scaling indices with a mean and a standard deviation of 2.82$\pm$1.95 and 3.18$\pm$0.60, respectively. We found some interesting features which will be studied in more detail in subsequent papers in this series: (i) in the time series of PSR~J1939+2134, where the scintillation bandwidth sharply increases or decreases associated with a sharp change of dispersion measure; (ii) PSR~J0613$-$0200 and PSR~J0636+5126 show a strong annual variation in the time series of the $τ_{\rm d}$; (iii) PSR~J1939+2134 shows a weak anti-correlation between scintillation timescale and dispersion in WSRT data.

preprint2022arXiv

Multi-turn RNN-T for streaming recognition of multi-party speech

Automatic speech recognition (ASR) of single channel far-field recordings with an unknown number of speakers is traditionally tackled by cascaded modules. Recent research shows that end-to-end (E2E) multi-speaker ASR models can achieve superior recognition accuracy compared to modular systems. However, these models do not ensure real-time applicability due to their dependency on full audio context. This work takes real-time applicability as the first priority in model design and addresses a few challenges in previous work on multi-speaker recurrent neural network transducer (MS-RNN-T). First, we introduce on-the-fly overlapping speech simulation during training, yielding 14% relative word error rate (WER) improvement on LibriSpeechMix test set. Second, we propose a novel multi-turn RNN-T (MT-RNN-T) model with an overlap-based target arrangement strategy that generalizes to an arbitrary number of speakers without changes in the model architecture. We investigate the impact of the maximum number of speakers seen during training on MT-RNN-T performance on LibriCSS test set, and report 28% relative WER improvement over the two-speaker MS-RNN-T. Third, we experiment with a rich transcription strategy for joint recognition and segmentation of multi-party speech. Through an in-depth analysis, we discuss potential pitfalls of the proposed system as well as promising future research directions.

preprint2022arXiv

Pulsar scintillation studies with LOFAR. I. The census

Context. Interstellar scintillation (ISS) of pulsar emission can be used both as a probe of the ionised interstellar medium (IISM) and cause corruptions in pulsar timing experiments. Of particular interest are so-called scintillation arcs which can be used to measure time-variable interstellar scattering delays directly, potentially allowing high-precision improvements to timing precision. Aims. The primary aim of this study is to carry out the first sizeable and self-consistent census of diffractive pulsar scintillation and scintillation-arc detectability at low frequencies, as a primer for larger-scale IISM studies and pulsar-timing related propagation studies with the LOw-Frequency ARray (LOFAR) High Band Antennae (HBA). Results. In this initial set of 31 sources, 15 allow full determination of the scintillation properties; nine of these show detectable scintillation arcs at 120-180 MHz. Eight of the observed sources show unresolved scintillation; and the final eight don't display diffractive scintillation. Some correlation between scintillation detectability and pulsar brightness and dispersion measure is apparent, although no clear cut-off values can be determined. Our measurements across a large fractional bandwidth allow a meaningful test of the frequency scaling of scintillation parameters, uncorrupted by influences from refractive scintillation variations. Conclusions. Our results indicate the powerful advantage and great potential of ISS studies at low frequencies and the complex dependence of scintillation detectability on parameters like pulsar brightness and interstellar dispersion. This work provides the first installment of a larger-scale census and longer-term monitoring of interstellar scintillation effects at low frequencies.

preprint2021arXiv

Streaming Multi-speaker ASR with RNN-T

Recent research shows end-to-end ASR systems can recognize overlapped speech from multiple speakers. However, all published works have assumed no latency constraints during inference, which does not hold for most voice assistant interactions. This work focuses on multi-speaker speech recognition based on a recurrent neural network transducer (RNN-T) that has been shown to provide high recognition accuracy at a low latency online recognition regime. We investigate two approaches to multi-speaker model training of the RNN-T: deterministic output-target assignment and permutation invariant training. We show that guiding separation with speaker order labels in the former case enhances the high-level speaker tracking capability of RNN-T. Apart from that, with multistyle training on single- and multi-speaker utterances, the resulting models gain robustness against ambiguous numbers of speakers during inference. Our best model achieves a WER of 10.2% on simulated 2-speaker LibriSpeech data, which is competitive with the previously reported state-of-the-art nonstreaming model (10.3%), while the proposed model could be directly applied for streaming applications.

preprint2021arXiv

Using Synthetic Audio to Improve The Recognition of Out-Of-Vocabulary Words in End-To-End ASR Systems

Today, many state-of-the-art automatic speech recognition (ASR) systems apply all-neural models that map audio to word sequences trained end-to-end along one global optimisation criterion in a fully data driven fashion. These models allow high precision ASR for domains and words represented in the training material but have difficulties recognising words that are rarely or not at all represented during training, i.e. trending words and new named entities. In this paper, we use a text-to-speech (TTS) engine to provide synthetic audio for out-of-vocabulary (OOV) words. We aim to boost the recognition accuracy of a recurrent neural network transducer (RNN-T) on OOV words by using the extra audio-text pairs, while maintaining the performance on the non-OOV words. Different regularisation techniques are explored and the best performance is achieved by fine-tuning the RNN-T on both original training data and extra synthetic data with elastic weight consolidation (EWC) applied on the encoder. This yields a 57% relative word error rate (WER) reduction on utterances containing OOV words without any degradation on the whole test set.