Source author record

Yulan Liu

Yulan Liu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language eess.AS Sound astro-ph.HE math.OC

Catalog footprint

What is connected

8works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Calmness of partial perturbation to composite rank constraint systems and its applications

This paper is concerned with the calmness of a partial perturbation to the composite rank constraint system, an intersection of the rank constraint set and a general closed set, which is shown to be equivalent to a local Lipschitz-type error bound and also a global Lipschitz-type error bound under a certain compactness. Based on its lifted formulation, we derive two criteria for identifying those closed sets such that the associated partial perturbation possesses the calmness, and provide a collection of examples to demonstrate that the criteria are satisfied by common nonnegative and positive semidefinite rank constraint sets. Then, we use the calmness of this perturbation to obtain several global exact penalties for rank constrained optimization problems, and a family of equivalent DC surrogates for rank regularized problems.

preprint2022arXiv

Long-term scintillation studies of EPTA pulsars. I. Observations and basic results

Interstellar scintillation analysis of pulsars allows us to probe the small-scale distribution and inhomogeneities of the ionized interstellar medium. Our priority is to present the data set and the basic measurements of scintillation parameters of pulsars employing long-term scintillation observations carried out from 2011 January to 2020 August by the European Pulsar Timing Array radio telescopes in the 21-cm and 11-cm bands. Additionally, we aim to identify future possible lines of study using this long-term scintillation dataset. We present the long-term time series of $ν_{\rm d}$ and $τ_{\rm d}$ for 13 pulsars. Sanity-checks and comparisons indicate that the scintillation parameters of our work and previously published works are mostly consistent. For two pulsars, PSRs~J1857+0943 and J1939+2134, we were able to obtain measurements of the $ν_{\rm d}$ at both bands, which allows us to derive the time series of frequency scaling indices with a mean and a standard deviation of 2.82$\pm$1.95 and 3.18$\pm$0.60, respectively. We found some interesting features which will be studied in more detail in subsequent papers in this series: (i) in the time series of PSR~J1939+2134, where the scintillation bandwidth sharply increases or decreases associated with a sharp change of dispersion measure; (ii) PSR~J0613$-$0200 and PSR~J0636+5126 show a strong annual variation in the time series of the $τ_{\rm d}$; (iii) PSR~J1939+2134 shows a weak anti-correlation between scintillation timescale and dispersion in WSRT data.

preprint2022arXiv

Multi-turn RNN-T for streaming recognition of multi-party speech

Automatic speech recognition (ASR) of single channel far-field recordings with an unknown number of speakers is traditionally tackled by cascaded modules. Recent research shows that end-to-end (E2E) multi-speaker ASR models can achieve superior recognition accuracy compared to modular systems. However, these models do not ensure real-time applicability due to their dependency on full audio context. This work takes real-time applicability as the first priority in model design and addresses a few challenges in previous work on multi-speaker recurrent neural network transducer (MS-RNN-T). First, we introduce on-the-fly overlapping speech simulation during training, yielding 14% relative word error rate (WER) improvement on LibriSpeechMix test set. Second, we propose a novel multi-turn RNN-T (MT-RNN-T) model with an overlap-based target arrangement strategy that generalizes to an arbitrary number of speakers without changes in the model architecture. We investigate the impact of the maximum number of speakers seen during training on MT-RNN-T performance on LibriCSS test set, and report 28% relative WER improvement over the two-speaker MS-RNN-T. Third, we experiment with a rich transcription strategy for joint recognition and segmentation of multi-party speech. Through an in-depth analysis, we discuss potential pitfalls of the proposed system as well as promising future research directions.

preprint2022arXiv

Pulsar scintillation studies with LOFAR. I. The census

Context. Interstellar scintillation (ISS) of pulsar emission can be used both as a probe of the ionised interstellar medium (IISM) and cause corruptions in pulsar timing experiments. Of particular interest are so-called scintillation arcs which can be used to measure time-variable interstellar scattering delays directly, potentially allowing high-precision improvements to timing precision. Aims. The primary aim of this study is to carry out the first sizeable and self-consistent census of diffractive pulsar scintillation and scintillation-arc detectability at low frequencies, as a primer for larger-scale IISM studies and pulsar-timing related propagation studies with the LOw-Frequency ARray (LOFAR) High Band Antennae (HBA). Results. In this initial set of 31 sources, 15 allow full determination of the scintillation properties; nine of these show detectable scintillation arcs at 120-180 MHz. Eight of the observed sources show unresolved scintillation; and the final eight don't display diffractive scintillation. Some correlation between scintillation detectability and pulsar brightness and dispersion measure is apparent, although no clear cut-off values can be determined. Our measurements across a large fractional bandwidth allow a meaningful test of the frequency scaling of scintillation parameters, uncorrupted by influences from refractive scintillation variations. Conclusions. Our results indicate the powerful advantage and great potential of ISS studies at low frequencies and the complex dependence of scintillation detectability on parameters like pulsar brightness and interstellar dispersion. This work provides the first installment of a larger-scale census and longer-term monitoring of interstellar scintillation effects at low frequencies.

preprint2021arXiv

Streaming Multi-speaker ASR with RNN-T

Recent research shows end-to-end ASR systems can recognize overlapped speech from multiple speakers. However, all published works have assumed no latency constraints during inference, which does not hold for most voice assistant interactions. This work focuses on multi-speaker speech recognition based on a recurrent neural network transducer (RNN-T) that has been shown to provide high recognition accuracy at a low latency online recognition regime. We investigate two approaches to multi-speaker model training of the RNN-T: deterministic output-target assignment and permutation invariant training. We show that guiding separation with speaker order labels in the former case enhances the high-level speaker tracking capability of RNN-T. Apart from that, with multistyle training on single- and multi-speaker utterances, the resulting models gain robustness against ambiguous numbers of speakers during inference. Our best model achieves a WER of 10.2% on simulated 2-speaker LibriSpeech data, which is competitive with the previously reported state-of-the-art nonstreaming model (10.3%), while the proposed model could be directly applied for streaming applications.

preprint2021arXiv

Using Synthetic Audio to Improve The Recognition of Out-Of-Vocabulary Words in End-To-End ASR Systems

Today, many state-of-the-art automatic speech recognition (ASR) systems apply all-neural models that map audio to word sequences trained end-to-end along one global optimisation criterion in a fully data driven fashion. These models allow high precision ASR for domains and words represented in the training material but have difficulties recognising words that are rarely or not at all represented during training, i.e. trending words and new named entities. In this paper, we use a text-to-speech (TTS) engine to provide synthetic audio for out-of-vocabulary (OOV) words. We aim to boost the recognition accuracy of a recurrent neural network transducer (RNN-T) on OOV words by using the extra audio-text pairs, while maintaining the performance on the non-OOV words. Different regularisation techniques are explored and the best performance is achieved by fine-tuning the RNN-T on both original training data and extra synthetic data with elastic weight consolidation (EWC) applied on the encoder. This yields a 57% relative word error rate (WER) reduction on utterances containing OOV words without any degradation on the whole test set.

preprint2016arXiv

Locally upper Lipschitz of the perturbed KKT system of Ky Fan $k$-norm matrix conic optimization problems

This note is concerned with the nonlinear Ky Fan $k$-norm matrix conic optimization problems, which include the nuclear norm regularized minimization problem as a special case. For this class of nonpolyhedral matrix conic optimization problems, under the assumption that a stationary solution satisfies the second-order sufficient condition and the associated Lagrange multiplier satisfies the strict Robinson's CQ, we show that two classes of perturbed KKT systems are locally upper Lipschitz at the origin, which implies a local error bound for the distance from any point in a neighborhood of the corresponding KKT point to the whole set of KKT points.

preprint2015arXiv

The 2015 Sheffield System for Transcription of Multi-Genre Broadcast Media

We describe the University of Sheffield system for participation in the 2015 Multi-Genre Broadcast (MGB) challenge task of transcribing multi-genre broadcast shows. Transcription was one of four tasks proposed in the MGB challenge, with the aim of advancing the state of the art of automatic speech recognition, speaker diarisation and automatic alignment of subtitles for broadcast media. Four topics are investigated in this work: Data selection techniques for training with unreliable data, automatic speech segmentation of broadcast media shows, acoustic modelling and adaptation in highly variable environments, and language modelling of multi-genre shows. The final system operates in multiple passes, using an initial unadapted decoding stage to refine segmentation, followed by three adapted passes: a hybrid DNN pass with input features normalised by speaker-based cepstral normalisation, another hybrid stage with input features normalised by speaker feature-MLLR transformations, and finally a bottleneck-based tandem stage with noise and speaker factorisation. The combination of these three system outputs provides a final error rate of 27.5% on the official development set, consisting of 47 multi-genre shows.

Yulan Liu

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Calmness of partial perturbation to composite rank constraint systems and its applications

Long-term scintillation studies of EPTA pulsars. I. Observations and basic results

Multi-turn RNN-T for streaming recognition of multi-party speech

Pulsar scintillation studies with LOFAR. I. The census

Streaming Multi-speaker ASR with RNN-T

Using Synthetic Audio to Improve The Recognition of Out-Of-Vocabulary Words in End-To-End ASR Systems

Locally upper Lipschitz of the perturbed KKT system of Ky Fan $k$-norm matrix conic optimization problems

The 2015 Sheffield System for Transcription of Multi-Genre Broadcast Media