Researcher profile

Mingjie Chen

Mingjie Chen contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2022arXiv

Efficient Non-Autoregressive GAN Voice Conversion using VQWav2vec Features and Dynamic Convolution

It was shown recently that a combination of ASR and TTS models yield highly competitive performance on standard voice conversion tasks such as the Voice Conversion Challenge 2020 (VCC2020). To obtain good performance both models require pretraining on large amounts of data, thereby obtaining large models that are potentially inefficient in use. In this work we present a model that is significantly smaller and thereby faster in processing while obtaining equivalent performance. To achieve this the proposed model, Dynamic-GAN-VC (DYGAN-VC), uses a non-autoregressive structure and makes use of vector quantised embeddings obtained from a VQWav2vec model. Furthermore dynamic convolution is introduced to improve speech content modeling while requiring a small number of parameters. Objective and subjective evaluation was performed using the VCC2020 task, yielding MOS scores of up to 3.86, and character error rates as low as 4.3\%. This was achieved with approximately half the number of model parameters, and up to 8 times faster decoding speed.

preprint2022arXiv

On $\mathbb{F}_p$-roots of the Hilbert class polynomial modulo $p$

The Hilbert class polynomial $H_{\mathcal{O}}(x)\in \mathbb{Z}[x]$ attached to an order $\mathcal{O}$ in an imaginary quadratic field $K$ is the monic polynomial whose roots are precisely the distinct $j$-invariants of elliptic curves over $\mathbb{C}$ with complex multiplication by $\mathcal{O}$. Let $p$ be a prime inert in $K$ and strictly greater than $|\operatorname{disc}(\mathcal{O})|$. We show that the number of $\mathbb{F}_p$-roots of $H_\mathcal{O}(x)\!\! \pmod{p}$ is either zero or $|\operatorname{Pic}(\mathcal{O})[2]|$ by exhibiting a free and transitive action of $\operatorname{Pic}(\mathcal{O})[2]$ on the set of $\mathbb{F}_p$-roots of $H_\mathcal{O}(x)\!\! \pmod p$ whenever it is nonempty. We also provide a concrete criterion for the nonemptiness of the set of $\mathbb{F}_p$-roots. A similar result was first obtained by Xiao et al.~[Int. J. Number Theory, DOI: 10.1142/S1793042122500555] and generalized much further by Li et al.~[arXiv:2108.00168] (that covers the current result) with a different approach.

preprint2022arXiv

One-off Negative Sequential Pattern Mining

Negative sequential pattern mining (SPM) is an important SPM research topic. Unlike positive SPM, negative SPM can discover events that should have occurred but have not occurred, and it can be used for financial risk management and fraud detection. However, existing methods generally ignore the repetitions of the pattern and do not consider gap constraints, which can lead to mining results containing a large number of patterns that users are not interested in. To solve this problem, this paper discovers frequent one-off negative sequential patterns (ONPs). This problem has the following two characteristics. First, the support is calculated under the one-off condition, which means that any character in the sequence can only be used once at most. Second, the gap constraint can be given by the user. To efficiently mine patterns, this paper proposes the ONP-Miner algorithm, which employs depth-first and backtracking strategies to calculate the support. Therefore, ONP-Miner can effectively avoid creating redundant nodes and parent-child relationships. Moreover, to effectively reduce the number of candidate patterns, ONP-Miner uses pattern join and pruning strategies to generate and further prune the candidate patterns, respectively. Experimental results show that ONP-Miner not only improves the mining efficiency, but also has better mining performance than the state-of-the-art algorithms. More importantly, ONP mining can find more interesting patterns in traffic volume data to predict future traffic.

preprint2020arXiv

Unsupervised Acoustic Unit Representation Learning for Voice Conversion using WaveNet Auto-encoders

Unsupervised representation learning of speech has been of keen interest in recent years, which is for example evident in the wide interest of the ZeroSpeech challenges. This work presents a new method for learning frame level representations based on WaveNet auto-encoders. Of particular interest in the ZeroSpeech Challenge 2019 were models with discrete latent variable such as the Vector Quantized Variational Auto-Encoder (VQVAE). However these models generate speech with relatively poor quality. In this work we aim to address this with two approaches: first WaveNet is used as the decoder and to generate waveform data directly from the latent representation; second, the low complexity of latent representations is improved with two alternative disentanglement learning methods, namely instance normalization and sliced vector quantization. The method was developed and tested in the context of the recent ZeroSpeech challenge 2020. The system output submitted to the challenge obtained the top position for naturalness (Mean Opinion Score 4.06), top position for intelligibility (Character Error Rate 0.15), and third position for the quality of the representation (ABX test score 12.5). These and further analysis in this paper illustrates that quality of the converted speech and the acoustic units representation can be well balanced.