Researcher profile

Masato Mimura

Masato Mimura contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2023arXiv

Mixed commutator lengths, wreath products and general ranks

In the present paper, for a pair $(G,N)$ of a group $G$ and its normal subgroup $N$, we consider the mixed commutator length $\mathrm{cl}_{G,N}$ on the mixed commutator subgroup $[G,N]$. We focus on the setting of wreath products: $ (G,N)=(\mathbb{Z}\wr Γ, \bigoplus_Γ\mathbb{Z})$. Then we determine mixed commutator lengths in terms of the general rank in the sense of Malcev. As a byproduct, when an abelian group $Γ$ is not locally cyclic, the ordinary commutator length $\mathrm{cl}_G$ does not coincide with $\mathrm{cl}_{G,N}$ on $[G,N]$ for the above pair. On the other hand, we prove that if $Γ$ is locally cyclic, then for every pair $(G,N)$ such that $1\to N\to G\to Γ\to 1$ is exact, $\mathrm{cl}_{G}$ and $\mathrm{cl}_{G,N}$ coincide on $[G,N]$. We also study the case of permutational wreath products when the group $Γ$ belongs to a certain class related to surface groups.

preprint2022arXiv

Bavard's duality theorem for mixed commutator length

Let $N$ be a normal subgroup of a group $G$. A quasimorphism $f$ on $N$ is $G$-invariant if $f(gxg^{-1}) = f(x)$ for every $g \in G$ and every $x \in N$. The goal in this paper is to establish Bavard's duality theorem of $G$-invariant quasimorphisms, which was previously proved by Kawasaki and Kimura in the case $N = [G,N]$. Our duality theorem provides a connection between $G$-invariant quasimorphisms and $(G,N)$-commutator lengths. Here for $x \in [G,N]$, the $(G,N)$-commutator length $\mathrm{cl}_{G,N}(x)$ of $x$ is the minimum number $n$ such that $x$ is a product of $n$ commutators which are written as $[g,x]$ with $g \in G$ and $h \in N$. In the proof, we give a geometric interpretation of $(G,N)$-commutator lengths. As an application of our Bavard duality, we obtain a sufficient condition on a pair $(G,N)$ under which $\mathrm{scl}_G$ and $\mathrm{scl}_{G,N}$ are bi-Lipschitzly equivalent on $[G,N]$.

preprint2022arXiv

Constellations in prime elements of number fields

Given any number field, we prove that there exist arbitrarily shaped constellations consisting of pairwise non-associate prime elements of the ring of integers. This result extends the celebrated Green-Tao theorem on arithmetic progressions of rational primes and Tao's theorem on constellations of Gaussian primes. Furthermore, we prove a constellation theorem on prime representations of binary quadratic forms with integer coefficients. More precisely, for a non-degenerate primitive binary quadratic form $F$ which is not negative definite, there exist arbitrarily shaped constellations consisting of pairs of integers $(x,y)$ for which $F(x,y)$ is a rational prime. The latter theorem is obtained by extending the framework from the ring of integers to the pair of an order and its invertible fractional ideal.

preprint2022arXiv

Non-autoregressive Error Correction for CTC-based ASR with Phone-conditioned Masked LM

Connectionist temporal classification (CTC) -based models are attractive in automatic speech recognition (ASR) because of their non-autoregressive nature. To take advantage of text-only data, language model (LM) integration approaches such as rescoring and shallow fusion have been widely used for CTC. However, they lose CTC's non-autoregressive nature because of the need for beam search, which slows down the inference speed. In this study, we propose an error correction method with phone-conditioned masked LM (PC-MLM). In the proposed method, less confident word tokens in a greedy decoded output from CTC are masked. PC-MLM then predicts these masked word tokens given unmasked words and phones supplementally predicted from CTC. We further extend it to Deletable PC-MLM in order to address insertion errors. Since both CTC and PC-MLM are non-autoregressive models, the method enables fast LM integration. Experimental evaluations on the Corpus of Spontaneous Japanese (CSJ) and TED-LIUM2 in domain adaptation setting shows that our proposed method outperformed rescoring and shallow fusion in terms of inference speed, and also in terms of recognition accuracy on CSJ.

preprint2022arXiv

On the spectrum and linear programming bound for hypergraphs

The spectrum of a graph is closely related to many graph parameters. In particular, the spectral gap of a regular graph which is the difference between its valency and second eigenvalue, is widely seen an algebraic measure of connectivity and plays a key role in the theory of expander graphs. In this paper, we extend previous work done for graphs and bipartite graphs and present a linear programming method for obtaining an upper bound on the order of a regular uniform hypergraph with prescribed distinct eigenvalues. Furthermore, we obtain a general upper bound on the order of a regular uniform hypergraph whose second eigenvalue is bounded by a given value. Our results improve and extend previous work done by Feng-Li (1996) on Alon-Boppana theorems for regular hypergraphs and by Dinitz-Schapira-Shahaf (2020) on the Moore or degree-diameter problem. We also determine the largest order of an $r$-regular $u$-uniform hypergraph with second eigenvalue at most $θ$ for several parameters $(r,u,θ)$. In particular, orthogonal arrays give the structure of the largest hypergraphs with second eigenvalue at most $1$ for every sufficiently large $r$. Moreover, we show that a generalized Moore geometry has the largest spectral gap among all hypergraphs of that order and degree.

preprint2020arXiv

CTC-synchronous Training for Monotonic Attention Model

Monotonic chunkwise attention (MoChA) has been studied for the online streaming automatic speech recognition (ASR) based on a sequence-to-sequence framework. In contrast to connectionist temporal classification (CTC), backward probabilities cannot be leveraged in the alignment marginalization process during training due to left-to-right dependency in the decoder. This results in the error propagation of alignments to subsequent token generation. To address this problem, we propose CTC-synchronous training (CTC-ST), in which MoChA uses CTC alignments to learn optimal monotonic alignments. Reference CTC alignments are extracted from a CTC branch sharing the same encoder with the decoder. The entire model is jointly optimized so that the expected boundaries from MoChA are synchronized with the alignments. Experimental evaluations of the TEDLIUM release-2 and Librispeech corpora show that the proposed method significantly improves recognition, especially for long utterances. We also show that CTC-ST can bring out the full potential of SpecAugment for MoChA.

preprint2020arXiv

Distilling the Knowledge of BERT for Sequence-to-Sequence ASR

Attention-based sequence-to-sequence (seq2seq) models have achieved promising results in automatic speech recognition (ASR). However, as these models decode in a left-to-right way, they do not have access to context on the right. We leverage both left and right context by applying BERT as an external language model to seq2seq ASR through knowledge distillation. In our proposed method, BERT generates soft labels to guide the training of seq2seq ASR. Furthermore, we leverage context beyond the current utterance as input to BERT. Experimental evaluations show that our method significantly improves the ASR performance from the seq2seq baseline on the Corpus of Spontaneous Japanese (CSJ). Knowledge distillation from BERT outperforms that from a transformer LM that only looks at left context. We also show the effectiveness of leveraging context beyond the current utterance. Our method outperforms other LM application approaches such as n-best rescoring and shallow fusion, while it does not require extra inference cost.

preprint2020arXiv

End-to-end Music-mixed Speech Recognition

Automatic speech recognition (ASR) in multimedia content is one of the promising applications, but speech data in this kind of content are frequently mixed with background music, which is harmful for the performance of ASR. In this study, we propose a method for improving ASR with background music based on time-domain source separation. We utilize Conv-TasNet as a separation network, which has achieved state-of-the-art performance for multi-speaker source separation, to extract the speech signal from a speech-music mixture in the waveform domain. We also propose joint fine-tuning of a pre-trained Conv-TasNet front-end with an attention-based ASR back-end using both separation and ASR objectives. We evaluated our method through ASR experiments using speech data mixed with background music from a wide variety of Japanese animations. We show that time-domain speech-music separation drastically improves ASR performance of the back-end model trained with mixture data, and the joint optimization yielded a further significant WER reduction. The time-domain separation method outperformed a frequency-domain separation method, which reuses the phase information of the input mixture signal, both in simple cascading and joint training settings. We also demonstrate that our method works robustly for music interference from classical, jazz and popular genres.

preprint2020arXiv

Generative Adversarial Training Data Adaptation for Very Low-resource Automatic Speech Recognition

It is important to transcribe and archive speech data of endangered languages for preserving heritages of verbal culture and automatic speech recognition (ASR) is a powerful tool to facilitate this process. However, since endangered languages do not generally have large corpora with many speakers, the performance of ASR models trained on them are considerably poor in general. Nevertheless, we are often left with a lot of recordings of spontaneous speech data that have to be transcribed. In this work, for mitigating this speaker sparsity problem, we propose to convert the whole training speech data and make it sound like the test speaker in order to develop a highly accurate ASR system for this speaker. For this purpose, we utilize a CycleGAN-based non-parallel voice conversion technology to forge a labeled training data that is close to the test speaker's speech. We evaluated this speaker adaptation approach on two low-resource corpora, namely, Ainu and Mboshi. We obtained 35-60% relative improvement in phone error rate on the Ainu corpus, and 40% relative improvement was attained on the Mboshi corpus. This approach outperformed two conventional methods namely unsupervised adaptation and multilingual training with these two corpora.

preprint2020arXiv

Speech Corpus of Ainu Folklore and End-to-end Speech Recognition for Ainu Language

Ainu is an unwritten language that has been spoken by Ainu people who are one of the ethnic groups in Japan. It is recognized as critically endangered by UNESCO and archiving and documentation of its language heritage is of paramount importance. Although a considerable amount of voice recordings of Ainu folklore has been produced and accumulated to save their culture, only a quite limited parts of them are transcribed so far. Thus, we started a project of automatic speech recognition (ASR) for the Ainu language in order to contribute to the development of annotated language archives. In this paper, we report speech corpus development and the structure and performance of end-to-end ASR for Ainu. We investigated four modeling units (phone, syllable, word piece, and word) and found that the syllable-based model performed best in terms of both word and phone recognition accuracy, which were about 60% and over 85% respectively in speaker-open condition. Furthermore, word and phone accuracy of 80% and 90% has been achieved in a speaker-closed setting. We also found out that a multilingual ASR training with additional speech corpora of English and Japanese further improves the speaker-open test accuracy.

preprint2010arXiv

On Quasi-homomorphisms and Commutators in the Special Linear Group over a Euclidean Ring

We prove that for any euclidean ring R and n at least 6, Gamma=SL_n(R) has no unbounded quasi-homomorphisms. From Bavard's duality theorem, this means that the stable commutator length vanishes on Gamma. The result is particularly interesting for R = F[x] for a certain field F (such as the field C of complex numbers, because in this case the commutator length on Gamma is known to be unbounded. This answers a question of M. Abért and N. Monod for n at least 6.