Source author record

Masato Mimura

Masato Mimura appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.GR Computation and Language eess.AS math.CO math.GT math.OA math.MG math.FA math.SP Sound Discrete Mathematics Machine Learning math.AT math.KT math.NT math.RA

Catalog footprint

What is connected

18works

16topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Mixed commutator lengths, wreath products and general ranks

In the present paper, for a pair $(G,N)$ of a group $G$ and its normal subgroup $N$, we consider the mixed commutator length $\mathrm{cl}_{G,N}$ on the mixed commutator subgroup $[G,N]$. We focus on the setting of wreath products: $ (G,N)=(\mathbb{Z}\wr Γ, \bigoplus_Γ\mathbb{Z})$. Then we determine mixed commutator lengths in terms of the general rank in the sense of Malcev. As a byproduct, when an abelian group $Γ$ is not locally cyclic, the ordinary commutator length $\mathrm{cl}_G$ does not coincide with $\mathrm{cl}_{G,N}$ on $[G,N]$ for the above pair. On the other hand, we prove that if $Γ$ is locally cyclic, then for every pair $(G,N)$ such that $1\to N\to G\to Γ\to 1$ is exact, $\mathrm{cl}_{G}$ and $\mathrm{cl}_{G,N}$ coincide on $[G,N]$. We also study the case of permutational wreath products when the group $Γ$ belongs to a certain class related to surface groups.

preprint2022arXiv

Bavard's duality theorem for mixed commutator length

Let $N$ be a normal subgroup of a group $G$. A quasimorphism $f$ on $N$ is $G$-invariant if $f(gxg^{-1}) = f(x)$ for every $g \in G$ and every $x \in N$. The goal in this paper is to establish Bavard's duality theorem of $G$-invariant quasimorphisms, which was previously proved by Kawasaki and Kimura in the case $N = [G,N]$. Our duality theorem provides a connection between $G$-invariant quasimorphisms and $(G,N)$-commutator lengths. Here for $x \in [G,N]$, the $(G,N)$-commutator length $\mathrm{cl}_{G,N}(x)$ of $x$ is the minimum number $n$ such that $x$ is a product of $n$ commutators which are written as $[g,x]$ with $g \in G$ and $h \in N$. In the proof, we give a geometric interpretation of $(G,N)$-commutator lengths. As an application of our Bavard duality, we obtain a sufficient condition on a pair $(G,N)$ under which $\mathrm{scl}_G$ and $\mathrm{scl}_{G,N}$ are bi-Lipschitzly equivalent on $[G,N]$.

preprint2022arXiv

Constellations in prime elements of number fields

Given any number field, we prove that there exist arbitrarily shaped constellations consisting of pairwise non-associate prime elements of the ring of integers. This result extends the celebrated Green-Tao theorem on arithmetic progressions of rational primes and Tao's theorem on constellations of Gaussian primes. Furthermore, we prove a constellation theorem on prime representations of binary quadratic forms with integer coefficients. More precisely, for a non-degenerate primitive binary quadratic form $F$ which is not negative definite, there exist arbitrarily shaped constellations consisting of pairs of integers $(x,y)$ for which $F(x,y)$ is a rational prime. The latter theorem is obtained by extending the framework from the ring of integers to the pair of an order and its invertible fractional ideal.

preprint2022arXiv

Non-autoregressive Error Correction for CTC-based ASR with Phone-conditioned Masked LM

Connectionist temporal classification (CTC) -based models are attractive in automatic speech recognition (ASR) because of their non-autoregressive nature. To take advantage of text-only data, language model (LM) integration approaches such as rescoring and shallow fusion have been widely used for CTC. However, they lose CTC's non-autoregressive nature because of the need for beam search, which slows down the inference speed. In this study, we propose an error correction method with phone-conditioned masked LM (PC-MLM). In the proposed method, less confident word tokens in a greedy decoded output from CTC are masked. PC-MLM then predicts these masked word tokens given unmasked words and phones supplementally predicted from CTC. We further extend it to Deletable PC-MLM in order to address insertion errors. Since both CTC and PC-MLM are non-autoregressive models, the method enables fast LM integration. Experimental evaluations on the Corpus of Spontaneous Japanese (CSJ) and TED-LIUM2 in domain adaptation setting shows that our proposed method outperformed rescoring and shallow fusion in terms of inference speed, and also in terms of recognition accuracy on CSJ.

preprint2022arXiv

On the spectrum and linear programming bound for hypergraphs

The spectrum of a graph is closely related to many graph parameters. In particular, the spectral gap of a regular graph which is the difference between its valency and second eigenvalue, is widely seen an algebraic measure of connectivity and plays a key role in the theory of expander graphs. In this paper, we extend previous work done for graphs and bipartite graphs and present a linear programming method for obtaining an upper bound on the order of a regular uniform hypergraph with prescribed distinct eigenvalues. Furthermore, we obtain a general upper bound on the order of a regular uniform hypergraph whose second eigenvalue is bounded by a given value. Our results improve and extend previous work done by Feng-Li (1996) on Alon-Boppana theorems for regular hypergraphs and by Dinitz-Schapira-Shahaf (2020) on the Moore or degree-diameter problem. We also determine the largest order of an $r$-regular $u$-uniform hypergraph with second eigenvalue at most $θ$ for several parameters $(r,u,θ)$. In particular, orthogonal arrays give the structure of the largest hypergraphs with second eigenvalue at most $1$ for every sufficiently large $r$. Moreover, we show that a generalized Moore geometry has the largest spectral gap among all hypergraphs of that order and degree.

preprint2020arXiv

CTC-synchronous Training for Monotonic Attention Model

Monotonic chunkwise attention (MoChA) has been studied for the online streaming automatic speech recognition (ASR) based on a sequence-to-sequence framework. In contrast to connectionist temporal classification (CTC), backward probabilities cannot be leveraged in the alignment marginalization process during training due to left-to-right dependency in the decoder. This results in the error propagation of alignments to subsequent token generation. To address this problem, we propose CTC-synchronous training (CTC-ST), in which MoChA uses CTC alignments to learn optimal monotonic alignments. Reference CTC alignments are extracted from a CTC branch sharing the same encoder with the decoder. The entire model is jointly optimized so that the expected boundaries from MoChA are synchronized with the alignments. Experimental evaluations of the TEDLIUM release-2 and Librispeech corpora show that the proposed method significantly improves recognition, especially for long utterances. We also show that CTC-ST can bring out the full potential of SpecAugment for MoChA.

preprint2020arXiv

Distilling the Knowledge of BERT for Sequence-to-Sequence ASR

Attention-based sequence-to-sequence (seq2seq) models have achieved promising results in automatic speech recognition (ASR). However, as these models decode in a left-to-right way, they do not have access to context on the right. We leverage both left and right context by applying BERT as an external language model to seq2seq ASR through knowledge distillation. In our proposed method, BERT generates soft labels to guide the training of seq2seq ASR. Furthermore, we leverage context beyond the current utterance as input to BERT. Experimental evaluations show that our method significantly improves the ASR performance from the seq2seq baseline on the Corpus of Spontaneous Japanese (CSJ). Knowledge distillation from BERT outperforms that from a transformer LM that only looks at left context. We also show the effectiveness of leveraging context beyond the current utterance. Our method outperforms other LM application approaches such as n-best rescoring and shallow fusion, while it does not require extra inference cost.

preprint2020arXiv

End-to-end Music-mixed Speech Recognition

Automatic speech recognition (ASR) in multimedia content is one of the promising applications, but speech data in this kind of content are frequently mixed with background music, which is harmful for the performance of ASR. In this study, we propose a method for improving ASR with background music based on time-domain source separation. We utilize Conv-TasNet as a separation network, which has achieved state-of-the-art performance for multi-speaker source separation, to extract the speech signal from a speech-music mixture in the waveform domain. We also propose joint fine-tuning of a pre-trained Conv-TasNet front-end with an attention-based ASR back-end using both separation and ASR objectives. We evaluated our method through ASR experiments using speech data mixed with background music from a wide variety of Japanese animations. We show that time-domain speech-music separation drastically improves ASR performance of the back-end model trained with mixture data, and the joint optimization yielded a further significant WER reduction. The time-domain separation method outperformed a frequency-domain separation method, which reuses the phase information of the input mixture signal, both in simple cascading and joint training settings. We also demonstrate that our method works robustly for music interference from classical, jazz and popular genres.

preprint2020arXiv

Generative Adversarial Training Data Adaptation for Very Low-resource Automatic Speech Recognition

It is important to transcribe and archive speech data of endangered languages for preserving heritages of verbal culture and automatic speech recognition (ASR) is a powerful tool to facilitate this process. However, since endangered languages do not generally have large corpora with many speakers, the performance of ASR models trained on them are considerably poor in general. Nevertheless, we are often left with a lot of recordings of spontaneous speech data that have to be transcribed. In this work, for mitigating this speaker sparsity problem, we propose to convert the whole training speech data and make it sound like the test speaker in order to develop a highly accurate ASR system for this speaker. For this purpose, we utilize a CycleGAN-based non-parallel voice conversion technology to forge a labeled training data that is close to the test speaker's speech. We evaluated this speaker adaptation approach on two low-resource corpora, namely, Ainu and Mboshi. We obtained 35-60% relative improvement in phone error rate on the Ainu corpus, and 40% relative improvement was attained on the Mboshi corpus. This approach outperformed two conventional methods namely unsupervised adaptation and multilingual training with these two corpora.

preprint2020arXiv

Speech Corpus of Ainu Folklore and End-to-end Speech Recognition for Ainu Language

Ainu is an unwritten language that has been spoken by Ainu people who are one of the ethnic groups in Japan. It is recognized as critically endangered by UNESCO and archiving and documentation of its language heritage is of paramount importance. Although a considerable amount of voice recordings of Ainu folklore has been produced and accumulated to save their culture, only a quite limited parts of them are transcribed so far. Thus, we started a project of automatic speech recognition (ASR) for the Ainu language in order to contribute to the development of annotated language archives. In this paper, we report speech corpus development and the structure and performance of end-to-end ASR for Ainu. We investigated four modeling units (phone, syllable, word piece, and word) and found that the syllable-based model performed best in terms of both word and phone recognition accuracy, which were about 60% and over 85% respectively in speaker-open condition. Furthermore, word and phone accuracy of 80% and 90% has been achieved in a speaker-closed setting. We also found out that a multilingual ASR training with additional speech corpora of English and Japanese further improves the speaker-open test accuracy.

preprint2016arXiv

Strong algebraization of fixed point properties

The following natural question arises from Shalom's innovational work (1999, Publ. IHES): "Can we establish an intrinsic criterion to synthesize relative fixed point properties into the whole fixed point property without assuming Bounded Generation?" This paper resolves this question in the affirmative. Our criterion works for ones with respect to certain classes of Busemann NPC spaces. It, moreover, suggests a further step toward constructing super-expanders from finite simple groups of Lie type.

preprint2015arXiv

Multi-way expanders and imprimitive group actions on graphs

For n at least 2, the concept of n-way expanders was defined by various researchers. Bigger n gives a weaker notion in general, and 2-way expanders coincide with expanders in usual sense. Koji Fujiwara asked whether these concepts are equivalent to that of ordinary expanders for all n for a sequence of Cayley graphs. In this paper, we answer his question in the affirmative. Furthermore, we obtain universal inequalities on multi-way isoperimetric constants on any finite connected vertex-transitive graph, and show that gaps between these constants imply the imprimitivity of the group action on the graph.

preprint2014arXiv

Group approximation in Cayley topology and coarse geometry, Part III: Geometric property (T)

In this series of papers, we study correspondence between the following: (1) large scale structure of the metric space bigsqcup_m {Cay(G(m))} consisting of Cayley graphs of finite groups with k generators; (2) structure of groups which appear in the boundary of the set {G(m)}_m in the space of k-marked groups. In this third part of the series, we show the correspondence among the metric properties `geometric property (T),' `cohomological property (T),' and the group property `Kazhdan's property (T).' Geometric property (T) of Willett--Yu is stronger than being expander graphs. Cohomological property (T) is stronger than geometric property (T) for general coarse spaces.

preprint2014arXiv

Sphere equivalence, Banach expanders, and extrapolation

We study the Banach spectral gap lambda_1(G;X,p) of finite graphs G for pairs (X,p) of Banach spaces and exponents. We define the notion of sphere equivalence between Banach spaces and show a generalization of Matousek's extrapolation for Banach spaces sphere equivalent to uniformly convex ones. As a byproduct, we prove that expanders are automatically expanders with respects to (X,p) for any X sphere equivalent to a uniformly curved Banach space and for any p strictly bigger than 1.

preprint2011arXiv

Fixed point property for universal lattice on Schatten classes

The special linear group G=SL_n(Z[x1,...,xk]) (n at least 3 and k finite) is called the universal lattice. Let n be at least 4, p be any real number in (1,\infty). The main result is the following: any finite index subgroup of G has the fixed point property with respect to every affine isometric action on the space of p-Schatten class operators. It is in addition shown that higher rank lattices have the same property. These results are generalization of previous theorems repsectively of the author and of Bader--Furman--Gelander--Monod, which treated commutative Lp-setting.

preprint2011arXiv

Property $(TT)$ modulo $T$ and homomorphism superrigidity into mapping class groups

Every homomorphism from finite index subgroups of a universal lattices to mapping class groups of orientable surfaces (possibly with punctures), or to outer automorphism groups of finitely generated nonabelian free groups must have finite image. Here the universal lattice denotes the special linear group G=SL_m(Z[x1,...,xk]) with m at least 3 and k finite. Moreover, the same results hold ture if universal lattices are replaced with symplectic universal lattices Sp_{2m}(Z[x1,...,xk]) with m at least 2. These results can be regarded as a non-arithmetization of the theorems of Farb--Kaimanovich--Masur and Bridson--Wade. A certain measure equivalence analogue is also established. To show the statements above, we introduce a notion of property (TT)/T ("/T" stands for "modulo trivial part"), which is a weakening of property (TT) of N. Monod. Furthermore, symplectic universal lattices Sp_{2m}(Z[x1,...,xk]) with m at least 3 has the fixed point property for L^p-spaces for any p in (1,infinity).

preprint2010arXiv

Fixed point properties and second bounded cohomology of universal lattices on Banach space

Let B be any Lp space for p in (1,infty) or any Banach space isomorphic to a Hilbert space, and k be a nonnegative integer. We show that if n is at least 4, then the universal lattice Gamma =SL_n (Z[x1,...,xk]) has property (F_B) in the sense of Bader--Furman--Gelander--Monod. Namely, any affine isometric action of Gamma on B has a global fixed point. The property of having (F_B) for all B above is known to be strictly stronger than Kazhdan's property (T). We also define the following generalization of property (F_B)$ for a group: the boundedness property of all affine quasi-actions on B. We name it property (FF_B) and prove that the group Gamma above also has this property modulo trivial part. The conclusion above in particular implies that the comparison map in degree two H^2_b (Gamma; B) \to H^2(Gamma; B) from bounded to ordinary cohomology is injective, provided that the associated linear representation does not contain the trivial representation.

preprint2010arXiv

On Quasi-homomorphisms and Commutators in the Special Linear Group over a Euclidean Ring

We prove that for any euclidean ring R and n at least 6, Gamma=SL_n(R) has no unbounded quasi-homomorphisms. From Bavard's duality theorem, this means that the stable commutator length vanishes on Gamma. The result is particularly interesting for R = F[x] for a certain field F (such as the field C of complex numbers, because in this case the commutator length on Gamma is known to be unbounded. This answers a question of M. Abért and N. Monod for n at least 6.

Masato Mimura

What is connected

Connect this record

See the researcher in context

Building this map preview

18 published item(s)

Mixed commutator lengths, wreath products and general ranks

Bavard's duality theorem for mixed commutator length

Constellations in prime elements of number fields

Non-autoregressive Error Correction for CTC-based ASR with Phone-conditioned Masked LM

On the spectrum and linear programming bound for hypergraphs

CTC-synchronous Training for Monotonic Attention Model

Distilling the Knowledge of BERT for Sequence-to-Sequence ASR

End-to-end Music-mixed Speech Recognition

Generative Adversarial Training Data Adaptation for Very Low-resource Automatic Speech Recognition

Speech Corpus of Ainu Folklore and End-to-end Speech Recognition for Ainu Language

Strong algebraization of fixed point properties

Multi-way expanders and imprimitive group actions on graphs

Group approximation in Cayley topology and coarse geometry, Part III: Geometric property (T)

Sphere equivalence, Banach expanders, and extrapolation

Fixed point property for universal lattice on Schatten classes

Property $(TT)$ modulo $T$ and homomorphism superrigidity into mapping class groups

Fixed point properties and second bounded cohomology of universal lattices on Banach space

On Quasi-homomorphisms and Commutators in the Special Linear Group over a Euclidean Ring