Researcher profile

Kohei Saijo

Kohei Saijo contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
2topics
2close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2022arXiv

Independence-based Joint Dereverberation and Separation with Neural Source Model

We propose an independence-based joint dereverberation and separation method with a neural source model. We introduce a neural network in the framework of time-decorrelation iterative source steering, which is an extension of independent vector analysis to joint dereverberation and separation. The network is trained in an end-to-end manner with a permutation invariant loss on the time-domain separation output signals. Our proposed method can be applied in any situation with at least as many microphones as sources, regardless of their number. In experiments, we demonstrate that our method results in high performance in terms of both speech quality metrics and word error rate (WER), even for mixtures with a different number of speakers than training. Furthermore, the model, trained on synthetic mixtures, without any modifications, greatly reduces the WER on the recorded dataset LibriCSS.

preprint2022arXiv

Remix-cycle-consistent Learning on Adversarially Learned Separator for Accurate and Stable Unsupervised Speech Separation

A new learning algorithm for speech separation networks is designed to explicitly reduce residual noise and artifacts in the separated signal in an unsupervised manner. Generative adversarial networks are known to be effective in constructing separation networks when the ground truth for the observed signal is inaccessible. Still, weak objectives aimed at distribution-to-distribution mapping make the learning unstable and limit their performance. This study introduces the remix-cycle-consistency loss as a more appropriate objective function and uses it to fine-tune adversarially learned source separation models. The remix-cycle-consistency loss is defined as the difference between the mixed speech observed at microphones and the pseudo-mixed speech obtained by alternating the process of separating the mixed sound and remixing its outputs with another combination. The minimization of this loss leads to an explicit reduction in the distortions in the output of the separation network. Experimental comparisons with multichannel speech separation demonstrated that the proposed method achieved high separation accuracy and learning stability comparable to supervised learning.

preprint2022arXiv

Spatial Loss for Unsupervised Multi-channel Source Separation

We propose a spatial loss for unsupervised multi-channel source separation. The proposed loss exploits the duality of direction of arrival (DOA) and beamforming: the steering and beamforming vectors should be aligned for the target source, but orthogonal for interfering ones. The spatial loss encourages consistency between the mixing and demixing systems from a classic DOA estimator and a neural separator, respectively. With the proposed loss, we train the neural separators based on minimum variance distortionless response (MVDR) beamforming and independent vector analysis (IVA). We also investigate the effectiveness of combining our spatial loss and a signal loss, which uses the outputs of blind source separation as the reference. We evaluate our proposed method on synthetic and recorded (LibriCSS) mixtures. We find that the spatial loss is most effective to train IVA-based separators. For the neural MVDR beamformer, it performs best when combined with a signal loss. On synthetic mixtures, the proposed unsupervised loss leads to the same performance as a supervised loss in terms of word error rate. On LibriCSS, we obtain close to state-of-the-art performance without any labeled training data.