Source author record

Shoichiro Saito

Shoichiro Saito appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.AS Sound cond-mat.mtrl-sci cond-mat.other Machine Learning

Catalog footprint

What is connected

9works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Echo-aware Adaptation of Sound Event Localization and Detection in Unknown Environments

Our goal is to develop a sound event localization and detection (SELD) system that works robustly in unknown environments. A SELD system trained on known environment data is degraded in an unknown environment due to environmental effects such as reverberation and noise not contained in the training data. Previous studies on related tasks have shown that domain adaptation methods are effective when data on the environment in which the system will be used is available even without labels. However adaptation to unknown environments remains a difficult task. In this study, we propose echo-aware feature refinement (EAR) for SELD, which suppresses environmental effects at the feature level by using additional spatial cues of the unknown environment obtained through measuring acoustic echoes. FOA-MEIR, an impulse response dataset containing over 100 environments, was recorded to validate the proposed method. Experiments on FOA-MEIR show that the EAR effectively improves SELD performance in unknown environments.

preprint2022arXiv

Multi-view and Multi-modal Event Detection Utilizing Transformer-based Multi-sensor fusion

We tackle a challenging task: multi-view and multi-modal event detection that detects events in a wide-range real environment by utilizing data from distributed cameras and microphones and their weak labels. In this task, distributed sensors are utilized complementarily to capture events that are difficult to capture with a single sensor, such as a series of actions of people moving in an intricate room, or communication between people located far apart in a room. For sensors to cooperate effectively in such a situation, the system should be able to exchange information among sensors and combines information that is useful for identifying events in a complementary manner. For such a mechanism, we propose a Transformer-based multi-sensor fusion (MultiTrans) which combines multi-sensor data on the basis of the relationships between features of different viewpoints and modalities. In the experiments using a dataset newly collected for this task, our proposed method using MultiTrans improved the event detection performance and outperformed comparatives.

preprint2022arXiv

Wearable SELD dataset: Dataset for sound event localization and detection using wearable devices around head

Sound event localization and detection (SELD) is a combined task of identifying the sound event and its direction. Deep neural networks (DNNs) are utilized to associate them with the sound signals observed by a microphone array. Although ambisonic microphones are popular in the literature of SELD, they might limit the range of applications due to their predetermined geometry. Some applications (including those for pedestrians that perform SELD while walking) require a wearable microphone array whose geometry can be designed to suit the task. In this paper, for the development of such a wearable SELD, we propose a dataset named Wearable SELD dataset. It consists of data recorded by 24 microphones placed on a head and torso simulators (HATS) with some accessories mimicking wearable devices (glasses, earphones, and headphones). We also provide experimental results of SELD using the proposed dataset and SELDNet to investigate the effect of microphone configuration.

preprint2020arXiv

A Transformer-based Audio Captioning Model with Keyword Estimation

One of the problems with automated audio captioning (AAC) is the indeterminacy in word selection corresponding to the audio event/scene. Since one acoustic event/scene can be described with several words, it results in a combinatorial explosion of possible captions and difficulty in training. To solve this problem, we propose a Transformer-based audio-captioning model with keyword estimation called TRACKE. It simultaneously solves the word-selection indeterminacy problem with the main task of AAC while executing the sub-task of acoustic event detection/acoustic scene classification (i.e., keyword estimation). TRACKE estimates keywords, which comprise a word set corresponding to audio events/scenes in the input audio, and generates the caption while referring to the estimated keywords to reduce word-selection indeterminacy. Experimental results on a public AAC dataset indicate that TRACKE achieved state-of-the-art performance and successfully estimated both the caption and its keywords.

preprint2020arXiv

Sound Event Localization based on Sound Intensity Vector Refined By DNN-Based Denoising and Source Separation

We propose a direction-of-arrival (DOA) estimation method for Sound Event Localization and Detection (SELD). Direct estimation of DOA using a deep neural network (DNN), i.e. completely-datadriven approach, achieves high accuracy. However, there is a gap in the accuracy between DOA estimation for single and overlapping sources because they cannot incorporate physical knowledge. Meanwhile, although the accuracy of physics-based approaches is inferior to DNN-based approaches, it is robust for overlapping source. In this study, we consider a combination of physics-based and DNN-based approaches; the sound intensity vectors (IVs) for physics-based DOA estimation is refined based on DNN-based denoising and source separation. This method enables the accurate DOA estimation for both single and overlapping sources using a spherical microphone array. Experimental results show that the proposed method achieves state-of-the-art DOA estimation accuracy on an open dataset of the SELD.

preprint2011arXiv

First-Principles Study on Structural Properties of GeO$_2$ and SiO$_2$ under Compression and Expansion Pressure

The detailed analysis of the structural variations of three GeO$_2$ and SiO$_2$ polymorphs ($α$-quartz, $α$-cristobalite, and rutile) under compression and expansion pressure is reported. First-principles total-energy calculations reveal that the rutile structure is the most stable phase among the phases of GeO$_2$, while SiO$_2$ preferentially forms quartz. GeO$_4$ tetrahedras of quartz and cristobalite GeO$_2$ phases at the equilibrium volume are more significantly distorted than those of SiO$_2$. Moreover, in the case of quartz GeO$_2$ and cristobalite GeO$_2$, all O-Ge-O bond angles vary when the volume of the GeO$_2$ bulk changes from the equilibrium point, which causes further deformation of tetrahedra. In contrast, the tilt angle formed by Si-O-Si in SiO$_2$ markedly changes. This flexibility of the O-Ge-O bonds reduces the stress at the Ge/GeO$_2$ interface due to the lattice-constant mismatch and results in the low defective interface observed in the experiments [Matsubara \textit{et al.}: Appl. Phys. Lett. \textbf{93} (2008) 032104; Hosoi \textit{et al.}: Appl. Phys. Lett. \textbf{94} (2009) 202112].

preprint2011arXiv

New structural model for GeO2/Ge interface: A first-principles study

First-principles modeling of a GeO2/Ge(001) interface reveals that sixfold GeO2, which is derived from cristobalite and is different from rutile, dramatically reduces the lattice mismatch at the interface and is much more stable than the conventional fourfold interface. Since the grain boundary between fourfold and sixfold GeO2 is unstable, the sixfold GeO2 forms a large grain at the interface. On the contrary, a comparative study with SiO2 demonstrates that SiO2 maintains a fourfold structure. The sixfold GeO2/Ge interface is shown to be a consequence of the ground-state phase of GeO2. In addition, the electronic structure calculation reveals that sixfold GeO2 at the interface shifts the valence band maximum far from the interface toward the conduction band.

preprint2010arXiv

First-principles electronic-structure calculation of dangling bonds at Si/SiO$_2$ and Ge/GeO$_2$ interfaces

Evidence of the absence of the clear electron spin-resonance signal from Ge dangling bonds (DBs) at Ge/GeO$_2$ interfaces is explored by means of first-principles electronic-structure calculations. Comparing the electronic structures of the DBs at Si/SiO$_2$ and Ge/GeO$_2$ interfaces, we found that the electronic structure of the Ge-DB is markedly different from that of the Si-DB; the Ge-DB states does not position in the energy band gap of the Ge/GeO$_2$ interface while the Si-DB states clearly appears. In addition, the charge density distribution of the Ge-DB state spreads more widely than that of the Si-DB state. These features are explained by considering the metallic properties of the bonding network of the Ge/GeO$_2$ interface and the structural deformation of the Ge bulk at the Ge/GeO$_2$ interface due to the lattice-constant mismatch.

preprint2009arXiv

First-Principles Study for Evidence of Low Interface Defect Density at Ge/GeO$_2$ Interfaces

We present the evidence of the low defect density at Ge/GeO$_2$ interfaces in terms of first-principles total energy calculations. The energy advantages of the atom emission from the Ge/GeO$_2$ interface to release the stress due to the lattice mismatch are compared with those from the Si/SiO$_2$ interface. The energy advantages of the Ge/GeO$_2$ are found to be smaller than those of the Si/SiO$_2$ because of the high flexibility of the bonding networks in GeO$_2$. Thus, the suppression of the Ge-atom emission during the oxidation process leads to the improved electrical properties of the Ge/GeO$_2$ interfaces.

Shoichiro Saito

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Echo-aware Adaptation of Sound Event Localization and Detection in Unknown Environments

Multi-view and Multi-modal Event Detection Utilizing Transformer-based Multi-sensor fusion

Wearable SELD dataset: Dataset for sound event localization and detection using wearable devices around head

A Transformer-based Audio Captioning Model with Keyword Estimation

Sound Event Localization based on Sound Intensity Vector Refined By DNN-Based Denoising and Source Separation

First-Principles Study on Structural Properties of GeO$_2$ and SiO$_2$ under Compression and Expansion Pressure

New structural model for GeO2/Ge interface: A first-principles study

First-principles electronic-structure calculation of dangling bonds at Si/SiO$_2$ and Ge/GeO$_2$ interfaces

First-Principles Study for Evidence of Low Interface Defect Density at Ge/GeO$_2$ Interfaces