Researcher profile

Takuya Higuchi

Takuya Higuchi contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2022arXiv

Improving Voice Trigger Detection with Metric Learning

Voice trigger detection is an important task, which enables activating a voice assistant when a target user speaks a keyword phrase. A detector is typically trained on speech data independent of speaker information and used for the voice trigger detection task. However, such a speaker independent voice trigger detector typically suffers from performance degradation on speech from underrepresented groups, such as accented speakers. In this work, we propose a novel voice trigger detector that can use a small number of utterances from a target speaker to improve detection accuracy. Our proposed model employs an encoder-decoder architecture. While the encoder performs speaker independent voice trigger detection, similar to the conventional detector, the decoder predicts a personalized embedding for each utterance. A personalized voice trigger score is then obtained as a similarity score between the embeddings of enrollment utterances and a test utterance. The personalized embedding allows adapting to target speaker's speech when computing the voice trigger score, hence improving voice trigger detection accuracy. Experimental results show that the proposed approach achieves a 38% relative reduction in a false rejection rate (FRR) compared to a baseline speaker independent voice trigger model.

preprint2021arXiv

Dynamic curriculum learning via data parameters for noise robust keyword spotting

We propose dynamic curriculum learning via data parameters for noise robust keyword spotting. Data parameter learning has recently been introduced for image processing, where weight parameters, so-called data parameters, for target classes and instances are introduced and optimized along with model parameters. The data parameters scale logits and control importance over classes and instances during training, which enables automatic curriculum learning without additional annotations for training data. Similarly, in this paper, we propose using this curriculum learning approach for acoustic modeling, and train an acoustic model on clean and noisy utterances with the data parameters. The proposed approach automatically learns the difficulty of the classes and instances, e.g. due to low speech to noise ratio (SNR), in the gradient descent optimization and performs curriculum learning. This curriculum learning leads to overall improvement of the accuracy of the acoustic model. We evaluate the effectiveness of the proposed approach on a keyword spotting task. Experimental results show 7.7% relative reduction in false reject ratio with the data parameters compared to a baseline model which is simply trained on the multiconditioned dataset.

preprint2020arXiv

Attosecond-fast internal photoemission

The photoelectric effect has a sister process relevant in optoelectronics called internal photoemission. Here an electron is photoemitted from a metal into a semiconductor. While the photoelectric effect takes place within less than 100 attoseconds, the attosecond time scale has so far not been measured for internal photoemission. Based on the new method CHArge transfer time MEasurement via Laser pulse duration-dependent saturation fluEnce determinatiON, CHAMELEON, we show that the atomically thin semi-metal graphene coupled to bulk silicon carbide, forming a Schottky junction, allows charge transfer times as fast as (300 $\pm$ 200) attoseconds. These results are supported by a simple quantum mechanical model simulation. With the obtained cut-off bandwidth of 3.3 PHz for the charge transfer rate, this semimetal-semiconductor interface represents the first functional solid-state interface offering the speed and design space required for future light-wave signal processing.

preprint2020arXiv

Stacked 1D convolutional networks for end-to-end small footprint voice trigger detection

We propose a stacked 1D convolutional neural network (S1DCNN) for end-to-end small footprint voice trigger detection in a streaming scenario. Voice trigger detection is an important speech application, with which users can activate their devices by simply saying a keyword or phrase. Due to privacy and latency reasons, a voice trigger detection system should run on an always-on processor on device. Therefore, having small memory and compute cost is crucial for a voice trigger detection system. Recently, singular value decomposition filters (SVDFs) has been used for end-to-end voice trigger detection. The SVDFs approximate a fully-connected layer with a low rank approximation, which reduces the number of model parameters. In this work, we propose S1DCNN as an alternative approach for end-to-end small-footprint voice trigger detection. An S1DCNN layer consists of a 1D convolution layer followed by a depth-wise 1D convolution layer. We show that the SVDF can be expressed as a special case of the S1DCNN layer. Experimental results show that the S1DCNN achieve 19.0% relative false reject ratio (FRR) reduction with a similar model size and a similar time delay compared to the SVDF. By using longer time delays, the S1DCNN further improve the FRR up to 12.2% relative.

preprint2020arXiv

Sub-cycle temporal evolution of light-induced electron dynamics in hexagonal 2D materials

Two-dimensional materials with hexagonal symmetry such as graphene and transition metal dichalcogenides} are unique materials to study light-field-controlled electron dynamics inside of a solid. Around the $K$-point, the dispersion relation represents an ideal system to study intricately coupled intraband motion and interband (Landau-Zener) transitions driven by the optical field of phase-controlled few-cycle laser pulses. Based on the coupled nature of the intraband and interband processes, we have recently observed in graphene repeated coherent Landau-Zener transitions between valence and conduction band separated by around half an optical period of ~1.3 fs [Higuchi et al., Nature 550, 224 (2017)]. Due to the low temporal symmetry of the applied laser pulse, a residual current density and a net electron polarization are formed. Here we show extended numerical data on the temporal evolution of the conduction band population of 2D materials with hexagonal symmetry during the light-matter interaction, yielding deep insights to attosecond-fast electron dynamics. In addition, we show that a residual ballistic current density is formed, which strongly increases when a band gap is introduced. Both, the sub-cycle electron dynamics and the resulting residual current are relevant for the fundamental understanding and future applications of strongly driven electrons in two-dimensional materials, including graphene or transition metal dichalcogenide monolayers.