Researcher profile

Bowon Lee

Bowon Lee contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2022arXiv

Integration of Pre-trained Networks with Continuous Token Interface for End-to-End Spoken Language Understanding

Most End-to-End (E2E) SLU networks leverage the pre-trained ASR networks but still lack the capability to understand the semantics of utterances, crucial for the SLU task. To solve this, recently proposed studies use pre-trained NLU networks. However, it is not trivial to fully utilize both pre-trained networks; many solutions were proposed, such as Knowledge Distillation, cross-modal shared embedding, and network integration with Interface. We propose a simple and robust integration method for the E2E SLU network with novel Interface, Continuous Token Interface (CTI), the junctional representation of the ASR and NLU networks when both networks are pre-trained with the same vocabulary. Because the only difference is the noise level, we directly feed the ASR network's output to the NLU network. Thus, we can train our SLU network in an E2E manner without additional modules, such as Gumbel-Softmax. We evaluate our model using SLURP, a challenging SLU dataset and achieve state-of-the-art scores on both intent classification and slot filling tasks. We also verify the NLU network, pre-trained with Masked Language Model, can utilize a noisy textual representation of CTI. Moreover, we show our model can be trained with multi-task learning from heterogeneous data even after integration with CTI.

preprint2011arXiv

Open-loop multi-channel inversion of room impulse response

This paper considers methods for audio display in a CAVE-type virtual reality theater, a 3 m cube with displays covering all six rigid faces. Headphones are possible since the user's headgear continuously measures ear positions, but loudspeakers are preferable since they enhance the sense of total immersion. The proposed solution consists of open-loop acoustic point control. The transfer function, a matrix of room frequency responses from the loudspeakers to the ears of the user, is inverted using multi-channel inversion methods, to create exactly the desired sound field at the user's ears. The inverse transfer function is constructed from impulse responses simulated by the image source method. This technique is validated by measuring a 2x2 matrix transfer function, simulating a transfer function with the same geometry, and filtering the measured transfer function through the inverse of the simulation. Since accuracy of the image source method decreases with time, inversion performance is improved by windowing the simulated response prior to inversion. Parameters of the simulation and inversion are adjusted to minimize residual reverberant energy; the best-case dereverberation ratio is 10 dB.