Researcher profile

Shaojun Wang

Shaojun Wang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2022arXiv

A Study of Different Ways to Use The Conformer Model For Spoken Language Understanding

SLU combines ASR and NLU capabilities to accomplish speech-to-intent understanding. In this paper, we compare different ways to combine ASR and NLU, in particular using a single Conformer model with different ways to use its components, to better understand the strengths and weaknesses of each approach. We find that it is not necessarily a choice between two-stage decoding and end-to-end systems which determines the best system for research or application. System optimization still entails carefully improving the performance of each component. It is difficult to prove that one direction is conclusively better than the other. In this paper, we also propose a novel connectionist temporal summarization (CTS) method to reduce the length of acoustic encoding sequences while improving the accuracy and processing speed of end-to-end models. This method achieves the same intent accuracy as the best two-stage SLU recognition with complicated and time-consuming decoding but does so at lower computational cost. This stacked end-to-end SLU system yields an intent accuracy of 93.97% for the SmartLights far-field set, 95.18% for the close-field set, and 99.71% for FluentSpeech.

preprint2022arXiv

Adding Connectionist Temporal Summarization into Conformer to Improve Its Decoder Efficiency For Speech Recognition

The Conformer model is an excellent architecture for speech recognition modeling that effectively utilizes the hybrid losses of connectionist temporal classification (CTC) and attention to train model parameters. To improve the decoding efficiency of Conformer, we propose a novel connectionist temporal summarization (CTS) method that reduces the number of frames required for the attention decoder fed from the acoustic sequences generated by the encoder, thus reducing operations. However, to achieve such decoding improvements, we must fine-tune model parameters, as cross-attention observations are changed and thus require corresponding refinements. Our final experiments show that, with a beamwidth of 4, the LibriSpeech's decoding budget can be reduced by up to 20% and for FluentSpeech data it can be reduced by 11%, without losing ASR accuracy. An improvement in accuracy is even found for the LibriSpeech "test-other" set. The word error rate (WER) is reduced by 6\% relative at the beam width of 1 and by 3% relative at the beam width of 4.

preprint2022arXiv

Enhancing Dual-Encoders with Question and Answer Cross-Embeddings for Answer Retrieval

Dual-Encoders is a promising mechanism for answer retrieval in question answering (QA) systems. Currently most conventional Dual-Encoders learn the semantic representations of questions and answers merely through matching score. Researchers proposed to introduce the QA interaction features in scoring function but at the cost of low efficiency in inference stage. To keep independent encoding of questions and answers during inference stage, variational auto-encoder is further introduced to reconstruct answers (questions) from question (answer) embeddings as an auxiliary task to enhance QA interaction in representation learning in training stage. However, the needs of text generation and answer retrieval are different, which leads to hardness in training. In this work, we propose a framework to enhance the Dual-Encoders model with question answer cross-embeddings and a novel Geometry Alignment Mechanism (GAM) to align the geometry of embeddings from Dual-Encoders with that from Cross-Encoders. Extensive experimental results show that our framework significantly improves Dual-Encoders model and outperforms the state-of-the-art method on multiple answer retrieval datasets.

preprint2022arXiv

Exciton diffusion and annihilation in nanophotonic Purcell landscapes

Excitons spread through diffusion and interact through exciton-exciton annihilation. Nanophotonics can counteract the resulting decrease in light emission. However, conventional enhancement treats emitters as immobile and noninteracting. Here, we go beyond the localized Purcell effect to exploit exciton dynamics. As interacting excitons diffuse through optical hotspots, the balance of excitonic and nanophotonic properties leads to either enhanced or suppressed photoluminescence. We identify the dominant enhancement mechanisms in the limits of high and low diffusion and annihilation to turn their detrimental impact into additional emission. Our guidelines are relevant for efficient and high-power light-emitting diodes and lasers based on monolayer semiconductors, perovskites, or organic crystals.

preprint2020arXiv

BS-NAS: Broadening-and-Shrinking One-Shot NAS with Searchable Numbers of Channels

One-Shot methods have evolved into one of the most popular methods in Neural Architecture Search (NAS) due to weight sharing and single training of a supernet. However, existing methods generally suffer from two issues: predetermined number of channels in each layer which is suboptimal; and model averaging effects and poor ranking correlation caused by weight coupling and continuously expanding search space. To explicitly address these issues, in this paper, a Broadening-and-Shrinking One-Shot NAS (BS-NAS) framework is proposed, in which `broadening' refers to broadening the search space with a spring block enabling search for numbers of channels during training of the supernet; while `shrinking' refers to a novel shrinking strategy gradually turning off those underperforming operations. The above innovations broaden the search space for wider representation and then shrink it by gradually removing underperforming operations, followed by an evolutionary algorithm to efficiently search for the optimal architecture. Extensive experiments on ImageNet illustrate the effectiveness of the proposed BS-NAS as well as the state-of-the-art performance.

preprint2020arXiv

Collective Mie Exciton-Polaritons in an Atomically Thin Semiconductor

Optically induced Mie resonances in dielectric nanoantennas feature low dissipative losses and large resonant enhancement of both electric and magnetic fields. They offer an alternative platform to plasmonic resonances to study light-matter interactions from the weak to the strong coupling regimes. Here, we experimentally demonstrate the strong coupling of bright excitons in monolayer WS$_2$ with Mie surface lattice resonances (Mie-SLRs). We resolve both electric and magnetic Mie-SLRs of a Si nanoparticle array in angular dispersion measurements. At the zero detuning condition, the dispersion of electric Mie-SLRs (e-SLRs) exhibits a clear anti-crossing and a Rabi-splitting of 32 meV between the upper and lower polariton bands. The magnetic Mie-SLRs (m-SLRs) nearly cross the energy band of excitons. These results suggest that the field of m-SLRs is dominated by out-of-plane components that do not efficiently couple with the in-plane excitonic dipoles of the monolayer WS$_2$. In contrast, e-SLRs in dielectric nanoparticle arrays with relatively high quality factors (Q $\sim$ 120) facilitate the formation of collective Mie exciton-polaritons, and may allow the development of novel polaritonic devices which can tailor the optoelectronic properties of atomically thin two-dimensional semiconductors.