Source author record

Shaojun Wang

Shaojun Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language eess.AS Machine Learning physics.optics Sound Artificial Intelligence cond-mat.mes-hall cond-mat.mtrl-sci

Catalog footprint

What is connected

7works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Study of Different Ways to Use The Conformer Model For Spoken Language Understanding

SLU combines ASR and NLU capabilities to accomplish speech-to-intent understanding. In this paper, we compare different ways to combine ASR and NLU, in particular using a single Conformer model with different ways to use its components, to better understand the strengths and weaknesses of each approach. We find that it is not necessarily a choice between two-stage decoding and end-to-end systems which determines the best system for research or application. System optimization still entails carefully improving the performance of each component. It is difficult to prove that one direction is conclusively better than the other. In this paper, we also propose a novel connectionist temporal summarization (CTS) method to reduce the length of acoustic encoding sequences while improving the accuracy and processing speed of end-to-end models. This method achieves the same intent accuracy as the best two-stage SLU recognition with complicated and time-consuming decoding but does so at lower computational cost. This stacked end-to-end SLU system yields an intent accuracy of 93.97% for the SmartLights far-field set, 95.18% for the close-field set, and 99.71% for FluentSpeech.

preprint2022arXiv

Adding Connectionist Temporal Summarization into Conformer to Improve Its Decoder Efficiency For Speech Recognition

The Conformer model is an excellent architecture for speech recognition modeling that effectively utilizes the hybrid losses of connectionist temporal classification (CTC) and attention to train model parameters. To improve the decoding efficiency of Conformer, we propose a novel connectionist temporal summarization (CTS) method that reduces the number of frames required for the attention decoder fed from the acoustic sequences generated by the encoder, thus reducing operations. However, to achieve such decoding improvements, we must fine-tune model parameters, as cross-attention observations are changed and thus require corresponding refinements. Our final experiments show that, with a beamwidth of 4, the LibriSpeech's decoding budget can be reduced by up to 20% and for FluentSpeech data it can be reduced by 11%, without losing ASR accuracy. An improvement in accuracy is even found for the LibriSpeech "test-other" set. The word error rate (WER) is reduced by 6\% relative at the beam width of 1 and by 3% relative at the beam width of 4.

preprint2022arXiv

Enhancing Dual-Encoders with Question and Answer Cross-Embeddings for Answer Retrieval

Dual-Encoders is a promising mechanism for answer retrieval in question answering (QA) systems. Currently most conventional Dual-Encoders learn the semantic representations of questions and answers merely through matching score. Researchers proposed to introduce the QA interaction features in scoring function but at the cost of low efficiency in inference stage. To keep independent encoding of questions and answers during inference stage, variational auto-encoder is further introduced to reconstruct answers (questions) from question (answer) embeddings as an auxiliary task to enhance QA interaction in representation learning in training stage. However, the needs of text generation and answer retrieval are different, which leads to hardness in training. In this work, we propose a framework to enhance the Dual-Encoders model with question answer cross-embeddings and a novel Geometry Alignment Mechanism (GAM) to align the geometry of embeddings from Dual-Encoders with that from Cross-Encoders. Extensive experimental results show that our framework significantly improves Dual-Encoders model and outperforms the state-of-the-art method on multiple answer retrieval datasets.

preprint2022arXiv

Exciton diffusion and annihilation in nanophotonic Purcell landscapes

Excitons spread through diffusion and interact through exciton-exciton annihilation. Nanophotonics can counteract the resulting decrease in light emission. However, conventional enhancement treats emitters as immobile and noninteracting. Here, we go beyond the localized Purcell effect to exploit exciton dynamics. As interacting excitons diffuse through optical hotspots, the balance of excitonic and nanophotonic properties leads to either enhanced or suppressed photoluminescence. We identify the dominant enhancement mechanisms in the limits of high and low diffusion and annihilation to turn their detrimental impact into additional emission. Our guidelines are relevant for efficient and high-power light-emitting diodes and lasers based on monolayer semiconductors, perovskites, or organic crystals.

preprint2020arXiv

BS-NAS: Broadening-and-Shrinking One-Shot NAS with Searchable Numbers of Channels

One-Shot methods have evolved into one of the most popular methods in Neural Architecture Search (NAS) due to weight sharing and single training of a supernet. However, existing methods generally suffer from two issues: predetermined number of channels in each layer which is suboptimal; and model averaging effects and poor ranking correlation caused by weight coupling and continuously expanding search space. To explicitly address these issues, in this paper, a Broadening-and-Shrinking One-Shot NAS (BS-NAS) framework is proposed, in which `broadening' refers to broadening the search space with a spring block enabling search for numbers of channels during training of the supernet; while `shrinking' refers to a novel shrinking strategy gradually turning off those underperforming operations. The above innovations broaden the search space for wider representation and then shrink it by gradually removing underperforming operations, followed by an evolutionary algorithm to efficiently search for the optimal architecture. Extensive experiments on ImageNet illustrate the effectiveness of the proposed BS-NAS as well as the state-of-the-art performance.

preprint2020arXiv

Collective Mie Exciton-Polaritons in an Atomically Thin Semiconductor

Optically induced Mie resonances in dielectric nanoantennas feature low dissipative losses and large resonant enhancement of both electric and magnetic fields. They offer an alternative platform to plasmonic resonances to study light-matter interactions from the weak to the strong coupling regimes. Here, we experimentally demonstrate the strong coupling of bright excitons in monolayer WS$_2$ with Mie surface lattice resonances (Mie-SLRs). We resolve both electric and magnetic Mie-SLRs of a Si nanoparticle array in angular dispersion measurements. At the zero detuning condition, the dispersion of electric Mie-SLRs (e-SLRs) exhibits a clear anti-crossing and a Rabi-splitting of 32 meV between the upper and lower polariton bands. The magnetic Mie-SLRs (m-SLRs) nearly cross the energy band of excitons. These results suggest that the field of m-SLRs is dominated by out-of-plane components that do not efficiently couple with the in-plane excitonic dipoles of the monolayer WS$_2$. In contrast, e-SLRs in dielectric nanoparticle arrays with relatively high quality factors (Q $\sim$ 120) facilitate the formation of collective Mie exciton-polaritons, and may allow the development of novel polaritonic devices which can tailor the optoelectronic properties of atomically thin two-dimensional semiconductors.

preprint2012arXiv

Boltzmann Machine Learning with the Latent Maximum Entropy Principle

We present a new statistical learning paradigm for Boltzmann machines based on a new inference principle we have proposed: the latent maximum entropy principle (LME). LME is different both from Jaynes maximum entropy principle and from standard maximum likelihood estimation.We demonstrate the LME principle BY deriving new algorithms for Boltzmann machine parameter estimation, and show how robust and fast new variant of the EM algorithm can be developed.Our experiments show that estimation based on LME generally yields better results than maximum likelihood estimation, particularly when inferring hidden units from small amounts of data.

Shaojun Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

A Study of Different Ways to Use The Conformer Model For Spoken Language Understanding

Adding Connectionist Temporal Summarization into Conformer to Improve Its Decoder Efficiency For Speech Recognition

Enhancing Dual-Encoders with Question and Answer Cross-Embeddings for Answer Retrieval

Exciton diffusion and annihilation in nanophotonic Purcell landscapes

BS-NAS: Broadening-and-Shrinking One-Shot NAS with Searchable Numbers of Channels

Collective Mie Exciton-Polaritons in an Atomically Thin Semiconductor

Boltzmann Machine Learning with the Latent Maximum Entropy Principle