Researcher profile

Nan Huo

Nan Huo contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2023arXiv

Graphix-T5: Mixing Pre-Trained Transformers with Graph-Aware Layers for Text-to-SQL Parsing

The task of text-to-SQL parsing, which aims at converting natural language questions into executable SQL queries, has garnered increasing attention in recent years, as it can assist end users in efficiently extracting vital information from databases without the need for technical background. One of the major challenges in text-to-SQL parsing is domain generalization, i.e., how to generalize well to unseen databases. Recently, the pre-trained text-to-text transformer model, namely T5, though not specialized for text-to-SQL parsing, has achieved state-of-the-art performance on standard benchmarks targeting domain generalization. In this work, we explore ways to further augment the pre-trained T5 model with specialized components for text-to-SQL parsing. Such components are expected to introduce structural inductive bias into text-to-SQL parsers thus improving model's capacity on (potentially multi-hop) reasoning, which is critical for generating structure-rich SQLs. To this end, we propose a new architecture GRAPHIX-T5, a mixed model with the standard pre-trained transformer model augmented by some specially-designed graph-aware layers. Extensive experiments and analysis demonstrate the effectiveness of GRAPHIX-T5 across four text-to-SQL benchmarks: SPIDER, SYN, REALISTIC and DK. GRAPHIX-T5 surpass all other T5-based parsers with a significant margin, achieving new state-of-the-art performance. Notably, GRAPHIX-T5-large reach performance superior to the original T5-large by 5.7% on exact match (EM) accuracy and 6.6% on execution accuracy (EX). This even outperforms the T5-3B by 1.2% on EM and 1.5% on EX.

preprint2022arXiv

Automatic Meta-Path Discovery for Effective Graph-Based Recommendation

Heterogeneous Information Networks (HINs) are labeled graphs that depict relationships among different types of entities (e.g., users, movies and directors). For HINs, meta-path-based recommenders (MPRs) utilize meta-paths (i.e., abstract paths consisting of node and link types) to predict user preference, and have attracted a lot of attention due to their explainability and performance. We observe that the performance of MPRs is highly sensitive to the meta-paths they use, but existing works manually select the meta-paths from many possible ones. Thus, to discover effective meta-paths automatically, we propose the Reinforcement learning-based Meta-path Selection (RMS) framework. Specifically, we define a vector encoding for meta-paths and design a policy network to extend meta-paths. The policy network is trained based on the results of downstream recommendation tasks and an early stopping approximation strategy is proposed to speed up training. RMS is a general model, and it can work with all existing MPRs. We also propose a new MPR called RMS-HRec, which uses an attention mechanism to aggregate information from the meta-paths. We conduct extensive experiments on real datasets. Compared with the manually selected meta-paths, the meta-paths identified by RMS consistently improve recommendation quality. Moreover, RMS-HRec outperforms state-of-the-art recommender systems by an average of 7% in hit ratio. The codes and datasets are available on https://github.com/Stevenn9981/RMS-HRec.

preprint2022arXiv

Measurement-dependent erasure of distinguishability for the observation of interference in an unbalanced SU(1,1) interferometer

It is known that quantum interference can disappear with the mere possibility of distinguishability without actually performing the act. We create such distinguishability in an unbalanced SU(1,1) interferometer and indeed observe no interference in the direct photodetection of the outputs. On the other hand, such distinguishability can be erased with a projective measurement. Here, we report a method of homodyne detection that can also recover interference effect. We find that it is the indistinguishability in amplitude measurement that leads to the recovery of interference, and the quantum nature of homodyne detection and the detector's slow response time both play an essential role. This is different from the quantum eraser schemes mentioned above. It demonstrates that quantum interference occurs in the measurement processes. With no need for path compensation, the unbalanced interferometers studied here should have practical applications in quantum metrology and sensing.

preprint2022arXiv

Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis

This paper introduces a new open-source platform named Muskits for end-to-end music processing, which mainly focuses on end-to-end singing voice synthesis (E2E-SVS). Muskits supports state-of-the-art SVS models, including RNN SVS, transformer SVS, and XiaoiceSing. The design of Muskits follows the style of widely-used speech processing toolkits, ESPnet and Kaldi, for data prepossessing, training, and recipe pipelines. To the best of our knowledge, this toolkit is the first platform that allows a fair and highly-reproducible comparison between several published works in SVS. In addition, we also demonstrate several advanced usages based on the toolkit functionalities, including multilingual training and transfer learning. This paper describes the major framework of Muskits, its functionalities, and experimental results in single-singer, multi-singer, multilingual, and transfer learning scenarios. The toolkit is publicly available at https://github.com/SJTMusicTeam/Muskits.

preprint2022arXiv

Temporal coherence of optical fields in the presence of entanglement

In classical coherence theory, coherence time is typically related to the bandwidth of the optical field. Narrowing the bandwidth will result in the lengthening of the coherence time. This will erase temporal distinguishability of photons due to time delay in pulsed photon interference. However, this is changed in an SU(1,1)-type quantum interferometer where quantum entanglement is involved. In this paper, we investigate how the temporal coherence of the fields in a pulse-pumped SU(1,1) interferometer changes with the bandwidth of optical filtering. We find that, because of the quantum entanglement, the coherence of the fields does not improve when optical filtering is applied, in contrary to the classical coherence theory, and quantum entanglement plays a crucial role in quantum interference in addition to distinguishability.

preprint2021arXiv

Propagation of temporal mode multiplexed optical fields in fibers: influence of dispersion

Exploiting two interfering fields which are initially in the same temporal mode but with the spectra altered by propagating through different fibers, we characterize how the spectra of temporal modes changes with the fiber induced dispersion by measuring the fourth-order interference when the order number and bandwidth of temporal modes are varied. The experiment is done by launching a pulsed field in different temporal modes into an unbalanced Mach-Zehnder interferometer, in which the fiber lengths in two arms are different. The results show that the mode mismatch of two interfering fields, reflected by the visibility and pattern of interference, is not only dependent upon the amount of unbalanced dispersion but also related to the order number of temporal mode. In particular, the two interfering fields may become orthogonal under a modest amount of unbalanced dispersion when the mode number of the fields is $k\geq2$. Moreover, we discuss how to recover the spectrally distorted temporal mode by measuring and compensating the transmission induced dispersion. Our investigation paves the way for further investigating the distribution of temporally multiplexed quantum states in fiber network.

preprint2021arXiv

Sequence-to-sequence Singing Voice Synthesis with Perceptual Entropy Loss

The neural network (NN) based singing voice synthesis (SVS) systems require sufficient data to train well and are prone to over-fitting due to data scarcity. However, we often encounter data limitation problem in building SVS systems because of high data acquisition and annotation costs. In this work, we propose a Perceptual Entropy (PE) loss derived from a psycho-acoustic hearing model to regularize the network. With a one-hour open-source singing voice database, we explore the impact of the PE loss on various mainstream sequence-to-sequence models, including the RNN-based, transformer-based, and conformer-based models. Our experiments show that the PE loss can mitigate the over-fitting problem and significantly improve the synthesized singing quality reflected in objective and subjective evaluations.

preprint2020arXiv

Context-aware Goodness of Pronunciation for Computer-Assisted Pronunciation Training

Mispronunciation detection is an essential component of the Computer-Assisted Pronunciation Training (CAPT) systems. State-of-the-art mispronunciation detection models use Deep Neural Networks (DNN) for acoustic modeling, and a Goodness of Pronunciation (GOP) based algorithm for pronunciation scoring. However, GOP based scoring models have two major limitations: i.e., (i) They depend on forced alignment which splits the speech into phonetic segments and independently use them for scoring, which neglects the transitions between phonemes within the segment; (ii) They only focus on phonetic segments, which fails to consider the context effects across phonemes (such as liaison, omission, incomplete plosive sound, etc.). In this work, we propose the Context-aware Goodness of Pronunciation (CaGOP) scoring model. Particularly, two factors namely the transition factor and the duration factor are injected into CaGOP scoring. The transition factor identifies the transitions between phonemes and applies them to weight the frame-wise GOP. Moreover, a self-attention based phonetic duration modeling is proposed to introduce the duration factor into the scoring model. The proposed scoring model significantly outperforms baselines, achieving 20% and 12% relative improvement over the GOP model on the phoneme-level and sentence-level mispronunciation detection respectively.

preprint2020arXiv

Direct temporal mode measurement for the characterization of temporally multiplexed high dimensional quantum entanglement in continuous variables

Field-orthogonal temporal mode analysis of optical fields is recently developed for a new framework of quantum information science. But so far, the exact profiles of the temporal modes are not known, which makes it difficult to achieve mode selection and de-multiplexing. Here, we report a novel method that measures directly the exact form of the temporal modes. This in turn enables us to make mode-orthogonal homodyne detection with mode-matched local oscillators. We apply the method to a pulse-pumped, specially engineered fiber parametric amplifier and demonstrate temporally multiplexed multi-dimensional quantum entanglement of continuous variables in telecom wavelength. The temporal mode characterization technique can be generalized to other pulse-excited systems to find their eigen modes for multiplexing in temporal domain.

preprint2019arXiv

Measuring the continuous variable quantum entanglement with a parametric amplifier assisted homodyne detection

Traditional method for measuring continuous-variable quantum entanglement relies on balanced homodyne detections, which are sensitive to vacuum quantum noise coupled in through losses resulted from many factors such as detector's quantum efficiency and mode mismatching between detected field and local oscillator. In this paper, we propose and analyze a new measurement method, which is realized by assisting the balanced homodyne detections with a high gain phase sensitive parametric amplifier. The employment of the high gain parametric amplifier helps to tackle the vacuum quantum noise originated from detection losses. Moreover, because the high gain parametric amplifier can couple two fields of different types in a phase sensitive manner, the proposed scheme can be used to reveal quantum entanglement between two fields of different types by using only one balanced homodyne detection. Furthermore, detailed analysis shows that in the multi-mode case, the proposed scheme is also advantageous over the traditional method. Such a new measurement method should find wide applications in quantum information and quantum metrology involving measurement of continuous variables.