Source author record

Yuan Jiang

Yuan Jiang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.AS Machine Learning Sound Methodology Applications Computer Vision eess.SP math.ST Statistics Theory Artificial Intelligence Computation and Language Cryptography and Security quant-ph Social and Information Networks

Catalog footprint

What is connected

17works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

Physics-informed Machine Learning for Battery Pack Thermal Management

With the popularity of electric vehicles, the demand for lithium-ion batteries is increasing. Temperature significantly influences the performance and safety of batteries. Battery thermal management systems can effectively control the temperature of batteries; therefore, the performance and safety can be ensured. However, the development process of battery thermal management systems is time-consuming and costly due to the extensive training dataset needed by data-driven models requiring enormous computational costs for finite element analysis. Therefore, a new approach to constructing surrogate models is needed in the era of AI. Physics-informed machine learning enforces the physical laws in surrogate models, making it the perfect candidate for estimating battery pack temperature distribution. In this study, we first developed a 21700 battery pack indirect liquid cooling system with cold plates on the top and bottom with thermal paste surrounding the battery cells. Then, the simplified finite element model was built based on experiment results. Due to the high coolant flow rate, the cold plates can be considered as constant temperature boundaries, while battery cells are the heat sources. The physics-informed convolutional neural network served as a surrogate model to estimate the temperature distribution of the battery pack. The loss function was constructed considering the heat conduction equation based on the finite difference method. The physics-informed loss function helped the convergence of the training process with less data. As a result, the physics-informed convolutional neural network showed more than 15 percents improvement in accuracy compared to the data-driven method with the same training data.

preprint2022arXiv

Asymptotic Uncertainty of False Discovery Proportion

Multiple testing has been a popular topic in statistical research. Although vast works have been done, controlling the false discoveries remains a challenging task when the corresponding test statistics are dependent. Various methods have been proposed to estimate the false discovery proportion (FDP) under arbitrary dependence among the test statistics. One of the main ideas is to reduce arbitrary dependence to weak dependence and then to establish theoretically the strong consistency of the FDP and false discovery rate (FDR) under weak dependence. As a consequence, FDPs share the same asymptotic limit in the framework of weak dependence. We observe that the asymptotic variance of the FDP, however, may rely heavily on the dependence structure of the corresponding test statistics even when they are only weakly dependent; and it is of great practical value to quantify this variability, as it can serve as an indicator of the quality of the FDP estimate from the given data. As far as we are aware, the research on this respect is still limited in the literature. In this paper, we first derive the asymptotic expansion of FDP under mild regularity conditions and then examine how the asymptotic variance of FDP varies under different dependence structures both theoretically and numerically. With the observations in this study, we recommend that in a multiple testing performed by an FDP procedure, we may report both the mean and the variance estimates of FDP to enrich the study outcome.

preprint2022arXiv

Asymptotic Uncertainty of False Discovery Proportion for Dependent $t$-Tests

Multiple testing is a fundamental problem in high-dimensional statistical inference. Although many methods have been proposed to control false discoveries, it is still a challenging task when the tests are correlated to each other. To overcome this challenge, various methods have been proposed to estimate the false discovery rate (FDR) and/or the false discovery proportion (FDP) under arbitrary covariance among the test statistics. An interesting finding of these works is that the estimation of FDP and FDR under weak dependence is identical to that under independence. However, Mei et al. (2021) pointed out that unlike FDR, the asymptotic variance of FDP can still differ drastically from that under independence, and the difference depends on the covariance structure among the test statistics. In this paper, we further extend this result from $z$-tests to $t$-tests when the marginal variances are unknown and need to be estimated. With weakly dependent $t$-tests, we show that FDP still converges to a fixed quantity unrelated to the dependence structure, and further derive the asymptotic expansion and uncertainty of FDP leading to similar results as in Mei et al. (2021). In addition, we develop an approximation method to efficiently evaluate the asymptotic variance of FDP for dependent $t$-tests. We examine how the asymptotic variance of FDP varies as well as the performance of its estimators under different dependence structures through simulations and a real-data study.

preprint2022arXiv

Compositional Graphical Lasso Resolves the Impact of Parasitic Infection on Gut Microbial Interaction Networks in a Zebrafish Model

Understanding how microbes interact with each other is key to revealing the underlying role that microorganisms play in the host or environment and to identifying microorganisms as an agent that can potentially alter the host or environment. For example, understanding how the microbial interactions associate with parasitic infection can help resolve potential drug or diagnostic test for parasitic infection. To unravel the microbial interactions, existing tools often rely on graphical models to infer the conditional dependence of microbial abundances to represent their interactions. However, current methods do not simultaneously account for the discreteness, compositionality, and heterogeneity inherent to microbiome data. Thus, we build a new approach called "compositional graphical lasso" upon existing tools by incorporating the above characteristics into the graphical model explicitly. We illustrate the advantage of compositional graphical lasso over current methods under a variety of simulation scenarios and on a benchmark study, the Tara Oceans Project. Moreover, we present our results from the analysis of a dataset from the Zebrafish Parasite Infection Study. Our approach identifies changes in interaction degree between infected and uninfected individuals for three taxa, Photobacterium, Gemmobacter, and Paucibacter, which are inversely predicted by other methods. Further investigation of these method-specific taxa interaction changes reveals their biological plausibility. In particular, we speculate on the potential pathobiotic roles of Photobacterium and Gemmobacter in the zebrafish gut, and the potential probiotic role of Paucibacter. Collectively, our analyses demonstrate that compositional graphical lasso provides a powerful means of accurately resolving interactions between microbiota and can thus drive novel biological discovery.

preprint2022arXiv

Learning to Solve Routing Problems via Distributionally Robust Optimization

Recent deep models for solving routing problems always assume a single distribution of nodes for training, which severely impairs their cross-distribution generalization ability. In this paper, we exploit group distributionally robust optimization (group DRO) to tackle this issue, where we jointly optimize the weights for different groups of distributions and the parameters for the deep model in an interleaved manner during training. We also design a module based on convolutional neural network, which allows the deep model to learn more informative latent pattern among the nodes. We evaluate the proposed approach on two types of well-known deep models including GCN and POMO. The experimental results on the randomly synthesized instances and the ones from two benchmark dataset (i.e., TSPLib and CVRPLib) demonstrate that our approach could significantly improve the cross-distribution generalization performance over the original models.

preprint2022arXiv

Neural Grapheme-to-Phoneme Conversion with Pre-trained Grapheme Models

Neural network models have achieved state-of-the-art performance on grapheme-to-phoneme (G2P) conversion. However, their performance relies on large-scale pronunciation dictionaries, which may not be available for a lot of languages. Inspired by the success of the pre-trained language model BERT, this paper proposes a pre-trained grapheme model called grapheme BERT (GBERT), which is built by self-supervised training on a large, language-specific word list with only grapheme information. Furthermore, two approaches are developed to incorporate GBERT into the state-of-the-art Transformer-based G2P model, i.e., fine-tuning GBERT or fusing GBERT into the Transformer model by attention. Experimental results on the Dutch, Serbo-Croatian, Bulgarian and Korean datasets of the SIGMORPHON 2021 G2P task confirm the effectiveness of our GBERT-based G2P models under both medium-resource and low-resource data conditions.

preprint2022arXiv

Portable ground stations for space-to-ground quantum key distribution

Quantum key distribution (QKD) uses the fundamental principles of quantum mechanics to share unconditionally secure keys between distant users. Previous works based on the quantum science satellite "Micius" have initially demonstrated the feasibility of a global QKD network. However, the practical applications of space-based QKD still face many technical problems, such as the huge size and weight of ground stations required to receive quantum signals. Here, we report space-to-ground QKD demonstrations based on portable receiving ground stations. The weight of the portable ground station is less than 100 kg, the space required is less than 1 m$^{3}$ and the installation time requires no more than 12 hours, all of the weight, required space and deployment time are about two orders of magnitude lower than those for the previous systems. Moreover, the equipment is easy to handle and can be placed on the roof of buildings in a metropolis. Secure keys have been successfully generated from the "Micius" satellite to these portable ground stations at six different places in China, and an average final secure key length is around 50 kb can be obtained during one passage. Our results pave the way for, and greatly accelerate the practical application of, space-based QKD.

preprint2022arXiv

Reference-Invariant Inverse Covariance Estimation with Application to Microbial Network Recovery

The interactions between microbial taxa in microbiome data has been under great research interest in the science community. In particular, several methods such as SPIEC-EASI, gCoda, and CD-trace have been proposed to model the conditional dependency between microbial taxa, in order to eliminate the detection of spurious correlations. However, all those methods are built upon the central log-ratio (CLR) transformation, which results in a degenerate covariance matrix and thus an undefined inverse covariance matrix as the estimation of the underlying network. Jiang et al. (2021) and Tian et al. (2022) proposed bias-corrected graphical lasso and compositional graphical lasso based on the additive log-ratio (ALR) transformation, which first selects a reference taxon and then computes the log ratios of the abundances of all the other taxa with respect to that of the reference. One concern of the ALR transformation would be the invariance of the estimated network with respect to the choice of reference. In this paper, we first establish the reference-invariance property of a subnetwork of interest based on the ALR transformed data. Then, we propose a reference-invariant version of the compositional graphical lasso by modifying the penalty in its objective function, penalizing only the invariant subnetwork. We validate the reference-invariance property of the proposed method under a variety of simulation scenarios as well as through the application to an oceanic microbiome data set.

preprint2022arXiv

Stability Approach to Regularization Selection for Reduced-Rank Regression

The reduced-rank regression model is a popular model to deal with multivariate response and multiple predictors, and is widely used in biology, chemometrics, econometrics, engineering, and other fields. In the reduced-rank regression modelling, a central objective is to estimate the rank of the coefficient matrix that represents the number of effective latent factors in predicting the multivariate response. Although theoretical results such as rank estimation consistency have been established for various methods, in practice rank determination still relies on information criterion based methods such as AIC and BIC or subsampling based methods such as cross validation. Unfortunately, the theoretical properties of these practical methods are largely unknown. In this paper, we present a novel method called StARS-RRR that selects the tuning parameter and then estimates the rank of the coefficient matrix for reduced-rank regression based on the stability approach. We prove that StARS-RRR achieves rank estimation consistency, i.e., the rank estimated with the tuning parameter selected by StARS-RRR is consistent to the true rank. Through a simulation study, we show that StARS-RRR outperforms other tuning parameter selection methods including AIC, BIC, and cross validation as it provides the most accurate estimated rank. In addition, when applied to a breast cancer dataset, StARS-RRR discovers a reasonable number of genetic pathways that affect the DNA copy number variations and results in a smaller prediction error than the other methods with a random-splitting process.

preprint2020arXiv

ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech

Automatic speaker verification (ASV) is one of the most natural and convenient means of biometric person recognition. Unfortunately, just like all other biometric systems, ASV is vulnerable to spoofing, also referred to as "presentation attacks." These vulnerabilities are generally unacceptable and call for spoofing countermeasures or "presentation attack detection" systems. In addition to impersonation, ASV systems are vulnerable to replay, speech synthesis, and voice conversion attacks. The ASVspoof 2019 edition is the first to consider all three spoofing attack types within a single challenge. While they originate from the same source database and same underlying protocol, they are explored in two specific use case scenarios. Spoofing attacks within a logical access (LA) scenario are generated with the latest speech synthesis and voice conversion technologies, including state-of-the-art neural acoustic and waveform model techniques. Replay spoofing attacks within a physical access (PA) scenario are generated through carefully controlled simulations that support much more revealing analysis than possible previously. Also new to the 2019 edition is the use of the tandem detection cost function metric, which reflects the impact of spoofing and countermeasures on the reliability of a fixed ASV system. This paper describes the database design, protocol, spoofing attack implementations, and baseline ASV and countermeasure results. It also describes a human assessment on spoofed data in logical access. It was demonstrated that the spoofing data in the ASVspoof 2019 database have varied degrees of perceived quality and similarity to the target speakers, including spoofed data that cannot be differentiated from bona-fide utterances even by human subjects.

preprint2020arXiv

Sequence-to-Sequence Acoustic Modeling for Voice Conversion

In this paper, a neural network named Sequence-to-sequence ConvErsion NeTwork (SCENT) is presented for acoustic modeling in voice conversion. At training stage, a SCENT model is estimated by aligning the feature sequences of source and target speakers implicitly using attention mechanism. At conversion stage, acoustic features and durations of source utterances are converted simultaneously using the unified acoustic model. Mel-scale spectrograms are adopted as acoustic features which contain both excitation and vocal tract descriptions of speech signals. The bottleneck features extracted from source speech using an automatic speech recognition (ASR) model are appended as auxiliary input. A WaveNet vocoder conditioned on Mel-spectrograms is built to reconstruct waveforms from the outputs of the SCENT model. It is worth noting that our proposed method can achieve appropriate duration conversion which is difficult in conventional methods. Experimental results show that our proposed method obtained better objective and subjective performance than the baseline methods using Gaussian mixture models (GMM) and deep neural networks (DNN) as acoustic models. This proposed method also outperformed our previous work which achieved the top rank in Voice Conversion Challenge 2018. Ablation tests further confirmed the effectiveness of several components in our proposed method.

preprint2020arXiv

Soft Gradient Boosting Machine

Gradient Boosting Machine has proven to be one successful function approximator and has been widely used in a variety of areas. However, since the training procedure of each base learner has to take the sequential order, it is infeasible to parallelize the training process among base learners for speed-up. In addition, under online or incremental learning settings, GBMs achieved sub-optimal performance due to the fact that the previously trained base learners can not adapt with the environment once trained. In this work, we propose the soft Gradient Boosting Machine (sGBM) by wiring multiple differentiable base learners together, by injecting both local and global objectives inspired from gradient boosting, all base learners can then be jointly optimized with linear speed-up. When using differentiable soft decision trees as base learner, such device can be regarded as an alternative version of the (hard) gradient boosting decision trees with extra benefits. Experimental results showed that, sGBM enjoys much higher time efficiency with better accuracy, given the same base learner in both on-line and off-line settings.

preprint2020arXiv

Voice Conversion by Cascading Automatic Speech Recognition and Text-to-Speech Synthesis with Prosody Transfer

With the development of automatic speech recognition (ASR) and text-to-speech synthesis (TTS) technique, it's intuitive to construct a voice conversion system by cascading an ASR and TTS system. In this paper, we present a ASR-TTS method for voice conversion, which used iFLYTEK ASR engine to transcribe the source speech into text and a Transformer TTS model with WaveNet vocoder to synthesize the converted speech from the decoded text. For the TTS model, we proposed to use a prosody code to describe the prosody information other than text and speaker information contained in speech. A prosody encoder is used to extract the prosody code. During conversion, the source prosody is transferred to converted speech by conditioning the Transformer TTS model with its code. Experiments were conducted to demonstrate the effectiveness of our proposed method. Our system also obtained the best naturalness and similarity in the mono-lingual task of Voice Conversion Challenge 2020.

preprint2018arXiv

Improving Sequence-to-Sequence Acoustic Modeling by Adding Text-Supervision

This paper presents methods of making using of text supervision to improve the performance of sequence-to-sequence (seq2seq) voice conversion. Compared with conventional frame-to-frame voice conversion approaches, the seq2seq acoustic modeling method proposed in our previous work achieved higher naturalness and similarity. In this paper, we further improve its performance by utilizing the text transcriptions of parallel training data. First, a multi-task learning structure is designed which adds auxiliary classifiers to the middle layers of the seq2seq model and predicts linguistic labels as a secondary task. Second, a data-augmentation method is proposed which utilizes text alignment to produce extra parallel sequences for model training. Experiments are conducted to evaluate our proposed method with training sets at different sizes. Experimental results show that the multi-task learning with linguistic labels is effective at reducing the errors of seq2seq voice conversion. The data-augmentation method can further improve the performance of seq2seq voice conversion when only 50 or 100 training utterances are available.

preprint2016arXiv

Object Skeleton Extraction in Natural Images by Fusing Scale-associated Deep Side Outputs

Object skeleton is a useful cue for object detection, complementary to the object contour, as it provides a structural representation to describe the relationship among object parts. While object skeleton extraction in natural images is a very challenging problem, as it requires the extractor to be able to capture both local and global image context to determine the intrinsic scale of each skeleton pixel. Existing methods rely on per-pixel based multi-scale feature computation, which results in difficult modeling and high time consumption. In this paper, we present a fully convolutional network with multiple scale-associated side outputs to address this problem. By observing the relationship between the receptive field sizes of the sequential stages in the network and the skeleton scales they can capture, we introduce a scale-associated side output to each stage. We impose supervision to different stages by guiding the scale-associated side outputs toward groundtruth skeletons of different scales. The responses of the multiple scale-associated side outputs are then fused in a scale-specific way to localize skeleton pixels with multiple scales effectively. Our method achieves promising results on two skeleton extraction datasets, and significantly outperforms other competitors.

preprint2016arXiv

Shape Recognition by Bag of Skeleton-associated Contour Parts

Contour and skeleton are two complementary representations for shape recognition. However combining them in a principal way is nontrivial, as they are generally abstracted by different structures (closed string vs graph), respectively. This paper aims at addressing the shape recognition problem by combining contour and skeleton according to the correspondence between them. The correspondence provides a straightforward way to associate skeletal information with a shape contour. More specifically, we propose a new shape descriptor. named Skeleton-associated Shape Context (SSC), which captures the features of a contour fragment associated with skeletal information. Benefited from the association, the proposed shape descriptor provides the complementary geometric information from both contour and skeleton parts, including the spatial distribution and the thickness change along the shape part. To form a meaningful shape feature vector for an overall shape, the Bag of Features framework is applied to the SSC descriptors extracted from it. Finally, the shape feature vector is fed into a linear SVM classifier to recognize the shape. The encouraging experimental results demonstrate that the proposed way to combine contour and skeleton is effective for shape recognition, which achieves the state-of-the-art performances on several standard shape benchmarks.

preprint2015arXiv

Multi-Label Active Learning from Crowds

Multi-label active learning is a hot topic in reducing the label cost by optimally choosing the most valuable instance to query its label from an oracle. In this paper, we consider the poolbased multi-label active learning under the crowdsourcing setting, where during the active query process, instead of resorting to a high cost oracle for the ground-truth, multiple low cost imperfect annotators with various expertise are available for labeling. To deal with this problem, we propose the MAC (Multi-label Active learning from Crowds) approach which incorporate the local influence of label correlations to build a probabilistic model over the multi-label classifier and annotators. Based on this model, we can estimate the labels for instances as well as the expertise of each annotator. Then we propose the instance selection and annotator selection criteria that consider the uncertainty/diversity of instances and the reliability of annotators, such that the most reliable annotator will be queried for the most valuable instances. Experimental results demonstrate the effectiveness of the proposed approach.

Yuan Jiang

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

Physics-informed Machine Learning for Battery Pack Thermal Management

Asymptotic Uncertainty of False Discovery Proportion

Asymptotic Uncertainty of False Discovery Proportion for Dependent $t$-Tests

Compositional Graphical Lasso Resolves the Impact of Parasitic Infection on Gut Microbial Interaction Networks in a Zebrafish Model

Learning to Solve Routing Problems via Distributionally Robust Optimization

Neural Grapheme-to-Phoneme Conversion with Pre-trained Grapheme Models

Portable ground stations for space-to-ground quantum key distribution

Reference-Invariant Inverse Covariance Estimation with Application to Microbial Network Recovery

Stability Approach to Regularization Selection for Reduced-Rank Regression

ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech

Sequence-to-Sequence Acoustic Modeling for Voice Conversion

Soft Gradient Boosting Machine

Voice Conversion by Cascading Automatic Speech Recognition and Text-to-Speech Synthesis with Prosody Transfer

Improving Sequence-to-Sequence Acoustic Modeling by Adding Text-Supervision

Object Skeleton Extraction in Natural Images by Fusing Scale-associated Deep Side Outputs

Shape Recognition by Bag of Skeleton-associated Contour Parts

Multi-Label Active Learning from Crowds