Researcher profile

Xiaohui Zhang

Xiaohui Zhang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
15works
0followers
19topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

15 published item(s)

preprint2026arXiv

Adaptive Robust Control for Uncertain Systems with Ellipsoid-Set Learning

Despite the celebrated success of stochastic control approaches for uncertain systems, such approaches are limited in the ability to handle non-Gaussian uncertainties. This work presents an adaptive robust control for linear uncertain systems, whose process noise, observation noise, and system states are depicted by ellipsoid sets rather than Gaussian distributions. We design an ellipsoid-set learning method to estimate the boundaries of state sets, and incorporate the learned sets into the control law derivation to reduce conservativeness in robust control. Further, we consider the parametric uncertainties in state-space matrices. Particularly, we assign finite candidates for the uncertain parameters, and construct a bank of candidate-conditional robust control problems for each candidate. We derive the final control law by aggregating the candidate-conditional control laws. In this way, we separate the control scheme into parallel robust controls, decoupling the learning and control, which otherwise renders the control unattainable. We demonstrate the effectiveness of the proposed control in numerical simulations in the cases of linear quadratic regulation and tracking control.

preprint2026arXiv

WavFlow: Audio Generation in Waveform Space

Modern audio generation predominantly relies on latent-space compression, introducing additional complexity and potential information loss. In this work, we challenge this paradigm with WavFlow, a framework that generates high-fidelity audio directly in raw waveform space without intermediate representations. To overcome the inherent difficulties of modeling high-dimensional and low-energy signals, we reshape audio into 2D token grids through waveform patchify and introduce amplitude lifting to align signal scales, enabling stable optimization via direct x-prediction in flow matching. To capture complex semantic alignment and temporal synchronization, we leverage an automated data pipeline to curate 5 million high-quality video-text-audio triplets, allowing the model to learn fine-grained acoustic patterns from scratch. Experimental results show that WavFlow achieves competitive performance on the video-to-audio benchmark VGGSound (FD_PaSST: 59.98, IS_PANNs: 17.40, DeSync: 0.44) and the text-to-audio benchmark AudioCaps (FD_PANNs: 10.63, IS_PANNs: 12.62), matching or exceeding the performance of established latent-based methods. Our work demonstrates that intermediate compression is not a prerequisite for high-quality synthesis, offering a simpler and more scalable alternative for multimodal audio generation.

preprint2022arXiv

A Consensus Algorithm Based on Risk Assessment Model for Permissioned Blockchain

Blockchain technology enables stakeholders to conduct trusted data sharing and exchange without a trusted centralized institution. These features make blockchain applications attractive to enhance trustworthiness in very different contexts. Due to unique design concepts and outstanding performance, blockchain has become a popular research topic in industry and academia in recent years. Every participant is anonymous in a permissionless blockchain represented by cryptocurrency applications such as Bitcoin. In this situation, some special incentive mechanisms are applied to permissionless blockchain, such as mined native cryptocurrency to solve the trust issues of permissionless blockchain. In many use cases, permissionless blockchain has bottlenecks in transaction throughput performance, which restricts further application in the real world. A permissioned blockchain can reach a consensus among a group of entities that do not establish an entire trust relationship. Unlike permissionless blockchains, the participants must be identified in permissioned blockchains. By relying on the traditional crash fault-tolerant consensus protocols, permissioned blockchains can achieve high transaction throughput and low latency without sacrificing security. However, how to balance the security and consensus efficiency is still the issue that needs to be solved urgently in permissioned blockchains. As the core module of blockchain technology, the consensus algorithm plays a vital role in the performance of the blockchain system. Thus, this paper proposes a new consensus algorithm for permissioned blockchain, the Risk Assessment-based Consensus protocol (RAC), combined with the decentralized design concept and the risk-node assessment mechanism to address the unbalance issues of performance in speed, scalability, and security.

preprint2022arXiv

A novel adversarial learning strategy for medical image classification

Deep learning (DL) techniques have been extensively utilized for medical image classification. Most DL-based classification networks are generally structured hierarchically and optimized through the minimization of a single loss function measured at the end of the networks. However, such a single loss design could potentially lead to optimization of one specific value of interest but fail to leverage informative features from intermediate layers that might benefit classification performance and reduce the risk of overfitting. Recently, auxiliary convolutional neural networks (AuxCNNs) have been employed on top of traditional classification networks to facilitate the training of intermediate layers to improve classification performance and robustness. In this study, we proposed an adversarial learning-based AuxCNN to support the training of deep neural networks for medical image classification. Two main innovations were adopted in our AuxCNN classification framework. First, the proposed AuxCNN architecture includes an image generator and an image discriminator for extracting more informative image features for medical image classification, motivated by the concept of generative adversarial network (GAN) and its impressive ability in approximating target data distribution. Second, a hybrid loss function is designed to guide the model training by incorporating different objectives of the classification network and AuxCNN to reduce overfitting. Comprehensive experimental studies demonstrated the superior classification performance of the proposed model. The effect of the network-related factors on classification performance was investigated.

preprint2022arXiv

Omni-sparsity DNN: Fast Sparsity Optimization for On-Device Streaming E2E ASR via Supernet

From wearables to powerful smart devices, modern automatic speech recognition (ASR) models run on a variety of edge devices with different computational budgets. To navigate the Pareto front of model accuracy vs model size, researchers are trapped in a dilemma of optimizing model accuracy by training and fine-tuning models for each individual edge device while keeping the training GPU-hours tractable. In this paper, we propose Omni-sparsity DNN, where a single neural network can be pruned to generate optimized model for a large range of model sizes. We develop training strategies for Omni-sparsity DNN that allows it to find models along the Pareto front of word-error-rate (WER) vs model size while keeping the training GPU-hours to no more than that of training one singular model. We demonstrate the Omni-sparsity DNN with streaming E2E ASR models. Our results show great saving on training time and resources with similar or better accuracy on LibriSpeech compared to individually pruned sparse models: 2%-6.6% better WER on Test-other.

preprint2022arXiv

Parameterized Colorings And Labellings Of Graphs In Topological Coding

The coming quantum computation is forcing us to reexamine the cryptosystems people use. We are applying graph colorings of topological coding to modern information security and future cryptography against supercomputer and quantum computer attacks in the near future. Many of techniques introduced here are associated with many mathematical conjecture and NP-problems. We will introduce a group of W-constraint (k,d)-total colorings and algorithms for realizing these colorings in some kinds of graphs, which are used to make quickly public-keys and private-keys with anti-quantum computing, these (k,d)-total colorings are: graceful (k,d)-total colorings, harmonious (k,d)-total colorings, (k,d)-edge-magic total colorings, (k,d)-graceful-difference total colorings and (k,d)-felicitous-difference total colorings. One of useful tools we used is called Topcode-matrix with elements can be all sorts of things, for example, sets, graphs, number-based strings. Most of parameterized graphic colorings/labelings are defined by Topcode-matrix algebra here. From the application point of view, many of our coloring techniques are given by algorithms and easily converted into programs.

preprint2020arXiv

Deja-vu: Double Feature Presentation and Iterated Loss in Deep Transformer Networks

Deep acoustic models typically receive features in the first layer of the network, and process increasingly abstract representations in the subsequent layers. Here, we propose to feed the input features at multiple depths in the acoustic model. As our motivation is to allow acoustic models to re-examine their input features in light of partial hypotheses we introduce intermediate model heads and loss function. We study this architecture in the context of deep Transformer networks, and we use an attention mechanism over both the previous layer activations and the input features. To train this model's intermediate output hypothesis, we apply the objective function at each layer right before feature re-use. We find that the use of such iterated loss significantly improves performance by itself, as well as enabling input feature re-use. We present results on both Librispeech, and a large scale video dataset, with relative improvements of 10 - 20% for Librispeech and 3.2 - 13% for videos.

preprint2020arXiv

Faster, Simpler and More Accurate Hybrid ASR Systems Using Wordpieces

In this work, we first show that on the widely used LibriSpeech benchmark, our transformer-based context-dependent connectionist temporal classification (CTC) system produces state-of-the-art results. We then show that using wordpieces as modeling units combined with CTC training, we can greatly simplify the engineering pipeline compared to conventional frame-based cross-entropy training by excluding all the GMM bootstrapping, decision tree building and force alignment steps, while still achieving very competitive word-error-rate. Additionally, using wordpieces as modeling units can significantly improve runtime efficiency since we can use larger stride without losing accuracy. We further confirm these findings on two internal VideoASR datasets: German, which is similar to English as a fusional language, and Turkish, which is an agglutinative language.

preprint2020arXiv

Multilingual Graphemic Hybrid ASR with Massive Data Augmentation

Towards developing high-performing ASR for low-resource languages, approaches to address the lack of resources are to make use of data from multiple languages, and to augment the training data by creating acoustic variations. In this work we present a single grapheme-based ASR model learned on 7 geographically proximal languages, using standard hybrid BLSTM-HMM acoustic models with lattice-free MMI objective. We build the single ASR grapheme set via taking the union over each language-specific grapheme set, and we find such multilingual graphemic hybrid ASR model can perform language-independent recognition on all 7 languages, and substantially outperform each monolingual ASR model. Secondly, we evaluate the efficacy of multiple data augmentation alternatives within language, as well as their complementarity with multilingual modeling. Overall, we show that the proposed multilingual graphemic hybrid ASR with various data augmentation can not only recognize any within training set languages, but also provide large ASR performance improvements.

preprint2020arXiv

On cyclic quadrilaterals in euclidean and hyperbolic geometries

Four points ordered in the positive order on the unit circle determine the vertices of a quadrilateral, which is considered either as a euclidean or as a hyperbolic quadrilateral depending on whether the lines connecting the vertices are euclidean or hyperbolic lines. In the case of hyperbolic lines, this type of quadrilaterals are called ideal quadrilaterals. Our main result gives a euclidean counterpart of an earlier result on the hyperbolic distances between the opposite sides of ideal quadrilaterals. The proof is based on computations involving hyperbolic geometry. We also found a new formula for the hyperbolic midpoint of a hyperbolic geodesic segment in the unit disk. As an application of some geometric properties, we provided a euclidean construction of the symmetrization of random four points on the unit circle with respect to a diameter which preserves the absolute cross ratio of quadruples.

preprint2020arXiv

On split regular Hom-Leibniz-Rinehart algebras

In this paper, we introduce the notion of the Hom-Leibniz-Rinehart algebra as an algebraic analogue of Hom-Leibniz algebroid, and prove that such an arbitrary split regular Hom-Leibniz-Rinehart algebra $L$ is of the form $L=U+\sum_γI_γ$ with $U$ a subspace of a maximal abelian subalgebra $H$ and any $I_γ$, a well described ideal of $L$, satisfying $[I_γ, I_δ]= 0$ if $[γ]\neq [δ]$. In the sequel, we develop techniques of connections of roots and weights for split Hom-Leibniz-Rinehart algebras respectively. Finally, we study the structures of tight split regular Hom-Leibniz-Rinehart algebras.

preprint2020arXiv

The Hom-Long dimodule category and nonlinear equations

In this paper, we construct a kind of new braided monoidal category over two Hom-Hopf algerbas $(H,α)$ and $(B,β)$ and associate it with two nonlinear equations. We first introduce the notion of an $(H,B)$-Hom-Long dimodule and show that the Hom-Long dimodule category $^{B}_{H} \Bbb L$ is an autonomous category. Second, we prove that the category $^{B}_{H} \Bbb L$ is a braided monoidal category if $(H,α)$ is quasitriangular and $(B,β)$ is coquasitriangular and get a solution of the quantum Yang-Baxter equation. Also, we show that the category $^{B}_{H} \Bbb L$ can be viewed as a subcategory of the Hom-Yetter-Drinfeld category $^{HøB}_{HøB} \Bbb {HYD}$. Finally, we obtain a solution of the Hom-Long equation from the Hom-Long dimodules.

preprint2020arXiv

Transformer-based Acoustic Modeling for Hybrid Speech Recognition

We propose and evaluate transformer-based acoustic models (AMs) for hybrid speech recognition. Several modeling choices are discussed in this work, including various positional embedding methods and an iterated loss to enable training deep transformers. We also present a preliminary study of using limited right context in transformer models, which makes it possible for streaming applications. We demonstrate that on the widely used Librispeech benchmark, our transformer-based AM outperforms the best published hybrid result by 19% to 26% relative when the standard n-gram language model (LM) is used. Combined with neural network LM for rescoring, our proposed approach achieves state-of-the-art results on Librispeech. Our findings are also confirmed on a much larger internal dataset.