Source author record

Yuxuan Zhao

Yuxuan Zhao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Machine Learning Cryptography and Security eess.SY math.NA math.ST Systems and Control

Catalog footprint

What is connected

7works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Posterior Concentration of Bayesian Physics-Informed Neural Networks for Elliptic PDEs

We study the posterior contraction rate of Bayesian Physics-Informed Neural Networks (PINNs) for solving a general class of elliptic partial differential equations (PDEs). We focus on learning of the elliptic equation with a non-homogeneous Dirichlet boundary condition from independent and noisy measurements collected both inside the domain and on the boundary. Assuming that the PDE admits a strong solution in a Hölder space and using with a suitably constructed prior on the neural network weights, we prove that the posterior distribution concentrates around the exact solution at a near-minimax rate. Furthermore, the chosen prior is rate-adaptive: the posterior contracts at an (almost) optimal rate without prior knowledge of the smoothness level of the exact solution. Our results provide statistical guarantees for uncertainty quantification of PDEs via Bayesian PINNs.

preprint2022arXiv

gcimpute: A Package for Missing Data Imputation

This article introduces the Python package gcimpute for missing data imputation. gcimpute can impute missing data with many different variable types, including continuous, binary, ordinal, count, and truncated values, by modeling data as samples from a Gaussian copula model. This semiparametric model learns the marginal distribution of each variable to match the empirical distribution, yet describes the interactions between variables with a joint Gaussian that enables fast inference, imputation with confidence intervals, and multiple imputation. The package also provides specialized extensions to handle large datasets (with complexity linear in the number of observations) and streaming datasets (with online imputation). This article describes the underlying methodology and demonstrates how to use the software package.

preprint2021arXiv

Group Linear non-Gaussian Component Analysis with Applications to Neuroimaging

Independent component analysis (ICA) is an unsupervised learning method popular in functional magnetic resonance imaging (fMRI). Group ICA has been used to search for biomarkers in neurological disorders including autism spectrum disorder and dementia. However, current methods use a principal component analysis (PCA) step that may remove low-variance features. Linear non-Gaussian component analysis (LNGCA) enables simultaneous dimension reduction and feature estimation including low-variance features in single-subject fMRI. We present a group LNGCA model to extract group components shared by more than one subject and subject-specific components. To determine the total number of components in each subject, we propose a parametric resampling test that samples spatially correlated Gaussian noise to match the spatial dependence observed in data. In simulations, our estimated group components achieve higher accuracy compared to group ICA. We apply our method to a resting-state fMRI study on autism spectrum disorder in 342 children (252 typically developing, 90 with autism), where the group signals include resting-state networks. We find examples of group components that appear to exhibit different levels of temporal engagement in autism versus typically developing children, as revealed using group LNGCA. This novel approach to matrix decomposition is a promising direction for feature detection in neuroimaging.

preprint2021arXiv

Matrix Completion with Quantified Uncertainty through Low Rank Gaussian Copula

Modern large scale datasets are often plagued with missing entries. For tabular data with missing values, a flurry of imputation algorithms solve for a complete matrix which minimizes some penalized reconstruction error. However, almost none of them can estimate the uncertainty of its imputations. This paper proposes a probabilistic and scalable framework for missing value imputation with quantified uncertainty. Our model, the Low Rank Gaussian Copula, augments a standard probabilistic model, Probabilistic Principal Component Analysis, with marginal transformations for each column that allow the model to better match the distribution of the data. It naturally handles Boolean, ordinal, and real-valued observations and quantifies the uncertainty in each imputation. The time required to fit the model scales linearly with the number of rows and the number of columns in the dataset. Empirical results show the method yields state-of-the-art imputation accuracy across a wide range of data types, including those with high rank. Our uncertainty measure predicts imputation error well: entries with lower uncertainty do have lower imputation error (on average). Moreover, for real-valued data, the resulting confidence intervals are well-calibrated.

preprint2021arXiv

Technoeconomic Supplement of P2G Clusters with Hydrogen Pipeline for Coordinated Renewable Energy and HVDC Systems

Under the downward tendency of prices of renewable energy generators and upward trend of hydrogen demand, this paper studies the technoeconomic supplement of P2G clusters with hydrogen pipeline for HVDC to jointly consume renewable energy. First, the planning and operation constraints of large-capacity P2G clusters is established. On this basis, the multistage coordinated planning model of renewable energy, HVDCs, P2Gs and hydrogen pipelines is proposed considering both variability and uncertainty, rendering a distributionally robust chance-constrained (DRCC) program. Then this model is applied in the case study based on the real Inner Mongolia-Shandong system. Compared with energy transmission via HVDC only, P2G can provide operation supplement with its operational flexibility and long term economic supplement with increasing demand in high-valued transportation sector, which stimulates an extra 24 GW renewable energy exploration. Sensitivity analysis for both technical and economic factors further verifies the advantages of P2G in the presence of high variability due to renewable energy and downward tendency of prices of renewable energy generators. However, since the additional levelized cost of the P2G (0.04 RMB/kWh) is approximately twice the HVDC (0.02 RMB/kWh), P2G is more sensitive to uncertainty from both renewable energy and hydrogen demand.

preprint2020arXiv

Missing Value Imputation for Mixed Data via Gaussian Copula

Missing data imputation forms the first critical step of many data analysis pipelines. The challenge is greatest for mixed data sets, including real, Boolean, and ordinal data, where standard techniques for imputation fail basic sanity checks: for example, the imputed values may not follow the same distributions as the data. This paper proposes a new semiparametric algorithm to impute missing values, with no tuning parameters. The algorithm models mixed data as a Gaussian copula. This model can fit arbitrary marginals for continuous variables and can handle ordinal variables with many levels, including Boolean variables as a special case. We develop an efficient approximate EM algorithm to estimate copula parameters from incomplete mixed data. The resulting model reveals the statistical associations among variables. Experimental results on several synthetic and real datasets show superiority of our proposed algorithm to state-of-the-art imputation algorithms for mixed data.

preprint2020arXiv

Weak Links in Authentication Chains: A Large-scale Analysis of Email Sender Spoofing Attacks

As a fundamental communicative service, email is playing an important role in both individual and corporate communications, which also makes it one of the most frequently attack vectors. An email's authenticity is based on an authentication chain involving multiple protocols, roles and services, the inconsistency among which creates security threats. Thus, it depends on the weakest link of the chain, as any failed part can break the whole chain-based defense. This paper systematically analyzes the transmission of an email and identifies a series of new attacks capable of bypassing SPF, DKIM, DMARC and user-interface protections. In particular, by conducting a "cocktail" joint attack, more realistic emails can be forged to penetrate the celebrated email services, such as Gmail and Outlook. We conduct a large-scale experiment on 30 popular email services and 23 email clients, and find that all of them are vulnerable to certain types of new attacks. We have duly reported the identified vulnerabilities to the related email service providers, and received positive responses from 11 of them, including Gmail, Yahoo, iCloud and Alibaba. Furthermore, we propose key mitigating measures to defend against the new attacks. Therefore, this work is of great value for identifying email spoofing attacks and improving the email ecosystem's overall security.

Yuxuan Zhao

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Posterior Concentration of Bayesian Physics-Informed Neural Networks for Elliptic PDEs

gcimpute: A Package for Missing Data Imputation

Group Linear non-Gaussian Component Analysis with Applications to Neuroimaging

Matrix Completion with Quantified Uncertainty through Low Rank Gaussian Copula

Technoeconomic Supplement of P2G Clusters with Hydrogen Pipeline for Coordinated Renewable Energy and HVDC Systems

Missing Value Imputation for Mixed Data via Gaussian Copula

Weak Links in Authentication Chains: A Large-scale Analysis of Email Sender Spoofing Attacks