Source author record

Yonghui Xiao

Yonghui Xiao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Databases Cryptography and Security Machine Learning Artificial Intelligence cs.CY

Catalog footprint

What is connected

10works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Federated Pruning: Improving Neural Network Efficiency with Federated Learning

Automatic Speech Recognition models require large amount of speech data for training, and the collection of such data often leads to privacy concerns. Federated learning has been widely used and is considered to be an effective decentralized technique by collaboratively learning a shared prediction model while keeping the data local on different clients devices. However, the limited computation and communication resources on clients devices present practical difficulties for large models. To overcome such challenges, we propose Federated Pruning to train a reduced model under the federated setting, while maintaining similar performance compared to the full model. Moreover, the vast amount of clients data can also be leveraged to improve the pruning results compared to centralized training. We explore different pruning schemes and provide empirical evidence of the effectiveness of our methods.

preprint2022arXiv

Online Model Compression for Federated Learning with Large Models

This paper addresses the challenges of training large neural network models under federated learning settings: high on-device memory usage and communication cost. The proposed Online Model Compression (OMC) provides a framework that stores model parameters in a compressed format and decompresses them only when needed. We use quantization as the compression method in this paper and propose three methods, (1) using per-variable transformation, (2) weight matrices only quantization, and (3) partial parameter quantization, to minimize the impact on model accuracy. According to our experiments on two recent neural networks for speech recognition and two different datasets, OMC can reduce memory usage and communication cost of model parameters by up to 59% while attaining comparable accuracy and training speed when compared with full-precision training.

preprint2020arXiv

PANDA: Policy-aware Location Privacy for Epidemic Surveillance

In this demonstration, we present a privacy-preserving epidemic surveillance system. Recently, many countries that suffer from coronavirus crises attempt to access citizen's location data to eliminate the outbreak. However, it raises privacy concerns and may open the doors to more invasive forms of surveillance in the name of public health. It also brings a challenge for privacy protection techniques: how can we leverage people's mobile data to help combat the pandemic without scarifying our location privacy. We demonstrate that we can have the best of the two worlds by implementing policy-based location privacy for epidemic surveillance. Specifically, we formalize the privacy policy using graphs in light of differential privacy, called policy graph. Our system has three primary functions for epidemic surveillance: location monitoring, epidemic analysis, and contact tracing. We provide an interactive tool allowing the attendees to explore and examine the usability of our system: (1) the utility of location monitor and disease transmission model estimation, (2) the procedure of contact tracing in our systems, and (3) the privacy-utility trade-offs w.r.t. different policy graphs. The attendees can find that it is possible to have the full functionality of epidemic surveillance while preserving location privacy.

preprint2020arXiv

PGLP: Customizable and Rigorous Location Privacy through Policy Graph

Location privacy has been extensively studied in the literature. However, existing location privacy models are either not rigorous or not customizable, which limits the trade-off between privacy and utility in many real-world applications. To address this issue, we propose a new location privacy notion called PGLP, i.e., \textit{Policy Graph based Location Privacy}, providing a rich interface to release private locations with customizable and rigorous privacy guarantee. First, we design the privacy metrics of PGLP by extending differential privacy. Specifically, we formalize a user's location privacy requirements using a \textit{location policy graph}, which is expressive and customizable. Second, we investigate how to satisfy an arbitrarily given location policy graph under adversarial knowledge. We find that a location policy graph may not always be viable and may suffer \textit{location exposure} when the attacker knows the user's mobility pattern. We propose efficient methods to detect location exposure and repair the policy graph with optimal utility. Third, we design a private location trace release framework that pipelines the detection of location exposure, policy graph repair, and private trajectory release with customizable and rigorous location privacy. Finally, we conduct experiments on real-world datasets to verify the effectiveness of the privacy-utility trade-off and the efficiency of the proposed algorithms.

preprint2020arXiv

Protecting Spatiotemporal Event Privacy in Continuous Location-Based Services

Location privacy-preserving mechanisms (LPPMs) have been extensively studied for protecting users' location privacy by releasing a perturbed location to third parties such as location-based service providers. However, when a user's perturbed locations are released continuously, existing LPPMs may not protect the sensitive information about the user's spatiotemporal activities, such as "visited hospital in the last week" or "regularly commuting between Address 1 and Address 2" (it is easy to infer that Addresses 1 and 2 may be home and office), which we call it \textit{spatiotemporal event}. In this paper, we first formally define {spatiotemporal event} as Boolean expressions between location and time predicates, and then we define $ ε$-\textit{spatiotemporal event privacy} by extending the notion of differential privacy. Second, to understand how much spatiotemporal event privacy that existing LPPMs can provide, we design computationally efficient algorithms to quantify the privacy leakage of state-of-the-art LPPMs when an adversary has prior knowledge of the user's initial probability over possible locations. It turns out that the existing LPPMs cannot adequately protect spatiotemporal event privacy. Third, we propose a framework, PriSTE, to transform an existing LPPM into one protecting spatiotemporal event privacy against adversaries with \textit{any} prior knowledge. Our experiments on real-life and synthetic data verified that the proposed method is effective and efficient.

preprint2017arXiv

Quantifying Differential Privacy under Temporal Correlations

Differential Privacy (DP) has received increased attention as a rigorous privacy framework. Existing studies employ traditional DP mechanisms (e.g., the Laplace mechanism) as primitives, which assume that the data are independent, or that adversaries do not have knowledge of the data correlations. However, continuously generated data in the real world tend to be temporally correlated, and such correlations can be acquired by adversaries. In this paper, we investigate the potential privacy loss of a traditional DP mechanism under temporal correlations in the context of continuous data release. First, we model the temporal correlations using Markov model and analyze the privacy leakage of a DP mechanism when adversaries have knowledge of such temporal correlations. Our analysis reveals that the privacy leakage of a DP mechanism may accumulate and increase over time. We call it temporal privacy leakage. Second, to measure such privacy leakage, we design an efficient algorithm for calculating it in polynomial time. Although the temporal privacy leakage may increase over time, we also show that its supremum may exist in some cases. Third, to bound the privacy loss, we propose mechanisms that convert any existing DP mechanism into one against temporal privacy leakage. Experiments with synthetic data confirm that our approach is efficient and effective.

preprint2016arXiv

DPHMM: Customizable Data Release with Differential Privacy via Hidden Markov Model

Hidden Markov model (HMM) has been well studied and extensively used. In this paper, we present DPHMM ({Differentially Private Hidden Markov Model}), an HMM embedded with a private data release mechanism, in which the privacy of the data is protected through a graph. Specifically, we treat every state in Markov model as a node, and use a graph to represent the privacy policy, in which "indistinguishability" between states is denoted by edges between nodes. Due to the temporal correlations in Markov model, we show that the graph may be reduced to a subgraph with disconnected nodes, which become unprotected and might be exposed. To detect such privacy risk, we define sensitivity hull and degree of protection based on the graph to capture the condition of information exposure. Then to tackle the detected exposure, we study how to build an optimal graph based on the existing graph. We also implement and evaluate the DPHMM on real-world datasets, showing that privacy and utility can be better tuned with customized policy graph.

preprint2015arXiv

Protecting Locations with Differential Privacy under Temporal Correlations

Concerns on location privacy frequently arise with the rapid development of GPS enabled devices and location-based applications. While spatial transformation techniques such as location perturbation or generalization have been studied extensively, most techniques rely on syntactic privacy models without rigorous privacy guarantee. Many of them only consider static scenarios or perturb the location at single timestamps without considering temporal correlations of a moving user's locations, and hence are vulnerable to various inference attacks. While differential privacy has been accepted as a standard for privacy protection, applying differential privacy in location based applications presents new challenges, as the protection needs to be enforced on the fly for a single user and needs to incorporate temporal correlations between a user's locations. In this paper, we propose a systematic solution to preserve location privacy with rigorous privacy guarantee. First, we propose a new definition, "$δ$-location set" based differential privacy, to account for the temporal correlations in location data. Second, we show that the well known $\ell_1$-norm sensitivity fails to capture the geometric sensitivity in multidimensional space and propose a new notion, sensitivity hull, based on which the error of differential privacy is bounded. Third, to obtain the optimal utility we present a planar isotropic mechanism (PIM) for location perturbation, which is the first mechanism achieving the lower bound of differential privacy. Experiments on real-world datasets also demonstrate that PIM significantly outperforms baseline approaches in data utility.

preprint2012arXiv

Bayesian inference under differential privacy

Bayesian inference is an important technique throughout statistics. The essence of Beyesian inference is to derive the posterior belief updated from prior belief by the learned information, which is a set of differentially private answers under differential privacy. Although Bayesian inference can be used in a variety of applications, it becomes theoretically hard to solve when the number of differentially private answers is large. To facilitate Bayesian inference under differential privacy, this paper proposes a systematic mechanism. The key step of the mechanism is the implementation of Bayesian updating with the best linear unbiased estimator derived by Gauss-Markov theorem. In addition, we also apply the proposed inference mechanism into an online queryanswering system, the novelty of which is that the utility for users is guaranteed by Bayesian inference in the form of credible interval and confidence level. Theoretical and experimental analysis are shown to demonstrate the efficiency and effectiveness of both inference mechanism and online query-answering system.

preprint2012arXiv

DPCube: Differentially Private Histogram Release through Multidimensional Partitioning

Differential privacy is a strong notion for protecting individual privacy in privacy preserving data analysis or publishing. In this paper, we study the problem of differentially private histogram release for random workloads. We study two multidimensional partitioning strategies including: 1) a baseline cell-based partitioning strategy for releasing an equi-width cell histogram, and 2) an innovative 2-phase kd-tree based partitioning strategy for releasing a v-optimal histogram. We formally analyze the utility of the released histograms and quantify the errors for answering linear queries such as counting queries. We formally characterize the property of the input data that will guarantee the optimality of the algorithm. Finally, we implement and experimentally evaluate several applications using the released histograms, including counting queries, classification, and blocking for record linkage and show the benefit of our approach.

Yonghui Xiao

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Federated Pruning: Improving Neural Network Efficiency with Federated Learning

Online Model Compression for Federated Learning with Large Models

PANDA: Policy-aware Location Privacy for Epidemic Surveillance

PGLP: Customizable and Rigorous Location Privacy through Policy Graph

Protecting Spatiotemporal Event Privacy in Continuous Location-Based Services

Quantifying Differential Privacy under Temporal Correlations

DPHMM: Customizable Data Release with Differential Privacy via Hidden Markov Model

Protecting Locations with Differential Privacy under Temporal Correlations

Bayesian inference under differential privacy

DPCube: Differentially Private Histogram Release through Multidimensional Partitioning