Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
23works
0followers
20topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

23 published item(s)

preprint2022arXiv

A Framework for Understanding Model Extraction Attack and Defense

The privacy of machine learning models has become a significant concern in many emerging Machine-Learning-as-a-Service applications, where prediction services based on well-trained models are offered to users via pay-per-query. The lack of a defense mechanism can impose a high risk on the privacy of the server's model since an adversary could efficiently steal the model by querying only a few `good' data points. The interplay between a server's defense and an adversary's attack inevitably leads to an arms race dilemma, as commonly seen in Adversarial Machine Learning. To study the fundamental tradeoffs between model utility from a benign user's view and privacy from an adversary's view, we develop new metrics to quantify such tradeoffs, analyze their theoretical properties, and develop an optimization problem to understand the optimal adversarial attack and defense strategies. The developed concepts and theory match the empirical findings on the `equilibrium' between privacy and utility. In terms of optimization, the key ingredient that enables our results is a unified representation of the attack-defense problem as a min-max bi-level problem. The developed results will be demonstrated by examples and experiments.

preprint2022arXiv

Asymptotic Critical Transmission Radii in Wireless Networks over a Convex Region

Critical transmission ranges (or radii) in wireless ad-hoc and sensor networks have been extensively investigated for various performance metrics such as connectivity, coverage, power assignment and energy consumption. However, the regions on which the networks are distributed are typically either squares or disks in existing works, which seriously limits the usage in real-life applications. In this article, we consider a convex region (i.e., a generalisation of squares and disks) on which wireless nodes are uniformly distributed. We have investigated two types of critical transmission radii, defined in terms of k-connectivity and the minimum vertex degree, respectively, and have also established their precise asymptotic distributions. These make the previous results obtained under the circumstance of squares or disks special cases of this work. More importantly, our results reveal how the region shape impacts on the critical transmission ranges: it is the length of the boundary of the (fixed-area) region that completely determines the transmission ranges. Furthermore, by isodiametric inequality, the smallest critical transmission ranges are achieved when regions are disks only.

preprint2022arXiv

Convergence Analysis of Structure-Preserving Numerical Methods Based on Slotboom Transformation for the Poisson--Nernst--Planck Equations

The analysis of structure-preserving numerical methods for the Poisson--Nernst--Planck (PNP) system has attracted growing interests in recent years. In this work, we provide an optimal rate convergence analysis and error estimate for finite difference schemes based on the Slotboom reformulation. Different options of mobility average at the staggered mesh points are considered in the finite-difference spatial discretization, such as the harmonic mean, geometric mean, arithmetic mean, and entropic mean. A semi-implicit temporal discretization is applied, which in turn results in a non-constant coefficient, positive-definite linear system at each time step. A higher order asymptotic expansion is applied in the consistency analysis, and such a higher order consistency estimate is necessary to control the discrete maximum norm of the concentration variables. In convergence estimate, the harmonic mean for the mobility average, which turns out to bring lots of convenience in the theoretical analysis, is taken for simplicity, while other options of mobility average would also lead to the desired error estimate, with more technical details involved. As a result, an optimal rate convergence analysis on concentrations, electric potential, and ionic fluxes is derived, which is the first such results for the structure-preserving numerical schemes based on the Slotboom reformulation. It is remarked that the convergence analysis leads to a theoretical justification of the conditional energy dissipation analysis, which relies on the maximum norm bounds of the concentration and the gradient of the electric potential. Some numerical results are also presented to demonstrate the accuracy and structure-preserving performance of the associated schemes.

preprint2022arXiv

Federated Learning Challenges and Opportunities: An Outlook

Federated learning (FL) has been developed as a promising framework to leverage the resources of edge devices, enhance customers' privacy, comply with regulations, and reduce development costs. Although many methods and applications have been developed for FL, several critical challenges for practical FL systems remain unaddressed. This paper provides an outlook on FL development, categorized into five emerging directions of FL, namely algorithm foundation, personalization, hardware and security constraints, lifelong learning, and nonstandard data. Our unique perspectives are backed by practical observations from large-scale federated systems for edge devices.

preprint2022arXiv

Interval Privacy: A Framework for Privacy-Preserving Data Collection

The emerging public awareness and government regulations of data privacy motivate new paradigms of collecting and analyzing data that are transparent and acceptable to data owners. We present a new concept of privacy and corresponding data formats, mechanisms, and theories for privatizing data during data collection. The privacy, named Interval Privacy, enforces the raw data conditional distribution on the privatized data to be the same as its unconditional distribution over a nontrivial support set. Correspondingly, the proposed privacy mechanism will record each data value as a random interval (or, more generally, a range) containing it. The proposed interval privacy mechanisms can be easily deployed through survey-based data collection interfaces, e.g., by asking a respondent whether its data value is within a randomly generated range. Another unique feature of interval mechanisms is that they obfuscate the truth but do not perturb it. Using narrowed range to convey information is complementary to the popular paradigm of perturbing data. Also, the interval mechanisms can generate progressively refined information at the discretion of individuals, naturally leading to privacy-adaptive data collection. We develop different aspects of theory such as composition, robustness, distribution estimation, and regression learning from interval-valued data. Interval privacy provides a new perspective of human-centric data privacy where individuals have a perceptible, transparent, and simple way of sharing sensitive data.

preprint2022arXiv

Is a Classification Procedure Good Enough? A Goodness-of-Fit Assessment Tool for Classification Learning

In recent years, many non-traditional classification methods, such as Random Forest, Boosting, and neural network, have been widely used in applications. Their performance is typically measured in terms of classification accuracy. While the classification error rate and the like are important, they do not address a fundamental question: Is the classification method underfitted? To our best knowledge, there is no existing method that can assess the goodness-of-fit of a general classification procedure. Indeed, the lack of a parametric assumption makes it challenging to construct proper tests. To overcome this difficulty, we propose a methodology called BAGofT that splits the data into a training set and a validation set. First, the classification procedure to assess is applied to the training set, which is also used to adaptively find a data grouping that reveals the most severe regions of underfitting. Then, based on this grouping, we calculate a test statistic by comparing the estimated success probabilities and the actual observed responses from the validation set. The data splitting guarantees that the size of the test is controlled under the null hypothesis, and the power of the test goes to one as the sample size increases under the alternative hypothesis. For testing parametric classification models, the BAGofT has a broader scope than the existing methods since it is not restricted to specific parametric models (e.g., logistic regression). Extensive simulation studies show the utility of the BAGofT when assessing general classification procedures and its strengths over some existing methods when testing parametric classification models.

preprint2022arXiv

On The Energy Statistics of Feature Maps in Pruning of Neural Networks with Skip-Connections

We propose a new structured pruning framework for compressing Deep Neural Networks (DNNs) with skip connections, based on measuring the statistical dependency of hidden layers and predicted outputs. The dependence measure defined by the energy statistics of hidden layers serves as a model-free measure of information between the feature maps and the output of the network. The estimated dependence measure is subsequently used to prune a collection of redundant and uninformative layers. Model-freeness of our measure guarantees that no parametric assumptions on the feature map distribution are required, making it computationally appealing for very high dimensional feature space in DNNs. Extensive numerical experiments on various architectures show the efficacy of the proposed pruning approach with competitive performance to state-of-the-art methods.

preprint2022arXiv

Self-Aware Personalized Federated Learning

In the context of personalized federated learning (FL), the critical challenge is to balance local model improvement and global model tuning when the personal and global objectives may not be exactly aligned. Inspired by Bayesian hierarchical models, we develop a self-aware personalized FL method where each client can automatically balance the training of its local personal model and the global model that implicitly contributes to other clients' training. Such a balance is derived from the inter-client and intra-client uncertainty quantification. A larger inter-client variation implies more personalization is needed. Correspondingly, our method uses uncertainty-driven local training steps and aggregation rule instead of conventional local fine-tuning and sample size-based aggregation. With experimental studies on synthetic data, Amazon Alexa audio data, and public datasets such as MNIST, FEMNIST, CIFAR10, and Sent140, we show that our proposed method can achieve significantly improved personalization performance compared with the existing counterparts.

preprint2022arXiv

Targeted Cross-Validation

In many applications, we have access to the complete dataset but are only interested in the prediction of a particular region of predictor variables. A standard approach is to find the globally best modeling method from a set of candidate methods. However, it is perhaps rare in reality that one candidate method is uniformly better than the others. A natural approach for this scenario is to apply a weighted $L_2$ loss in performance assessment to reflect the region-specific interest. We propose a targeted cross-validation (TCV) to select models or procedures based on a general weighted $L_2$ loss. We show that the TCV is consistent in selecting the best performing candidate under the weighted $L_2$ loss. Experimental studies are used to demonstrate the use of TCV and its potential advantage over the global CV or the approach of using only local data for modeling a local region. Previous investigations on CV have relied on the condition that when the sample size is large enough, the ranking of two candidates stays the same. However, in many applications with the setup of changing data-generating processes or highly adaptive modeling methods, the relative performance of the methods is not static as the sample size varies. Even with a fixed data-generating process, it is possible that the ranking of two methods switches infinitely many times. In this work, we broaden the concept of the selection consistency by allowing the best candidate to switch as the sample size varies, and then establish the consistency of the TCV. This flexible framework can be applied to high-dimensional and complex machine learning scenarios where the relative performances of modeling procedures are dynamic.

preprint2022arXiv

The Rate of Convergence of Variation-Constrained Deep Neural Networks

Multi-layer feedforward networks have been used to approximate a wide range of nonlinear functions. An important and fundamental problem is to understand the learnability of a network model through its statistical risk, or the expected prediction error on future data. To the best of our knowledge, the rate of convergence of neural networks shown by existing works is bounded by at most the order of $n^{-1/4}$ for a sample size of $n$. In this paper, we show that a class of variation-constrained neural networks, with arbitrary width, can achieve near-parametric rate $n^{-1/2+δ}$ for an arbitrarily small positive constant $δ$. It is equivalent to $n^{-1 +2δ}$ under the mean squared error. This rate is also observed by numerical experiments. The result indicates that the neural function space needed for approximating smooth functions may not be as large as what is often perceived. Our result also provides insight to the phenomena that deep neural networks do not easily suffer from overfitting when the number of neurons and learning parameters rapidly grow with $n$ or even surpass $n$. We also discuss the rate of convergence regarding other network parameters, including the input dimension, network layer, and coefficient norm.

preprint2020arXiv

Forecasting with Multiple Seasonality

An emerging number of modern applications involve forecasting time series data that exhibit both short-time dynamics and long-time seasonality. Specifically, time series with multiple seasonality is a difficult task with comparatively fewer discussions. In this paper, we propose a two-stage method for time series with multiple seasonality, which does not require pre-determined seasonality periods. In the first stage, we generalize the classical seasonal autoregressive moving average (ARMA) model in multiple seasonality regime. In the second stage, we utilize an appropriate criterion for lag order selection. Simulation and empirical studies show the excellent predictive performance of our method, especially compared to a recently popular `Facebook Prophet' model for time series.

preprint2020arXiv

Imitation Privacy

In recent years, there have been many cloud-based machine learning services, where well-trained models are provided to users on a pay-per-query scheme through a prediction API. The emergence of these services motivates this work, where we will develop a general notion of model privacy named imitation privacy. We show the broad applicability of imitation privacy in classical query-response MLaaS scenarios and new multi-organizational learning scenarios. We also exemplify the fundamental difference between imitation privacy and the usual data-level privacy.

preprint2020arXiv

Information Laundering for Model Privacy

In this work, we propose information laundering, a novel framework for enhancing model privacy. Unlike data privacy that concerns the protection of raw data information, model privacy aims to protect an already-learned model that is to be deployed for public use. The private model can be obtained from general learning methods, and its deployment means that it will return a deterministic or random response for a given input query. An information-laundered model consists of probabilistic components that deliberately maneuver the intended input and output for queries to the model, so the model's adversarial acquisition is less likely. Under the proposed framework, we develop an information-theoretic principle to quantify the fundamental tradeoffs between model utility and privacy leakage and derive the optimal design.

preprint2020arXiv

IoT Connectivity Technologies and Applications: A Survey

The Internet of Things (IoT) is rapidly becoming an integral part of our life and also multiple industries. We expect to see the number of IoT connected devices explosively grows and will reach hundreds of billions during the next few years. To support such a massive connectivity, various wireless technologies are investigated. In this survey, we provide a broad view of the existing wireless IoT connectivity technologies and discuss several new emerging technologies and solutions that can be effectively used to enable massive connectivity for IoT. In particular, we categorize the existing wireless IoT connectivity technologies based on coverage range and review diverse types of connectivity technologies with different specifications. We also point out key technical challenges of the existing connectivity technologies for enabling massive IoT connectivity. To address the challenges, we further review and discuss some examples of promising technologies such as compressive sensing (CS) random access, non-orthogonal multiple access (NOMA), and massive multiple input multiple output (mMIMO) based random access that could be employed in future standards for supporting IoT connectivity. Finally, a classification of IoT applications is considered in terms of various service requirements. For each group of classified applications, we outline its suitable IoT connectivity options.

preprint2020arXiv

Speech Emotion Recognition with Dual-Sequence LSTM Architecture

Speech Emotion Recognition (SER) has emerged as a critical component of the next generation human-machine interfacing technologies. In this work, we propose a new dual-level model that predicts emotions based on both MFCC features and mel-spectrograms produced from raw audio signals. Each utterance is preprocessed into MFCC features and two mel-spectrograms at different time-frequency resolutions. A standard LSTM processes the MFCC features, while a novel LSTM architecture, denoted as Dual-Sequence LSTM (DS-LSTM), processes the two mel-spectrograms simultaneously. The outputs are later averaged to produce a final classification of the utterance. Our proposed model achieves, on average, a weighted accuracy of 72.7% and an unweighted accuracy of 73.3%---a 6% improvement over current state-of-the-art unimodal models---and is comparable with multimodal models that leverage textual information as well as audio signals.

preprint2020arXiv

Structure-Preserving and Efficient Numerical Methods for Ion Transport

Ion transport, often described by the Poisson--Nernst--Planck (PNP) equations, is ubiquitous in electrochemical devices and many biological processes of significance. In this work, we develop conservative, positivity-preserving, energy dissipating, and implicit finite difference schemes for solving the multi-dimensional PNP equations with multiple ionic species. A central-differencing discretization based on harmonic-mean approximations is employed for the Nernst--Planck (NP) equations. The backward Euler discretization in time is employed to derive a fully implicit nonlinear system, which is efficiently solved by a newly proposed Newton's method. The improved computational efficiency of the Newton's method originates from the usage of the electrostatic potential as the iteration variable, rather than the unknowns of the nonlinear system that involves both the potential and concentration of multiple ionic species. Numerical analysis proves that the numerical schemes respect three desired analytical properties (conservation, positivity preserving, and energy dissipation) fully discretely. Based on advantages brought by the harmonic-mean approximations, we are able to establish estimate on the upper bound of condition numbers of coefficient matrices in linear systems that are solved iteratively. The solvability and stability of the linearized problem in the Newton's method are rigorously established as well. Numerical tests are performed to confirm the anticipated numerical accuracy, computational efficiency, and structure-preserving properties of the developed schemes. Adaptive time stepping is implemented for further efficiency improvement. Finally, the proposed numerical approaches are applied to characterize ion transport subject to a sinusoidal applied potential.

preprint2020arXiv

Towards Enabling Critical mMTC: A Review of URLLC within mMTC

Massive machine-type communication (mMTC) and ultra-reliable and low-latency communication (URLLC) are two key service types in the fifth-generation (5G) communication systems, pursuing scalability and reliability with low-latency, respectively. These two extreme services are envisaged to agglomerate together into \emph{critical mMTC} shortly with emerging use cases (e.g., wide-area disaster monitoring, wireless factory automation), creating new challenges to designing wireless systems beyond 5G. While conventional network slicing is effective in supporting a simple mixture of mMTC and URLLC, it is difficult to simultaneously guarantee the reliability, latency, and scalability requirements of critical mMTC (e.g., < 4ms latency, $10^6$ devices/km$^2$ for factory automation) with limited radio resources. Furthermore, recently proposed solutions to scalable URLLC (e.g., machine learning aided URLLC for driverless vehicles) are ill-suited to critical mMTC whose machine type users have minimal energy budget and computing capability that should be (tightly) optimized for given tasks. To this end, our paper aims to characterize promising use cases of critical mMTC and search for their possible solutions. To this end, we first review the state-of-the-art (SOTA) technologies for separate mMTC and URLLC services and then identify key challenges from conflicting SOTA requirements, followed by potential approaches to prospective critical mMTC solutions at different layers.

preprint2019arXiv

Deep Clustering of Compressed Variational Embeddings

Motivated by the ever-increasing demands for limited communication bandwidth and low-power consumption, we propose a new methodology, named joint Variational Autoencoders with Bernoulli mixture models (VAB), for performing clustering in the compressed data domain. The idea is to reduce the data dimension by Variational Autoencoders (VAEs) and group data representations by Bernoulli mixture models (BMMs). Once jointly trained for compression and clustering, the model can be decomposed into two parts: a data vendor that encodes the raw data into compressed data, and a data consumer that classifies the received (compressed) data. In this way, the data vendor benefits from data security and communication bandwidth, while the data consumer benefits from low computational complexity. To enable training using the gradient descent algorithm, we propose to use the Gumbel-Softmax distribution to resolve the infeasibility of the back-propagation algorithm when assessing categorical samples.

preprint2019arXiv

DRASIC: Distributed Recurrent Autoencoder for Scalable Image Compression

We propose a new architecture for distributed image compression from a group of distributed data sources. The work is motivated by practical needs of data-driven codec design, low power consumption, robustness, and data privacy. The proposed architecture, which we refer to as Distributed Recurrent Autoencoder for Scalable Image Compression (DRASIC), is able to train distributed encoders and one joint decoder on correlated data sources. Its compression capability is much better than the method of training codecs separately. Meanwhile, the performance of our distributed system with 10 distributed sources is only within 2 dB peak signal-to-noise ratio (PSNR) of the performance of a single codec trained with all data sources. We experiment distributed sources with different correlations and show how our data-driven methodology well matches the Slepian-Wolf Theorem in Distributed Source Coding (DSC). To the best of our knowledge, this is the first data-driven DSC framework for general distributed code design with deep learning.

preprint2019arXiv

Restricted Recurrent Neural Networks

Recurrent Neural Network (RNN) and its variations such as Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU), have become standard building blocks for learning online data of sequential nature in many research areas, including natural language processing and speech data analysis. In this paper, we present a new methodology to significantly reduce the number of parameters in RNNs while maintaining performance that is comparable or even better than classical RNNs. The new proposal, referred to as Restricted Recurrent Neural Network (RRNN), restricts the weight matrices corresponding to the input data and hidden states at each time step to share a large proportion of parameters. The new architecture can be regarded as a compression of its classical counterpart, but it does not require pre-training or sophisticated parameter fine-tuning, both of which are major issues in most existing compression techniques. Experiments on natural language modeling show that compared with its classical counterpart, the restricted recurrent architecture generally produces comparable results at about 50\% compression rate. In particular, the Restricted LSTM can outperform classical RNN with even less number of parameters.

preprint2019arXiv

Supervised Encoding for Discrete Representation Learning

Classical supervised classification tasks search for a nonlinear mapping that maps each encoded feature directly to a probability mass over the labels. Such a learning framework typically lacks the intuition that encoded features from the same class tend to be similar and thus has little interpretability for the learned features. In this paper, we propose a novel supervised learning model named Supervised-Encoding Quantizer (SEQ). The SEQ applies a quantizer to cluster and classify the encoded features. We found that the quantizer provides an interpretable graph where each cluster in the graph represents a class of data samples that have a particular style. We also trained a decoder that can decode convex combinations of the encoded features from similar and different clusters and provide guidance on style transfer between sub-classes.