Source author record

Shaolei Ren

Shaolei Ren appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Cryptography and Security Computer Science and Game Theory Artificial Intelligence Distributed, Parallel, and Cluster Computing Hardware Architecture Information Theory math.IT math.OC Networking and Internet Architecture Social and Information Networks

Catalog footprint

What is connected

21works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Brain-Inspired Low-Dimensional Computing Classifier for Inference on Tiny Devices

By mimicking brain-like cognition and exploiting parallelism, hyperdimensional computing (HDC) classifiers have been emerging as a lightweight framework to achieve efficient on-device inference. Nonetheless, they have two fundamental drawbacks, heuristic training process and ultra-high dimension, which result in sub-optimal inference accuracy and large model sizes beyond the capability of tiny devices with stringent resource constraints. In this paper, we address these fundamental drawbacks and propose a low-dimensional computing (LDC) alternative. Specifically, by mapping our LDC classifier into an equivalent neural network, we optimize our model using a principled training approach. Most importantly, we can improve the inference accuracy while successfully reducing the ultra-high dimension of existing HDC models by orders of magnitude (e.g., 8000 vs. 4/64). We run experiments to evaluate our LDC classifier by considering different datasets for inference on tiny devices, and also implement different models on an FPGA platform for acceleration. The results highlight that our LDC classifier offers an overwhelming advantage over the existing brain-inspired HDC models and is particularly suitable for inference on tiny devices.

preprint2022arXiv

A Semi-Decoupled Approach to Fast and Optimal Hardware-Software Co-Design of Neural Accelerators

In view of the performance limitations of fully-decoupled designs for neural architectures and accelerators, hardware-software co-design has been emerging to fully reap the benefits of flexible design spaces and optimize neural network performance. Nonetheless, such co-design also enlarges the total search space to practically infinity and presents substantial challenges. While the prior studies have been focusing on improving the search efficiency (e.g., via reinforcement learning), they commonly rely on co-searches over the entire architecture-accelerator design space. In this paper, we propose a \emph{semi}-decoupled approach to reduce the size of the total design space by orders of magnitude, yet without losing optimality. We first perform neural architecture search to obtain a small set of optimal architectures for one accelerator candidate. Importantly, this is also the set of (close-to-)optimal architectures for other accelerator designs based on the property that neural architectures' ranking orders in terms of inference latency and energy consumption on different accelerator designs are highly similar. Then, instead of considering all the possible architectures, we optimize the accelerator design only in combination with this small set of architectures, thus significantly reducing the total search cost. We validate our approach by conducting experiments on various architecture spaces for accelerator designs with different dataflows. Our results highlight that we can obtain the optimal design by only navigating over the reduced search space. The source code of this work is at \url{https://github.com/Ren-Research/CoDesign}.

preprint2022arXiv

Expert-Calibrated Learning for Online Optimization with Switching Costs

We study online convex optimization with switching costs, a practically important but also extremely challenging problem due to the lack of complete offline information. By tapping into the power of machine learning (ML) based optimizers, ML-augmented online algorithms (also referred to as expert calibration in this paper) have been emerging as state of the art, with provable worst-case performance guarantees. Nonetheless, by using the standard practice of training an ML model as a standalone optimizer and plugging it into an ML-augmented algorithm, the average cost performance can be highly unsatisfactory. In order to address the "how to learn" challenge, we propose EC-L2O (expert-calibrated learning to optimize), which trains an ML-based optimizer by explicitly taking into account the downstream expert calibrator. To accomplish this, we propose a new differentiable expert calibrator that generalizes regularized online balanced descent and offers a provably better competitive ratio than pure ML predictions when the prediction error is large. For training, our loss function is a weighted sum of two different losses -- one minimizing the average ML prediction error for better robustness, and the other one minimizing the post-calibration average cost. We also provide theoretical analysis for EC-L2O, highlighting that expert calibration can be even beneficial for the average cost performance and that the high-percentile tail ratio of the cost achieved by EC-L2O to that of the offline optimal oracle (i.e., tail cost ratio) can be bounded. Finally, we test EC-L2O by running simulations for sustainable datacenter demand response. Our results demonstrate that EC-L2O can empirically achieve a lower average cost as well as a lower competitive ratio than the existing baseline algorithms.

preprint2022arXiv

HDLock: Exploiting Privileged Encoding to Protect Hyperdimensional Computing Models against IP Stealing

Hyperdimensional Computing (HDC) is facing infringement issues due to straightforward computations. This work, for the first time, raises a critical vulnerability of HDC, an attacker can reverse engineer the entire model, only requiring the unindexed hypervector memory. To mitigate this attack, we propose a defense strategy, namely HDLock, which significantly increases the reasoning cost of encoding. Specifically, HDLock adds extra feature hypervector combination and permutation in the encoding module. Compared to the standard HDC model, a two-layer-key HDLock can increase the adversarial reasoning complexity by 10 order of magnitudes without inference accuracy loss, with only 21% latency overhead.

preprint2022arXiv

Informed Learning by Wide Neural Networks: Convergence, Generalization and Sampling Complexity

By integrating domain knowledge with labeled samples, informed machine learning has been emerging to improve the learning performance for a wide range of applications. Nonetheless, rigorous understanding of the role of injected domain knowledge has been under-explored. In this paper, we consider an informed deep neural network (DNN) with over-parameterization and domain knowledge integrated into its training objective function, and study how and why domain knowledge benefits the performance. Concretely, we quantitatively demonstrate the two benefits of domain knowledge in informed learning - regularizing the label-based supervision and supplementing the labeled samples - and reveal the trade-off between label and knowledge imperfectness in the bound of the population risk. Based on the theoretical analysis, we propose a generalized informed training objective to better exploit the benefits of knowledge and balance the label and knowledge imperfectness, which is validated by the population risk bound. Our analysis on sampling complexity sheds lights on how to choose the hyper-parameters for informed learning, and further justifies the advantages of knowledge informed learning.

preprint2022arXiv

LeHDC: Learning-Based Hyperdimensional Computing Classifier

Thanks to the tiny storage and efficient execution, hyperdimensional Computing (HDC) is emerging as a lightweight learning framework on resource-constrained hardware. Nonetheless, the existing HDC training relies on various heuristic methods, significantly limiting their inference accuracy. In this paper, we propose a new HDC framework, called LeHDC, which leverages a principled learning approach to improve the model accuracy. Concretely, LeHDC maps the existing HDC framework into an equivalent Binary Neural Network architecture, and employs a corresponding training strategy to minimize the training loss. Experimental validation shows that LeHDC outperforms previous HDC training strategies and can improve on average the inference accuracy over 15% compared to the baseline HDC.

preprint2022arXiv

ObfuNAS: A Neural Architecture Search-based DNN Obfuscation Approach

Malicious architecture extraction has been emerging as a crucial concern for deep neural network (DNN) security. As a defense, architecture obfuscation is proposed to remap the victim DNN to a different architecture. Nonetheless, we observe that, with only extracting an obfuscated DNN architecture, the adversary can still retrain a substitute model with high performance (e.g., accuracy), rendering the obfuscation techniques ineffective. To mitigate this under-explored vulnerability, we propose ObfuNAS, which converts the DNN architecture obfuscation into a neural architecture search (NAS) problem. Using a combination of function-preserving obfuscation strategies, ObfuNAS ensures that the obfuscated DNN architecture can only achieve lower accuracy than the victim. We validate the performance of ObfuNAS with open-source architecture datasets like NAS-Bench-101 and NAS-Bench-301. The experimental results demonstrate that ObfuNAS can successfully find the optimal mask for a victim model within a given FLOPs constraint, leading up to 2.6% inference accuracy degradation for attackers with only 0.14x FLOPs overhead. The code is available at: https://github.com/Tongzhou0101/ObfuNAS.

preprint2021arXiv

A Quantitative Perspective on Values of Domain Knowledge for Machine Learning

With the exploding popularity of machine learning, domain knowledge in various forms has been playing a crucial role in improving the learning performance, especially when training data is limited. Nonetheless, there is little understanding of to what extent domain knowledge can affect a machine learning task from a quantitative perspective. To increase the transparency and rigorously explain the role of domain knowledge in machine learning, we study the problem of quantifying the values of domain knowledge in terms of its contribution to the learning performance in the context of informed machine learning. We propose a quantification method based on Shapley value that fairly attributes the overall learning performance improvement to different domain knowledge. We also present Monte-Carlo sampling to approximate the fair value of domain knowledge with a polynomial time complexity. We run experiments of injecting symbolic domain knowledge into semi-supervised learning tasks on both MNIST and CIFAR10 datasets, providing quantitative values of different symbolic knowledge and rigorously explaining how it affects the machine learning performance in terms of test accuracy.

preprint2020arXiv

A Note on Latency Variability of Deep Neural Networks for Mobile Inference

Running deep neural network (DNN) inference on mobile devices, i.e., mobile inference, has become a growing trend, making inference less dependent on network connections and keeping private data locally. The prior studies on optimizing DNNs for mobile inference typically focus on the metric of average inference latency, thus implicitly assuming that mobile inference exhibits little latency variability. In this note, we conduct a preliminary measurement study on the latency variability of DNNs for mobile inference. We show that the inference latency variability can become quite significant in the presence of CPU resource contention. More interestingly, unlike the common belief that the relative performance superiority of DNNs on one device can carry over to another device and/or another level of resource contention, we highlight that a DNN model with a better latency performance than another model can become outperformed by the other model when resource contention be more severe or running on another device. Thus, when optimizing DNN models for mobile inference, only measuring the average latency may not be adequate; instead, latency variability under various conditions should be accounted for, including but not limited to different devices and different levels of CPU resource contention considered in this note.

preprint2020arXiv

Adversarial Attacks on Brain-Inspired Hyperdimensional Computing-Based Classifiers

Being an emerging class of in-memory computing architecture, brain-inspired hyperdimensional computing (HDC) mimics brain cognition and leverages random hypervectors (i.e., vectors with a dimensionality of thousands or even more) to represent features and to perform classification tasks. The unique hypervector representation enables HDC classifiers to exhibit high energy efficiency, low inference latency and strong robustness against hardware-induced bit errors. Consequently, they have been increasingly recognized as an appealing alternative to or even replacement of traditional deep neural networks (DNNs) for local on device classification, especially on low-power Internet of Things devices. Nonetheless, unlike their DNN counterparts, state-of-the-art designs for HDC classifiers are mostly security-oblivious, casting doubt on their safety and immunity to adversarial inputs. In this paper, we study for the first time adversarial attacks on HDC classifiers and highlight that HDC classifiers can be vulnerable to even minimally-perturbed adversarial samples. Concretely, using handwritten digit classification as an example, we construct a HDC classifier and formulate a grey-box attack problem, where an attacker's goal is to mislead the target HDC classifier to produce erroneous prediction labels while keeping the amount of added perturbation noise as little as possible. Then, we propose a modified genetic algorithm to generate adversarial samples within a reasonably small number of queries. Our results show that adversarial images generated by our algorithm can successfully mislead the HDC classifier to produce wrong prediction labels with a high probability (i.e., 78% when the HDC classifier uses a fixed majority rule for decision). Finally, we also present two defense strategies -- adversarial training and retraining-- to strengthen the security of HDC classifiers.

preprint2020arXiv

Calibrating Deep Neural Network Classifiers on Out-of-Distribution Datasets

To increase the trustworthiness of deep neural network (DNN) classifiers, an accurate prediction confidence that represents the true likelihood of correctness is crucial. Towards this end, many post-hoc calibration methods have been proposed to leverage a lightweight model to map the target DNN's output layer into a calibrated confidence. Nonetheless, on an out-of-distribution (OOD) dataset in practice, the target DNN can often mis-classify samples with a high confidence, creating significant challenges for the existing calibration methods to produce an accurate confidence. In this paper, we propose a new post-hoc confidence calibration method, called CCAC (Confidence Calibration with an Auxiliary Class), for DNN classifiers on OOD datasets. The key novelty of CCAC is an auxiliary class in the calibration model which separates mis-classified samples from correctly classified ones, thus effectively mitigating the target DNN's being confidently wrong. We also propose a simplified version of CCAC to reduce free parameters and facilitate transfer to a new unseen dataset. Our experiments on different DNN models, datasets and applications show that CCAC can consistently outperform the prior post-hoc calibration methods.

preprint2020arXiv

Increasing Trustworthiness of Deep Neural Networks via Accuracy Monitoring

Inference accuracy of deep neural networks (DNNs) is a crucial performance metric, but can vary greatly in practice subject to actual test datasets and is typically unknown due to the lack of ground truth labels. This has raised significant concerns with trustworthiness of DNNs, especially in safety-critical applications. In this paper, we address trustworthiness of DNNs by using post-hoc processing to monitor the true inference accuracy on a user's dataset. Concretely, we propose a neural network-based accuracy monitor model, which only takes the deployed DNN's softmax probability output as its input and directly predicts if the DNN's prediction result is correct or not, thus leading to an estimate of the true inference accuracy. The accuracy monitor model can be pre-trained on a dataset relevant to the target application of interest, and only needs to actively label a small portion (1% in our experiments) of the user's dataset for model transfer. For estimation robustness, we further employ an ensemble of monitor models based on the Monte-Carlo dropout method. We evaluate our approach on different deployed DNN models for image classification and traffic sign detection over multiple datasets (including adversarial samples). The result shows that our accuracy monitor model provides a close-to-true accuracy estimation and outperforms the existing baseline methods.

preprint2020arXiv

Scaling Up Deep Neural Network Optimization for Edge Inference

Deep neural networks (DNNs) have been increasingly deployed on and integrated with edge devices, such as mobile phones, drones, robots and wearables. To run DNN inference directly on edge devices (a.k.a. edge inference) with a satisfactory performance, optimizing the DNN design (e.g., network architecture and quantization policy) is crucial. While state-of-the-art DNN designs have leveraged performance predictors to speed up the optimization process, they are device-specific (i.e., each predictor for only one target device) and hence cannot scale well in the presence of extremely diverse edge devices. Moreover, even with performance predictors, the optimizer (e.g., search-based optimization) can still be time-consuming when optimizing DNNs for many different devices. In this work, we propose two approaches to scaling up DNN optimization. In the first approach, we reuse the performance predictors built on a proxy device, and leverage the performance monotonicity to scale up the DNN optimization without re-building performance predictors for each different device. In the second approach, we build scalable performance predictors that can estimate the resulting performance (e.g., inference accuracy/latency/energy) given a DNN-device pair, and use a neural network-based automated optimizer that takes both device features and optimization parameters as input and then directly outputs the optimal DNN design without going through a lengthy optimization process for each individual device.

preprint2020arXiv

Your Noise, My Signal: Exploiting Switching Noise for Stealthy Data Exfiltration from Desktop Computers

Attacks based on power analysis have been long existing and studied, with some recent works focused on data exfiltration from victim systems without using conventional communications (e.g., WiFi). Nonetheless, prior works typically rely on intrusive direct power measurement, either by implanting meters in the power outlet or tapping into the power cable, thus jeopardizing the stealthiness of attacks. In this paper, we propose NoDE (Noise for Data Exfiltration), a new system for stealthy data exfiltration from enterprise desktop computers. Specifically, NoDE achieves data exfiltration over a building's power network by exploiting high-frequency voltage ripples (i.e., switching noises) generated by power factor correction circuits built into today's computers. Located at a distance and even from a different room, the receiver can non-intrusively measure the voltage of a power outlet to capture the high-frequency switching noises for online information decoding without supervised training/learning. To evaluate NoDE, we run experiments on seven different computers from top-vendors and using top brand power supply units. Our results show that for a single transmitter, NoDE achieves a rate of up to 28.48 bits/second with a distance of 90 feet (27.4 meters) without the line of sight, demonstrating a practically stealthy threat. Based on the orthogonality of switching noise frequencies of different computers, we also demonstrate simultaneous data exfiltration from four computers using only one receiver. Finally, we present a few possible defenses, such as installing noise filters, and discuss their limitations.

preprint2016arXiv

Online Learning for Offloading and Autoscaling in Renewable-Powered Mobile Edge Computing

Mobile edge computing (a.k.a. fog computing) has recently emerged to enable \emph{in-situ} processing of delay-sensitive applications at the edge of mobile networks. Providing grid power supply in support of mobile edge computing, however, is costly and even infeasible (in certain rugged or under-developed areas), thus mandating on-site renewable energy as a major or even sole power supply in increasingly many scenarios. Nonetheless, the high intermittency and unpredictability of renewable energy make it very challenging to deliver a high quality of service to users in renewable-powered mobile edge computing systems. In this paper, we address the challenge of incorporating renewables into mobile edge computing and propose an efficient reinforcement learning-based resource management algorithm, which learns on-the-fly the optimal policy of dynamic workload offloading (to centralized cloud) and edge server provisioning to minimize the long-term system cost (including both service delay and operational cost). Our online learning algorithm uses a decomposition of the (offline) value iteration and (online) reinforcement learning, thus achieving a significant improvement of learning rate and run-time performance when compared to standard reinforcement learning algorithms such as Q-learning.

preprint2015arXiv

Greening Multi-Tenant Data Center Demand Response

Data centers have emerged as promising resources for demand response, particularly for emergency demand response (EDR), which saves the power grid from incurring blackouts during emergency situations. However, currently, data centers typically participate in EDR by turning on backup (diesel) generators, which is both expensive and environmentally unfriendly. In this paper, we focus on "greening" demand response in multi-tenant data centers, i.e., colocation data centers, by designing a pricing mechanism through which the data center operator can efficiently extract load reductions from tenants during emergency periods to fulfill energy reduction requirement for EDR. In particular, we propose a pricing mechanism for both mandatory and voluntary EDR programs, ColoEDR, that is based on parameterized supply function bidding and provides provably near-optimal efficiency guarantees, both when tenants are price-taking and when they are price-anticipating. In addition to analytic results, we extend the literature on supply function mechanism design, and evaluate ColoEDR using trace-based simulation studies. These validate the efficiency analysis and conclude that the pricing mechanism is both beneficial to the environment and to the data center operator (by decreasing the need for backup diesel generation), while also aiding tenants (by providing payments for load reductions).

preprint2014arXiv

Energy-Efficient Flow Scheduling and Routing with Hard Deadlines in Data Center Networks

The power consumption of enormous network devices in data centers has emerged as a big concern to data center operators. Despite many traffic-engineering-based solutions, very little attention has been paid on performance-guaranteed energy saving schemes. In this paper, we propose a novel energy-saving model for data center networks by scheduling and routing "deadline-constrained flows" where the transmission of every flow has to be accomplished before a rigorous deadline, being the most critical requirement in production data center networks. Based on speed scaling and power-down energy saving strategies for network devices, we aim to explore the most energy efficient way of scheduling and routing flows on the network, as well as determining the transmission speed for every flow. We consider two general versions of the problem. For the version of only flow scheduling where routes of flows are pre-given, we show that it can be solved polynomially and we develop an optimal combinatorial algorithm for it. For the version of joint flow scheduling and routing, we prove that it is strongly NP-hard and cannot have a Fully Polynomial-Time Approximation Scheme (FPTAS) unless P=NP. Based on a relaxation and randomized rounding technique, we provide an efficient approximation algorithm which can guarantee a provable performance ratio with respect to a polynomial of the total number of flows.

preprint2012arXiv

Entry and Spectrum Sharing Scheme Selection in Femtocell Markets

Focusing on a femtocell communications market, we study the entrant network service provider's (NSP's) long-term decision: whether to enter the market and which spectrum sharing technology to select to maximize its profit. This long-term decision is closely related to the entrant's pricing strategy and the users' aggregate demand, which we model as medium-term and short-term decisions, respectively. We consider two markets, one with no incumbent and the other with one incumbent. For both markets, we show the existence and uniqueness of an equilibrium point in the user subscription dynamics, and provide a sufficient condition for the convergence of the dynamics. For the market with no incumbent, we derive upper and lower bounds on the optimal price and market share that maximize the entrant's revenue, based on which the entrant selects an available technology to maximize its long-term profit. For the market with one incumbent, we model competition between the two NSPs as a non-cooperative game, in which the incumbent and the entrant choose their market shares independently, and provide a sufficient condition that guarantees the existence of at least one pure Nash equilibrium. Finally, we formalize the problem of entry and spectrum sharing scheme selection for the entrant and provide numerical results to complement our analysis.

preprint2012arXiv

User Subscription, Revenue Maximization, and Competition in Communications Markets

An updated version of this paper (but with a different title) can be found at arXiv:1204.4262

preprint2011arXiv

Business Mode Selection in Digital Content Markets

In this paper, we consider a two-sided digital content market, and study which of the two business modes, i.e., Business-to-Customer (B2C) and Customer-to-Customer (C2C), should be selected and when it should be selected. The considered market is managed by an intermediary, through which content producers can sell their contents to consumers. The intermediary can select B2C or C2C as its business mode, while the content producers and consumers are rational agents that maximize their own utilities. The content producers are differentiated by their content qualities. First, given the intermediary's business mode, we show that there always exists a unique equilibrium at which neither the content producers nor the consumers change their decisions. Moreover, if there are a sufficiently large number of consumers, then the decision process based on the content producers' naive expectation can reach the unique equilibrium. Next, we show that in a market with only one intermediary, C2C should be selected if the intermediary aims at maximizing its profit. Then, by considering a particular scenario where the contents are not highly substitutable, we prove that when the intermediary chooses to maximize the social welfare, C2C should be selected if the content producers can receive sufficient compensation for content sales, and B2C should be selected otherwise.

preprint2010arXiv

Distributed Power Allocation in Multi-User Multi-Channel Relay Networks

This paper has been withdrawn by the authors as they feel it inappropriate to publish this paper for the time being.

Shaolei Ren

What is connected

Connect this record

See the researcher in context

Building this map preview

21 published item(s)

A Brain-Inspired Low-Dimensional Computing Classifier for Inference on Tiny Devices

A Semi-Decoupled Approach to Fast and Optimal Hardware-Software Co-Design of Neural Accelerators

Expert-Calibrated Learning for Online Optimization with Switching Costs

HDLock: Exploiting Privileged Encoding to Protect Hyperdimensional Computing Models against IP Stealing

Informed Learning by Wide Neural Networks: Convergence, Generalization and Sampling Complexity

LeHDC: Learning-Based Hyperdimensional Computing Classifier

ObfuNAS: A Neural Architecture Search-based DNN Obfuscation Approach

A Quantitative Perspective on Values of Domain Knowledge for Machine Learning

A Note on Latency Variability of Deep Neural Networks for Mobile Inference

Adversarial Attacks on Brain-Inspired Hyperdimensional Computing-Based Classifiers

Calibrating Deep Neural Network Classifiers on Out-of-Distribution Datasets

Increasing Trustworthiness of Deep Neural Networks via Accuracy Monitoring

Scaling Up Deep Neural Network Optimization for Edge Inference

Your Noise, My Signal: Exploiting Switching Noise for Stealthy Data Exfiltration from Desktop Computers

Online Learning for Offloading and Autoscaling in Renewable-Powered Mobile Edge Computing

Greening Multi-Tenant Data Center Demand Response

Energy-Efficient Flow Scheduling and Routing with Hard Deadlines in Data Center Networks

Entry and Spectrum Sharing Scheme Selection in Femtocell Markets

User Subscription, Revenue Maximization, and Competition in Communications Markets

Business Mode Selection in Digital Content Markets

Distributed Power Allocation in Multi-User Multi-Channel Relay Networks