Source author record

Ting He

Ting He appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Networking and Internet Architecture Machine Learning math.OC Performance Distributed, Parallel, and Cluster Computing Data Structures and Algorithms Cryptography and Security Information Theory math.IT q-fin.MF

Catalog footprint

What is connected

19works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

Time-varying Mixing Matrix Design for Energy-efficient Decentralized Federated Learning

We consider the design of mixing matrices to minimize the operation cost for decentralized federated learning (DFL) in wireless networks, with focus on minimizing the maximum per-node energy consumption. As a critical hyperparameter for DFL, the mixing matrix controls both the convergence rate and the needs of agent-to-agent communications, and has thus been studied extensively. However, existing designs mostly focused on minimizing the communication time, leaving open the minimization of per-node energy consumption that is critical for energy-constrained devices. This work addresses this gap through a theoretically-justified solution for mixing matrix design that aims at minimizing the maximum per-node energy consumption until convergence, while taking into account the broadcast nature of wireless communications. Based on a novel convergence theorem that allows arbitrarily time-varying mixing matrices, we propose a multi-phase design framework that activates time-varying communication topologies under optimized budgets to trade off the per-iteration energy consumption and the convergence rate while balancing the energy consumption across nodes. Our evaluations based on real data have validated the efficacy of the proposed solution in combining the low energy consumption of sparse mixing matrices and the fast convergence of dense mixing matrices.

preprint2022arXiv

Communication-efficient k-Means for Edge-based Machine Learning

We consider the problem of computing the k-means centers for a large high-dimensional dataset in the context of edge-based machine learning, where data sources offload machine learning computation to nearby edge servers. k-Means computation is fundamental to many data analytics, and the capability of computing provably accurate k-means centers by leveraging the computation power of the edge servers, at a low communication and computation cost to the data sources, will greatly improve the performance of these analytics. We propose to let the data sources send small summaries, generated by joint dimensionality reduction (DR), cardinality reduction (CR), and quantization (QT), to support approximate k-means computation at reduced complexity and communication cost. By analyzing the complexity, the communication cost, and the approximation error of k-means algorithms based on carefully designed composition of DR/CR/QT methods, we show that: (i) it is possible to compute near-optimal k-means centers at a near-linear complexity and a constant or logarithmic communication cost, (ii) the order of applying DR and CR significantly affects the complexity and the communication cost, and (iii) combining DR/CR methods with a properly configured quantizer can further reduce the communication cost without compromising the other performance metrics. Our theoretical analysis has been validated through experiments based on real datasets.

preprint2022arXiv

Joint Coreset Construction and Quantization for Distributed Machine Learning

Coresets are small, weighted summaries of larger datasets, aiming at providing provable error bounds for machine learning (ML) tasks while significantly reducing the communication and computation costs. To achieve a better trade-off between ML error bounds and costs, we propose the first framework to incorporate quantization techniques into the process of coreset construction. Specifically, we theoretically analyze the ML error bounds caused by a combination of coreset construction and quantization. Based on that, we formulate an optimization problem to minimize the ML error under a fixed budget of communication cost. To improve the scalability for large datasets, we identify two proxies of the original objective function, for which efficient algorithms are developed. For the case of data on multiple nodes, we further design a novel algorithm to allocate the communication budget to the nodes while minimizing the overall ML error. Through extensive experiments on multiple real-world datasets, we demonstrate the effectiveness and efficiency of our proposed algorithms for a variety of ML tasks. In particular, our algorithms have achieved more than 90% data reduction with less than 10% degradation in ML performance in most cases.

preprint2022arXiv

Preventing Outages under Coordinated Cyber-Physical Attack with Secured PMUs

Due to the potentially severe consequences of coordinated cyber-physical attacks (CCPA), the design of defenses has gained significant attention. A popular approach is to eliminate the existence of attacks by either securing existing sensors or deploying secured PMUs. In this work, we improve this approach by lowering the defense target from eliminating attacks to preventing outages and reducing the required number of PMUs. To this end, we formulate the problem of PMU Placement for Outage Prevention (PPOP) under DC power flow model as a tri-level non-linear optimization problem and transform it into a bi-level mixed-integer linear programming (MILP) problem. Then, we propose an alternating optimization framework to solve PPOP by iteratively adding constraints, for which we develop two constraint generation algorithms. In addition, for large-scale grids, we propose a polynomial-time heuristic algorithm to obtain suboptimal solutions. Next, we extend our solution to achieve the defense goal under AC power flow model. Finally, we evaluate our algorithm on IEEE 30-bus, 57-bus, 118-bus, and 300-bus systems, which demonstrates the potential of the proposed approach in greatly reducing the required number of PMUs.

preprint2021arXiv

Power Grid State Estimation under General Cyber-Physical Attacks

Effective defense against cyber-physical attacks in power grid requires the capability of accurate damage assessment within the attacked area. While some solutions have been proposed to recover the phase angles and the link status (i.e., breaker status) within the attacked area, existing solutions made the limiting assumption that the grid stays connected after the attack. To fill this gap, we study the problem of recovering the phase angles and the link status under a general cyber-physical attack that may partition the grid into islands. To this end, we (i) show that the existing solutions and recovery conditions still hold if the post-attack power injections in the attacked area are known, and (ii) propose a linear programming-based algorithm that can perfectly recover the link status under certain conditions even if the post-attack power injections are unknown. Our numerical evaluations based on the Polish power grid demonstrate that the proposed algorithm is highly accurate in localizing failed links once the phase angles are known.

preprint2021arXiv

Verifiable Failure Localization in Smart Grid under Cyber-Physical Attacks

Cyber-physical attacks impose a significant threat to the smart grid, as the cyber attack makes it difficult to identify the actual damage caused by the physical attack. To defend against such attacks, various inference-based solutions have been proposed to estimate the states of grid elements (e.g., transmission lines) from measurements outside the attacked area, out of which a few have provided theoretical conditions for guaranteed accuracy. However, these conditions are usually based on the ground truth states and thus not verifiable in practice. To solve this problem, we develop (i) verifiable conditions that can be tested based on only observable information, and (ii) efficient algorithms for verifying the states of links (i.e., transmission lines) within the attacked area based on these conditions. Our numerical evaluations based on the Polish power grid and IEEE 300-bus system demonstrate that the proposed algorithms are highly successful in verifying the states of truly failed links, and can thus greatly help in prioritizing repairs during the recovery process.

preprint2020arXiv

Additive Link Metrics Identification: Proof of Selected Lemmas and Propositions

This is a technical report, containing all the lemma and proposition proofs in paper "Topological Constraints on Identifying Additive Link Metrics via End-to-end Paths Measurements" by Liang Ma, Ting He, Kin K. Leung, Don Towsley, and Ananthram Swami, published in Annual Conference of The International Technology Alliance (ACITA), 2012.

preprint2020arXiv

Nonparametric Predictive Inference for Asian options

Asian option, as one of the path-dependent exotic options, is widely traded in the energy market, either for speculation or hedging. However, it is hard to price, especially the one with the arithmetic average price. The traditional trading procedure is either too restrictive by assuming the distribution of the underlying asset or less rigorous by using the approximation. It is attractive to infer the Asian option price with few assumptions of the underlying asset distribution and adopt to the historical data with a nonparametric method. In this paper, we present a novel approach to price the Asian option from an imprecise statistical aspect. Nonparametric Predictive Inference (NPI) is applied to infer the average value of the future underlying asset price, which attempts to make the prediction reflecting more uncertainty because of the limited information. A rational pairwise trading criterion is also proposed in this paper for the Asian options comparison, as a risk measure. The NPI method for the Asian option is illustrated in several examples by using the simulation techniques or the empirical data from the energy market.

preprint2020arXiv

Online Learning of Facility Locations

In this paper, we provide a rigorous theoretical investigation of an online learning version of the Facility Location problem which is motivated by emerging problems in real-world applications. In our formulation, we are given a set of sites and an online sequence of user requests. At each trial, the learner selects a subset of sites and then incurs a cost for each selected site and an additional cost which is the price of the user's connection to the nearest site in the selected subset. The problem may be solved by an application of the well-known Hedge algorithm. This would, however, require time and space exponential in the number of the given sites, which motivates our design of a novel quasi-linear time algorithm for this problem, with good theoretical guarantees on its performance.

preprint2020arXiv

Robust Coreset Construction for Distributed Machine Learning

Coreset, which is a summary of the original dataset in the form of a small weighted set in the same sample space, provides a promising approach to enable machine learning over distributed data. Although viewed as a proxy of the original dataset, each coreset is only designed to approximate the cost function of a specific machine learning problem, and thus different coresets are often required to solve different machine learning problems, increasing the communication overhead. We resolve this dilemma by developing robust coreset construction algorithms that can support a variety of machine learning problems. Motivated by empirical evidence that suitably-weighted k-clustering centers provide a robust coreset, we harden the observation by establishing theoretical conditions under which the coreset provides a guaranteed approximation for a broad range of machine learning problems, and developing both centralized and distributed algorithms to generate coresets satisfying the conditions. The robustness of the proposed algorithms is verified through extensive experiments on diverse datasets with respect to both supervised and unsupervised learning problems.

preprint2020arXiv

Sharing Models or Coresets: A Study based on Membership Inference Attack

Distributed machine learning generally aims at training a global model based on distributed data without collecting all the data to a centralized location, where two different approaches have been proposed: collecting and aggregating local models (federated learning) and collecting and training over representative data summaries (coreset). While each approach preserves data privacy to some extent thanks to not sharing the raw data, the exact extent of protection is unclear under sophisticated attacks that try to infer the raw data from the shared information. We present the first comparison between the two approaches in terms of target model accuracy, communication cost, and data privacy, where the last is measured by the accuracy of a state-of-the-art attack strategy called the membership inference attack. Our experiments quantify the accuracy-privacy-cost tradeoff of each approach, and reveal a nontrivial comparison that can be used to guide the design of model training processes.

preprint2016arXiv

Dynamic Service Placement for Mobile Micro-Clouds with Predicted Future Costs

Mobile micro-clouds are promising for enabling performance-critical cloud applications. However, one challenge therein is the dynamics at the network edge. In this paper, we study how to place service instances to cope with these dynamics, where multiple users and service instances coexist in the system. Our goal is to find the optimal placement (configuration) of instances to minimize the average cost over time, leveraging the ability of predicting future cost parameters with known accuracy. We first propose an offline algorithm that solves for the optimal configuration in a specific look-ahead time-window. Then, we propose an online approximation algorithm with polynomial time-complexity to find the placement in real-time whenever an instance arrives. We analytically show that the online algorithm is $O(1)$-competitive for a broad family of cost functions. Afterwards, the impact of prediction errors is considered and a method for finding the optimal look-ahead window size is proposed, which minimizes an upper bound of the average actual cost. The effectiveness of the proposed approach is evaluated by simulations with both synthetic and real-world (San Francisco taxi) user-mobility traces. The theoretical methodology used in this paper can potentially be applied to a larger class of dynamic resource allocation problems.

preprint2015arXiv

Mobility-Induced Service Migration in Mobile Micro-Clouds

Mobile micro-cloud is an emerging technology in distributed computing, which is aimed at providing seamless computing/data access to the edge of the network when a centralized service may suffer from poor connectivity and long latency. Different from the traditional cloud, a mobile micro-cloud is smaller and deployed closer to users, typically attached to a cellular basestation or wireless network access point. Due to the relatively small coverage area of each basestation or access point, when a user moves across areas covered by different basestations or access points which are attached to different micro-clouds, issues of service performance and service migration become important. In this paper, we consider such migration issues. We model the general problem as a Markov decision process (MDP), and show that, in the special case where the mobile user follows a one-dimensional asymmetric random walk mobility model, the optimal policy for service migration is a threshold policy. We obtain the analytical solution for the cost resulting from arbitrary thresholds, and then propose an algorithm for finding the optimal thresholds. The proposed algorithm is more efficient than standard mechanisms for solving MDPs.

preprint2015arXiv

Network Capability in Localizing Node Failures via End-to-end Path Measurements

We investigate the capability of localizing node failures in communication networks from binary states (normal/failed) of end-to-end paths. Given a set of nodes of interest, uniquely localizing failures within this set requires that different observable path states associate with different node failure events. However, this condition is difficult to test on large networks due to the need to enumerate all possible node failures. Our first contribution is a set of sufficient/necessary conditions for identifying a bounded number of failures within an arbitrary node set that can be tested in polynomial time. In addition to network topology and locations of monitors, our conditions also incorporate constraints imposed by the probing mechanism used. We consider three probing mechanisms that differ according to whether measurement paths are (i) arbitrarily controllable, (ii) controllable but cycle-free, or (iii) uncontrollable (determined by the default routing protocol). Our second contribution is to quantify the capability of failure localization through (1) the maximum number of failures (anywhere in the network) such that failures within a given node set can be uniquely localized, and (2) the largest node set within which failures can be uniquely localized under a given bound on the total number of failures. Both measures in (1-2) can be converted into functions of a per-node property, which can be computed efficiently based on the above sufficient/necessary conditions. We demonstrate how measures (1-2) proposed for quantifying failure localization capability can be used to evaluate the impact of various parameters, including topology, number of monitors, and probing mechanisms.

preprint2014arXiv

On the Complexity of Optimal Routing and Content Caching in Heterogeneous Networks

We investigate the problem of optimal request routing and content caching in a heterogeneous network supporting in-network content caching with the goal of minimizing average content access delay. Here, content can either be accessed directly from a back-end server (where content resides permanently) or be obtained from one of multiple in-network caches. To access a piece of content, a user must decide whether to route its request to a cache or to the back-end server. Additionally, caches must decide which content to cache. We investigate the problem complexity of two problem formulations, where the direct path to the back-end server is modeled as i) a congestion-sensitive or ii) a congestion-insensitive path, reflecting whether or not the delay of the uncached path to the back-end server depends on the user request load, respectively. We show that the problem is NP-complete in both cases. We prove that under the congestion-insensitive model the problem can be solved optimally in polynomial time if each piece of content is requested by only one user, or when there are at most two caches in the network. We also identify a structural property of the user-cache graph that potentially makes the problem NP-complete. For the congestion-sensitive model, we prove that the problem remains NP-complete even if there is only one cache in the network and each content is requested by only one user. We show that approximate solutions can be found for both models within a (1-1/e) factor of the optimal solution, and demonstrate a greedy algorithm that is found to be within 1% of optimal for small problem sizes. Through trace-driven simulations we evaluate the performance of our greedy algorithms, which show up to a 50% reduction in average delay over solutions based on LRU content caching.

preprint2014arXiv

Optimal Caching and Routing in Hybrid Networks

Hybrid networks consisting of MANET nodes and cellular infrastructure have been recently proposed to improve the performance of military networks. Prior work has demonstrated the benefits of in-network content caching in a wired, Internet context. We investigate the problem of developing optimal routing and caching policies in a hybrid network supporting in-network caching with the goal of minimizing overall content-access delay. Here, needed content may always be accessed at a back-end server via the cellular infrastructure; alternatively, content may also be accessed via cache-equipped "cluster" nodes within the MANET. To access content, MANET nodes must thus decide whether to route to in-MANET cluster nodes or to back-end servers via the cellular infrastructure; the in-MANET cluster nodes must additionally decide which content to cache. We model the cellular path as either i) a congestion-insensitive fixed-delay path or ii) a congestion-sensitive path modeled as an M/M/1 queue. We demonstrate that under the assumption of stationary, independent requests, it is optimal to adopt static caching (i.e., to keep a cache's content fixed over time) based on content popularity. We also show that it is optimal to route to in-MANET caches for content cached there, but to route requests for remaining content via the cellular infrastructure for the congestion-insensitive case and to split traffic between the in-MANET caches and cellular infrastructure for the congestion-sensitive case. We develop a simple distributed algorithm for the joint routing/caching problem and demonstrate its efficacy via simulation.

preprint2011arXiv

Optimal Deadline Scheduling with Commitment

We consider an online preemptive scheduling problem where jobs with deadlines arrive sporadically. A commitment requirement is imposed such that the scheduler has to either accept or decline a job immediately upon arrival. The scheduler's decision to accept an arriving job constitutes a contract with the customer; if the accepted job is not completed by its deadline as promised, the scheduler loses the value of the corresponding job and has to pay an additional penalty depending on the amount of unfinished workload. The objective of the online scheduler is to maximize the overall profit, i.e., the total value of the admitted jobs completed before their deadlines less the penalty paid for the admitted jobs that miss their deadlines. We show that the maximum competitive ratio is $3-2\sqrt{2}$ and propose a simple online algorithm to achieve this competitive ratio. The optimal scheduling includes a threshold admission and a greedy scheduling policies. The proposed algorithm has direct applications to the charging of plug-in hybrid electrical vehicles (PHEV) at garages or parking lots.

preprint2011arXiv

Seeing Through Black Boxes : Tracking Transactions through Queues under Monitoring Resource Constraints

The problem of optimal allocation of monitoring resources for tracking transactions progressing through a distributed system, modeled as a queueing network, is considered. Two forms of monitoring information are considered, viz., locally unique transaction identifiers, and arrival and departure timestamps of transactions at each processing queue. The timestamps are assumed available at all the queues but in the absence of identifiers, only enable imprecise tracking since parallel processing can result in out-of-order departures. On the other hand, identifiers enable precise tracking but are not available without proper instrumentation. Given an instrumentation budget, only a subset of queues can be selected for production of identifiers, while the remaining queues have to resort to imprecise tracking using timestamps. The goal is then to optimally allocate the instrumentation budget to maximize the overall tracking accuracy. The challenge is that the optimal allocation strategy depends on accuracies of timestamp-based tracking at different queues, which has complex dependencies on the arrival and service processes, and the queueing discipline. We propose two simple heuristics for allocation by predicting the order of timestamp-based tracking accuracies of different queues. We derive sufficient conditions for these heuristics to achieve optimality through the notion of stochastic comparison of queues. Simulations show that our heuristics are close to optimality, even when the parameters deviate from these conditions.

preprint2011arXiv

The Embedding Capacity of Information Flows Under Renewal Traffic

Given two independent point processes and a certain rule for matching points between them, what is the fraction of matched points over infinitely long streams? In many application contexts, e.g., secure networking, a meaningful matching rule is that of a maximum causal delay, and the problem is related to embedding a flow of packets in cover traffic such that no traffic analysis can detect it. We study the best undetectable embedding policy and the corresponding maximum flow rate ---that we call the embedding capacity--- under the assumption that the cover traffic can be modeled as arbitrary renewal processes. We find that computing the embedding capacity requires the inversion of very structured linear systems that, for a broad range of renewal models encountered in practice, admits a fully analytical expression in terms of the renewal function of the processes. Our main theoretical contribution is a simple closed form of such relationship. This result enables us to explore properties of the embedding capacity, obtaining closed-form solutions for selected distribution families and a suite of sufficient conditions on the capacity ordering. We evaluate our solution on real network traces, which shows a noticeable match for tight delay constraints. A gap between the predicted and the actual embedding capacities appears for looser constraints, and further investigation reveals that it is caused by inaccuracy of the renewal traffic model rather than of the solution itself.

Ting He

What is connected

Connect this record

See the researcher in context

Building this map preview

19 published item(s)

Time-varying Mixing Matrix Design for Energy-efficient Decentralized Federated Learning

Communication-efficient k-Means for Edge-based Machine Learning

Joint Coreset Construction and Quantization for Distributed Machine Learning

Preventing Outages under Coordinated Cyber-Physical Attack with Secured PMUs

Power Grid State Estimation under General Cyber-Physical Attacks

Verifiable Failure Localization in Smart Grid under Cyber-Physical Attacks

Additive Link Metrics Identification: Proof of Selected Lemmas and Propositions

Nonparametric Predictive Inference for Asian options

Online Learning of Facility Locations

Robust Coreset Construction for Distributed Machine Learning

Sharing Models or Coresets: A Study based on Membership Inference Attack

Dynamic Service Placement for Mobile Micro-Clouds with Predicted Future Costs

Mobility-Induced Service Migration in Mobile Micro-Clouds

Network Capability in Localizing Node Failures via End-to-end Path Measurements

On the Complexity of Optimal Routing and Content Caching in Heterogeneous Networks

Optimal Caching and Routing in Hybrid Networks

Optimal Deadline Scheduling with Commitment

Seeing Through Black Boxes : Tracking Transactions through Queues under Monitoring Resource Constraints

The Embedding Capacity of Information Flows Under Renewal Traffic