Source author record

Jiannong Cao

Jiannong Cao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Distributed, Parallel, and Cluster Computing Networking and Internet Architecture Machine Learning Multiagent Systems Robotics Computer Science and Game Theory Computer Vision cs.CY Databases Formal Languages and Automata Theory Social and Information Networks

Catalog footprint

What is connected

13works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

GroupNL: Low-Resource and Robust CNN Design over Cloud and Device

Deploying Convolutional Neural Network (CNN) models on ubiquitous Internet of Things (IoT) devices in a cloud-assisted manner to provide users with a variety of high-quality services has become mainstream. Most existing studies speed up model cloud training/on-device inference by reducing the number of convolution (Conv) parameters and floating-point operations (FLOPs). However, they usually employ two or more lightweight operations (e.g., depthwise Conv, $1\times1$ cheap Conv) to replace a Conv, which can still affect the model's speedup even with fewer parameters and FLOPs. To this end, we propose the Grouped NonLinear transformation generation method (GroupNL), leveraging data-agnostic, hyperparameters-fixed, and lightweight Nonlinear Transformation Functions (NLFs) to generate diversified feature maps on demand via grouping, thereby reducing resource consumption while improving the robustness of CNNs. First, in a GroupNL Conv layer, a small set of feature maps, i.e., seed feature maps, are generated based on the seed Conv operation. Then, we split seed feature maps into several groups, each with a set of different NLFs, to generate the required number of diversified feature maps with tensor manipulation operators and nonlinear processing in a lightweight manner without additional Conv operations. We further introduce a sparse GroupNL Conv to speed up by reasonably designing the seed Conv groups between the number of input channels and seed feature maps. Experiments conducted on benchmarks and on-device resource measurements demonstrate that the GroupNL Conv is an impressive alternative to Conv layers in baseline models. Specifically, on Icons-50 dataset, the accuracy of GroupNL-ResNet-18 is 2.86% higher than ResNet-18; on ImageNet-C dataset, the accuracy of GroupNL-EfficientNet-ES achieves about 1.1% higher than EfficientNet-ES.

preprint2023arXiv

AdaptCL: Adaptive Continual Learning for Tackling Heterogeneity in Sequential Datasets

Managing heterogeneous datasets that vary in complexity, size, and similarity in continual learning presents a significant challenge. Task-agnostic continual learning is necessary to address this challenge, as datasets with varying similarity pose difficulties in distinguishing task boundaries. Conventional task-agnostic continual learning practices typically rely on rehearsal or regularization techniques. However, rehearsal methods may struggle with varying dataset sizes and regulating the importance of old and new data due to rigid buffer sizes. Meanwhile, regularization methods apply generic constraints to promote generalization but can hinder performance when dealing with dissimilar datasets lacking shared features, necessitating a more adaptive approach. In this paper, we propose AdaptCL, a novel adaptive continual learning method to tackle heterogeneity in sequential datasets. AdaptCL employs fine-grained data-driven pruning to adapt to variations in data complexity and dataset size. It also utilizes task-agnostic parameter isolation to mitigate the impact of varying degrees of catastrophic forgetting caused by differences in data similarity. Through a two-pronged case study approach, we evaluate AdaptCL on both datasets of MNIST Variants and DomainNet, as well as datasets from different domains. The latter include both large-scale, diverse binary-class datasets and few-shot, multi-class datasets. Across all these scenarios, AdaptCL consistently exhibits robust performance, demonstrating its flexibility and general applicability in handling heterogeneous datasets.

preprint2022arXiv

EaaS: A Service-Oriented Edge Computing Framework Towards Distributed Intelligence

Edge computing has become a popular paradigm where services and applications are deployed at the network edge closer to the data sources. It provides applications with outstanding benefits, including reduced response latency and enhanced privacy protection. For emerging advanced applications, such as autonomous vehicles, industrial IoT, and metaverse, further research is needed. This is because such applications demand ultra-low latency, hyper-connectivity, and dynamic and reliable service provision, while existing approaches are inadequate to address the new challenges. Hence, we envision that the future edge computing is moving towards distributed intelligence, where heterogeneous edge nodes collaborate to provide services in large-scale and geo-distributed edge infrastructure. We thereby propose Edge-as-a-Service (EaaS) to enable distributed intelligence. EaaS jointly manages large-scale cross-node edge resources and facilitates edge autonomy, edge-to-edge collaboration, and resource elasticity. These features enable flexible deployment of services and ubiquitous computation and intelligence. We first give an overview of existing edge computing studies and discuss their limitations to articulate the motivation for proposing EaaS. Then, we describe the details of EaaS, including the physical architecture, proposed software framework, and benefits of EaaS. Various application scenarios, such as real-time video surveillance, smart building, and metaverse, are presented to illustrate the significance and potential of EaaS. Finally, we discuss several challenging issues of EaaS to inspire more research towards this new edge computing framework.

preprint2022arXiv

From Multi-agent to Multi-robot: A Scalable Training and Evaluation Platform for Multi-robot Reinforcement Learning

Multi-agent reinforcement learning (MARL) has been gaining extensive attention from academia and industries in the past few decades. One of the fundamental problems in MARL is how to evaluate different approaches comprehensively. Most existing MARL methods are evaluated in either video games or simplistic simulated scenarios. It remains unknown how these methods perform in real-world scenarios, especially multi-robot systems. This paper introduces a scalable emulation platform for multi-robot reinforcement learning (MRRL) called SMART to meet this need. Precisely, SMART consists of two components: 1) a simulation environment that provides a variety of complex interaction scenarios for training and 2) a real-world multi-robot system for realistic performance evaluation. Besides, SMART offers agent-environment APIs that are plug-and-play for algorithm implementation. To illustrate the practicality of our platform, we conduct a case study on the cooperative driving lane change scenario. Building off the case study, we summarize several unique challenges of MRRL, which are rarely considered previously. Finally, we open-source the simulation environments, associated benchmark tasks, and state-of-the-art baselines to encourage and empower MRRL research.

preprint2022arXiv

Hierarchical Reinforcement Learning with Opponent Modeling for Distributed Multi-agent Cooperation

Many real-world applications can be formulated as multi-agent cooperation problems, such as network packet routing and coordination of autonomous vehicles. The emergence of deep reinforcement learning (DRL) provides a promising approach for multi-agent cooperation through the interaction of the agents and environments. However, traditional DRL solutions suffer from the high dimensions of multiple agents with continuous action space during policy search. Besides, the dynamicity of agents' policies makes the training non-stationary. To tackle the issues, we propose a hierarchical reinforcement learning approach with high-level decision-making and low-level individual control for efficient policy search. In particular, the cooperation of multiple agents can be learned in high-level discrete action space efficiently. At the same time, the low-level individual control can be reduced to single-agent reinforcement learning. In addition to hierarchical reinforcement learning, we propose an opponent modeling network to model other agents' policies during the learning process. In contrast to end-to-end DRL approaches, our approach reduces the learning complexity by decomposing the overall task into sub-tasks in a hierarchical way. To evaluate the efficiency of our approach, we conduct a real-world case study in the cooperative lane change scenario. Both simulation and real-world experiments show the superiority of our approach in the collision rate and convergence speed.

preprint2021arXiv

E-Tree Learning: A Novel Decentralized Model Learning Framework for Edge AI

Traditionally, AI models are trained on the central cloud with data collected from end devices. This leads to high communication cost, long response time and privacy concerns. Recently Edge empowered AI, namely Edge AI, has been proposed to support AI model learning and deployment at the network edge closer to the data sources. Existing research including federated learning adopts a centralized architecture for model learning where a central server aggregates the model updates from the clients/workers. The centralized architecture has drawbacks such as performance bottleneck, poor scalability and single point of failure. In this paper, we propose a novel decentralized model learning approach, namely E-Tree, which makes use of a well-designed tree structure imposed on the edge devices. The tree structure and the locations and orders of aggregation on the tree are optimally designed to improve the training convergency and model accuracy. In particular, we design an efficient device clustering algorithm, named by KMA, for E-Tree by taking into account the data distribution on the devices as well as the the network distance. Evaluation results show E-Tree significantly outperforms the benchmark approaches such as federated learning and Gossip learning under NonIID data in terms of model accuracy and convergency.

preprint2020arXiv

Data Dissemination Using Interest Tree in Socially Aware Networking

Socially aware networking (SAN) exploits social characteristics of mobile users to streamline data dissemination protocols in opportunistic environments. Existing protocols in this area utilized various social features such as user interests, social similarity, and community structure to improve the performance of data dissemination. However, the interrelationship between user interests and its impact on the efficiency of data dissemination has not been explored sufficiently. In this paper, we analyze various kinds of relationships between user interests and model them using a layer-based structure in order to form social communities in SAN paradigm. We propose Int-Tree, an Interest-Tree based scheme which uses the relationship between user interests to improve the performance of data dissemination. The core of Int-Tree is the interest-tree, a tree-based community structure that combines two social features, i.e. density of a community and social tie, to support data dissemination. The simulation results show that Int-Tree achieves higher delivery ratio, lower overhead, in comparison to two benchmark protocols, PROPHET and Epidemic routing. In addition, Int-Tree can perform with 1.36 hop counts in average, and tolerable latency in terms of buffer size, time to live (TTL) and simulation duration. Finally, Int-Tree keeps stable performance with various parameters.

preprint2020arXiv

EPARS: Early Prediction of At-risk Students with Online and Offline Learning Behaviors

Early prediction of students at risk (STAR) is an effective and significant means to provide timely intervention for dropout and suicide. Existing works mostly rely on either online or offline learning behaviors which are not comprehensive enough to capture the whole learning processes and lead to unsatisfying prediction performance. We propose a novel algorithm (EPARS) that could early predict STAR in a semester by modeling online and offline learning behaviors. The online behaviors come from the log of activities when students use the online learning management system. The offline behaviors derive from the check-in records of the library. Our main observations are two folds. Significantly different from good students, STAR barely have regular and clear study routines. We devised a multi-scale bag-of-regularity method to extract the regularity of learning behaviors that is robust to sparse data. Second, friends of STAR are more likely to be at risk. We constructed a co-occurrence network to approximate the underlying social network and encode the social homophily as features through network embedding. To validate the proposed algorithm, extensive experiments have been conducted among an Asian university with 15,503 undergraduate students. The results indicate EPARS outperforms baselines by 14.62% ~ 38.22% in predicting STAR.

preprint2015arXiv

Almost Strong Consistency: "Good Enough" in Distributed Storage Systems

A consistency/latency tradeoff arises as soon as a distributed storage system replicates data. For low latency, modern storage systems often settle for weak consistency conditions, which provide little, or even worse, no guarantee for data consistency. In this paper we propose the notion of almost strong consistency as a better balance option for the consistency/latency tradeoff. It provides both deterministically bounded staleness of data versions for each read and probabilistic quantification on the rate of "reading stale values", while achieving low latency. In the context of distributed storage systems, we investigate almost strong consistency in terms of 2-atomicity. Our 2AM (2-Atomicity Maintenance) algorithm completes both reads and writes in one communication round-trip, and guarantees that each read obtains the value of within the latest 2 versions. To quantify the rate of "reading stale values", we decompose the so-called "old-new inversion" phenomenon into concurrency patterns and read-write patterns, and propose a stochastic queueing model and a "timed balls-into-bins model" to analyze them, respectively. The theoretical analysis not only demonstrates that "old-new inversions" rarely occur as expected, but also reveals that the read-write pattern dominates in guaranteeing such rare data inconsistencies. These are further confirmed by the experimental results, showing that 2-atomicity is "good enough" in distributed storage systems by achieving low latency, bounded staleness, and rare data inconsistencies.

preprint2015arXiv

Understanding the Timed Distributed Trace of a Partially Synchronous System at Runtime

It has gained broad attention to understand the timed distributed trace of a cyber-physical system at runtime, which is often achieved by verifying properties over the observed trace of system execution. However, this verification is facing severe challenges. First, in realistic settings, the computing entities only have imperfectly synchronized clocks. A proper timing model is essential to the interpretation of the trace of system execution. Second, the specification should be able to express properties with real-time constraints despite the asynchrony, and the semantics should be interpreted over the currently-observed and continuously-growing trace. To address these challenges, we propose PARO - the partially synchronous system observation framework, which i) adopts the partially synchronous model of time, and introduces the lattice and the timed automata theories to model the trace of system execution; ii) adopts a tailored subset of TCTL to specify temporal properties, and defines the 3-valued semantics to interpret the properties over the currently-observed finite trace; iii) constructs the timed automaton corresponding to the trace at runtime, and reduces the satisfaction of the 3-valued semantics over finite traces to that of the classical boolean semantics over infinite traces. PARO is implemented over MIPA - the open-source middleware we developed. Performance measurements show the cost-effectiveness of PARO in different settings of key environmental factors.

preprint2013arXiv

Verifying PRAM Consistency over Read/Write Traces of Data Replicas

Data replication technologies enable efficient and highly-available data access, thus gaining more and more interests in both the academia and the industry. However, data replication introduces the problem of data consistency. Modern commercial data replication systems often provide weak consistency for high availability under certain failure scenarios. An important weak consistency is Pipelined-RAM (PRAM) consistency. It allows different processes to hold different views of data. To determine whether a data replication system indeed provides PRAM consistency, we study the problem of Verifying PRAM Consistency over read/write traces (or VPC, for short). We first identify four variants of VPC according to a) whether there are Multiple shared variables (or one Single variable), and b) whether write operations can assign Duplicate values (or only Unique values) for each shared variable; the four variants are labeled VPC-SU, VPC-MU, VPC-SD, and VPC-MD. Second, we present a simple VPC-MU algorithm, called RW-CLOSURE. It constructs an operation graph $\mathcal{G}$ by iteratively adding edges according to three rules. Its time complexity is $O(n^5)$, where n is the number of operations in the trace. Third, we present an improved VPC-MU algorithm, called READ-CENTRIC, with time complexity $O(n^4)$. Basically it attempts to construct the operation graph $\mathcal{G}$ in an incremental and efficient way. Its correctness is based on that of RW-CLOSURE. Finally, we prove that VPC-SD (so is VPC-MD) is $\sf{NP}$-complete by reducing the strongly $\sf{NP}$-complete problem 3-PARTITION to it.

preprint2012arXiv

MLLS: Minimum Length Link Scheduling Under Physical Interference Model

We study a fundamental problem called Minimum Length Link Scheduling (MLLS) which is crucial to the efficient operations of wireless networks. Given a set of communication links of arbitrary length spread and assume each link has one unit of traffic demand in wireless networks, the problem MLLS seeks a schedule for all links (to satisfy all demands) of minimum number of time-slots such that the links assigned to the same time-slot do not conflict with each other under the physical interference model. In this paper, we will explore this problem under three important transmission power control settings: linear power control, uniform power control and arbitrary power control. We design a suite of new and novel scheduling algorithms and conduct explicit complexity analysis to demonstrate their efficiency. Our algorithms can account for the presence of background noises in wireless networks. We also investigate the fractional case of the problem MLLS where each link has a fractional demand. We propose an efficient greedy algorithm of the approximation ratio at most $(K+1)^{2}ω$.

preprint2011arXiv

Design of a Sliding Window over Asynchronous Event Streams

The proliferation of sensing and monitoring applications motivates adoption of the event stream model of computation. Though sliding windows are widely used to facilitate effective event stream processing, it is greatly challenged when the event sources are distributed and asynchronous. To address this challenge, we first show that the snapshots of the asynchronous event streams within the sliding window form a convex distributive lattice (denoted by Lat-Win). Then we propose an algorithm to maintain Lat-Win at runtime. The Lat-Win maintenance algorithm is implemented and evaluated on the open-source context-aware middleware we developed. The evaluation results first show the necessity of adopting sliding windows over asynchronous event streams. Then they show the performance of detecting specified predicates within Lat-Win, even when faced with dynamic changes in the computing environment.

Jiannong Cao

What is connected

Connect this record

See the researcher in context

Building this map preview

13 published item(s)

GroupNL: Low-Resource and Robust CNN Design over Cloud and Device

AdaptCL: Adaptive Continual Learning for Tackling Heterogeneity in Sequential Datasets

EaaS: A Service-Oriented Edge Computing Framework Towards Distributed Intelligence

From Multi-agent to Multi-robot: A Scalable Training and Evaluation Platform for Multi-robot Reinforcement Learning

Hierarchical Reinforcement Learning with Opponent Modeling for Distributed Multi-agent Cooperation

E-Tree Learning: A Novel Decentralized Model Learning Framework for Edge AI

Data Dissemination Using Interest Tree in Socially Aware Networking

EPARS: Early Prediction of At-risk Students with Online and Offline Learning Behaviors

Almost Strong Consistency: "Good Enough" in Distributed Storage Systems

Understanding the Timed Distributed Trace of a Partially Synchronous System at Runtime

Verifying PRAM Consistency over Read/Write Traces of Data Replicas

MLLS: Minimum Length Link Scheduling Under Physical Interference Model

Design of a Sliding Window over Asynchronous Event Streams