Researcher profile

Shangguang Wang

Shangguang Wang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2026arXiv

GroupNL: Low-Resource and Robust CNN Design over Cloud and Device

Deploying Convolutional Neural Network (CNN) models on ubiquitous Internet of Things (IoT) devices in a cloud-assisted manner to provide users with a variety of high-quality services has become mainstream. Most existing studies speed up model cloud training/on-device inference by reducing the number of convolution (Conv) parameters and floating-point operations (FLOPs). However, they usually employ two or more lightweight operations (e.g., depthwise Conv, $1\times1$ cheap Conv) to replace a Conv, which can still affect the model's speedup even with fewer parameters and FLOPs. To this end, we propose the Grouped NonLinear transformation generation method (GroupNL), leveraging data-agnostic, hyperparameters-fixed, and lightweight Nonlinear Transformation Functions (NLFs) to generate diversified feature maps on demand via grouping, thereby reducing resource consumption while improving the robustness of CNNs. First, in a GroupNL Conv layer, a small set of feature maps, i.e., seed feature maps, are generated based on the seed Conv operation. Then, we split seed feature maps into several groups, each with a set of different NLFs, to generate the required number of diversified feature maps with tensor manipulation operators and nonlinear processing in a lightweight manner without additional Conv operations. We further introduce a sparse GroupNL Conv to speed up by reasonably designing the seed Conv groups between the number of input channels and seed feature maps. Experiments conducted on benchmarks and on-device resource measurements demonstrate that the GroupNL Conv is an impressive alternative to Conv layers in baseline models. Specifically, on Icons-50 dataset, the accuracy of GroupNL-ResNet-18 is 2.86% higher than ResNet-18; on ImageNet-C dataset, the accuracy of GroupNL-EfficientNet-ES achieves about 1.1% higher than EfficientNet-ES.

preprint2026arXiv

Knowledge Poisoning Attacks on Medical Multi-Modal Retrieval-Augmented Generation

Retrieval-augmented generation (RAG) is a widely adopted paradigm for enhancing LLMs in medical applications by incorporating expert multimodal knowledge during generation. However, the underlying retrieval databases may naturally contain, or be intentionally injected with, adversarial knowledge, which can perturb model outputs and undermine system reliability. To investigate this risk, prior studies have explored knowledge poisoning attacks in medical RAG systems. Nevertheless, most of them rely on the strong assumption that adversaries possess prior knowledge of user queries, which is unrealistic in deployments and substantially limits their practical applicability. In this paper, we propose M\textsuperscript{3}Att, a knowledge-poisoning framework designed for medical multimodal RAG systems, assuming only limited distribution knowledge of the underlying database. Our core idea is to inject covert misinformation into textual data while using paired visual data as a query-agnostic trigger to promote retrieval. We first propose a unified framework that introduces imperceptible perturbations to visual inputs to manipulate retrieval probabilities. Besides, due to the prior medical knowledge in LLMs, naively poisoned medical content with explicit factual errors can be corrected during generation. Thus, we leverage the inherent ambiguity of medical diagnosis and design a covert misinformation injection strategy that degrades diagnostic accuracy while evading model self-correction. Experiments on five LLMs and datasets demonstrate that M\textsuperscript{3}Att consistently produces clinically plausible yet incorrect generations. Codes: https://github.com/ypr17/M3Att.

preprint2022arXiv

Benchmarking of DL Libraries and Models on Mobile Devices

Deploying deep learning (DL) on mobile devices has been a notable trend in recent years. To support fast inference of on-device DL, DL libraries play a critical role as algorithms and hardware do. Unfortunately, no prior work ever dives deep into the ecosystem of modern DL libs and provides quantitative results on their performance. In this paper, we first build a comprehensive benchmark that includes 6 representative DL libs and 15 diversified DL models. We then perform extensive experiments on 10 mobile devices, which help reveal a complete landscape of the current mobile DL libs ecosystem. For example, we find that the best-performing DL lib is severely fragmented across different models and hardware, and the gap between those DL libs can be rather huge. In fact, the impacts of DL libs can overwhelm the optimizations from algorithms or hardware, e.g., model quantization and GPU/DSP-based heterogeneous computing. Finally, atop the observations, we summarize practical implications to different roles in the DL lib ecosystem.

preprint2022arXiv

Device-centric Federated Analytics At Ease

Nowadays, high-volume and privacy-sensitive data are generated by mobile devices, which are better to be preserved on devices and queried on demand. However, data analysts still lack a uniform way to harness such distributed on-device data. In this paper, we propose a data querying system, Deck, that enables flexible device-centric federated analytics. The key idea of Deck is to bypass the app developers but allow the data analysts to directly submit their analytics code to run on devices, through a centralized query coordinator service. Deck provides a list of standard APIs to data analysts and handles most of the device-specific tasks underneath. Deck further incorporates two key techniques: (i) a hybrid permission checking mechanism and mandatory cross-device aggregation to ensure data privacy; (ii) a zero-knowledge statistical model that judiciously trades off query delay and query resource expenditure on devices. We fully implement Deck and plug it into 20 popular Android apps. An in-the-wild deployment on 1,642 volunteers shows that Deck significantly reduces the query delay by up to 30x compared to baselines. Our microbenchmarks also demonstrate that the standalone overhead of Deck is negligible.

preprint2022arXiv

Federated Neural Architecture Search

To preserve user privacy while enabling mobile intelligence, techniques have been proposed to train deep neural networks on decentralized data. However, training over decentralized data makes the design of neural architecture quite difficult as it already was. Such difficulty is further amplified when designing and deploying different neural architectures for heterogeneous mobile platforms. In this work, we propose an automatic neural architecture search into the decentralized training, as a new DNN training paradigm called Federated Neural Architecture Search, namely federated NAS. To deal with the primary challenge of limited on-client computational and communication resources, we present FedNAS, a highly optimized framework for efficient federated NAS. FedNAS fully exploits the key opportunity of insufficient model candidate re-training during the architecture search process, and incorporates three key optimizations: parallel candidates training on partial clients, early dropping candidates with inferior performance, and dynamic round numbers. Tested on large-scale datasets and typical CNN architectures, FedNAS achieves comparable model accuracy as state-of-the-art NAS algorithm that trains models with centralized data, and also reduces the client cost by up to two orders of magnitude compared to a straightforward design of federated NAS.

preprint2022arXiv

From Earth to Space: A First Deployment of 5G Core Network on Satellite

Recent developments in the aerospace industry have led to a dramatic reduction in the manufacturing and launch costs of low Earth orbit satellites. The new trend enables the paradigm shift of satellite-terrestrial integrated networks with global coverage. In particular, the integration of 5G communication systems and satellites has the potential to restructure next-generation mobile networks. By leveraging the network function virtualization and network slicing, the orbital 5G core networks will facilitate the coordination and management of network functions in satellite-terrestrial integrated networks. We are the first to deploy a lightweight 5G core network on a real-world satellite to investigate its feasibility. We conducted experiments to validate the onboard 5G core network functions. The validated procedures include registration and session setup procedures. The results show that the 5G core network can function normally and generate correct signaling.

preprint2022arXiv

Incorporating Distributed DRL into Storage Resource Optimization of Space-Air-Ground Integrated Wireless Communication Network

Space-air-ground integrated network (SAGIN) is a new type of wireless network mode. The effective management of SAGIN resources is a prerequisite for high-reliability communication. However, the storage capacity of space-air network segment is extremely limited. The air servers also do not have sufficient storage resources to centrally accommodate the information uploaded by each edge server. So the problem of how to coordinate the storage resources of SAGIN has arisen. This paper proposes a SAGIN storage resource management algorithm based on distributed deep reinforcement learning (DRL). The resource management process is modeled as a Markov decision model. In each edge physical domain, we extract the network attributes represented by storage resources for the agent to build a training environment, so as to realize the distributed training. In addition, we propose a SAGIN resource management framework based on distributed DRL. Simulation results show that the agent has an ideal training effect. Compared with other algorithms, the resource allocation revenue and user request acceptance rate of the proposed algorithm are increased by about 18.15\% and 8.35\% respectively. Besides, the proposed algorithm has good flexibility in dealing with the changes of resource conditions.

preprint2022arXiv

Mandheling: Mixed-Precision On-Device DNN Training with DSP Offloading

This paper proposes Mandheling, the first system that enables highly resource-efficient on-device training by orchestrating the mixed-precision training with on-chip Digital Signal Processing (DSP) offloading. Mandheling fully explores the advantages of DSP in integer-based numerical calculation by four novel techniques: (1) a CPU-DSP co-scheduling scheme to mitigate the overhead from DSP-unfriendly operators; (2) a self-adaptive rescaling algorithm to reduce the overhead of dynamic rescaling in backward propagation; (3) a batch-splitting algorithm to improve the DSP cache efficiency; (4) a DSP-compute subgraph reusing mechanism to eliminate the preparation overhead on DSP. We have fully implemented Mandheling and demonstrate its effectiveness through extensive experiments. The results show that, compared to the state-of-the-art DNN engines from TFLite and MNN, Mandheling reduces the per-batch training time by 5.5$\times$ and the energy consumption by 8.9$\times$ on average. In end-to-end training tasks, Mandheling reduces up to 10.7$\times$ convergence time and 13.1$\times$ energy consumption, with only 1.9%-2.7% accuracy loss compared to the FP32 precision setting.

preprint2022arXiv

Towards Sustainable Satellite Edge Computing

Recently, Low Earth Orbit (LEO) satellites experience rapid development and satellite edge computing emerges to address the limitation of bent-pipe architecture in existing satellite systems. Introducing energy-consuming computing components in satellite edge computing increases the depth of battery discharge. This will shorten batteries' life and influences the satellites' operation in orbit. In this paper, we aim to extend batteries' life by minimizing the depth of discharge for Earth observation missions. Facing the challenges of wireless uncertainty and energy harvesting dynamics, our work develops an online energy scheduling algorithm within an online convex optimization framework. Our algorithm achieves sub-linear regret and the constraint violation asymptotically approaches zero. Simulation results show that our algorithm can reduce the depth of discharge significantly.

preprint2021arXiv

Tiansuan Constellation: An Open Research Platform

Satellite network is the first step of interstellar voyages. It can provide global Internet connectivity everywhere on earth, where most areas cannot access the Internet by the terrestrial infrastructure due to the geographic accessibility and high cost. The space industry experiences a rise in large low-earth-orbit satellite constellations to achieve universal connectivity. The research community is also urgent to do some leading research to bridge the connectivity divide. Researchers now conduct their work by simulation, which is far from enough. However, experiments on real satellites are blocked by the high threshold of space technology, such as deployment cost and unknown risks. To solve the above dilemma, we are eager to contribute to the universal connectivity and build an open research platform, Tiansuan constellation to support experiments on real satellite networks. We discuss the potential research topics that would benefit from Tiansuan constellation. We provide two case studies that have already deployed in two experimental satellites of Tiansuan constellation.

preprint2020arXiv

Cooperative Service Caching and Workload Scheduling in Mobile Edge Computing

Mobile edge computing is beneficial to reduce service response time and core network traffic by pushing cloud functionalities to network edge. Equipped with storage and computation capacities, edge nodes can cache services of resource-intensive and delay-sensitive mobile applications and process the corresponding computation tasks without outsourcing to central clouds. However, the heterogeneity of edge resource capacities and inconsistence of edge storage and computation capacities make it difficult to jointly fully utilize the storage and computation capacities when there is no cooperation among edge nodes. To address this issue, we consider cooperation among edge nodes and investigate cooperative service caching and workload scheduling in mobile edge computing. This problem can be formulated as a mixed integer nonlinear programming problem, which has non-polynomial computation complexity. To overcome the challenges of subproblem coupling, computation-communication tradeoff, and edge node heterogeneity, we develop an iterative algorithm called ICE. This algorithm is designed based on Gibbs sampling, which has provably near-optimal results, and the idea of water filling, which has polynomial computation complexity. Simulations are conducted and the results demonstrate that our algorithm can jointly reduce the service response time and the outsourcing traffic compared with the benchmark algorithms.