Source author record

Minghong Fang

Minghong Fang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Cryptography and Security Distributed, Parallel, and Cluster Computing Networking and Internet Architecture Information Retrieval Artificial Intelligence Human-Computer Interaction

Catalog footprint

What is connected

8works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Network Digital Untwinning: Towards Backward Optimization of Digital Twins

Network digital twins (NDTs) are transforming network management by offering precise virtual replicas of physical network systems. However, their reliance on diverse and sensitive data introduces significant challenges related to data management, regulatory compliance, and user privacy. In scenarios where selective data removal is necessary, such as device deactivation, network reconfiguration, or regulatory compliance, traditional approaches often fall short of preserving the integrity of the twin model. To address this gap, we introduce a network digital untwinning framework that enables the targeted removal of deprecated NDT contributions while maintaining model integrity. Our approach comprises two complementary components: Single Request Untwinning (\algO) and Parallel Request Untwinning (\algM) mechanisms. \algO leverages connectivity metrics based on geographical proximity, data distribution, and network-level attributes to identify and remove the target NDT along with its propagating influence. This is achieved through an optimally selected rollback checkpoint augmented with injected Gaussian noise, followed by a precise remapping phase. \algM extends this mechanism to efficiently handle multiple removal requests by clustering NDTs with similar attributes and performing a coordinated rollback and untwinning schedule. We provide theoretical guarantees on model indistinguishability from scratch-built twins, and validate the framework through extensive experiments on real-world traffic data, demonstrating its effectiveness and operational efficiency.

preprint2026arXiv

Practical Poisoning Attacks against Retrieval-Augmented Generation

Large language models (LLMs) have demonstrated impressive natural language processing abilities but face challenges such as hallucination and outdated knowledge. Retrieval-Augmented Generation (RAG) has emerged as a state-of-the-art approach to mitigate these issues. While RAG enhances LLM outputs, it remains vulnerable to poisoning attacks. Recent studies show that injecting poisoned text into the knowledge database can compromise RAG systems, but most existing attacks assume that the attacker can insert a sufficient number of poisoned texts per query to outnumber correct-answer texts in retrieval, an assumption that is often unrealistic. To address this limitation, we propose CorruptRAG, a practical poisoning attack against RAG systems in which the attacker injects only a single poisoned text, enhancing both feasibility and stealth. Extensive experiments conducted on multiple large-scale datasets demonstrate that CorruptRAG achieves higher attack success rates than existing baselines.

preprint2022arXiv

FLTrust: Byzantine-robust Federated Learning via Trust Bootstrapping

Byzantine-robust federated learning aims to enable a service provider to learn an accurate global model when a bounded number of clients are malicious. The key idea of existing Byzantine-robust federated learning methods is that the service provider performs statistical analysis among the clients' local model updates and removes suspicious ones, before aggregating them to update the global model. However, malicious clients can still corrupt the global models in these methods via sending carefully crafted local model updates to the service provider. The fundamental reason is that there is no root of trust in existing federated learning methods. In this work, we bridge the gap via proposing FLTrust, a new federated learning method in which the service provider itself bootstraps trust. In particular, the service provider itself collects a clean small training dataset (called root dataset) for the learning task and the service provider maintains a model (called server model) based on it to bootstrap trust. In each iteration, the service provider first assigns a trust score to each local model update from the clients, where a local model update has a lower trust score if its direction deviates more from the direction of the server model update. Then, the service provider normalizes the magnitudes of the local model updates such that they lie in the same hyper-sphere as the server model update in the vector space. Our normalization limits the impact of malicious local model updates with large magnitudes. Finally, the service provider computes the average of the normalized local model updates weighted by their trust scores as a global model update, which is used to update the global model. Our extensive evaluations on six datasets from different domains show that our FLTrust is secure against both existing attacks and strong adaptive attacks.

preprint2022arXiv

NET-FLEET: Achieving Linear Convergence Speedup for Fully Decentralized Federated Learning with Heterogeneous Data

Federated learning (FL) has received a surge of interest in recent years thanks to its benefits in data privacy protection, efficient communication, and parallel data processing. Also, with appropriate algorithmic designs, one could achieve the desirable linear speedup for convergence effect in FL. However, most existing works on FL are limited to systems with i.i.d. data and centralized parameter servers and results on decentralized FL with heterogeneous datasets remains limited. Moreover, whether or not the linear speedup for convergence is achievable under fully decentralized FL with data heterogeneity remains an open question. In this paper, we address these challenges by proposing a new algorithm, called NET-FLEET, for fully decentralized FL systems with data heterogeneity. The key idea of our algorithm is to enhance the local update scheme in FL (originally intended for communication efficiency) by incorporating a recursive gradient correction technique to handle heterogeneous datasets. We show that, under appropriate parameter settings, the proposed NET-FLEET algorithm achieves a linear speedup for convergence. We further conduct extensive numerical experiments to evaluate the performance of the proposed NET-FLEET algorithm and verify our theoretical findings.

preprint2021arXiv

Data Poisoning Attacks and Defenses to Crowdsourcing Systems

A key challenge of big data analytics is how to collect a large volume of (labeled) data. Crowdsourcing aims to address this challenge via aggregating and estimating high-quality data (e.g., sentiment label for text) from pervasive clients/users. Existing studies on crowdsourcing focus on designing new methods to improve the aggregated data quality from unreliable/noisy clients. However, the security aspects of such crowdsourcing systems remain under-explored to date. We aim to bridge this gap in this work. Specifically, we show that crowdsourcing is vulnerable to data poisoning attacks, in which malicious clients provide carefully crafted data to corrupt the aggregated data. We formulate our proposed data poisoning attacks as an optimization problem that maximizes the error of the aggregated data. Our evaluation results on one synthetic and two real-world benchmark datasets demonstrate that the proposed attacks can substantially increase the estimation errors of the aggregated data. We also propose two defenses to reduce the impact of malicious clients. Our empirical results show that the proposed defenses can substantially reduce the estimation errors of the data poisoning attacks.

preprint2020arXiv

Influence Function based Data Poisoning Attacks to Top-N Recommender Systems

Recommender system is an essential component of web services to engage users. Popular recommender systems model user preferences and item properties using a large amount of crowdsourced user-item interaction data, e.g., rating scores; then top-$N$ items that match the best with a user's preference are recommended to the user. In this work, we show that an attacker can launch a data poisoning attack to a recommender system to make recommendations as the attacker desires via injecting fake users with carefully crafted user-item interaction data. Specifically, an attacker can trick a recommender system to recommend a target item to as many normal users as possible. We focus on matrix factorization based recommender systems because they have been widely deployed in industry. Given the number of fake users the attacker can inject, we formulate the crafting of rating scores for the fake users as an optimization problem. However, this optimization problem is challenging to solve as it is a non-convex integer programming problem. To address the challenge, we develop several techniques to approximately solve the optimization problem. For instance, we leverage influence function to select a subset of normal users who are influential to the recommendations and solve our formulated optimization problem based on these influential users. Our results show that our attacks are effective and outperform existing methods.

preprint2020arXiv

Private and Communication-Efficient Edge Learning: A Sparse Differential Gaussian-Masking Distributed SGD Approach

With rise of machine learning (ML) and the proliferation of smart mobile devices, recent years have witnessed a surge of interest in performing ML in wireless edge networks. In this paper, we consider the problem of jointly improving data privacy and communication efficiency of distributed edge learning, both of which are critical performance metrics in wireless edge network computing. Toward this end, we propose a new decentralized stochastic gradient method with sparse differential Gaussian-masked stochastic gradients (SDM-DSGD) for non-convex distributed edge learning. Our main contributions are three-fold: i) We theoretically establish the privacy and communication efficiency performance guarantee of our SDM-DSGD method, which outperforms all existing works; ii) We show that SDM-DSGD improves the fundamental training-privacy trade-off by {\em two orders of magnitude} compared with the state-of-the-art. iii) We reveal theoretical insights and offer practical design guidelines for the interactions between privacy preservation and communication efficiency, two conflicting performance goals. We conduct extensive experiments with a variety of learning models on MNIST and CIFAR-10 datasets to verify our theoretical findings. Collectively, our results contribute to the theory and algorithm design for distributed edge learning.

preprint2020arXiv

Toward Low-Cost and Stable Blockchain Networks

Envisioned to be the future of secured distributed systems, blockchain networks have received increasing attention from both the industry and academia in recent years. However, blockchain mining processes demand high hardware costs and consume a vast amount of energy (studies have shown that the amount of energy consumed in Bitcoin mining is almost the same as the electricity used in Ireland). To address the high mining cost problem of blockchain networks, in this paper, we propose a blockchain mining resources allocation algorithm to reduce the mining cost in PoW-based (proof-of-work-based) blockchain networks. We first propose an analytical queueing model for general blockchain networks. In our queueing model, transactions arrive randomly to the queue and are served in a batch manner with unknown service rate probability distribution and agnostic to any priority mechanism. Then, we leverage the Lyapunov optimization techniques to propose a dynamic mining resources allocation algorithm (DMRA), which is parameterized by a tuning parameter $K>0$. We show that our algorithm achieves an $[O(1/K), O(K)]$ cost-optimality-gap-vs-delay tradeoff. Our simulation results also demonstrate the effectiveness of DMRA in reducing mining costs.

Minghong Fang

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Network Digital Untwinning: Towards Backward Optimization of Digital Twins

Practical Poisoning Attacks against Retrieval-Augmented Generation

FLTrust: Byzantine-robust Federated Learning via Trust Bootstrapping

NET-FLEET: Achieving Linear Convergence Speedup for Fully Decentralized Federated Learning with Heterogeneous Data

Data Poisoning Attacks and Defenses to Crowdsourcing Systems

Influence Function based Data Poisoning Attacks to Top-N Recommender Systems

Private and Communication-Efficient Edge Learning: A Sparse Differential Gaussian-Masking Distributed SGD Approach

Toward Low-Cost and Stable Blockchain Networks