Source author record

Ziyao Liu

Ziyao Liu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Cryptography and Security Distributed, Parallel, and Cluster Computing Artificial Intelligence Machine Learning

Catalog footprint

What is connected

6works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Efficient Privacy-Preserving Retrieval Augmented Generation with Distance-Preserving Encryption

RAG has emerged as a key technique for enhancing response quality of LLMs without high computational cost. In traditional architectures, RAG services are provided by a single entity that hosts the dataset within a trusted local environment. However, individuals or small organizations often lack the resources to maintain data storage servers, leading them to rely on outsourced cloud storage. This dependence on untrusted third-party services introduces privacy risks. Embedding-based retrieval mechanisms, commonly used in RAG systems, are vulnerable to privacy leakage such as vector-to-text reconstruction attacks and structural leakage via vector analysis. Several privacy-preserving RAG techniques have been proposed but most existing approaches rely on partially homomorphic encryption, which incurs substantial computational overhead. To address these challenges, we propose an efficient privacy-preserving RAG framework (ppRAG) tailored for untrusted cloud environments that defends against vector-to-text attack, vector analysis, and query analysis. We propose Conditional Approximate Distance-Comparison-Preserving Symmetric Encryption (CAPRISE) that encrypts embeddings while still allowing the cloud to compute similarity between an encrypted query and the encrypted database embeddings. CAPRISE preserves only the relative distance ordering between the encrypted query and each encrypted database embedding, without exposing inter-database distances, thereby enhancing both privacy and efficiency. To mitigate query analysis, we introduce DP by perturbing the query embedding prior to encryption, preventing the cloud from inferring sensitive patterns. Experimental results show that ppRAG achieves efficient processing throughput, high retrieval accuracy, strong privacy guarantees, making it a practical solution for resource-constrained users seeking secure cloud-augmented LLMs.

preprint2022arXiv

Efficient Dropout-resilient Aggregation for Privacy-preserving Machine Learning

With the increasing adoption of data-hungry machine learning algorithms, personal data privacy has emerged as one of the key concerns that could hinder the success of digital transformation. As such, Privacy-Preserving Machine Learning (PPML) has received much attention from both academia and industry. However, organizations are faced with the dilemma that, on the one hand, they are encouraged to share data to enhance ML performance, but on the other hand, they could potentially be breaching the relevant data privacy regulations. Practical PPML typically allows multiple participants to individually train their ML models, which are then aggregated to construct a global model in a privacy-preserving manner, e.g., based on multi-party computation or homomorphic encryption. Nevertheless, in most important applications of large-scale PPML, e.g., by aggregating clients' gradients to update a global model for federated learning, such as consumer behavior modeling of mobile application services, some participants are inevitably resource-constrained mobile devices, which may drop out of the PPML system due to their mobility nature. Therefore, the resilience of privacy-preserving aggregation has become an important problem to be tackled. In this paper, we propose a scalable privacy-preserving aggregation scheme that can tolerate dropout by participants at any time, and is secure against both semi-honest and active malicious adversaries by setting proper system parameters. By replacing communication-intensive building blocks with a seed homomorphic pseudo-random generator, and relying on the additive homomorphic property of Shamir secret sharing scheme, our scheme outperforms state-of-the-art schemes by up to 6.37$\times$ in runtime and provides a stronger dropout-resilience. The simplicity of our scheme makes it attractive both for implementation and for further improvements.

preprint2022arXiv

Privacy-Preserving Aggregation in Federated Learning: A Survey

Over the recent years, with the increasing adoption of Federated Learning (FL) algorithms and growing concerns over personal data privacy, Privacy-Preserving Federated Learning (PPFL) has attracted tremendous attention from both academia and industry. Practical PPFL typically allows multiple participants to individually train their machine learning models, which are then aggregated to construct a global model in a privacy-preserving manner. As such, Privacy-Preserving Aggregation (PPAgg) as the key protocol in PPFL has received substantial research interest. This survey aims to fill the gap between a large number of studies on PPFL, where PPAgg is adopted to provide a privacy guarantee, and the lack of a comprehensive survey on the PPAgg protocols applied in FL systems. In this survey, we review the PPAgg protocols proposed to address privacy and security issues in FL systems. The focus is placed on the construction of PPAgg protocols with an extensive analysis of the advantages and disadvantages of these selected PPAgg protocols and solutions. Additionally, we discuss the open-source FL frameworks that support PPAgg. Finally, we highlight important challenges and future research directions for applying PPAgg to FL systems and the combination of PPAgg with other technologies for further security improvement.

preprint2022arXiv

Privacy-preserving Anomaly Detection in Cloud Manufacturing via Federated Transformer

With the rapid development of cloud manufacturing, industrial production with edge computing as the core architecture has been greatly developed. However, edge devices often suffer from abnormalities and failures in industrial production. Therefore, detecting these abnormal situations timely and accurately is crucial for cloud manufacturing. As such, a straightforward solution is that the edge device uploads the data to the cloud for anomaly detection. However, Industry 4.0 puts forward higher requirements for data privacy and security so that it is unrealistic to upload data from edge devices directly to the cloud. Considering the above-mentioned severe challenges, this paper customizes a weakly-supervised edge computing anomaly detection framework, i.e., Federated Learning-based Transformer framework (\textit{FedAnomaly}), to deal with the anomaly detection problem in cloud manufacturing. Specifically, we introduce federated learning (FL) framework that allows edge devices to train an anomaly detection model in collaboration with the cloud without compromising privacy. To boost the privacy performance of the framework, we add differential privacy noise to the uploaded features. To further improve the ability of edge devices to extract abnormal features, we use the Transformer to extract the feature representation of abnormal data. In this context, we design a novel collaborative learning protocol to promote efficient collaboration between FL and Transformer. Furthermore, extensive case studies on four benchmark data sets verify the effectiveness of the proposed framework. To the best of our knowledge, this is the first time integrating FL and Transformer to deal with anomaly detection problems in cloud manufacturing.

preprint2022arXiv

Understanding Security in Smart City Domains From the ANT-centric Perspective

A city is a large human settlement that serves the people who live there, and a smart city is a concept of how cities might better serve their residents through new forms of technology. In this paper, we focus on four major smart city domains according to Maslow's hierarchy of needs: smart utility, smart transportation, smart homes, and smart healthcare. Numerous IoT applications have been developed to achieve the intelligence that we desire in our smart domains, ranging from personal gadgets such as health trackers and smart watches to large-scale industrial IoT systems such as nuclear and energy management systems. However, many of the existing smart city IoT solutions can be made better by considering the suitability of their security strategies. Inappropriate system security designs generally occur in two scenarios: first, system designers recognize the importance of security but are unsure of where, when, or how to implement it; and second, system designers try to fit traditional security designs to meet the smart city security context. Thus, the objective of this paper is to provide application designers with the missing security link they may need to improve their security designs. By evaluating the specific context of each smart city domain and the context-specific security requirements, we aim to provide directions on when, where, and how they should implement security strategies and the possible security challenges they need to consider. In addition, we present a new perspective on security issues in smart cities from a data-centric viewpoint by referring to the reference architecture, the Activity-Network-Things (ANT)-centric architecture, built upon the concept of "security in a zero-trust environment". By doing so, we reduce the security risks posed by new system interactions or unanticipated user behaviors while avoiding the hassle of regularly upgrading security models.

preprint2021arXiv

MPC-enabled Privacy-Preserving Neural Network Training against Malicious Attack

The application of secure multiparty computation (MPC) in machine learning, especially privacy-preserving neural network training, has attracted tremendous attention from the research community in recent years. MPC enables several data owners to jointly train a neural network while preserving the data privacy of each participant. However, most of the previous works focus on semi-honest threat model that cannot withstand fraudulent messages sent by malicious participants. In this paper, we propose an approach for constructing efficient $n$-party protocols for secure neural network training that can provide security for all honest participants even when a majority of the parties are malicious. Compared to the other designs that provide semi-honest security in a dishonest majority setting, our actively secure neural network training incurs affordable efficiency overheads of around 2X and 2.7X in LAN and WAN settings, respectively. Besides, we propose a scheme to allow additive shares defined over an integer ring $\mathbb{Z}_N$ to be securely converted to additive shares over a finite field $\mathbb{Z}_Q$, which may be of independent interest. Such conversion scheme is essential in securely and correctly converting shared Beaver triples defined over an integer ring generated in the preprocessing phase to triples defined over a field to be used in the calculation in the online phase.

Ziyao Liu

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Efficient Privacy-Preserving Retrieval Augmented Generation with Distance-Preserving Encryption

Efficient Dropout-resilient Aggregation for Privacy-preserving Machine Learning

Privacy-Preserving Aggregation in Federated Learning: A Survey

Privacy-preserving Anomaly Detection in Cloud Manufacturing via Federated Transformer

Understanding Security in Smart City Domains From the ANT-centric Perspective

MPC-enabled Privacy-Preserving Neural Network Training against Malicious Attack