Source author record

Ximeng Liu

Ximeng Liu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Cryptography and Security Machine Learning Computer Vision Artificial Intelligence cond-mat.mtrl-sci Data Structures and Algorithms Discrete Mathematics Distributed, Parallel, and Cluster Computing Networking and Internet Architecture Neural and Evolutionary Computing

Catalog footprint

What is connected

16works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

NeuroRisk: Physics-Informed Neural Optimization for Risk-Aware Traffic Engineering

In production Wide-Area Networks (WANs), correlated failures dominate availability losses, forcing operators to reserve large safety margins that leave substantial capacity underutilized. Achieving high utilization under strict availability targets therefore requires risk-aware Traffic Engineering (TE) over dozens to hundreds of probabilistic failure scenarios-yet solving this problem at operational timescales remains elusive. We demonstrate that existing risk-aware formulations can be unified under an embedded Sort-and-Select structure, exposing a fundamental trade-off between expressiveness and tractability: classical optimizers either restrict scenario selection for efficiency or incur prohibitive decomposition costs. While deep learning appears promising, prior Deep TE methods mainly target maximum link utilization and rely on scaling-based feasibility, which fundamentally breaks under explicit capacity constraints and scenario-dependent risk. We present NeuroRisk, a physics-informed deep unrolled optimizer that exploits the structure of Sort-and-Select. NeuroRisk enforces feasibility via gated edge-local reservations and represents scenario sets through permutation-invariant, gradient-aligned cues. Evaluations on production-style WANs show that NeuroRisk achieves small optimality gaps relative to the solver with orders of magnitude speedup $(10^2- 10^5 \times)$ on risk objectives, while outperforming neural baselines on nominal throughput.

preprint2022arXiv

Backdoor Defense with Machine Unlearning

Backdoor injection attack is an emerging threat to the security of neural networks, however, there still exist limited effective defense methods against the attack. In this paper, we propose BAERASE, a novel method that can erase the backdoor injected into the victim model through machine unlearning. Specifically, BAERASE mainly implements backdoor defense in two key steps. First, trigger pattern recovery is conducted to extract the trigger patterns infected by the victim model. Here, the trigger pattern recovery problem is equivalent to the one of extracting an unknown noise distribution from the victim model, which can be easily resolved by the entropy maximization based generative model. Subsequently, BAERASE leverages these recovered trigger patterns to reverse the backdoor injection procedure and induce the victim model to erase the polluted memories through a newly designed gradient ascent based machine unlearning method. Compared with the previous machine unlearning solutions, the proposed approach gets rid of the reliance on the full access to training data for retraining and shows higher effectiveness on backdoor erasing than existing fine-tuning or pruning methods. Moreover, experiments show that BAERASE can averagely lower the attack success rates of three kinds of state-of-the-art backdoor attacks by 99\% on four benchmark datasets.

preprint2022arXiv

Defense against Backdoor Attacks via Identifying and Purifying Bad Neurons

The opacity of neural networks leads their vulnerability to backdoor attacks, where hidden attention of infected neurons is triggered to override normal predictions to the attacker-chosen ones. In this paper, we propose a novel backdoor defense method to mark and purify the infected neurons in the backdoored neural networks. Specifically, we first define a new metric, called benign salience. By combining the first-order gradient to retain the connections between neurons, benign salience can identify the infected neurons with higher accuracy than the commonly used metric in backdoor defense. Then, a new Adaptive Regularization (AR) mechanism is proposed to assist in purifying these identified infected neurons via fine-tuning. Due to the ability to adapt to different magnitudes of parameters, AR can provide faster and more stable convergence than the common regularization mechanism in neuron purifying. Extensive experimental results demonstrate that our method can erase the backdoor in neural networks with negligible performance degradation.

preprint2022arXiv

Enhance transferability of adversarial examples with model architecture

Transferability of adversarial examples is of critical importance to launch black-box adversarial attacks, where attackers are only allowed to access the output of the target model. However, under such a challenging but practical setting, the crafted adversarial examples are always prone to overfitting to the proxy model employed, presenting poor transferability. In this paper, we suggest alleviating the overfitting issue from a novel perspective, i.e., designing a fitted model architecture. Specifically, delving the bottom of the cause of poor transferability, we arguably decompose and reconstruct the existing model architecture into an effective model architecture, namely multi-track model architecture (MMA). The adversarial examples crafted on the MMA can maximumly relieve the effect of model-specified features to it and toward the vulnerable directions adopted by diverse architectures. Extensive experimental evaluation demonstrates that the transferability of adversarial examples based on the MMA significantly surpass other state-of-the-art model architectures by up to 40% with comparable overhead.

preprint2022arXiv

Evolution as a Service: A Privacy-Preserving Genetic Algorithm for Combinatorial Optimization

Evolutionary algorithms (EAs), such as the genetic algorithm (GA), offer an elegant way to handle combinatorial optimization problems (COPs). However, limited by expertise and resources, most users do not have enough capability to implement EAs to solve COPs. An intuitive and promising solution is to outsource evolutionary operations to a cloud server, whilst it suffers from privacy concerns. To this end, this paper proposes a novel computing paradigm, evolution as a service (EaaS), where a cloud server renders evolutionary computation services for users without sacrificing users' privacy. Inspired by the idea of EaaS, this paper designs PEGA, a novel privacy-preserving GA for COPs. Specifically, PEGA enables users outsourcing COPs to the cloud server holding a competitive GA and approximating the optimal solution in a privacy-preserving manner. PEGA features the following characteristics. First, any user without expertise and enough resources can solve her COPs. Second, PEGA does not leak contents of optimization problems, i.e., users' privacy. Third, PEGA has the same capability as the conventional GA to approximate the optimal solution. We implements PEGA falling in a twin-server architecture and evaluates it in the traveling salesman problem (TSP, a widely known COP). Particularly, we utilize encryption cryptography to protect users' privacy and carefully design a suit of secure computing protocols to support evolutionary operators of GA on encrypted data. Privacy analysis demonstrates that PEGA does not disclose the contents of the COP to the cloud server. Experimental evaluation results on four TSP datasets show that PEGA is as effective as the conventional GA in approximating the optimal solution.

preprint2022arXiv

Federated Learning based on Defending Against Data Poisoning Attacks in IoT

The rapidly expanding number of Internet of Things (IoT) devices is generating huge quantities of data, but the data privacy and security exposure in IoT devices, especially in the automatic driving system. Federated learning (FL) is a paradigm that addresses data privacy, security, access rights, and access to heterogeneous message issues by integrating a global model based on distributed nodes. However, data poisoning attacks on FL can undermine the benefits, destroying the global model's availability and disrupting model training. To avoid the above issues, we build up a hierarchical defense data poisoning (HDDP) system framework to defend against data poisoning attacks in FL, which monitors each local model of individual nodes via abnormal detection to remove the malicious clients. Whether the poisoning defense server has a trusted test dataset, we design the \underline{l}ocal \underline{m}odel \underline{t}est \underline{v}oting (LMTV) and \underline{k}ullback-\underline{l}eibler divergence \underline{a}nomaly parameters \underline{d}etection (KLAD) algorithms to defend against label-flipping poisoning attacks. Specifically, the trusted test dataset is utilized to obtain the evaluation results for each classification to recognize the malicious clients in LMTV. More importantly, we adopt the kullback leibler divergence to measure the similarity between local models without the trusted test dataset in KLAD. Finally, through extensive evaluations and against the various label-flipping poisoning attacks, LMTV and KLAD algorithms could achieve the $100\%$ and $40\%$ to $85\%$ successful defense ratios under different detection situations.

preprint2022arXiv

Generation Matrix: An Embeddable Matrix Representation for Hierarchical Trees

Starting from the local structures to study hierarchical trees is a common research method. However, the cumbersome analysis and description make the naive method challenging to adapt to the increasingly complex hierarchical tree problems. To improve the efficiency of hierarchical tree research, we propose an embeddable matrix representation for hierarchical trees, called Generation Matrix. It can transform the abstract hierarchical tree into a concrete matrix representation and then take the hierarchical tree as a whole to study, which dramatically reduces the complexity of research. Mathematical analysis shows that Generation Matrix can simulate various recursive algorithms without accessing local structures and provides a variety of interpretable matrix operations to support the research of hierarchical trees. Applying Generation Matrix to differential privacy hierarchical tree release, we propose a Generation Matrix-based optimally consistent release algorithm (GMC). It provides an exceptionally concise process description so that we can describe its core steps as a simple matrix expression rather than multiple complicated recursive processes like existing algorithms. Our experiments show that GMC takes only a few seconds to complete a release for large-scale datasets with more than 10 million nodes. The calculation efficiency is increased by up to 100 times compared with the state-of-the-art schemes.

preprint2022arXiv

The Component Diagnosability of General Networks

The processor failures in a multiprocessor system have a negative impact on its distributed computing efficiency. Because of the rapid expansion of multiprocessor systems, the importance of fault diagnosis is becoming increasingly prominent. The $h$-component diagnosability of $G$, denoted by $ct_{h}(G)$, is the maximum number of nodes of the faulty set $F$ that is correctly identified in a system, and the number of components in $G-F$ is at least $h$. In this paper, we determine the $(h+1)$-component diagnosability of general networks under the PMC model and MM$^{*}$ model. As applications, the component diagnosability is explored for some well-known networks, including complete cubic networks, hierarchical cubic networks, generalized exchanged hypercubes, dual-cube-like networks, hierarchical hypercubes, Cayley graphs generated by transposition trees (except star graphs), and DQcube as well. Furthermore, we provide some comparison results between the component diagnosability and other fault diagnosabilities.

preprint2021arXiv

Robust Single-step Adversarial Training with Regularizer

High cost of training time caused by multi-step adversarial example generation is a major challenge in adversarial training. Previous methods try to reduce the computational burden of adversarial training using single-step adversarial example generation schemes, which can effectively improve the efficiency but also introduce the problem of catastrophic overfitting, where the robust accuracy against Fast Gradient Sign Method (FGSM) can achieve nearby 100\% whereas the robust accuracy against Projected Gradient Descent (PGD) suddenly drops to 0\% over a single epoch. To address this problem, we propose a novel Fast Gradient Sign Method with PGD Regularization (FGSMPR) to boost the efficiency of adversarial training without catastrophic overfitting. Our core idea is that single-step adversarial training can not learn robust internal representations of FGSM and PGD adversarial examples. Therefore, we design a PGD regularization term to encourage similar embeddings of FGSM and PGD adversarial examples. The experiments demonstrate that our proposed method can train a robust deep network for L$_\infty$-perturbations with FGSM adversarial training and reduce the gap to multi-step adversarial training.

preprint2021arXiv

When Crowdsensing Meets Federated Learning: Privacy-Preserving Mobile Crowdsensing System

Mobile crowdsensing (MCS) is an emerging sensing data collection pattern with scalability, low deployment cost, and distributed characteristics. Traditional MCS systems suffer from privacy concerns and fair reward distribution. Moreover, existing privacy-preserving MCS solutions usually focus on the privacy protection of data collection rather than that of data processing. To tackle faced problems of MCS, in this paper, we integrate federated learning (FL) into MCS and propose a privacy-preserving MCS system, called \textsc{CrowdFL}. Specifically, in order to protect privacy, participants locally process sensing data via federated learning and only upload encrypted training models. Particularly, a privacy-preserving federated averaging algorithm is proposed to average encrypted training models. To reduce computation and communication overhead of restraining dropped participants, discard and retransmission strategies are designed. Besides, a privacy-preserving posted pricing incentive mechanism is designed, which tries to break the dilemma of privacy protection and data evaluation. Theoretical analysis and experimental evaluation on a practical MCS application demonstrate the proposed \textsc{CrowdFL} can effectively protect participants privacy and is feasible and efficient.

preprint2020arXiv

Boosting Privately: Privacy-Preserving Federated Extreme Boosting for Mobile Crowdsensing

Recently, Google and other 24 institutions proposed a series of open challenges towards federated learning (FL), which include application expansion and homomorphic encryption (HE). The former aims to expand the applicable machine learning models of FL. The latter focuses on who holds the secret key when applying HE to FL. For the naive HE scheme, the server is set to master the secret key. Such a setting causes a serious problem that if the server does not conduct aggregation before decryption, a chance is left for the server to access the user's update. Inspired by the two challenges, we propose FedXGB, a federated extreme gradient boosting (XGBoost) scheme supporting forced aggregation. FedXGB mainly achieves the following two breakthroughs. First, FedXGB involves a new HE based secure aggregation scheme for FL. By combining the advantages of secret sharing and homomorphic encryption, the algorithm can solve the second challenge mentioned above, and is robust to the user dropout. Then, FedXGB extends FL to a new machine learning model by applying the secure aggregation scheme to the classification and regression tree building of XGBoost. Moreover, we conduct a comprehensive theoretical analysis and extensive experiments to evaluate the security, effectiveness, and efficiency of FedXGB. The results indicate that FedXGB achieves less than 1% accuracy loss compared with the original XGBoost, and can provide about 23.9% runtime and 33.3% communication reduction for HE based model update aggregation of FL.

preprint2020arXiv

Cloud-based Federated Boosting for Mobile Crowdsensing

The application of federated extreme gradient boosting to mobile crowdsensing apps brings several benefits, in particular high performance on efficiency and classification. However, it also brings a new challenge for data and model privacy protection. Besides it being vulnerable to Generative Adversarial Network (GAN) based user data reconstruction attack, there is not the existing architecture that considers how to preserve model privacy. In this paper, we propose a secret sharing based federated learning architecture FedXGB to achieve the privacy-preserving extreme gradient boosting for mobile crowdsensing. Specifically, we first build a secure classification and regression tree (CART) of XGBoost using secret sharing. Then, we propose a secure prediction protocol to protect the model privacy of XGBoost in mobile crowdsensing. We conduct a comprehensive theoretical analysis and extensive experiments to evaluate the security, effectiveness, and efficiency of FedXGB. The results indicate that FedXGB is secure against the honest-but-curious adversaries and attains less than 1% accuracy loss compared with the original XGBoost model.

preprint2020arXiv

Droidetec: Android Malware Detection and Malicious Code Localization through Deep Learning

Android malware detection is a critical step towards building a security credible system. Especially, manual search for the potential malicious code has plagued program analysts for a long time. In this paper, we propose Droidetec, a deep learning based method for android malware detection and malicious code localization, to model an application program as a natural language sequence. Droidetec adopts a novel feature extraction method to derive behavior sequences from Android applications. Based on that, the bi-directional Long Short Term Memory network is utilized for malware detection. Each unit in the extracted behavior sequence is inventively represented as a vector, which allows Droidetec to automatically analyze the semantics of sequence segments and eventually find out the malicious code. Experiments with 9616 malicious and 11982 benign programs show that Droidetec reaches an accuracy of 97.22% and an F1-score of 98.21%. In all, Droidetec has a hit rate of 91% to properly find out malicious code segments.

preprint2020arXiv

Privacy-preserving Medical Treatment System through Nondeterministic Finite Automata

In this paper, we propose a privacy-preserving medical treatment system using nondeterministic finite automata (NFA), hereafter referred to as P-Med, designed for the remote medical environment. P-Med makes use of the nondeterministic transition characteristic of NFA to flexibly represent the medical model, which includes illness states, treatment methods and state transitions caused by exerting different treatment methods. A medical model is encrypted and outsourced to the cloud to deliver telemedicine services. Using P-Med, patient-centric diagnosis and treatment can be made on-the-fly while protecting the confidentiality of a patient's illness states and treatment recommendation results. Moreover, a new privacy-preserving NFA evaluation method is given in P-Med to get a confidential match result for the evaluation of an encrypted NFA and an encrypted data set, which avoids the cumbersome inner state transition determination. We demonstrate that P-Med realizes treatment procedure recommendation without privacy leakage to unauthorized parties. We conduct extensive experiments and analyses to evaluate efficiency.

preprint2020arXiv

VerifyTL: Secure and Verifiable Collaborative Transfer Learning

Getting access to labelled datasets in certain sensitive application domains can be challenging. Hence, one often resorts to transfer learning to transfer knowledge learned from a source domain with sufficient labelled data to a target domain with limited labelled data. However, most existing transfer learning techniques only focus on one-way transfer which brings no benefit to the source domain. In addition, there is the risk of a covert adversary corrupting a number of domains, which can consequently result in inaccurate prediction or privacy leakage. In this paper we construct a secure and Verifiable collaborative Transfer Learning scheme, VerifyTL, to support two-way transfer learning over potentially untrusted datasets by improving knowledge transfer from a target domain to a source domain. Further, we equip VerifyTL with a cross transfer unit and a weave transfer unit employing SPDZ computation to provide privacy guarantee and verification in the two-domain setting and the multi-domain setting, respectively. Thus, VerifyTL is secure against covert adversary that can compromise up to n-1 out of n data domains. We analyze the security of VerifyTL and evaluate its performance over two real-world datasets. Experimental results show that VerifyTL achieves significant performance gains over existing secure learning schemes.

preprint2016arXiv

Role of Pressure in the Growth of Hexagonal Boron Nitride Thin Films from Ammonia-Borane

We analyze the optical, chemical, and electrical properties of chemical vapor deposition (CVD) grown hexagonal boron nitride (h-BN) using the precursor ammonia-borane ($H_3N-BH_3$) as a function of $Ar/H_2$ background pressure ($P_{TOT}$). Films grown at $P_{TOT}$ less than 2.0 Torr are uniform in thickness, highly crystalline, and consist solely of h-BN. At larger $P_{TOT}$, with constant precursor flow, the growth rate increases, but the resulting h-BN is more amorphous, disordered, and $sp^3$ bonded. We attribute these changes in h-BN grown at high pressure to incomplete thermolysis of the $H_3N-BH_3$ precursor from a passivated Cu catalyst. A similar increase in h-BN growth rate and amorphization is observed even at low $P_{TOT}$ if the $H_3N-BH_3$ partial pressure is initially greater than the background pressure $P_{TOT}$ at the beginning of growth. h-BN growth using the $H_3N-BH_3$ precursor reproducibly can give large-area, crystalline h-BN thin films, provided that the total pressure is under 2.0 Torr and the precursor flux is well-controlled.

Ximeng Liu

What is connected

Connect this record

See the researcher in context

Building this map preview

16 published item(s)

NeuroRisk: Physics-Informed Neural Optimization for Risk-Aware Traffic Engineering

Backdoor Defense with Machine Unlearning

Defense against Backdoor Attacks via Identifying and Purifying Bad Neurons

Enhance transferability of adversarial examples with model architecture

Evolution as a Service: A Privacy-Preserving Genetic Algorithm for Combinatorial Optimization

Federated Learning based on Defending Against Data Poisoning Attacks in IoT

Generation Matrix: An Embeddable Matrix Representation for Hierarchical Trees

The Component Diagnosability of General Networks

Robust Single-step Adversarial Training with Regularizer

When Crowdsensing Meets Federated Learning: Privacy-Preserving Mobile Crowdsensing System

Boosting Privately: Privacy-Preserving Federated Extreme Boosting for Mobile Crowdsensing

Cloud-based Federated Boosting for Mobile Crowdsensing

Droidetec: Android Malware Detection and Malicious Code Localization through Deep Learning

Privacy-preserving Medical Treatment System through Nondeterministic Finite Automata

VerifyTL: Secure and Verifiable Collaborative Transfer Learning

Role of Pressure in the Growth of Hexagonal Boron Nitride Thin Films from Ammonia-Borane