Researcher profile

Yan Kang

Yan Kang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2022arXiv

Batch Label Inference and Replacement Attacks in Black-Boxed Vertical Federated Learning

In a vertical federated learning (VFL) scenario where features and model are split into different parties, communications of sample-specific updates are required for correct gradient calculations but can be used to deduce important sample-level label information. An immediate defense strategy is to protect sample-level messages communicated with Homomorphic Encryption (HE), and in this way only the batch-averaged local gradients are exposed to each party (termed black-boxed VFL). In this paper, we first explore the possibility of recovering labels in the vertical federated learning setting with HE-protected communication, and show that private labels can be reconstructed with high accuracy by training a gradient inversion model. Furthermore, we show that label replacement backdoor attacks can be conducted in black-boxed VFL by directly replacing encrypted communicated messages (termed gradient-replacement attack). As it is a common presumption that batch-averaged information is safe to share, batch label inference and replacement attacks are a severe challenge to VFL. To defend against batch label inference attack, we further evaluate several defense strategies, including confusional autoencoder (CoAE), a technique we proposed based on autoencoder and entropy regularization. We demonstrate that label inference and replacement attacks can be successfully blocked by this technique without hurting as much main task accuracy as compared to existing methods.

preprint2022arXiv

FuncFooler: A Practical Black-box Attack Against Learning-based Binary Code Similarity Detection Methods

The binary code similarity detection (BCSD) method measures the similarity of two binary executable codes. Recently, the learning-based BCSD methods have achieved great success, outperforming traditional BCSD in detection accuracy and efficiency. However, the existing studies are rather sparse on the adversarial vulnerability of the learning-based BCSD methods, which cause hazards in security-related applications. To evaluate the adversarial robustness, this paper designs an efficient and black-box adversarial code generation algorithm, namely, FuncFooler. FuncFooler constrains the adversarial codes 1) to keep unchanged the program's control flow graph (CFG), and 2) to preserve the same semantic meaning. Specifically, FuncFooler consecutively 1) determines vulnerable candidates in the malicious code, 2) chooses and inserts the adversarial instructions from the benign code, and 3) corrects the semantic side effect of the adversarial code to meet the constraints. Empirically, our FuncFooler can successfully attack the three learning-based BCSD models, including SAFE, Asm2Vec, and jTrans, which calls into question whether the learning-based BCSD is desirable.

preprint2020arXiv

A Communication Efficient Collaborative Learning Framework for Distributed Features

We introduce a collaborative learning framework allowing multiple parties having different sets of attributes about the same user to jointly build models without exposing their raw data or model parameters. In particular, we propose a Federated Stochastic Block Coordinate Descent (FedBCD) algorithm, in which each party conducts multiple local updates before each communication to effectively reduce the number of communication rounds among parties, a principal bottleneck for collaborative learning problems. We analyze theoretically the impact of the number of local updates and show that when the batch size, sample size, and the local iterations are selected appropriately, within $T$ iterations, the algorithm performs $\mathcal{O}(\sqrt{T})$ communication rounds and achieves some $\mathcal{O}(1/\sqrt{T})$ accuracy (measured by the average of the gradient norm squared). The approach is supported by our empirical evaluations on a variety of tasks and datasets, demonstrating advantages over stochastic gradient descent (SGD) approaches.

preprint2020arXiv

Secure Federated Transfer Learning

Machine learning relies on the availability of a vast amount of data for training. However, in reality, most data are scattered across different organizations and cannot be easily integrated under many legal and practical constraints. In this paper, we introduce a new technique and framework, known as federated transfer learning (FTL), to improve statistical models under a data federation. The federation allows knowledge to be shared without compromising user privacy, and enables complimentary knowledge to be transferred in the network. As a result, a target-domain party can build more flexible and powerful models by leveraging rich labels from a source-domain party. A secure transfer cross validation approach is also proposed to guard the FTL performance under the federation. The framework requires minimal modifications to the existing model structure and provides the same level of accuracy as the non-privacy-preserving approach. This framework is very flexible and can be effectively adapted to various secure multi-party machine learning tasks.