Source author record

Yongzheng Zhang

Yongzheng Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Cryptography and Security math.RA math.RT Software Engineering

Catalog footprint

What is connected

5works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

BinMLM: Binary Authorship Verification with Flow-aware Mixture-of-Shared Language Model

Binary authorship analysis is a significant problem in many software engineering applications. In this paper, we formulate a binary authorship verification task to accurately reflect the real-world working process of software forensic experts. It aims to determine whether an anonymous binary is developed by a specific programmer with a small set of support samples, and the actual developer may not belong to the known candidate set but from the wild. We propose an effective binary authorship verification framework, BinMLM. BinMLM trains the RNN language model on consecutive opcode traces extracted from the control-flow-graph (CFG) to characterize the candidate developers' programming styles. We build a mixture-of-shared architecture with multiple shared encoders and author-specific gate layers, which can learn the developers' combination preferences of universal programming patterns and alleviate the problem of low training resources. Through an optimization pipeline of external pre-training, joint training, and fine-tuning, our framework can eliminate additional noise and accurately distill developers' unique styles. Extensive experiments show that BinMLM achieves promising results on Google Code Jam (GCJ) and Codeforces datasets with different numbers of programmers and supporting samples. It significantly outperforms the baselines built on the state-of-the-art feature set (4.73% to 19.46% improvement) and remains robust in multi-author collaboration scenarios. Furthermore, BinMLM can perform organization-level verification on a real-world APT malware dataset, which can provide valuable auxiliary information for exploring the group behind the APT attack.

preprint2022arXiv

Inter-BIN: Interaction-based Cross-architecture IoT Binary Similarity Comparison

The big wave of Internet of Things (IoT) malware reflects the fragility of the current IoT ecosystem. Research has found that IoT malware can spread quickly on devices of different processer architectures, which leads our attention to cross-architecture binary similarity comparison technology. The goal of binary similarity comparison is to determine whether the semantics of two binary snippets is similar. Existing learning-based approaches usually learn the representations of binary code snippets individually and perform similarity matching based on the distance metric, without considering inter-binary semantic interactions. Moreover, they often rely on the large-scale external code corpus for instruction embeddings pre-training, which is heavyweight and easy to suffer the out-of-vocabulary (OOV) problem. In this paper, we propose an interaction-based cross-architecture IoT binary similarity comparison system, Inter-BIN. Our key insight is to introduce interaction between instruction sequences by co-attention mechanism, which can flexibly perform soft alignment of semantically related instructions from different architectures. And we design a lightweight multi-feature fusion-based instruction embedding method, which can avoid the heavy workload and the OOV problem of previous approaches. Extensive experiments show that Inter-BIN can significantly outperform state-of-the-art approaches on cross-architecture binary similarity comparison tasks of different input granularities. Furthermore, we present an IoT malware function matching dataset from real network environments, CrossMal, containing 1,878,437 cross-architecture reuse function pairs. Experimental results on CrossMal prove that Inter-BIN is practical and scalable on real-world binary similarity comparison collections.

preprint2022arXiv

Multi-relational Instruction Association Graph for Cross-architecture Binary Similarity Comparison

Cross-architecture binary similarity comparison is essential in many security applications. Recently, researchers have proposed learning-based approaches to improve comparison performance. They adopted a paradigm of instruction pre-training, individual binary encoding, and distance-based similarity comparison. However, instruction embeddings pre-trained on external code corpus are not universal in diverse real-world applications. And separately encoding cross-architecture binaries will accumulate the semantic gap of instruction sets, limiting the comparison accuracy. This paper proposes a novel cross-architecture binary similarity comparison approach with multi-relational instruction association graph. We associate mono-architecture instruction tokens with context relevance and cross-architecture tokens with potential semantic correlations from different perspectives. Then we exploit the relational graph convolutional network (R-GCN) to perform type-specific graph information propagation. Our approach can bridge the gap in the cross-architecture instruction representation spaces while avoiding the external pre-training workload. We conduct extensive experiments on basic block-level and function-level datasets to prove the superiority of our approach. Furthermore, evaluations on a large-scale real-world IoT malware reuse function collection show that our approach is valuable for identifying malware propagated on IoT devices of various architectures.

preprint2015arXiv

The matrix representation of the first cohomology of $\frak{gl}_{0|2}$ with coefficients in the generalized Witt Lie superalgebra

This paper is primarily concerned with the first cohomology of $\frak{gl}_{0|2}$ with coefficients in the generalized Witt Lie superalgebra, where $\frak{gl}_{0|2}$ is a subalgebra of the general linear Lie superalgebra. The derivations and inner derivations from $\frak{gl}_{0|2}$ into submodules of the generalized Witt Lie superalgebra are represented by matrices, respectively. Then the first cohomology of $\frak{gl}_{0|2}$ with coefficients in the generalized Witt Lie superalgebra is completely determined by matrices.

preprint2014arXiv

Some properties of generalized reduced Verma modules over $\mathbb{Z}$-graded modular Lie superalgebras

This paper is primarily concerned with generalized reduced Verma modules over $\mathbb{Z}$-graded modular Lie superalgebras. Some properties of the generalized reduced Verma modules and the coinduced modules are obtained. Moreover, the invariant forms on the generalized reduced Verma modules are considered. In particular, we prove that the generalized reduced Verma module is isomorphic to the mixed product for modules of $\mathbb{Z}$-graded modular Lie superalgebras of Cartan type.

Yongzheng Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

BinMLM: Binary Authorship Verification with Flow-aware Mixture-of-Shared Language Model

Inter-BIN: Interaction-based Cross-architecture IoT Binary Similarity Comparison

Multi-relational Instruction Association Graph for Cross-architecture Binary Similarity Comparison

The matrix representation of the first cohomology of $\frak{gl}_{0|2}$ with coefficients in the generalized Witt Lie superalgebra

Some properties of generalized reduced Verma modules over $\mathbb{Z}$-graded modular Lie superalgebras