Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
23works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

23 published item(s)

preprint2026arXiv

Provable Fairness Repair for Deep Neural Networks

Deep neural networks (DNNs) are suffering from ethical issues such as individual discrimination. In response, extensive NN repair techniques have been developed to adjust models and mitigate such undesired behaviors. However, existing fairness repair methods are typically data-centric, which often lack provable guarantees and generalization to unseen samples. To overcome these limitations, we propose ProF, a novel fairness repair framework with provable guarantees. The key intuition of ProF is to leverage interval bound propagation (a widely used NN verification technique) to soundly capture model outputs over the whole set $S(\mathbf{x})$ around a biased sample $\mathbf{x}$. The derived bounds are utilized to guide fairness repair which encourages the model to produce consistent outputs on $S(\mathbf{x})$. Specifically, we integrate fairness constraints and model modifications into a unified constraint-solving formulation, which can be transformed to a Mixed-Integer Linear Programming (MILP) problem solvable by off-the-shelf solvers. The solution to the MILP problem effectively induces a repaired model with guaranteed fairness over the whole set $S(\mathbf{x})$. We evaluate ProF on four widely used benchmark datasets and demonstrate that it achieves provable fairness repair, with generalization of up to 95.93\% on full datasets and 93.16\% on the entire input space. Notably, ProF can be easily configured to support multiple sensitive attributes and more practical fairness definitions, while providing provable repair guarantees and delivering around 90\% fairness improvement. Our code is available at https://github.com/nninjn/ProF.

preprint2026arXiv

Traffic-MoE: A Sparse Foundation Model for Network Traffic Analysis

While pre-trained large models have achieved state-of-the-art performance in network traffic analysis, their prohibitive computational costs hinder deployment in real-time, throughput-sensitive network defense environments. This work bridges the gap between advanced representation learning and practical network protection by introducing Traffic-MoE, a sparse foundation model optimized for high-efficiency real-time inference. By dynamically routing traffic tokens to a small subset of specialized experts, Traffic-MoE effectively decouples model capacity from computational overhead. Extensive evaluations across three security-oriented tasks demonstrate that Traffic-MoE achieves up to a 12.38% improvement in detection performance compared to leading dense competitors. Crucially, it delivers a 91.62% increase in throughput, reduces inference latency by 47.81%, and cuts peak GPU memory consumption by 38.72%. Beyond efficiency, Traffic-MoE exhibits superior robustness against adversarial traffic shaping and maintains high detection efficacy in few-shot scenarios, establishing a new paradigm for scalable and resilient network traffic analysis.

preprint2024arXiv

Multi-Modal Representation Learning for Molecular Property Prediction: Sequence, Graph, Geometry

Molecular property prediction refers to the task of labeling molecules with some biochemical properties, playing a pivotal role in the drug discovery and design process. Recently, with the advancement of machine learning, deep learning-based molecular property prediction has emerged as a solution to the resource-intensive nature of traditional methods, garnering significant attention. Among them, molecular representation learning is the key factor for molecular property prediction performance. And there are lots of sequence-based, graph-based, and geometry-based methods that have been proposed. However, the majority of existing studies focus solely on one modality for learning molecular representations, failing to comprehensively capture molecular characteristics and information. In this paper, a novel multi-modal representation learning model, which integrates the sequence, graph, and geometry characteristics, is proposed for molecular property prediction, called SGGRL. Specifically, we design a fusion layer to fusion the representation of different modalities. Furthermore, to ensure consistency across modalities, SGGRL is trained to maximize the similarity of representations for the same molecule while minimizing similarity for different molecules. To verify the effectiveness of SGGRL, seven molecular datasets, and several baselines are used for evaluation and comparison. The experimental results demonstrate that SGGRL consistently outperforms the baselines in most cases. This further underscores the capability of SGGRL to comprehensively capture molecular information. Overall, the proposed SGGRL model showcases its potential to revolutionize molecular property prediction by leveraging multi-modal representation learning to extract diverse and comprehensive molecular insights. Our code is released at https://github.com/Vencent-Won/SGGRL.

preprint2022arXiv

Dual-channel Early Warning Framework for Ethereum Ponzi Schemes

Blockchain technology supports the generation and record of transactions, and maintains the fairness and openness of the cryptocurrency system. However, many fraudsters utilize smart contracts to create fraudulent Ponzi schemes for profiting on Ethereum, which seriously affects financial security. Most existing Ponzi scheme detection techniques suffer from two major restricted problems: the lack of motivation for temporal early warning and failure to fuse multi-source information finally cause the lagging and unsatisfactory performance of Ethereum Ponzi scheme detection. In this paper, we propose a dual-channel early warning framework for Ethereum Ponzi schemes, named Ponzi-Warning, which performs feature extraction and fusion on both code and transaction levels. Moreover, we represent a temporal evolution augmentation strategy for generating transaction graph sequences, which can effectively increase the data scale and introduce temporal information. Comprehensive experiments on our Ponzi scheme datasets demonstrate the effectiveness and timeliness of our framework for detecting the Ponzi contract accounts.

preprint2022arXiv

Graph-Based Similarity of Neural Network Representations

Understanding the black-box representations in Deep Neural Networks (DNN) is an essential problem in deep learning. In this work, we propose Graph-Based Similarity (GBS) to measure the similarity of layer features. Contrary to previous works that compute the similarity directly on the feature maps, GBS measures the correlation based on the graph constructed with hidden layer outputs. By treating each input sample as a node and the corresponding layer output similarity as edges, we construct the graph of DNN representations for each layer. The similarity between graphs of layers identifies the correspondences between representations of models trained in different datasets and initializations. We demonstrate and prove the invariance property of GBS, including invariance to orthogonal transformation and invariance to isotropic scaling, and compare GBS with CKA. GBS shows state-of-the-art performance in reflecting the similarity and provides insights on explaining the adversarial sample behavior on the hidden layer space.

preprint2022arXiv

Heterogeneous Feature Augmentation for Ponzi Detection in Ethereum

While blockchain technology triggers new industrial and technological revolutions, it also brings new challenges. Recently, a large number of new scams with a "blockchain" sock-puppet continue to emerge, such as Ponzi schemes, money laundering, etc., seriously threatening financial security. Existing fraud detection methods in blockchain mainly concentrate on manual feature and graph analytics, which first construct a homogeneous transaction graph using partial blockchain data and then use graph analytics to detect anomaly, resulting in a loss of pattern information. In this paper, we mainly focus on Ponzi scheme detection and propose HFAug, a generic Heterogeneous Feature Augmentation module that can capture the heterogeneous information associated with account behavior patterns and can be combined with existing Ponzi detection methods. HFAug learns the metapath-based behavior characteristics in an auxiliary heterogeneous interaction graph, and aggregates the heterogeneous features to corresponding account nodes in the homogeneous one where the Ponzi detection methods are performed. Comprehensive experimental results demonstrate that our HFAug can help existing Ponzi detection methods achieve significant performance improvement on Ethereum datasets, suggesting the effectiveness of heterogeneous information on detecting Ponzi schemes.

preprint2022arXiv

Hyperspectral Imaging for cherry tomato

Cherry tomato (Solanum Lycopersicum) is popular with consumers over the world due to its special flavor. Soluble solids content (SSC) and firmness are two key metrics for evaluating the product qualities. In this work, we develop non-destructive testing techniques for SSC and fruit firmness based on hyperspectral images and a corresponding deep learning regression model. Hyperspectral reflectance images of over 200 tomato fruits are derived with spectrum ranging from 400 to 1000 nm. The acquired hyperspectral images are corrected and the spectral information is extracted. A novel one-dimensional(1D) convolutional ResNet (Con1dResNet) based regression model is prosed and compared with the state of art techniques. Experimental results show that, with a relatively large number of samples our technique is 26.4\% better than state of art technique for SSC and 33.7\% for firmness. The results of this study indicate the application potential of hyperspectral imaging technique in the SSC and firmness detection, which provides a new option for non-destructive testing of cherry tomato fruit quality in the future.

preprint2022arXiv

Improving robustness of language models from a geometry-aware perspective

Recent studies have found that removing the norm-bounded projection and increasing search steps in adversarial training can significantly improve robustness. However, we observe that a too large number of search steps can hurt accuracy. We aim to obtain strong robustness efficiently using fewer steps. Through a toy experiment, we find that perturbing the clean data to the decision boundary but not crossing it does not degrade the test accuracy. Inspired by this, we propose friendly adversarial data augmentation (FADA) to generate friendly adversarial data. On top of FADA, we propose geometry-aware adversarial training (GAT) to perform adversarial training on friendly adversarial data so that we can save a large number of search steps. Comprehensive experiments across two widely used datasets and three pre-trained language models demonstrate that GAT can obtain stronger robustness via fewer steps. In addition, we provide extensive empirical results and in-depth analyses on robustness to facilitate future studies.

preprint2022arXiv

Null Model-Based Data Augmentation for Graph Classification

In network science, the null model is typically used to generate a series of graphs based on randomization as a term of comparison to verify whether a network in question displays some non-trivial features such as community structure. Since such non-trivial features play a significant role in graph classification, the null model could be useful for network data augmentation to enhance classification performance. In this paper, we propose a novel technique that combines the null model with data augmentation for graph classification. Moreover, we propose four standard null model-based augmentation methods and four approximate null model-based augmentation methods to verify and improve the performance of our graph classification technique. Our experiments demonstrate that the proposed augmentation technique has significantly achieved general improvement on the tested datasets. In addition, we find that the standard null model-based augmentation methods always outperform the approximate ones, depending on the design mechanisms of the null models. Our results indicate that the choice of non-trivial features is significant for increasing the performance of augmentation models for different network structures, which also provides a new perspective of data augmentation for studying various graph classification methods.

preprint2022arXiv

Phishing Fraud Detection on Ethereum using Graph Neural Network

Blockchain has widespread applications in the financial field but has also attracted increasing cybercrimes. Recently, phishing fraud has emerged as a major threat to blockchain security, calling for the development of effective regulatory strategies. Nowadays network science has been widely used in modeling Ethereum transaction data, further introducing the network representation learning technology to analyze the transaction patterns. In this paper, we consider phishing detection as a graph classification task and propose an end-to-end Phishing Detection Graph Neural Network framework (PDGNN). Specifically, we first construct a lightweight Ethereum transaction network and extract transaction subgraphs of collected phishing accounts. Then we propose an end-to-end detection model based on Chebyshev-GCN to precisely distinguish between normal and phishing accounts. Extensive experiments on five Ethereum datasets demonstrate that our PDGNN significantly outperforms general phishing detection methods and scales well in large transaction networks.

preprint2022arXiv

SubGraph Networks based Entity Alignment for Cross-lingual Knowledge Graph

Entity alignment is the task of finding entities representing the same real-world object in two knowledge graphs(KGs). Cross-lingual knowledge graph entity alignment aims to discover the cross-lingual links in the multi-language KGs, which is of great significance to the NLP applications and multi-language KGs fusion. In the task of aligning cross-language knowledge graphs, the structures of the two graphs are very similar, and the equivalent entities often have the same subgraph structure characteristics. The traditional GCN method neglects to obtain structural features through representative parts of the original graph and the use of adjacency matrix is not enough to effectively represent the structural features of the graph. In this paper, we introduce the subgraph network (SGN) method into the GCN-based cross-lingual KG entity alignment method. In the method, we extracted the first-order subgraphs of the KGs to expand the structural features of the original graph to enhance the representation ability of the entity embedding and improve the alignment accuracy. Experiments show that the proposed method outperforms the state-of-the-art GCN-based method.

preprint2022arXiv

Targeted k-node Collapse Problem: Towards Understanding the Robustness of Local k-core Structure

The concept of k-core, which indicates the largest induced subgraph where each node has k or more neighbors, plays a significant role in measuring the cohesiveness and the engagement of a network, and it is exploited in diverse applications, e.g., network analysis, anomaly detection, community detection, etc. Recent works have demonstrated the vulnerability of k-core under malicious perturbations which focuses on removing the minimal number of edges to make a whole k-core structure collapse. However, to the best of our knowledge, there is no existing research concentrating on how many edges should be removed at least to make an arbitrary node in k-core collapse. Therefore, in this paper, we make the first attempt to study the Targeted k-node Collapse Problem (TNCP) with four novel contributions. Firstly, we offer the general definition of TNCP problem with the proof of its NP-hardness. Secondly, in order to address the TNCP problem, we propose a heuristic algorithm named TNC and its improved version named ATNC for implementations on large-scale networks. After that, the experiments on 16 real-world networks across various domains verify the superiority of our proposed algorithms over 4 baseline methods along with detailed comparisons and analyses. Finally, the significance of TNCP problem for precisely evaluating the resilience of k-core structures in networks is validated.

preprint2022arXiv

TSGN: Transaction Subgraph Networks Assisting Phishing Detection in Ethereum

Due to the decentralized and public nature of the Blockchain ecosystem, the malicious activities on the Ethereum platform impose immeasurable losses for the users. Existing phishing scam detection methods mostly rely only on the analysis of original transaction networks, which is difficult to dig deeply into the transaction patterns hidden in the network structure of transaction interaction. In this paper, we propose a \underline{T}ransaction \underline{S}ub\underline{G}raph \underline{N}etwork (TSGN) based phishing accounts identification framework for Ethereum. We first extract transaction subgraphs for target accounts and then expand these subgraphs into corresponding TSGNs based on the different mapping mechanisms. In order to make our model incorporate more important information about real transactions, we encode the transaction attributes into the modeling process of TSGNs, yielding two variants of TSGN, i.e., Directed-TSGN and Temporal-TSGN, which can be applied to the different attributed networks. Especially, by introducing TSGN into multi-edge transaction networks, the Multiple-TSGN model proposed is able to preserve the temporal transaction flow information and capture the significant topological pattern of phishing scams, while reducing the time complexity of modeling large-scale networks. Extensive experimental results show that TSGN models can provide more potential information to improve the performance of phishing detection by incorporating graph representation learning.

preprint2022arXiv

Understanding the Dynamics of DNNs Using Graph Modularity

There are good arguments to support the claim that deep neural networks (DNNs) capture better feature representations than the previous hand-crafted feature engineering, which leads to a significant performance improvement. In this paper, we move a tiny step towards understanding the dynamics of feature representations over layers. Specifically, we model the process of class separation of intermediate representations in pre-trained DNNs as the evolution of communities in dynamic graphs. Then, we introduce modularity, a generic metric in graph theory, to quantify the evolution of communities. In the preliminary experiment, we find that modularity roughly tends to increase as the layer goes deeper and the degradation and plateau arise when the model complexity is great relative to the dataset. Through an asymptotic analysis, we prove that modularity can be broadly used for different applications. For example, modularity provides new insights to quantify the difference between feature representations. More crucially, we demonstrate that the degradation and plateau in modularity curves represent redundant layers in DNNs and can be pruned with minimal impact on performance, which provides theoretical guidance for layer pruning. Our code is available at https://github.com/yaolu-zjut/Dynamic-Graphs-Construction.

preprint2021arXiv

CLPVG: Circular limited penetrable visibility graph as a new network model for time series

Visibility Graph (VG) transforms time series into graphs, facilitating signal processing by advanced graph data mining algorithms. In this paper, based on the classic Limited Penetrable Visibility Graph (LPVG) method, we propose a novel nonlinear mapping method named Circular Limited Penetrable Visibility Graph (CLPVG). The testing on degree distribution and clustering coefficient on the generated graphs of typical time series validates that our CLPVG is able to effectively capture the important features of time series and has better anti-noise ability than traditional LPVG. The experiments on real-world time-series datasets of radio signal and electroencephalogram (EEG) also suggest that the structural features provided by CLPVG, rather than LPVG, are more useful for time-series classification, leading to higher accuracy. And this classification performance can be further enhanced through structural feature expansion by adopting Subgraph Networks (SGN). All of these results validate the effectiveness of our CLPVG model.

preprint2021arXiv

Identity Inference on Blockchain using Graph Neural Network

The anonymity of blockchain has accelerated the growth of illegal activities and criminal behaviors on cryptocurrency platforms. Although decentralization is one of the typical characteristics of blockchain, we urgently call for effective regulation to detect these illegal behaviors to ensure the safety and stability of user transactions. Identity inference, which aims to make a preliminary inference about account identity, plays a significant role in blockchain security. As a common tool, graph mining technique can effectively represent the interactive information between accounts and be used for identity inference. However, existing methods cannot balance scalability and end-to-end architecture, resulting high computational consumption and weak feature representation. In this paper, we present a novel approach to analyze user's behavior from the perspective of the transaction subgraph, which naturally transforms the identity inference task into a graph classification pattern and effectively avoids computation in large-scale graph. Furthermore, we propose a generic end-to-end graph neural network model, named $\text{I}^2 \text{BGNN}$, which can accept subgraph as input and learn a function mapping the transaction subgraph pattern to account identity, achieving de-anonymization. Extensive experiments on EOSG and ETHG datasets demonstrate that the proposed method achieve the state-of-the-art performance in identity inference.

preprint2021arXiv

MITNet: GAN Enhanced Magnetic Induction Tomography Based on Complex CNN

Magnetic induction tomography (MIT) is an efficient solution for long-term brain disease monitoring, which focuses on reconstructing bio-impedance distribution inside the human brain using non-intrusive electromagnetic fields. However, high-quality brain image reconstruction remains challenging since reconstructing images from the measured weak signals is a highly non-linear and ill-conditioned problem. In this work, we propose a generative adversarial network (GAN) enhanced MIT technique, named MITNet, based on a complex convolutional neural network (CNN). The experimental results on the real-world dataset validate the performance of our technique, which outperforms the state-of-art method by 25.27%.

preprint2021arXiv

Multiscale Evolutionary Perturbation Attack on Community Detection

Community detection, aiming to group nodes based on their connections, plays an important role in network analysis, since communities, treated as meta-nodes, allow us to create a large-scale map of a network to simplify its analysis. However, for privacy reasons, we may want to prevent communities from being discovered in certain cases, leading to the topics on community deception. In this paper, we formalize this community detection attack problem in three scales, including global attack (macroscale), target community attack (mesoscale) and target node attack (microscale). We treat this as an optimization problem and further propose a novel Evolutionary Perturbation Attack (EPA) method, where we generate adversarial networks to realize the community detection attack. Numerical experiments validate that our EPA can successfully attack network community algorithms in all three scales, i.e., hide target nodes or communities and further disturb the community structure of the whole network by only changing a small fraction of links. By comparison, our EPA behaves better than a number of baseline attack methods on six synthetic networks and three real-world networks. More interestingly, although our EPA is based on the louvain algorithm, it is also effective on attacking other community detection algorithms, validating its good transferability.

preprint2021arXiv

Sampling Subgraph Network with Application to Graph Classification

Graphs are naturally used to describe the structures of various real-world systems in biology, society, computer science etc., where subgraphs or motifs as basic blocks play an important role in function expression and information processing. However, existing research focuses on the basic statistics of certain motifs, largely ignoring the connection patterns among them. Recently, a subgraph network (SGN) model is proposed to study the potential structure among motifs, and it was found that the integration of SGN can enhance a series of graph classification methods. However, SGN model lacks diversity and is of quite high time complexity, making it difficult to widely apply in practice. In this paper, we introduce sampling strategies into SGN, and design a novel sampling subgraph network model, which is scale-controllable and of higher diversity. We also present a hierarchical feature fusion framework to integrate the structural features of diverse sampling SGNs, so as to improve the performance of graph classification. Extensive experiments demonstrate that, by comparing with the SGN model, our new model indeed has much lower time complexity (reduced by two orders of magnitude) and can better enhance a series of graph classification methods (doubling the performance enhancement).

preprint2021arXiv

Temporal-Amount Snapshot MultiGraph for Ethereum Transaction Tracking

With the wide application of blockchain in the financial field, the rise of various types of cybercrimes has brought great challenges to the security of blockchain. In order to better understand this emerging market and explore more efficient countermeasures for effective supervision, it is imperative to track transactions on blockchain-based systems. Due to the openness of Ethereum, we can easily access the publicly available transaction records, model them as a complex network, and further study the problem of transaction tracking via link prediction, which provides a deeper understanding of Ethereum transactions from a network perspective. Specifically, we introduce an embedding based link prediction framework that is composed of temporal-amount snapshot multigraph (TASMG) and present temporal-amount walk (TAW). By taking the realistic rules and features of transaction networks into consideration, we propose TASMG to model Ethereum transaction records as a temporal-amount network and then present TAW to effectively embed accounts via their transaction records, which integrates temporal and amount information of the proposed network. Experimental results demonstrate the superiority of the proposed framework in learning more informative representations and could be an effective method for transaction tracking.

preprint2020arXiv

Adversarial Attacks to Scale-Free Networks: Testing the Robustness of Physical Criteria

Adversarial attacks have been alerting the artificial intelligence community recently, since many machine learning algorithms were found vulnerable to malicious attacks. This paper studies adversarial attacks to scale-free networks to test their robustness in terms of statistical measures. In addition to the well-known random link rewiring (RLR) attack, two heuristic attacks are formulated and simulated: degree-addition-based link rewiring (DALR) and degree-interval-based link rewiring (DILR). These three strategies are applied to attack a number of strong scale-free networks of various sizes generated from the Barabási-Albert model. It is found that both DALR and DILR are more effective than RLR, in the sense that rewiring a smaller number of links can succeed in the same attack. However, DILR is as concealed as RLR in the sense that they both are constructed by introducing a relatively small number of changes on several typical structural properties such as average shortest path-length, average clustering coefficient, and average diagonal distance. The results of this paper suggest that to classify a network to be scale-free has to be very careful from the viewpoint of adversarial attack effects.

preprint2020arXiv

Data Augmentation for Graph Classification

Graph classification, which aims to identify the category labels of graphs, plays a significant role in drug classification, toxicity detection, protein analysis etc. However, the limitation of scale of benchmark datasets makes it easy for graph classification models to fall into over-fitting and undergeneralization. Towards this, we introduce data augmentation on graphs and present two heuristic algorithms: random mapping and motif-similarity mapping, to generate more weakly labeled data for small-scale benchmark datasets via heuristic modification of graph structures. Furthermore, we propose a generic model evolution framework, M-Evolve, which combines graph augmentation, data filtration and model retraining to optimize pre-trained graph classifiers. Experiments conducted on six benchmark datasets demonstrate that M-Evolve helps existing graph classification models alleviate over-fitting when training on small-scale benchmark datasets and yields an average improvement of 3-12% accuracy on graph classification tasks.

preprint2020arXiv

Time-aware Gradient Attack on Dynamic Network Link Prediction

In network link prediction, it is possible to hide a target link from being predicted with a small perturbation on network structure. This observation may be exploited in many real world scenarios, for example, to preserve privacy, or to exploit financial security. There have been many recent studies to generate adversarial examples to mislead deep learning models on graph data. However, none of the previous work has considered the dynamic nature of real-world systems. In this work, we present the first study of adversarial attack on dynamic network link prediction (DNLP). The proposed attack method, namely time-aware gradient attack (TGA), utilizes the gradient information generated by deep dynamic network embedding (DDNE) across different snapshots to rewire a few links, so as to make DDNE fail to predict target links. We implement TGA in two ways: one is based on traversal search, namely TGA-Tra; and the other is simplified with greedy search for efficiency, namely TGA-Gre. We conduct comprehensive experiments which show the outstanding performance of TGA in attacking DNLP algorithms.